Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example doesn't "learn" anything #6

Closed
sigaloid opened this issue Jun 22, 2022 · 2 comments
Closed

Example doesn't "learn" anything #6

sigaloid opened this issue Jun 22, 2022 · 2 comments

Comments

@sigaloid
Copy link

Running the example and adding some debugging code, I'm finding that the neural network is not learning anything at all.

    let mut trainer = AgentTrainer::new();
    let mut agent = MyAgent {
        state: MyState { x: 0, y: 0 },
    };
    trainer.train(
        &mut agent,
        &QLearning::new(0.2, 0.01, 2.),
        &mut FixedIterations::new(10000000),
        &RandomExploration::new(),
    );
    let state1 = MyState { x: 1, y: 0 };
    let state2 = MyState { x: 0, y: 1 };
    let actions = vec![MyAction { dx: 0, dy: -1 }, MyAction { dx: -1, dy: 0 }];
    for action in actions {
        println!(
            "1: {:?} {:?} {:?}",
            state1,
            action,
            trainer.expected_value(&state1, &action),
        );
        println!(
            "2: {:?} {:?} {:?}",
            state2,
            action,
            trainer.expected_value(&state2, &action),
        );
        println!();
    }
1: MyState { x: 1, y: 0 } MyAction { dx: 0, dy: -1 } Some(-13.582118848154376)
2: MyState { x: 0, y: 1 } MyAction { dx: 0, dy: -1 } Some(-14.27795681221249)

1: MyState { x: 1, y: 0 } MyAction { dx: -1, dy: 0 } Some(-14.27795681221249)
2: MyState { x: 0, y: 1 } MyAction { dx: -1, dy: 0 } Some(-13.582118848154376)

It seems that it hasn't learned that even with x:1 and y:0, dx:-1 and dy:0 is the best move. Am I misunderstanding the example or anything here?

@milanboers
Copy link
Owner

It needs to arrive at 10,10 and you are at 1,0. Both dx:-1,dy:0 and dx:0,dy:-1 are moving away from the target. The best move is either dx:1,dy:0 or dx:0,dy:1.

@sigaloid
Copy link
Author

Oh okay, I see. I misunderstood the goal. thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants