Example doesn't "learn" anything #6

sigaloid · 2022-06-22T00:18:19Z

Running the example and adding some debugging code, I'm finding that the neural network is not learning anything at all.

    let mut trainer = AgentTrainer::new();
    let mut agent = MyAgent {
        state: MyState { x: 0, y: 0 },
    };
    trainer.train(
        &mut agent,
        &QLearning::new(0.2, 0.01, 2.),
        &mut FixedIterations::new(10000000),
        &RandomExploration::new(),
    );
    let state1 = MyState { x: 1, y: 0 };
    let state2 = MyState { x: 0, y: 1 };
    let actions = vec![MyAction { dx: 0, dy: -1 }, MyAction { dx: -1, dy: 0 }];
    for action in actions {
        println!(
            "1: {:?} {:?} {:?}",
            state1,
            action,
            trainer.expected_value(&state1, &action),
        );
        println!(
            "2: {:?} {:?} {:?}",
            state2,
            action,
            trainer.expected_value(&state2, &action),
        );
        println!();
    }

1: MyState { x: 1, y: 0 } MyAction { dx: 0, dy: -1 } Some(-13.582118848154376)
2: MyState { x: 0, y: 1 } MyAction { dx: 0, dy: -1 } Some(-14.27795681221249)

1: MyState { x: 1, y: 0 } MyAction { dx: -1, dy: 0 } Some(-14.27795681221249)
2: MyState { x: 0, y: 1 } MyAction { dx: -1, dy: 0 } Some(-13.582118848154376)

It seems that it hasn't learned that even with x:1 and y:0, dx:-1 and dy:0 is the best move. Am I misunderstanding the example or anything here?

The text was updated successfully, but these errors were encountered:

milanboers · 2022-06-22T05:57:05Z

It needs to arrive at 10,10 and you are at 1,0. Both dx:-1,dy:0 and dx:0,dy:-1 are moving away from the target. The best move is either dx:1,dy:0 or dx:0,dy:1.

sigaloid · 2022-06-22T20:05:41Z

Oh okay, I see. I misunderstood the goal. thank you very much!

sigaloid closed this as completed Jun 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Example doesn't "learn" anything #6

Example doesn't "learn" anything #6

sigaloid commented Jun 22, 2022

milanboers commented Jun 22, 2022

sigaloid commented Jun 22, 2022

Example doesn't "learn" anything #6

Example doesn't "learn" anything #6

Comments

sigaloid commented Jun 22, 2022

milanboers commented Jun 22, 2022

sigaloid commented Jun 22, 2022