Mindmaker is returning the same action for every receiveAction call during Evaluation phase. #21

FLOROID · 2022-06-13T11:42:46Z

I've set everything up in my project now and the agent is training without any errors, however when the Evaluation phase starts, he will continue executing the same action he executed in the first evaluation episode for every evaluation episode until the evaluation is done.
What could be causing this and how could I go about fixing this issue? Until I can actually properly evaluate what the agent is learning I can't really make use of the training, so this would be very crucial to fix for me.

krumiaa · 2022-06-13T19:10:42Z

it sounds like your agent learned an incorrect strategy, got caught in a local equilibrium rather then discovering whatever global strategy you wanted it to learn. I suggest changing the algorithm, or messing around with the hyperparameters. This is not at all uncommon, finding good hyperparameters can be tricky
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0252754

FLOROID · 2022-06-15T12:31:09Z

Oh okay, I'll look into it and let you know if I need more help with this. Thank you very much for the paper you sent me, especially since it also focusses on autonomous driving :)

By the way, could this have any influence on the results?
#17 (comment)

FLOROID · 2022-06-15T14:45:49Z

What would be a good way to test if it actually works properly? Because I've been playing around a little bit with my check reward function and the hyperparameters, but I can't get it to produce any other action in the evaluation phase other than the one being chosen for the rest of that phase.
I also found that in the MindMaker executable, during the Evaluation phase it is repeatedly printing out the line "Im rrestarting now how fun"

On top of that, the observations in there frequently print out a bunch of 0. values with no commas and nothing after the points. Is this normal?

I also tried setting the number of training episodes to 0 and just have it evaluate the randomly generated model, which always returns action 0 in the evaluation phase.

FLOROID · 2022-06-16T17:09:05Z

UPDATE: @krumiaa
Okay so I've been changing a few things. Instead of a discrete action space I am now using a continuous action space of 2 float values between -1 and 1 that will go directly into the throttle and steering inputs of my car.
I figured this would simplify my work a bit more and also let me have some more insight into what the algorithm is actually producing. As it turns out this is running into the same issue as before but gave me a lot more insight.

The algorithm correctly produces a general strategy for avoiding a wall in this case. If, for example, I place the car facing a wall, it will produce a forward throttle input and a steering input that will steer him away from the wall in the direction of the road.
However, the float values of the actions that are produced do not change at all in the evaluation phase, which to my understanding should be impossible since the input values are changing simply by the car moving. It seems to me like the algorithm generalizes the entire dataset into one singular action which will simply be put out for every frame regardless of the inputs. This obviously isn't what we're aiming for, so I'm a little bit confused on how to fix this error since in the cartpole example it perfectly adapts to new inputs constantly.

Any idea what I could try?

many thanks in advance - Floroid <3

krumiaa · 2022-06-16T19:23:12Z

In most of the algorithms in the DRL mindmaker there is an automatic transition from exploration to exploitation, so you don't necessarily need to just choose one or the other. Read the docs on SAC for instance https://stable-baselines.readthedocs.io/en/master/modules/sac.html As the algorithm trains, it will automatically start choosing more exploitative actions and less exploring ones. I dont have allot of experience with self driving car but you could look at something like this blog and the values they used for epsilon greedy, which controls the explore exploit trade off. https://www.inspiritai.com/blogs/ai-blog/2021/9/4/student-blog-project-on-self-driving-cars

…

On Thu, Jun 16, 2022 at 1:09 PM Florian Dittrich ***@***.***> wrote: *UPDATE:* Okay so I've been changing a few things. Instead of a discrete action space I am now using a continuous action space of 2 float values between -1 and 1 that will go directly into the throttle and steering inputs of my car. I figured this would simplify my work a bit more and also let me have some more insight into what the algorithm is actually producing. As it turns out this is running into the same issue as before but gave me a lot more insight. The algorithm correctly produces a general strategy for avoiding a wall in this case. If, for example, I place the car facing a wall, it will produce a forward throttle input and a steering input that will steer him away from the wall in the direction of the road. However, the float values of the actions that are produced do not change at all in the evaluation phase, which to my understanding should be impossible since the input values are changing simply by the car moving. It seems to me like the algorithm generalizes the entire dataset into one singular action which will simply be put out for every frame regardless of the inputs. This obviously isn't what we're aiming for, so I'm a little bit confused on how to fix this error since in the cartpole example it perfectly adapts to new inputs constantly. Any idea what I could try? many thanks in advance - Floroid <3 — Reply to this email directly, view it on GitHub <#21 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABI6N2JS5ONAL72HFQQFB5LVPNNTZANCNFSM5YT5XACQ> . You are receiving this because you commented.Message ID: ***@***.***>

FLOROID · 2022-06-17T09:24:22Z

I believe what doesn't quite click with me yet is why, while the inputs change, the outputs during the evaluation phase don't even change by even 0.0001 ever, which to my understanding shouldn't be mathematically possible, so I guess I'm a little fearful that the issue lies outside of the hyperparameters, but I'm not sure either ^^° It clearly works in the cartpole example, because the output changes depending on which side the pole is falling towards, but my output doesn't change at all @krumiaa

FLOROID · 2022-06-17T11:04:14Z

To further showcase the exact issue, here is a video example for a 25000 Training Episode Session . At around 12 seconds the Training is completed and he switches to the evaluation where suddenly he starts only doing right turns with the same static outputs. During the Training phase he seems to slowly learn to drive without any anomalies as you can see in the first 12 seconds.

From what I understand, when I load up the model it will run however many evaluation episodes I told it to run again but not continue the training where I left off, even though it will still print out the Training episodes. It's just that in the MindMaker executable it doesn't say that training is in progress anymore.

^ 1 is when not loading a model and simply letting it train. 2 is when I load the model and want it to continue training.
Is there any way to load the model and continue training on it? @krumiaa

FLOROID · 2022-06-17T12:00:49Z

So I ran into another weird anomaly :) @krumiaa
If I run MindMaker without loading any model, it does the correct amount of training episodes and the correct amount of evaluation episodes (regardless of whether the evaluation episodes actually represent the trained model in my case)
but if I do load the model, it will run the given number of training episodes and the number of eval episodes MINUS the number of given training episodes as evaluation episodes. Is this intentional? And if yes, why? ^^
On Top of that, if I set the number of Training Episodes to a higher value than the number of eval episodes, it will produce as many training episodes as the input for the eval episodes but no evaluation episodes.
Either way the training episodes do not actually train the agent unless I start from fresh.

I sincerely apologize for the amount of questions. I hope it's not too much.

FLOROID · 2022-06-29T16:52:58Z

sorry to bother, but I still need help with these issues @krumiaa

krumiaa · 2022-06-30T14:33:08Z

When loading a model try set the number of training episodes to zero or visa versa. I think its currently configured for continuous training with the loaded model, rather than just exploitation, thats probably whats causing the odd behavior. If you wanted to load a model only for demonstration, you could go to the python source code included with the examples and dig through, modify as necessary as per the code here

https://stable-baselines.readthedocs.io/en/master/guide/examples.html

Currently working on some other projects related to mindmaker, if this is a mission critical for your project we could discuss the options for having me consult with you on this and modify the source as per your needs.

FLOROID · 2022-06-30T22:31:24Z

Thank you very much @krumiaa I will look into it and let you know if consultation is needed :)
Does that mean I can continue training a model when I load it?

Also - do you have any clues as to why the evaluation phase is always spitting out the same action for me?
The problem doesn't occur in the cartpole example so it must either be an error in my blueprints or a bug.

krumiaa · 2022-07-01T15:27:32Z

yes, right now I believe its setup for continuous training after loading a saved model, rather then exploitation. the line to modify in the source code would be print("Loading the Agent for Continous Training") logmessages = "Loading the Agent for Continous Training" sio.emit('messages', logmessages) obs = env.reset() intaction = 0 #Begin strategic behvaior model.set_env(env) model.learn(total_timesteps=trainepisodes) you would want to use some code like below to evaluate it instead # Load the trained agentmodel = DQN.load("dqn_lunar") # Evaluate the agentmean_reward, std_reward = evaluate_policy(model, model.get_env(), n_eval_episodes=10) # Enjoy trained agentobs = env.reset()for i in range(1000): action, _states = model.predict(obs) obs, rewards, dones, info = env.step(action) env.render() the key difference being model.predict(obs) instead of model.learn

…

On Thu, Jun 30, 2022 at 6:31 PM Florian Dittrich ***@***.***> wrote: Thank you very much @krumiaa <https://github.com/krumiaa> I will look into it and let you know if consultation is needed :) Does that mean I can continue training a model when I load it? — Reply to this email directly, view it on GitHub <#21 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABI6N2NFSNG6SAMZPPSKYHDVRYN4RANCNFSM5YT5XACQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

FLOROID · 2022-07-03T10:17:00Z

Which file do I need open to find this code and where do I need to look for it? I assume it will be local to the project, however there are a lot of files and I'm not sure what file I'm looking for. @krumiaa

FLOROID · 2022-07-07T12:57:30Z

Still needing help with this ^^ I'm very lost in the files haha @krumiaa

FLOROID · 2022-07-11T09:02:43Z

Another quick reminder @krumiaa

krumiaa · 2022-07-11T14:16:43Z

Believe it's in the Content Mindmaker folder under source. Should be the only .py file in the entire project directory if you want to do a search for *.py

…

On Sun, Jul 3, 2022 at 6:17 AM Florian Dittrich ***@***.***> wrote: Which file do I need open to find this code and where do I need to look for it? I assume it will be local to the project, however there are a lot of files and I'm not sure what file I'm looking for. @krumiaa <https://github.com/krumiaa> — Reply to this email directly, view it on GitHub <#21 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABI6N2JCKH2MPCBWZMRJHITVSFSCNANCNFSM5YT5XACQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

FLOROID · 2022-07-13T12:49:29Z

So I'm in my project directory under the MindMaker folder searching for only *.py files but that brings up over 250 files.
Sadly I have no idea what kind of file name I'm looking for ^^° @krumiaa

I also ran this PowerShell command through the directory to search for files containing various search strings such as model.learn or model.predict(obs), but to no avail.
Get-ChildItem -Recurse | Select-String "SeachString" -List | Select Path

krumiaa · 2022-07-14T17:50:40Z

if you download the latest version of mindmaker drl from marketplace and look under Content\MindMaker\Source there is a file called mindmakerUE5.py this is the file you want

FLOROID · 2022-07-14T18:11:44Z

if you download the latest version of mindmaker drl from marketplace and look under Content\MindMaker\Source there is a file called mindmakerUE5.py this is the file you want

I'm currently using UE 4.27 for this project. Will this still work under that version?

I'm a little bit afraid to break something, because UE 4.27 is no longer under the supported Engine Versions on the market place page.
And porting the project to Unreal Engine 5 would most likely break a lot of my code and seems to generally be pretty hard.

krumiaa · 2022-07-15T15:10:40Z

I updated the marketplace listing of mindmaker so it has the 4.27 version, if you download and go to Content\MindMaker\MindMakerSource you will see a file called mindmakerUE4.py, this is the python source file you will want to modify

FLOROID · 2022-07-18T13:09:56Z

I updated the marketplace listing of mindmaker so it has the 4.27 version, if you download and go to Content\MindMaker\MindMakerSource you will see a file called mindmakerUE4.py, this is the python source file you will want to modify

That's wonderful news! I'll try it out now :)

FLOROID · 2022-07-18T14:44:16Z

Migration to a new project worked fantastically. All the code still works and I can now see the mindmakerUE4.py. This is helping me a lot with understanding what's actually going on behind the scenes :)
The thing that I'm curious about is what the UEdone variable is actually used for

I remember seeing it in the struct for sending observations and launching mindmaker, however from how much I can tell so far it's making mindmaker print out this:

Can you elaborate a bit on what this does exactly?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mindmaker is returning the same action for every receiveAction call during Evaluation phase. #21

Mindmaker is returning the same action for every receiveAction call during Evaluation phase. #21

FLOROID commented Jun 13, 2022

krumiaa commented Jun 13, 2022

FLOROID commented Jun 15, 2022 •

edited

Loading

FLOROID commented Jun 15, 2022 •

edited

Loading

FLOROID commented Jun 16, 2022 •

edited

Loading

krumiaa commented Jun 16, 2022 via email

FLOROID commented Jun 17, 2022 •

edited

Loading

FLOROID commented Jun 17, 2022 •

edited

Loading

FLOROID commented Jun 17, 2022 •

edited

Loading

FLOROID commented Jun 29, 2022

krumiaa commented Jun 30, 2022

FLOROID commented Jun 30, 2022 •

edited

Loading

krumiaa commented Jul 1, 2022 via email •

edited

Loading

FLOROID commented Jul 3, 2022

FLOROID commented Jul 7, 2022 •

edited

Loading

FLOROID commented Jul 11, 2022

krumiaa commented Jul 11, 2022 via email

FLOROID commented Jul 13, 2022 •

edited

Loading

krumiaa commented Jul 14, 2022

FLOROID commented Jul 14, 2022 •

edited

Loading

krumiaa commented Jul 15, 2022

FLOROID commented Jul 18, 2022

FLOROID commented Jul 18, 2022

Mindmaker is returning the same action for every receiveAction call during Evaluation phase. #21

Mindmaker is returning the same action for every receiveAction call during Evaluation phase. #21

Comments

FLOROID commented Jun 13, 2022

krumiaa commented Jun 13, 2022

FLOROID commented Jun 15, 2022 • edited Loading

FLOROID commented Jun 15, 2022 • edited Loading

FLOROID commented Jun 16, 2022 • edited Loading

krumiaa commented Jun 16, 2022 via email

FLOROID commented Jun 17, 2022 • edited Loading

FLOROID commented Jun 17, 2022 • edited Loading

FLOROID commented Jun 17, 2022 • edited Loading

FLOROID commented Jun 29, 2022

krumiaa commented Jun 30, 2022

FLOROID commented Jun 30, 2022 • edited Loading

krumiaa commented Jul 1, 2022 via email • edited Loading

FLOROID commented Jul 3, 2022

FLOROID commented Jul 7, 2022 • edited Loading

FLOROID commented Jul 11, 2022

krumiaa commented Jul 11, 2022 via email

FLOROID commented Jul 13, 2022 • edited Loading

krumiaa commented Jul 14, 2022

FLOROID commented Jul 14, 2022 • edited Loading

krumiaa commented Jul 15, 2022

FLOROID commented Jul 18, 2022

FLOROID commented Jul 18, 2022

FLOROID commented Jun 15, 2022 •

edited

Loading

FLOROID commented Jun 15, 2022 •

edited

Loading

FLOROID commented Jun 16, 2022 •

edited

Loading

FLOROID commented Jun 17, 2022 •

edited

Loading

FLOROID commented Jun 17, 2022 •

edited

Loading

FLOROID commented Jun 17, 2022 •

edited

Loading

FLOROID commented Jun 30, 2022 •

edited

Loading

krumiaa commented Jul 1, 2022 via email •

edited

Loading

FLOROID commented Jul 7, 2022 •

edited

Loading

FLOROID commented Jul 13, 2022 •

edited

Loading

FLOROID commented Jul 14, 2022 •

edited

Loading