-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mindmaker is returning the same action for every receiveAction call during Evaluation phase. #21
Comments
it sounds like your agent learned an incorrect strategy, got caught in a local equilibrium rather then discovering whatever global strategy you wanted it to learn. I suggest changing the algorithm, or messing around with the hyperparameters. This is not at all uncommon, finding good hyperparameters can be tricky |
Oh okay, I'll look into it and let you know if I need more help with this. Thank you very much for the paper you sent me, especially since it also focusses on autonomous driving :) By the way, could this have any influence on the results? |
UPDATE: @krumiaa The algorithm correctly produces a general strategy for avoiding a wall in this case. If, for example, I place the car facing a wall, it will produce a forward throttle input and a steering input that will steer him away from the wall in the direction of the road. Any idea what I could try? many thanks in advance - Floroid <3 |
In most of the algorithms in the DRL mindmaker there is an automatic
transition from exploration to exploitation, so you don't necessarily need
to just choose one or the other. Read the docs on SAC for instance
https://stable-baselines.readthedocs.io/en/master/modules/sac.html
As the algorithm trains, it will automatically start choosing more
exploitative actions and less exploring ones. I dont have allot of
experience with self driving car but you could look at something like this
blog and the values they used for epsilon greedy, which controls the
explore exploit trade off.
https://www.inspiritai.com/blogs/ai-blog/2021/9/4/student-blog-project-on-self-driving-cars
…On Thu, Jun 16, 2022 at 1:09 PM Florian Dittrich ***@***.***> wrote:
*UPDATE:*
Okay so I've been changing a few things. Instead of a discrete action
space I am now using a continuous action space of 2 float values between -1
and 1 that will go directly into the throttle and steering inputs of my car.
I figured this would simplify my work a bit more and also let me have some
more insight into what the algorithm is actually producing. As it turns out
this is running into the same issue as before but gave me a lot more
insight.
The algorithm correctly produces a general strategy for avoiding a wall in
this case. If, for example, I place the car facing a wall, it will produce
a forward throttle input and a steering input that will steer him away from
the wall in the direction of the road.
However, the float values of the actions that are produced do not change
at all in the evaluation phase, which to my understanding should be
impossible since the input values are changing simply by the car moving. It
seems to me like the algorithm generalizes the entire dataset into one
singular action which will simply be put out for every frame regardless of
the inputs. This obviously isn't what we're aiming for, so I'm a little bit
confused on how to fix this error since in the cartpole example it
perfectly adapts to new inputs constantly.
Any idea what I could try?
many thanks in advance - Floroid <3
—
Reply to this email directly, view it on GitHub
<#21 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABI6N2JS5ONAL72HFQQFB5LVPNNTZANCNFSM5YT5XACQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
I believe what doesn't quite click with me yet is why, while the inputs change, the outputs during the evaluation phase don't even change by even 0.0001 ever, which to my understanding shouldn't be mathematically possible, so I guess I'm a little fearful that the issue lies outside of the hyperparameters, but I'm not sure either ^^° It clearly works in the cartpole example, because the output changes depending on which side the pole is falling towards, but my output doesn't change at all @krumiaa |
To further showcase the exact issue, here is a video example for a 25000 Training Episode Session . At around 12 seconds the Training is completed and he switches to the evaluation where suddenly he starts only doing right turns with the same static outputs. During the Training phase he seems to slowly learn to drive without any anomalies as you can see in the first 12 seconds. From what I understand, when I load up the model it will run however many evaluation episodes I told it to run again but not continue the training where I left off, even though it will still print out the Training episodes. It's just that in the MindMaker executable it doesn't say that training is in progress anymore. |
So I ran into another weird anomaly :) @krumiaa I sincerely apologize for the amount of questions. I hope it's not too much. |
sorry to bother, but I still need help with these issues @krumiaa |
When loading a model try set the number of training episodes to zero or visa versa. I think its currently configured for continuous training with the loaded model, rather than just exploitation, thats probably whats causing the odd behavior. If you wanted to load a model only for demonstration, you could go to the python source code included with the examples and dig through, modify as necessary as per the code here https://stable-baselines.readthedocs.io/en/master/guide/examples.html Currently working on some other projects related to mindmaker, if this is a mission critical for your project we could discuss the options for having me consult with you on this and modify the source as per your needs. |
Thank you very much @krumiaa I will look into it and let you know if consultation is needed :) Also - do you have any clues as to why the evaluation phase is always spitting out the same action for me? |
yes, right now I believe its setup for continuous training after loading a
saved model, rather then exploitation. the line to modify in the source
code would be
print("Loading the Agent for Continous Training")
logmessages = "Loading the Agent for Continous Training"
sio.emit('messages', logmessages)
obs = env.reset()
intaction = 0
#Begin strategic behvaior
model.set_env(env)
model.learn(total_timesteps=trainepisodes)
you would want to use some code like below to evaluate it instead
# Load the trained agentmodel = DQN.load("dqn_lunar")
# Evaluate the agentmean_reward, std_reward = evaluate_policy(model,
model.get_env(), n_eval_episodes=10)
# Enjoy trained agentobs = env.reset()for i in range(1000):
action, _states = model.predict(obs)
obs, rewards, dones, info = env.step(action)
env.render()
the key difference being model.predict(obs) instead of model.learn
…On Thu, Jun 30, 2022 at 6:31 PM Florian Dittrich ***@***.***> wrote:
Thank you very much @krumiaa <https://github.com/krumiaa> I will look
into it and let you know if consultation is needed :)
Does that mean I can continue training a model when I load it?
—
Reply to this email directly, view it on GitHub
<#21 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABI6N2NFSNG6SAMZPPSKYHDVRYN4RANCNFSM5YT5XACQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Which file do I need open to find this code and where do I need to look for it? I assume it will be local to the project, however there are a lot of files and I'm not sure what file I'm looking for. @krumiaa |
Still needing help with this ^^ I'm very lost in the files haha @krumiaa |
Another quick reminder @krumiaa |
Believe it's in the Content Mindmaker folder under source. Should be the
only .py file in the entire project directory if you want to do a search
for *.py
…On Sun, Jul 3, 2022 at 6:17 AM Florian Dittrich ***@***.***> wrote:
Which file do I need open to find this code and where do I need to look
for it? I assume it will be local to the project, however there are a lot
of files and I'm not sure what file I'm looking for. @krumiaa
<https://github.com/krumiaa>
—
Reply to this email directly, view it on GitHub
<#21 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABI6N2JCKH2MPCBWZMRJHITVSFSCNANCNFSM5YT5XACQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
So I'm in my project directory under the MindMaker folder searching for only *.py files but that brings up over 250 files. I also ran this PowerShell command through the directory to search for files containing various search strings such as model.learn or model.predict(obs), but to no avail. |
if you download the latest version of mindmaker drl from marketplace and look under Content\MindMaker\Source there is a file called mindmakerUE5.py this is the file you want |
I'm currently using UE 4.27 for this project. Will this still work under that version? I'm a little bit afraid to break something, because UE 4.27 is no longer under the supported Engine Versions on the market place page. |
I updated the marketplace listing of mindmaker so it has the 4.27 version, if you download and go to Content\MindMaker\MindMakerSource you will see a file called mindmakerUE4.py, this is the python source file you will want to modify |
That's wonderful news! I'll try it out now :) |
I've set everything up in my project now and the agent is training without any errors, however when the Evaluation phase starts, he will continue executing the same action he executed in the first evaluation episode for every evaluation episode until the evaluation is done.
What could be causing this and how could I go about fixing this issue? Until I can actually properly evaluate what the agent is learning I can't really make use of the training, so this would be very crucial to fix for me.
The text was updated successfully, but these errors were encountered: