Skip to content

mshik3/MountainCar-v0

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MountainCar v0 solution

Solution to the OpenAI Gym environment of the MountainCar through Deep Q-Learning

Background

OpenAI offers a toolkit for practicing and implementing Deep Q-Learning algorithms. (http://gym.openai.com/) This is my implementation of the MountainCar-v0 environment. This environment has a small cart stuck in a trench. The cart needs to get to the flag on top of the crest to gain points and the faster it learns to do this, it gains more points. The cart can go left and right, with any variation of speed. Once the cart performs an action, the environment provides it a reward and tells it where the cart is at this point.

This model basically learns to randomly perform actions until it recognizes the actions that give it a higher score. You can watch it learn what to do by watching the score for each episode.

Results

Results can be found in train_results.log and test_results.log for the train and test, respectively.

Training

For the training, I set a threshold of -110 for an average score of the mountain car. The mountain car gets a score of -200 per episode if it doesn't reach the flag. It gets a small boost to its score if it reaches the flag. And it gets more and more points if it gets to the flag fast. I modeled the reward function (reward + gamma * np.max(next_Q_target)) to train the MountainCar to get to the flag as fast as possible.

Episode 743	Time Taken: 31.48 sec	Score: -109.00	State: 0.505025593192	Average Q-Target: -14.1120	Epsilon: 0.001	Average Score: -110.03	
Episode 744	Time Taken: 36.07 sec	Score: -125.00	State: 0.501472864486	Average Q-Target: -60.9826	Epsilon: 0.001	Average Score: -110.14	
Episode 745	Time Taken: 30.07 sec	Score: -104.00	State: 0.521960339024	Average Q-Target: -56.2346	Epsilon: 0.001	Average Score: -110.31	
Episode 746	Time Taken: 26.00 sec	Score: -90.00	State: 0.510257725214	Average Q-Target: -36.6640	Epsilon: 0.001	Average Score: -110.15	
Episode 747	Time Taken: 34.39 sec	Score: -119.00	State: 0.536389897949	Average Q-Target: -11.7476	Epsilon: 0.001	Average Score: -109.70	
Episode 748	Time Taken: 32.40 sec	Score: -112.00	State: 0.520691140417	Average Q-Target: -44.0669	Epsilon: 0.001	Average Score: -109.76	
Episode 749	Time Taken: 31.25 sec	Score: -108.00	State: 0.501927383055	Average Q-Target: -57.2569	Epsilon: 0.001	Average Score: -109.75	
Episode 750	Time Taken: 35.22 sec	Score: -122.00	State: 0.507334402534	Average Q-Target: -46.2420	Epsilon: 0.001	Average Score: -109.35	
Episode 751	Time Taken: 32.34 sec	Score: -112.00	State: 0.519616429384	Average Q-Target: -59.1135	Epsilon: 0.001	Average Score: -109.60	
Model training finished! 
Average Score over last 100 episodes: -109.6	Number of Episodes: 751

Training scores plot

Testing

Iteration: 92	Score: -105.0
Iteration: 93	Score: -105.0
Iteration: 94	Score: -105.0
Iteration: 95	Score: -105.0
Iteration: 96	Score: -104.0
Iteration: 97	Score: -105.0
Iteration: 98	Score: -105.0
Iteration: 99	Score: -105.0
Iteration: 100	Score: -103.0
Total Avg. Score over 100 consecutive iterations : -102.84
Agent finished test within expected reward boundary! Environment is solved.

Testing scores plot Testing gif

Learning

This was my first project with Deep Q Learning after my Udacity course so this was very interesting to model and plan. I took some inspirations from github user "harshitandro" to start out and get my feet wet. I enjoyed seeing the training progress and tweaking with the input parameters to get this to work. The MountainCar showed me how a complex learning algorithm in a continuous space could be developed through Deep Q Learning instead of arduous man hours by developers.

About

Solution to the OpenAI Gym environment of the MountainCar through Deep Q-Learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published