flappybird_q_learning

This environment contains 44 million + possible state action paris. The environment is dynamic and random meaning that it could take billions of itterations for the agent to explore each state action pair. I did not wait until convergence due to the shear magnitude of time needed to wait for convergence.

Three facts are known: The q values updates are correct value wise. The q table is being updated in the correct location. Due to the above to facts. Convergence is inevitable.

The movements of the agent made me implement a method of training the agent through playing the game myself. This meant I was able to ensure the state actions pairs that yeild the highest reward are updated early on allowing for better learning. This is moreso a proof of concept that the implementation is correct rather then a real life solution that is optimal and the best one for the problem.

Images can be found in the repo.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
base.png		base.png
bg.png		bg.png
bird1.png		bird1.png
bird2.png		bird2.png
bird3.png		bird3.png
main.cpp		main.cpp
pipe.png		pipe.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

flappybird_q_learning

About

Releases

Packages

Languages

hackerman600/flappybird_q_learning

Folders and files

Latest commit

History

Repository files navigation

flappybird_q_learning

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages