A genetic algorithm implement with chrome's dino game.
中文版介绍请参见此处。English version is translated from Chinese version.
Recently I watched the videos made by Code Bullet about AIs learning to do something. Here is the tutorial of the algorithm used in those videos, which is made by Code Bullet. It's pretty good, I highly recommend to watch those.
This project was inspired by his video 「AI learns to play Google Chrome Dinosaur Game || Can you beat it??」. You can find his code in the video's description. I made a simple version of it, also has some improvements.
I try to copy the Chrome's dino game, which you can play at
chrome://dino if you are a chrome user. It's looks like this:
The height and width of dino and obstacles are find in the Code Bullet's project. The rest are chosen by my experience. That not means perfect, but only workable.
Here I use the Newton's second law with no air resistance and gravity are constant in different height. The time of each frame are represented by
deltaT in class
Game. And the speed of obstacles are same as ground, which is defined by both base speed(represented by
groundBaseSpeed in class
Game) and speed correction(
speedFix in class
speedFix will added by 0.002 after each frame. The limitation is 1.3 times of base speed.
The jump of dino is implemented by giving velocity. Big jump and small jump have different velocity, which is constant during the game.
In the game, dino is represented by a black block. And as it's name, it is defined by class
Dino. Each dino has functions naming
setDown(status)(status is true when dino needs sit down, false means needn't) to control the movements of it. In each frame, dino's
move() function will be called to update it's position.
There are 2 kinds of obstacles: cactus (
Obstacle class) and birds(
Bird class). Here are represented by red blocks. The
move() function will be called every frame to update it's position. The function
collided(dino) will judge if the given dino is collided with the obstacles. Returning
true if collided. This part I referred Code Bullet's code.
Defined by class
Player. Each player has it's own dino, representing the character in the game. They also has a
Brain object, which taking the responsibility of making a decision. When a player need one, they call the function
brain will compute based on the current position of dino and obstacles, then give the answer. After every movements, game will check if dino is collided with obstacles, if it does, the player will be marked with game over.
Brain has a MLP with 11 inputs(The first obstacle's position(x and y) and size(width and height), the speed of game, height of dino, the second obstacle's position and size, and a bias) and 3 outputs(if need jump, big or small, if need to sit down). There is also a hidden layer with 5 neurons. All neurons are Sigmoid neurons. You have to call the
randomWeight() function in the firsts generation to randomize the weight and bias, which are offered by the
nextDouble() function returning a random double value between $\pm10$. When a player need a decision, it calls
brain.lookAround(obstacles, dino) first to input the MLP with the status of dino and obstacles. Then the function
brain.computeAction() will be called to compute the output. The output bigger than 0.8 is treated as true.
When every player is marked with game over(
true), the current generation is end. The program will call the function
game.naturalSelection() to generate next generation. It will find the player with max score, then clone it into the next generation. To keep the population between generations, the rest will be clone by the parent chosen by function
game.selectParent(), based on score. The higher score gets higher chance. And finally the function
game.mutateAndPrepareNextGeneration(rate) will be called. Every weight of each player's brain and bias will have a chance(the rate you given) to mutate. When mutate, the corresponding weight will be randomize. And the function will also reset the speed and obstacles.
The program will simulate players evolution: Each generation will has a mount of players. Deal to it's random weights, the action will be unpredictable. But there are always some players will go further by lucky. And program will choose those lucky players to next generation and add some mutations. Again and again, the player will be more and more skilled in general. And finally in a generation players will have the ability to continually playing the game without game over. Then we can conclude that the AI has learnt how to play Chrome's dino.
In the real world...
Principle is principle, the real world is another speaking. The process involved random, which is very unpredictable. In my experience, it finished learning at the 23 generation. But then it didn't finish at 80+ generation. Today when I am writing the document, it finished at 30 generation, then it still stuck at 80 generation. The last time it finished at 53 generation. And the result is also different: 30 generation one can jump and sit down, the 53 gen one can only jump over the obstacles.
There for, if the best score of 20+ generation is still less than 10, you probably restart it. If it still not finish after 60 generation, I think it hopeless to finish learning.
You can also make it finish faster by changing some arguments. For example adding mutation rate to create more possibilities to have a workable player. But this may also result in the best player in last generation is eliminated after mutating.