Centipede AI Project (Deep Q-Network RL)

This project focuses on building an intelligent agent capable of playing the Centipede video game within the Gymnasium (previously openAI Gym) environment with unsupervised learning. The agent selects actions based on a small neural network constructed using TensorFlow and Keras, trained with Deep Q-Network (DQN) reinforcement learning.

Project Overview / Motivation

Unlike the original DeepMind paper, which used a shared Convolutional Neural Network (CNN) architecture across all Atari games, I was curious to see if a much smaller NN would accomplish the job in one particular game (if fed rich customized parameters). I chose to extract a custom state consisting of 18 parameters (location of player,spider,centipedes), and this state representation was then fed into a small neural network consisting of only 3 fully connected layers of 64 x 32 x 32 neurons respectively (~65k parameters). The architecture of the network was determined through trial and error, aiming for a balance between complexity and performance.

Repository Contents

test_agent.py: Can be run to test the agent. It creates a gymnasium environment, an agent, and renders a window with the game being played.
agent.py: Acts as the agent, loads the pre-trained model and interacts with the env.
state_extractor.py: Is used by the agent to extract a low dimensional state from the environment, to feed into the model.
DQN_training.py: Was used to train the agent. It implements a DQN RL approach (observe, learn, repeat, test) to train the model iteratively.
'gifs_imgs_txts/': Has some extra resources; gifs showing the agents in action, as well as a jupiter notebook with graphs of the training process.

Training Process

Training the agent was a challenging and iterative process. The Deep Q-Learning algorithm, while powerful, can be finicky to get working optimally. The training process was not linear, and I encountered various obstacles and challenges along the way. It required careful tuning of hyperparameters, exploration strategies, and network architecture to achieve satisfactory performance. A more in-depth analysis can be found in the 'graficos_y_comentarios.ipynb'.

Results

Agent Before Training

Agent After Training

After several nights of training, the best performing agent achieves significantly higher scores than a random agent (~8400 vs ~2500 pts on average). Although the scores are not nearly as good as the ones Deepmind achieved, the agent visibly demonstrates the ability to learn and adapt to the game environment. It correctly learns to follow the centipedes location and avoid spiders, on the other hand it lacks the ability to dodge the centipede once it reaches the bottom. The project serves as a valuable learning experience in reinforcement learning and deep learning, I was surprised with the possibilities that can be achieved with totally unsupervised learning!

Future Work

In the future, there is potential to further improve the agent's performance by exploring additional reinforcement learning algorithms like experience replay on human games, fine-tuning hyperparameters, or experimenting with CNN architectures. Additionally, integrating advanced techniques such as prioritized replay could enhance the stability and efficiency of the training process.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
all_first_models		all_first_models
gifs_imgs_txts		gifs_imgs_txts
.gitattributes		.gitattributes
DQN_training.py		DQN_training.py
README.md		README.md
agent.py		agent.py
state_extractor.py		state_extractor.py
test_agent.py		test_agent.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Centipede AI Project (Deep Q-Network RL)

Project Overview / Motivation

Repository Contents

Training Process

Results

Agent Before Training

Agent After Training

Future Work

About

Releases

Packages

Languages

ggorondi/centipede_AI

Folders and files

Latest commit

History

Repository files navigation

Centipede AI Project (Deep Q-Network RL)

Project Overview / Motivation

Repository Contents

Training Process

Results

Agent Before Training

Agent After Training

Future Work

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages