# COGS 118B - Final Project

## Reinforcement Learning & OpenAI Gym for Playable Pokemon Battle AI Agent

## Group members

- Karun Mokha
- Ke Zhang
- Pranjli Khana

# Abstract 
Our project aims to develop an AI agent capable of competitively playing Pokemon battles at varying levels of difficulty. We will create a battle environment within OpenAI Gym, leveraging existing open source battle simulators so we focus on developing and tuning our AI and do not get bogged down by the intricacies of the game mechanics. In the context of Pokemon battles, using AI to simulate numerous possible battle sequences can help the AI learn and determine the best move and strategy to maximize the odds of winning. Our idea is to use a technique like RL, Monte Carlo Search Tree, or Proximal Policy Optimization, where the AI evaluates potential outcomes based on each turn during battle and refines its strategy based on simulation. Essentially choosing the move that moves towards the most winning-likely scenarios based on the game state, feature engineering, and reward shaping in which we implement. Performance will be measured through the AI’s win rate in a fixed number of test battles against scripted and human opponents, the number of moves chosen by the AI which are super effective against the opponent, and adaptability-measured by the change in win rate against different scripted opponents.

# Background

Since Pokémon’s video game debut Pokémon Red, Blue, and Yellow in 1996 on the Game Boy, there have existed artificially intelligent characters that users encountered and battled against on their adventure. These in-game AI opponents followed simple, rule-based strategies to select moves and switch Pokémon, providing a baseline level of challenge for players. However, these early AI implementations were limited in their strategic depth and adaptability.

The field of artificial intelligence for game playing has seen significant advancements over the years, with notable successes in complex games such as chess, Go, and Dota 2. DeepMind's AlphaGo demonstrated the power of reinforcement learning and Monte Carlo Tree Search (MCTS) in mastering the game of Go, a game with a vast state space and deep strategic complexity. AlphaGo's success was achieved by combining deep neural networks with MCTS, allowing the AI to evaluate and plan moves by simulating future states and backpropagating results through a search tree.

OpenAI developed AI agents capable of playing the multiplayer online battle arena game Dota 2 at a competitive level. These agents used a combination of reinforcement learning techniques to learn complex strategies and adapt to dynamic game environments.

Applying these advanced AI techniques to Pokémon battles introduces unique challenges due to the game's specific mechanics, such as type advantages, move sets, and turn-based strategies. Recent research in this area has explored various approaches to developing competitive AI for Pokémon battles.

For example, a project by Stanford University implemented a state-based AI algorithm for Pokémon battles using minimax and Monte Carlo methods. The Stanford project utilized Monte Carlo Tree Search (MCTS) to simulate numerous future battle sequences and select optimal moves based on the outcomes of these simulations. MCTS, combined with an evaluation function that considers the health of Pokémon, allowed the AI to make informed decisions and adapt its strategy based on the game state.

Additionally, statistical data on Generation 1 Pokémon, available from sources like Kaggle, provides a comprehensive foundation for modeling the game accurately. This data includes critical information about Pokémon stats, moves, and type interactions, which are essential for developing a realistic and competitive AI.

We aim to build upon this prior work by developing an AI agent that leverages MCTS and reinforcement learning that not only excels in strategic decision-making but also adapts to different opponents, providing a robust and challenging experience for players.

# Problem Statement

The primary problem we aim to solve is developing an AI agent capable of competitively playing Pokémon battles at varying levels of difficulty. This problem involves creating an AI that can make strategic decisions during battles to maximize its chances of winning.

Our problem has multiple ML-relevant solutions, including Reinforcement Learning, Monte Carlo Search Tree, and Proximal Policy Optimization. The problem, and battles are quantifiable, as they can be expressed entirely in mathematical or logical form, through State Representation of the battle (stats, HP, condition of the pokemon), the Action Space (all possible moves the AI can take), and through its reward function. Our problem can be measured through win rate, move effectiveness, and adaptability against different opponents. It is also replicable as the problem occurs more than one, and each battle can be simulated with the same or different pokemon to ensure reliability and validity of results. It can also be trained repeatedly due to our use of OpenAI gym, making our experiments and results more reproducible.

# Data

We will utilize two key datasets available on Kaggle to support our AI development:

- [Pokemon Dataset with Team Combat](https://www.kaggle.com/datasets/tuannguyenvananh/pokemon-dataset-with-team-combat): This dataset includes information on individual Pokémon statistics and their performance in both 1 vs 1 and 6 vs 6 team combats. It contains four CSV files detailing various aspects of Pokémon battles.
- [Pokemon with Stats](https://www.kaggle.com/datasets/abcsds/pokemon): This dataset provides detailed information on 721 Pokémon, including their basic stats and types, which is essential for understanding how Pokémon attributes affect their battle performance.

# Proposed Solution

Our solution involves developing an AI agent using a combination of Monte Carlo Tree Search (MCTS) and Reinforcement Learning (RL) to effectively play Pokémon battles by making strategic decisions to maximize its chances of winning.

Algorithmic Description:
State Representation:
Each game state will include the current Pokémon's stats (HP, Attack, Defense, Speed, etc.), status conditions (e.g., Burned, Paralyzed), move details (type, power, accuracy), and type matchups against the opponent's Pokémon.
Action Space:
The AI can choose from all available moves for the current Pokémon or switch to another Pokémon. These actions are represented in a discrete action space.
Reward Function:
The reward function will be designed to encourage effective gameplay. Rewards will be given for actions that lead to favorable outcomes, such as dealing super effective damage or causing a status condition. Penalties will be applied for ineffective actions, such as using moves that are not very effective.
Monte Carlo Search Tree:
MCTS will be used to explore the potential future states of the game by simulating multiple possible sequences of actions. At each turn, the AI will use MCTS to predict the outcomes of different move sequences, choosing the move that leads to the most favorable predicted outcome. MCTS will help the AI consider the long-term effects of its actions, providing a deeper strategic insight.
Reinforcement Learning with Proximal Policy Optimization:
We will implement the PPO algorithm using the Stable Baselines3 library in Python. The PPO algorithm works by optimizing a clipped surrogate objective function, which helps to balance exploration and exploitation, and ensures stable policy updates.

# Ethics & Privacy

The main goal of our project is to develop an AI agent using MCTS and RL techniques to make strategic decisions to maximize winning expectation of Pokémon battles. While we also seriously consider ethical and privacy aspects in our project.
Data privacy & consent: Data Privacy & Consent:
Since our project utilizes battle data from various sources, which may include sensitive information regarding the game and its users, we will ensure that all data is anonymized and handled in compliance with data privacy regulations. We will strictly adhere to data protection laws, ensuring the privacy and consent of all individuals involved in the data collection process.
Bias and Representation:
The dataset used in our project might contain inherent biases due to factors such as cultural preferences, language, and geographical distribution. We acknowledge that the initial dataset may not equally represent all types of Pokemon battles or strategies. To address this, we will actively work to minimize and mitigate such biases, striving to create a more balanced and inclusive AI model that can cater to diverse battling styles and preferences.
Generalization and Impact:
Our project dataset is limited, but we aim to develop methods to generalize the AI's performance across various battle scenarios and Pokemon types. We will take care to avoid overgeneralization, which could misrepresent certain battle strategies or Pokemon characteristics. Our goal is to ensure that the AI provides recommendations and strategies that benefit a broad spectrum of players and battle conditions.
Transparency and Accountability:
In line with ethical research practices, we commit to full transparency regarding our methodologies, data sources, and findings. We will document and share our research processes, ensuring that our work can be reviewed, reproduced, and validated by others in the community. This transparency will help us maintain accountability and foster trust in our project's outcomes.

# Team Expectations

We expect each member to keep the other members updated on their progress and let us know in advance if they are unable to complete their work.

We will regularly meet at the planned times and will let the other members know if something prevents us from attending.

Each member will complete their portion of the work on time and let the other members know if the load is too much work for them.

We will respect each member and listen to each other regardless of our differences.

# Project Timeline Proposal
Meet Date
Meet Time
Completed Before Meeting
Discuss at Meeting
5/9
5 PM
Brainstorm topics/questions
Determine best way of communication, discuss and finalize project idea, discuss problem/solution, determine best days and time for meetings
5/14
3 PM
Do background research on topic
Discuss and finalize the ideal dataset that may be used to the project, begin draft of project proposal and split independent work,
5/17
5PM
Edit proposal
Discuss and agree as a group on team expectations and team calendar, go through each part of proposal together and make any final edits, finalize and submit proposal
5/20
5PM
Begin cleaning the dataset only including the battle related data
Discuss and assign group members to lead specific parts of project, discuss analysis plan
6/01
5PM
Begin programming for project
Discuss and edit code, discuss what are the final steps for the project code
6/07
5PM
Finalize results, draft results/conclusion/discussion
Discuss and split the remaining parts from the project for individual work
6/11
5PM
Review and finalize project
Turn in Final Project

# Footnotes
Barradas, A. Pokemon with stats. Retrieved May 17, 2024 from https://www.kaggle.com/datasets/abcsds/pokemon/data.
Nguyen Van Anh, T. Pokemon Dataset with Team Combat. Retrieved May 17, 2024 from https://www.kaggle.com/datasets/tuannguyenvananh/pokemon-dataset-with-team-combat.
Silver, D., et al. "Mastering the game of Go with deep neural networks and tree search." Nature 529.7587 (2016): 484-489.

OpenAI. "OpenAI Five: The world’s first AI to defeat a team of professionals at Dota 2." OpenAI, 2018.

Kaggle dataset on Generation 1 Pokémon: https://www.kaggle.com/abcsds/pokemon