# COGS 118B - Final Project

# ChessMindsAI

## Group members

- Shihua Yang
- Zhiheng Wu
- Ha Duong

# Abstract 
This project implements and compares four different methods to chess artificial intelligence: minimax with alpha-beta pruning, Monte Carlo Tree Search (MCTS), neural network evaluation, and a hybrid approach which combines neural networks with MCTS. We developed a comprehensive chess environment to evaluate these methods objectively through a tournament play, computational efficiency. Our results show that the NeuralNet approach achieves the strongest performance which has a win rate of 28%. Pure minimax shows exceptional stability with the highest number of draws. MCTS performed poorly in limited-iteration situations and it showed the importance of sufficient exploration. The hybrid approach showed promising results but showed training instability. These findings show the strengths of combinng traditional search algorithms and modern machine learning techniques and suggests that combining these approaches can produce the best chess AI under resource constraints.

# Background

Chess always has been a main challenge in AI. Early methods mainly relied on minimax search and hand-created evaluation functions<a name="1"></a>[<sup>[1]</sup>](#1). Eventually IBM's Deep Blue defeated world champion Garry Kasparov in 1997 <a name="2"></a>[<sup>[2]</sup>](#2). The world of computer chess changed dramatically when the Monte Carlo Tree Search (MCTS) was introduced in 2006. MCTS has several advantages over traditional minimax: it requires no domain-specific evaluation functions, scales well with available computation, and can be effectively combined with machine learning methods. This gives DeepMind's AlphaZero a foundation to build on in 2017. This demonstrated that a neural network combined with MCTS can achieve significant performance through pure self-play without any human knowledge like the rules of chess<a name="3"></a>[<sup>[3]</sup>](#3). Recent work focused on creating more computational efficient hybrid methods. The Leela Chess Zero project demonstrated that open-source implementations could achieve AlphaZero's success and could show how neural networks could be effectively combined with traditional alpha-beta search<a name="4"></a>[<sup>[4]</sup>](#4). The success of these different methods has generated interest in comparing their relative strengths and weaknesses in controlled environments, especially in resource-limited situations.

# Problem Statement

This project addresses the problem of implementing and comparing different AI algorithms for playing chess. We mainly focus on four different methods: minimax with alpha-beta pruning, MCTS, neural network evaluation, and hybrid approach which combines neural networks with MCTS. This problem can be evaluated through two well-defined metrics:

(1) Playing strength measured by win rates and tournament performance


(2) Computational efficiency measured in nodes explored per second and time per move

With our tournament system, problems are measured intuitively. The system automatically plays the game with the same parameters across all agent combinations. Our solution includes both traditional AI search techniques and modern machine learning methods. Thus they can be directly compared under the same conditions and with the same evaluation criteria.

# Data

Unlike many ML projects that use static datasets, our project primarily generates data through self-play and agent interactions. The primary data source is the state space of chess positions and move sequences generated during agent gameplay and training.

Self-generated data:

1. Training data for neural network: Generated through self-play during the PPO training process. Each game have sequences of board states, actions, rewards, and game outcomes.

2. Tournament results: Generated from systematic matchups between all agent combinations, while tracking wins, losses, draws, and detailed game statistics.

3. Computational performance metrics: Nodes explored, time per move, and efficiency measurements for each agent.

Data representation:

Our chess environment (chess_env.py) represents the chess state in several formats:

1. 8x8x14 tensor representation: Each position is encoded as an 8×8 board with 14 channels (6 piece types × 2 colors and 2 auxiliary channels)

2. FEN string: Standard chess notation for board positions

3. Move sequences: Stored in UCI format (ex. e2e4)

4. PGN files: Complete games recorded in Portable Game Notation format

The GameRecorder class (in game_recorder.py) records all games in standard PGN format. It stores metadata like player names, dates, and results along with the complete move sequences. These PGN files serve multiple purposes:

1. Permanent storage of game records for later analysis
2. Input for our visualization tools (pgn_visualizer.py)
3. Source data for GIF creation of interesting games (create_gifs.py)

When training the neural network, we accumulated approximately:

50 episodes of training

~50,000 board positions

~5,000 complete games

For evaluation, our tournament generated:

120 total games (10 games per matchup between 4 agents), a comprehensive tournament PGN file containing all games, and detailed statistics on win rates, draw rates, and game lengths


# Proposed Solution

Our solution implements four distinct chess-playing approaches, all designed to operate within a common environment for fair comparison.

We developed a minimax agent with alpha-beta pruning that searches the game tree to a configurable depth (default: 4 ply). The MinimaxAgent implements optimizations including alpha-beta pruning to eliminate irrelevant branches, iterative deepening for timely responses, and move ordering to maximize efficiency. Position evaluation uses a straightforward material-based function with standard piece values, providing a simple but effective heuristic.

Our second approach explores Monte Carlo Tree Search (MCTS), which builds a statistical model of the game tree. The MCTSAgent balances exploration versus exploitation using the UCB1 formula, conducting random playouts to estimate position values rather than using complex evaluation functions. As simulations accumulate, the search gradually focuses on promising lines while including timeout handling for real-time performance.

The third approach utilizes deep learning with our NeuralAgent. This agent employs a convolutional neural network architecture with separate policy and value heads. Board states are represented as 8×8×14 tensors, with the policy head outputting move probabilities and the value head evaluating positions. We trained this network using Proximal Policy Optimization (PPO) through self-play, allowing the agent to learn chess strategy without human knowledge beyond the rules.

Our final approach is the HybridAgent, which combines MCTS with neural guidance. This agent uses the neural network's policy output to guide tree expansion, replacing random playouts with neural value predictions for leaf node evaluation. This hybrid approach balances traditional search with learned knowledge, typically requiring fewer iterations than pure MCTS to achieve superior performance.

All agents operate within the same ChessEnvironment wrapper, ensuring consistent interfaces and fair comparison across these diverse approaches to chess AI.

# Evaluation Metrics

We evaluated our chess agents using three primary metrics:

1.Tournament Performance

We conducted a tournament where each agent played against all others multiple times, and also altered colors. 
Key metrics included:

1. Win rate: Percentage of games won

2. Draw rate: Percentage of games ending in a draw

3. Loss rate: Percentage of games lost

4. Points: Standard chess scoring (1 point for win, 0.5 for draw, 0 for loss)

This comparison provides direct evidence of relative playing strength.

2.Computational Efficiency

We measured computational efficiency through:

1. Nodes per second: Number of positions evaluated per second

2. Time per move: Average time spent deciding each move

3. Memory usage: Peak memory consumption during play

These metrics reflect how well each algorithm utilizes available resources.

# Results

### Tournament Performance

Our tournament consisted of 120 games (10 games per matchup between 4 agents). The results reveal notable performance differences between approaches:

![Tournament Results](Image 3)

Key observations:

Minimax showed remarkable stability. It achieves the highest number of draws (60) and has zero losses, but also secured no wins.

Hybrid demonstrated the strongest overall performance with 18 wins, 42 draws, and 0 losses, resulting in the highest point total.

Neural performed well with 8 wins, 49 draws, and 3 loss.

MCTS struggled significantly, recording no wins, 37 draws, and 23 losses.

The tournament standings based on points (win=1, draw=0.5, loss=0):

Hybrid: 39 points

Neural: 32.5 points

Minimax: 30 points

MCTS: 18.5 points

### Win Rate Matrix Analysis

The pairwise win rate matrix provides deeper insight into specific matchup dynamics:

![Win Rate Matrix](Image 1)

This matrix shows each agent's performance against specific opponents.

Notable patterns:

Hybrid performed exceptionally well against MCTS (0.84 win rate)

Neural showed balanced strength against Minimax and MCTS (0.77 win rate)

Minimax maintained moderate success against MCTS (0.56)

MCTS struggled against all opponents, particularly against Hybrid (0.84 loss rate)

### Neural Network Training Progress
The neural network training showed interesting patterns:

![Training Progress](Image 2)

The training curve reveals:

High volatility in raw rewards (light blue line)

Gradual improvement in the smoothed reward trend (dark blue line)

Persistent negative rewards indicating training challenges

No clear convergence after 50 episodes

### Computational Efficiency
Performance metrics revealed significant differences in computational approaches:

| Agent  | Nodes/Second | Time/Move (s) | Memory Usage |
|---|---|---|---|
| Minimax  |  17820.02 |  0.1227  | Medium | 
| MCTS  |  1048.83 |  0.4767 | Low | 
| Neural  | 429.92  | 0.9304  | High   |
| Hybrid  | 650.02  | 0.6564 | High |

The minimax agent demonstrated exceptional efficiency, processing nodes nearly 40 times faster than the neural agent and 17 times faster than MCTS. These differences reflect the fundamental tradeoffs between simple evaluation (minimax) versus complex but potentially more insightful evaluation (neural networks).

# Discussion

### Interpreting the result

Our research shows several striking insights of chess AI methods:

1. The Advantage of Hybrid Methods: 
The hybrid agent’s great performance proved the advantage of combining traditional searching with neural guidance. By using neural networks to guide the MCTS process, the hybrid agent made wiser decisions by reaching less nodes than the pure MCTS. This proves the crucial prerequisite of systems like AlphaZero, simultaneously showing that it can even play a role in a smaller scale to pass limited training.

2. The Stability of Minimax: 
Although it’s really easy, Minimax taking Alpha-bet pruning shows striking stability, avoiding any losses during the whole tournament. Its high draw rate shows that simple evaluation methods could have fair moves if it’s combined with enough searching depth. This proves that this method can achieve great success in systems like DeepBlue.

3. The Challenge faced by Neural Networks: 
Although neural agent shows instability during the training, it still shows great performance. Our analysis shows that neural networks have learned effective chess methods, but stuck in consistent evaluation. The negative rewards in the whole training process show that it is difficult to develop a stable self-play training environment. However, its performance in the tournament still shows some potential of learned representations.

4. The Limitation of MCTS: 
It’s surprising to see the poor performance of pure MCTS. But it  still helps us get insights. Because of limited iterations, 1000 per move, MCTS cannot effectively explore the game tree, especially in complex middle of the games. This exhibits that MCTS needs a large amount of computing resources to show equal competence as other methods. So it also explains that why modern systems usually combines MCTS with neural guidance.

### Limitations

1. Limited Computational Resources: The performance of MCTS and neural network agent could be impacted severely by limited computational resources. In particular, MCTS requires a large number of iterations and then can explore the game tree effectively, but our computing resources does not allow it to reach the ideal.
2. Limited Training Data: The neural network and hybrid agents were trained just on a relatively small dataset of approximately 50,000 board positions and 5,000 games. This could be really small because even experienced human players could reach such training amounts, so the AI methods as machines do not reach its potential during our training, so has a limited understanding of the real chess game.
3. Simple Evaluation Metrics: The evaluation metrics are just too simple to measure complicated games in the real world. When chess AI is measured by more complex and deeper strategic assessments, its understanding of real chess game can be exposed more in detail.


### Future work
1. Hybrid Algorithm Development: Hybrid Algorithm Development: Based on the limitations of individual algorithms, future work could focus on developing a hybrid method combining Minimax, MCTS and neural networks. For example, neural networks can be used to guide the search processes of Minimax or MCTS. Or, Minimax and MCTS can be used to generate high quality training data for neural networks.
2. Evaluation Metrics Improvement:  The current evaluation metrics, such as win rates, computational efficiency, and Top-1 Accuracy, provide a foundational understanding towards algorithm’s performance. However, we can develop more complicated evaluation metrics, to measure strategic depth, adaptability to different opponents, and long term planning capabilities.
3. Real World Deployment: In order to better assess the practicality of the algorithms, we could deploy them in chess games in real world or train human players. This could help discover any possibilities happening in the real world, and improve them right away.
4. Computing Resources Endeavor: This issue could not be addressed in the short term, but we will train again once we get more computing resources.

### Ethics & Privacy

The development of chess AI seems simple, but causes subtle ethical concerns. Our agents had developed unique playing styles, exhibiting that although in fields based on rules, algorithmic bias also plays a role. This raises a question about that how AI systems develop their own “tendencies” without explicit programming.

Our methods uncovered the resource inequality in AI research, from our computational differences. Neural methods and hybrid methods require strong computing power, which could limit technicians who participate in promoting the technologies and benefit the institutions with plenty of resources.

In the wake of that chess AI surpass human, the world of human chess also changes. Players more frequently take advantage of AI games to research and imitate, which could influence the creativity of human in this field. The relationship between human and machine continues to change over time, impacting the chess education and professional games.

Models like neural networks lack the transparency like traditional algorithms have. We can precisely explain why minimax agent moved at that way, but neural agent made a decision based on more complicated “weights”, which cannot be easily explained. This lack of transparency challenges that understanding and trusting the decisions made by chess AI, and other AI either.


### Conclusion

Our comparative methods of chess AI methods show that different algorithms exhibit different benefits and drawbacks. The hybrid methods combining neural networks with MCTS show great competence overall, proving the value of combining learned knowledge with traditional searching. Minimax with Alpha-bet pruning functions exhibit striking stability, while pure MCTS gets stuck in limited iterations.

These findings show that the future of chess AI not only is put in choosing traditional algorithms or machine learning, but also in finding the optimal solution of combining them. Our hybrid agent succeeds even in limited training, showing that relatively easy neural architectures can strikingly improve the tree search competence. 

From the resource perspective, Minimax achieves optimal balance between performance and efficiency, so is really valuable in relatively limited resource environment. Neural methods will have a brilliant future, but needs tons of training to reach its potential.

This work provides a controlled comparison for researchers without tons of computing resources, broadly contributing to AI in game fields like chess. The future work will focus on improving neural training processes, including more complicated evaluation functions, and exploring dynamic resource allocation strategies, and so on.

# Footnotes
<a name="lorenznote"></a>1.[^](#1): Thompson, T. (2023, August 31). History of AI in games – chess. modl.ai | AI Engine for Game Development. https://modl.ai/chess/#:~:text=Chess%20was%20core%20to%20AI,domain%20knowledge%20about%20the%20game<br>
<a name="lorenznote"></a>2.[^](#2): Wikipedia contributors. (2025, February 10). Deep Blue versus Garry Kasparov. Wikipedia. https://en.wikipedia.org/wiki/Deep_Blue_versus_Garry_Kasparov<br>
<a name="lorenznote"></a>3.[^](#3): J. Scheiermann and W. Konen, "AlphaZero-Inspired Game Learning: Faster Training by Using MCTS Only at Test Time," in IEEE Transactions on Games, vol. 15, no. 4, pp. 637-647, Dec. 2023, doi: 10.1109/TG.2022.3206733.<br>
<a name="lorenznote"></a>4.[^](#4): Wikipedia contributors. (2025a, February 8). Leela Chess Zero. Wikipedia. https://en.wikipedia.org/wiki/Leela_Chess_Zero<br>
