Home

UoM COMP90054 Contest Project

Team Name : Pac-Champs

Team Members

List here the full name, email, and student number for each member of the team:

Kanupriya Arora - kanupriya.arora@student.unimelb.edu.au - 971689
Karan Katnani - k.katnani@student.unimelb.edu.au - 920130
Suraj Kumar Thakur - surajkumar.thakur@student.unimelb.edu.au - 999886

YouTube Link: https://www.youtube.com/watch?v=7Ib14vkOxLo

Overview

For our project, our initial objective was to understand “Capture the Flag “ methodology and determine which model/approach could be implemented to simplify the problem. So we came up with three approaches for our agents which are primarily Q-learning and the second agent is a combination of the heuristic technique with Monte Carlo tree search method.

The first implementation involved the usage of the Q learning methodology to be utilized in our defensive agent and we have used A-star heuristic search as part of our offensive agent to get the distance to our nearest food pellet and capsule. The second implementation involved a combination of heuristic function with the Monte Carlo method, where the heuristics was implemented as a part to capture the flags. The heuristic algorithm was implemented for the offensive agent whereas for the defensive agent we have used Monte Carlo tree search, the combination of heuristic and the monte Carlo tree search performance was better in comparison to the Q learning technique, hence we have used this for our final submission

Hence finally we submitted an agent having combination of heuristic function as well as the Monte Carlo methodology to make it more powerful and efficient implementation as the final submission outperformed the previous implementations.

Techniques implemented

Q learning:

We have implemented Q learning with A star heuristic search. We have used Q learning techniques for our defensive agent wherein it finds the strategic position to attack the enemy, the defensive agent looks for alley with highest food pellets and wait for the enemy to enter that zone, based on features : distance to ghost and Number of food pellets left our agent decides the next move based on the Q values. Whereas our Offensive agent works on the A-star heuristic that helped the agent to learn distances to the nearest food pellet or capsule with help of this the agent was able to go to opponents' territory easily and effectively.

Heuristic Method:

The heuristic function that we have implemented involves the usage of the linear regression technique which is based on several factors like the mode of the pacman i.e. offensive or defensive, start mode and kill mode. The approach that we took is

y = f1*w1 + f2*w2 + f3*w3 +f4*w4

where,

y is the predicted value for the linear regression technique. f1, f2, f3 and f4 are the features. w1,w2 and w3 are the corresponding weights.

f1: Offense_mode
f2: successor score
f3: distance to food
f4: distance to host

The object of the heuristic function is to select the action which gives the best heuristic cost. We have designed the pacman moves in such a way that in the start the pacman goes to the center location which we have termed it as the start mode.If the pacman reaches start mode and if there is no enemy around then the start mode changes to the offensive mode. Otherwise, the pacman start mode changes to the defensive mode and it looks out for the enemy pacman. If the enemy pacman enters our zone then our agent goes to kill mode, where it follows the enemy pacman to kill it.PAcman has five possible actions which are N, S , E , W and stop our objective is to which action to take next based on the heuristic function. So our code for heuristic evaluates next move depending on the pacman mode and the possible actions.

Monte Carlo Method:

The heuristic search algorithm has been applied to take a decision, finding the best possible moves from all the available moves. Various heuristic functions are applied with different weights for all the possible moves. All the possible moves with their functions and weights fed into a Monte Carlo Tree simulation, which then returns the best possible move to be taken. One limitation is that the current design cannot handle reverse moves and enters into a deadlock. It has been handled, by removing reverse moves before feeding into the tree.

Game Strategy

Defensive Agent

The defensive agent picks the centre point of the given maze, on it’s home side, after taking into account all the walls that exist at its own boundary column. The ghost agent keeps patrolling the centre point until it sees an enemy pacman in it’s visible range. The agent will find the closest path to the enemy pacman by using the available heuristic methods, attempt to kill and then return back to centre base once killed.

Offensive Agent

A Monte Carlo Tree Simulation is run, where all the possible actions are retrieved and then weighed in. The algorithm checks whether the agent can return back, and removes those movements since the current version of the algorithm breaks if reverse moves exist. In the case where a capsule is eaten, it ignores the ghosts surrounding it by assigning a negative weight. Once it is in capsule power, the food carrying capacity gets boosted and then it stops returning back to base after eating a single food item. If the agent does not see any ghost when in the enemy region, it heads towards the capsule, eats it and then eats the maximum food possible.

Analysis

From the initial inception of the approaches that we have implemented, we decided to use Monte Carlo methodology combined with the heuristic search function as the performance during the process of testing the implementation, the final implementation outperformed our other approaches.

Please find below the analysis conducted for different algorithms which compare our implementation methodology against each other and also against the baseline team.

Teams	Layout	Win Rate	Average Win Score	Tie	Win Rate	Average Win Score
Monte Carlo Heuristics vs Baseline	Default	100%	9.5	0%	0%	0
Monte Carlo Heuristics vs Baseline	Office	95%	21	5%	0%	0

Monte Carlo Heuristics vs qLearning	Default	75%	6.5	10%	15%	4
Monte Carlo Heuristics vs qLearning	Office	85%	23	5%	10%	5

qLearning vs Baseline	Default	30%	6.5	50%	10%	3
qLearning vs Baseline	Office	40%	15.5	45%	15%	7

Below is the win-rate analysis of different pacman agents

Challenges

The Q LearningAgent that has been implemented is taking longer than 1 second to make a decision in some contest layouts, which is making it ineligible for the competition.
The Monte Carlo agent tends to enter deep alleys if there is food around it, making it vulnerable to attacks from the enemy in such scenarios.
It was challenging to find out weights, as only after multiple iterations we were able to conclude on the most optimal weights for our Monte Carlo Tree Search Model.

Future Improvements

We can try to implement strategies where if pacman is in capsule eating mode then we can send pacman to eat the farthest food.
If we have the position of the traps , then we can deploy a methodology to avoid the alleys so that pacman doesn't get stuck at a position and avoid getting killed.
Trap Detection : At the start of the game, during the first 10 seconds we can locate all the coordinates which will lead to the alley or dead-end within 3-4 moves.For every 20 moves we take, we can store those positions of pacman in a list of visited list and if pacman happens to visit that coordinate over again then we can increment the count of that coordinate by one. Towards the end if we keep a check on the counter of pacman visiting a state , we can check if pacman has visited a state more than 6-7 times within 20 moves then we should send our offensive agent to call heuristic function to eat food so that we can score points if one of the agents is stuck.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly