Navigate to the folder CS3243_P2/sudoku, run the command python CS3243_P2_Sudoku_47.py input4.txt dump.txt to test the solver algorithm.
General algorithm
- Maintain the list of assignable numbers for each empty cell on the board as constraints
- Use backtracking to try all the assignable numbers for each cell
- If the trial does not violate Sudoku rules, proceed to the next empty cell.
- If it violates, backtrack by resetting the current cell empty and go back to the previous assignment.
- After each valid assignment, update the list of assignable numbers for all remaining empty cells on the board
Approach 1:
- Fill the empty cell from top-down, left to right
- Try out each assignable number in ascending order
Results:
- Input1:
24.9235s - Input2:
2.7022s - Input3:
2.1306s - Input4:
0.4958s
Approach 2:
- Fill the empty cell from top-down, left to right
- Try out each assignable number in random order
Results:
- Input1:
25.6315s (better case), 213.7323s (worse case) - Input2:
6.1811s (better cases), 12.1366s (worse case) - Input3:
0.9792s - Input4:
0.7423s
Notes: The time taken varies largely. On average, there is no significant improvements from approach 1. For more difficult cases like input1 and input2, this approach struggles to compute the answer occasionally.
Approach 3:
- Fill the empty cell from top-down, left to right
- Try out each assignable number in ascending order
- Perform forward checking: stop early when constraints are not satisfiable (some empty cells already have no assignable number)
Results:
- Input1:
11.5302s - Input2:
0.6481s - Input3:
0.3025s - Input4:
0.2247s
Notes: The addition of forward checking has improved performance by more than half in most cases.
Approach 4:
- Fill the empty cell starting with the highest degree of constraints (least number of assignable number)
- Try out each assignable number in ascending order
- Perform forward checking: stop early when constraints are not satisfiable (some empty cells already have no assignable number)
Results:
- Input1:
3.9446s - Input2:
0.04649s - Input3:
0.01629s - Input4:
0.02353s
Notes: The heuristic of "sorting by degree of constraints" has further improved the performance. It is because we further narrowed down the number of possibilities to try the assignment.
The update formula for the Q-value from Wikipedia
The GameState class in pacman.py specifies the full game state, including the food, capsules, agent configurations and score changes.
It is used by the Game object to capture the actual state of the game and also used by the agent to reason about the game.
getLegalActions( self, agentIndex=0 ): returns the legal actions for the specified agent; returns[]if no action is possible (i.e. in the terminal state)
Agent (game.py)
|
--> ValueEstimationAgent (learningAgents.py)
|
--> ReinforcementAgent (learningAgents.py)
|
--> QlearningAgent (qlearningAgents.py)*
|
--> PacmanQAgent (qlearningAgents.py)
|
--> ApproximateQAgent (qlearningAgents.py)*
QLearningAgent
- Attributes:
alpha -------- learning rate
epsilon ------ exploration rate
gamma -------- discount factor
numTraining -- number of training episodes
actionFn = lambda state: state.getLegalActions()
(function that takes a state, returns a list of legal actions)
- Functions:
getQValue(self, state, action):
- if a state has not seen, return 0.0
- otherwise return the Q-value of given state & action pair
computeValueFromQValues(self, state):
- return an action that gives the maximum Q-value among all legal actions
- if no legal action exists, return 0.0
computeActionFromQValues(self, state):
- return the best action (the one that gives the highest Q-value) for a given state
- if no legal action exists, return `None`
getAction(self, state):
- return the action to take given the state
- if no legal action exists, return `None`.
- the action can be either to explore random choices or follow the policy
- the probability to explore is given by `self.epsilon`
update(self, state, action, nextState, reward):
- update the Q-value for state & action pair, given the reward for current state and Q-value in nextState
- the calculation is based on the Q-learning formula above.
Task 1
Run "python2 pacman.py -p PacmanQAgent -x 2000 -n 2010 -l smallGrid"
Completed 2000 out of 2000 training episodes
Average Rewards over all training: -50.29
Average Rewards for last 100 episodes: 245.66
Episode took 1.12 seconds
Average Score: 500.2
Win Rate: 10/10 (1.00)
Run "python2 autograder.py -q q1"
Reinforcement Learning Status:
Completed 100 test episodes
Average Rewards over testing: 500.48
Average Rewards for last 100 episodes: 500.48
Episode took 1.15 seconds
Average Score: 500.48
Win Rate: 100/100 (1.00)
ApproximateQAgent
- Attributes:
featExtractor = util.lookup(extractor, globals())()
weights = util.Counter()
- Functions:
getQValue(self, state, action):
- return the Q-value, given the state & action, by applying dot product operation on features and their corresponding weights
update(self, state, action, nextState, reward):
- update the weight for each feature based on Q-learning formula
Task 2
Run "python2 pacman.py -p ApproximateQAgent -x 2000 -n 2010 -l smallGrid"
Reinforcement Learning Status:
Completed 2000 out of 2000 training episodes
Average Rewards over all training: -77.55
Average Rewards for last 100 episodes: 186.18
Episode took 1.50 seconds
Average Score: 499.8
Win Rate: 10/10 (1.00)
Run "python2 pacman.py -p ApproximateQAgent -a extractor=SimpleExtractor -x 50 -n 60 -l mediumGrid"
Average Score: 528.2
Win Rate: 10/10 (1.00)
Run "python2 pacman.py -p ApproximateQAgent -a extractor=SimpleExtractor -x 50 -n 60 -l mediumClassic"
Average Score: 1323.9
Win Rate: 10/10 (1.00)
Run "python2 autograder.py -q q2"
*** PASS: test_cases/q2/1-tinygrid.test
*** PASS: test_cases/q2/2-tinygrid-noisy.test
*** PASS: test_cases/q2/3-bridge.test
*** PASS: test_cases/q2/4-discountgrid.test
*** PASS: test_cases/q2/5-coord-extractor.test
FeatureExtractor (featureExtractors.py)
|
--> IdentityExtractor (assign single feature to all state & action pairs)
|
--> CoordinateExtractor (additionally consider x-y coordinates of the player agent)
|
--> SimpleExtractor (consider factors of ghosts present 1 step away & nearest food)
|
--> NewExtractor (additionally consider factors of nearest capsule & nearest scared ghost)
Task 3
Run "python2 pacman.py -p ApproximateQAgent -a extractor=NewExtractor -x 50 -n 60 -l mediumClassic"
Average Score: 1583.5
Win Rate: 9/10 (0.90)
Average Score: 1424.6
Win Rate: 8/10 (0.80)
Average Score: 1615.3
Win Rate: 10/10 (1.00)
Run "python2 pacman.py -p ApproximateQAgent -a extractor=NewExtractor -x 50 -n 60 -l mediumClassic"
(prioritise less on hunting capsule)
Average Score: 1611.2
Win Rate: 10/10 (1.00)
Average Score: 1695.6
Win Rate: 10/10 (1.00)
Average Score: 1588.1
Win Rate: 9/10 (0.90)
Run "python2 pacman.py -p ApproximateQAgent -a extractor=NewExtractor -x 50 -n 60 -l mediumClassic"
(prioritise less on hunting capsule, and more on eating scared ghosts)
Average Score: 1554.2
Win Rate: 9/10 (0.90)
Average Score: 1636.2
Win Rate: 10/10 (1.00)
Average Score: 1651.0
Win Rate: 10/10 (1.00)
Note: Consider the eating of capsules and scared ghosts definitely maximise the reward as compared with SimpleExtractor by roughly 300 points. Moreover, winning rate is more stable when eating capsule is less prioritised.
Functions
flipCoin(p): return true with the given probabilityp
Counter class
A counter maintains counts for some keys. It extends from dictionary with number values (integer / float).
{'test': 2, 'blah': 1, 'item': 5}
All keys are defaulted to have value 0. Hence, we can count things without initialisation.
a = Counter()
a['blah'] += 1
print(a['blah'])
The terminal will print 1 without error.
-
incrementAll(self, keys, count): increment all values of the keys by the same count -
argMax(self): return the key with the highest value -
sortedKeys(self): return a list of keys sorted by their values in descending order -
totalCount(self): return the sum of counts for all keys -
normalize(self): edit the counter such that total count of all keys equals 1. The ratio of counts for keys remains the same. -
divideAll(self, divisor): divide all counts by divisor
