

### Schema

You are given a dataset of chess games played on the Lichess website (lichess.org), each game document has the following fields:


* **event**: Type of event the game was played 
* **site**: Where the game was played 
* **date**: date the game was played
* **UTCdate** : date in UTC format
* **UTCtime**: time the game was played in UTC format
* **White**: username of the White player
* **Black**: username of the Black player
* **ECO**: Alphanumeric code of the opening played in the game
* **opening**: Name of the opening played in the game
* **TimeControl**: Time control the game was played, e.g, 60 + 0 seconds per move increment or 5min + 3 seconds per increment
* **result**: result of the game
* **WhiteElo**: Elo rating of the White player at the time of the game (https://www.chess.com/terms/elo-rating-chess)
* **BlackElo**: Elo rating of the Black player at the time of the game
* **moves**: list of moves of the game, each move is a sub-document with the following fields:

    * **number**: Number of the move
    * **turn**: false if it was White move, true if it was Black move. Note that it is important to consider move number together with the turn: one can have move = 1 and turn = False (White's first move) and move = 1 and turn = True (Black's first move)
    * **clock**: player’s remaining time to the next time control after this move, in seconds
    * **move**: the algebraic notation of the move https://en.wikipedia.org/wiki/Algebraic_notation_(chess) 

    * **eval**: A subdocument with the evaluation of the position after the move calculated by a computer chess engine: 
  {'unit' : centipawns or mate , 'value' : an integer}. 
  
  If unit is 'centipawns', then 'value' is an integer that quantifies the advantage/disadvantage in the position after the move, from the point of view of the White player. A positive value indicates White player has a better position, a negative value indicates Black has a better position. 
  If unit is 'mate', then 'value' is the number of moves until checkmate. A positive value indicates that White player can checkmate in 'value' moves, a negative value, that Black player can checkmate in 'value' moves. 
  As checkmate ends the game, a mate evaluation indicates a very large advantage.


### Marking scheme

- The result provided is correct: An incorrect answer will have between 0% and 40% of the mark depending on the nature of the mistake. Questions where there was only one answer possible will have 0%, questions where the result is correct in some cases and not others will be marked at 20% or 40%. Feel free to create as many notebooks as you want for experimenting and transcribe your final answer to the one you submit.

- The result is provided in the expected format and output: 20% will be deducted to correct results that are not in the expected format. The reason is that bad format breaks the automated marking scripts. If you have doubts, ask. 

- Efficiency of the answer: Measured in terms of execution time. There are many ways to reach the correct result, some of them are more efficient than others, some are more straight forward than others. 

- Tables below detail the percentage of mark you get according to the efficiency of the answer, each cell shows the maximum time allowed to get the mark in the corresponding row. Answers that take more time than the time in the 60% column will be declared timeout and get zero points. 


| Task1 | Marks |  100% | 80% | 60% | 
| --- | --- | --- | --- | --- | 
| q1 | 5 | 5sec |  10 sec | 1minute | 
|  q2 | 5 | 10sec   | 40sec   | 3minutes  |
|  q3 | 6 |   30sec | 60 sec  |  4minutes  |
|  q4 | 6 | 30sec   | 60sec   | 4minutes   |
|  q5 | 8 | 5sec   | 10sec  | 1minute   |
|  q6 | 10  |  30sec   | 90sec   | 5minutes   |
|  q7 | 8 | 15sec    | 60sec   | 4minutes   |
|  q8 | 12 | 30 sec    | 120sec | 5minutes  | 
|  q9 | 10   | 30 sec   |  120sec  | 5minutes  |
|  q10| 10  |  30sec  | 120sec  | 5minutes   |


| Task2 | Marks | 100% | 80% | 60% | 
| --- |--- | --- | --- | --- | 
| q11 | 8 | 90sec |  3minutes | 6minutes | 
|  q12| 12  | 90sec   | 3minutes   | 6minutes   |

In [None]:
#import section
import pymongo
from pymongo import MongoClient
from datetime import datetime
from pprint import pprint
import networkx as nx
import pandas as pd

# Creation of pyMongo connection object
client = MongoClient('mongodb://coursework:coursework@localhost:27017')
db = client['coursework']
games_collection = db['games']

## The snippet below deletes two games that create an issue with Question 2
## For more info, check the announcement
from bson.objectid import ObjectId

games_collection.delete_one({'_id':ObjectId('6353c7b266d6f91385e3bd57')})
games_collection.delete_one({'_id':ObjectId('6353d8ac66d6f91385e3cd63')})

### Question 1


Write a function that receives as input a username, a colour ("White" or "Black") and a result, and returns all games where the input username plays as the input colour that ended with the input result.

For efficiency evaluation, we will use the username with most games played.



In [None]:
def get_games(games_collection,username,colour,result):
    #Your code here
    #Return list of game documents
    pass

### Question 2

Write a function that finds and removes duplicate games, leaving a single instance of a game. Games are duplicate if they have the same White, Black, UTCdate and UTCtime values.   


(5 marks)

In [None]:
def remove_duplicates(games_collection):
    #your code here
    pass

### Question 3

Write a function that returns the number of knight moves minus the number of bishop moves in all games in the dataset. Recall that in algebraic notation, a bishop move starts with the letter "B" and a Knight move starts with the letter "N". 

(6 marks)

In [None]:
def knight_vs_bishop(games_collection):
    # your code here
    # returns integer value
    pass
    

### Question 4

A colleague's exploration of the dataset reports: 

* the "date" field contains wrong values, the "UTCDate" contains the real dates
* WhiteElo and BlackElo fields are Strings, when they should be integers.

Write a function that drops the "date" field from all documents, and converts WhiteElo and BlackElo to integers.

(6 marks)

In [None]:
def date_and_elo_fix(games_collection):
    #your code here
    pass

# Apply the function to execute the update
date_and_elo_fix()

### Question 5

An "upset" is a game between two players with a (large) rating difference won by the lower rated player.  

Write a function that receives as input an integer 'ratingDifference' and returns the number of upsets calculated based on a rating difference greater than than the input 'ratingDifference'.
(Note this was amended from previous version that stated less than instead of greater than 50)

Examples:
* 'ratingDifference = 50'
* WhiteElo = 2000, BlackElo = 2100, result "1-0 ---> An upset
* WhiteElo = 2000, BlackElo = 2049, result "1-0 ---> Not an upset


For efficiency evaluation, we will use 'ratingDifference' = 50 



In [None]:
def count_upsets(games_collection,ratingDifference):
    #Your code here
    #return integer value
    pass