

### Context 

You have been hired by ChessChowDown Inc. to provide insights on chess games played in the Lichess portal. Your insights will be used by the company to develop new learning materials. 

### Schema

You are given a dataset of chess games played on the Lichess website (lichess.org), each game document has the following fields:


* **event**: Type of event the game was played 
* **site**: Where the game was played 
* **date**: date the game was played
* **UTCdate** : date in UTC format
* **UTCtime**: time the game was played in UTC format
* **White**: username of the White player
* **Black**: username of the Black player
* **ECO**: Alphanumeric code of the opening played in the game
* **opening**: Name of the opening played in the game
* **TimeControl**: Time control the game was played, e.g, 60 + 0 seconds per move increment or 5min + 3 seconds per increment
* **result**: result of the game
* **WhiteElo**: Elo rating of the White player at the time of the game (https://www.chess.com/terms/elo-rating-chess)
* **BlackElo**: Elo rating of the Black player at the time of the game
* **moves**: list of moves of the game, each move is a sub-document with the following fields:

    * **number**: Number of the move
    * **turn**: false if it was White move, true if it was Black move. Note that it is important to consider move number together with the turn: one can have move = 1 and turn = False (White's first move) and move = 1 and turn = True (Black's first move)
    * **clock**: player’s remaining time to the next time control after this move, in seconds
    * **move**: the algebraic notation of the move https://en.wikipedia.org/wiki/Algebraic_notation_(chess) 

    * **eval**: A subdocument with the evaluation of the position after the move calculated by a computer chess engine: 
  {'unit' : centipawns or mate , 'value' : an integer}. 
  
  If unit is 'centipawns', then 'value' is an integer that quantifies the advantage/disadvantage in the position after the move, from the point of view of the White player. A positive value indicates White player has a better position, a negative value indicates Black has a better position. 
  If unit is 'mate', then 'value' is the number of moves until checkmate. A positive value indicates that White player can checkmate in 'value' moves, a negative value, that Black player can checkmate in 'value' moves. 
  As checkmate ends the game, a mate evaluation indicates a very large advantage.


| Task1 | Marks |  100% | 80% | 60% | 
| --- | --- | --- | --- | --- | 
|  q6 | 10  |  30sec   | 90sec   | 5minutes   |
|  q7 | 8 | 15sec    | 60sec   | 4minutes   |
|  q8 | 12 | 30 sec    | 120sec | 5minutes  | 
|  q9 | 10   | 30 sec   |  120sec  | 5minutes  |
|  q10| 10  |  30sec  | 120sec  | 5minutes   |




In [11]:
#import section
import pymongo
from pymongo import MongoClient
from datetime import datetime
from pprint import pprint
import networkx as nx
import pandas as pd

# Creation of pyMongo connection object
client = MongoClient('mongodb://localhost:27017/')
db = client['admin']
games_collection = db['games']

## The snippet below deletes two games that create an issue with Question 2
## For more info, check the announcement
from bson.objectid import ObjectId

games_collection.delete_one({'_id':ObjectId('634d2bc62269f69fcfb74448')})
games_collection.delete_one({'_id':ObjectId('634d2bc62269f69fcfb7444a')})

<pymongo.results.DeleteResult at 0x26af24d1a80>

### Question 6

A "blunder" is defined as a move that makes the evaluation of a position to lose >=200 centipawns, with respect to the evaluation of the previous move by the opponent _from the point of view of the player that made the move_ 

Examples:

* After Black's move 21 the evaluation is 50 centipawns and after White's move 22 the evaluation is -150 centipawns. White's move 22 is a blunder. 
* After White's move 36 the evaluation is 600 centipawns and after Black's move 36 the evaluation is checkmate for White in any number of moves. Black's move 36 is a blunder.
* After White's move 26 the evaluation is 50 centipawns and after Black's move 26 the evaluation is 300 centipawns. Black's move 26 is a blunder
* After Black's move 44, the evaluation is Mate in 4, that is, White checkmates in 4 moves, and after White's move 45 the evaluation is 600 centipawns. White's move 45 is a blunder

(Note the following case is not considered a blunder: After Black's move 44, the evaluation is Mate in 4, that is, White checkmates in 4 moves, and after White's move 45 the evaluation is Mate in 12.)


Remember evaluation values are expressed in terms of White's point of view. In the third example above, the evaluation increased more than 200 centipawns, but from Black's point of view, they are more than 200 centipawns worst, hence, the move was a blunder.

Write a function to add to each "move" subdocument of each game document a boolean field 'blunder' with true if the move is a blunder and false if it's not.


===== Further explanation on blunder definition, there are 6 cases to consider, we provide an example for each of them:

**1) White Blunders a worst position.**

   Previous move by Black {eval: {unit: centipawns , value: 300} (White has an 300 centipawns advantage)

   Current move by White {eval: {unit: centipawns , value: 100} (White has a 100 centipawns advantage)

   White's move lost 200 centipawns of advantage, it is a blunder

**2) Black blunders a worst position**

   Previous move by White {eval: {unit: centipawns , value: -300} (Black has a 300 centipawns advantage)

   Current move by Black {eval: {unit: centipawns , value: -100} (Black has a 100 centipawns advantage)

   Black's move lost 200 centipawns of advantage, it is a blunder.

**3) White blunders mate**

   Previous move by Black {eval: {unit: centipawns , value: -300} (White has a 300 centipawns disadvantage)

   Current move by White {eval: {unit: mate , value: -3} (Black can mate in 3 moves)

   White's move made them go from disadvantage, to get mated. It is a blunder.


**4) Black blunders mate**

   Previous move by White {eval: {unit: centipawns , value: -300} (Black has a 300 centipawns advantage)

   Current move by Black {eval: {unit: mate , value: 3} (White can mate in 3 moves)

  Black's move made them go from advantage, to mate for White. It is a blunder.

**5) White misses mate**

   Previous move by Black {eval: {unit: mate , value: 3} (White can mate in 3 moves)

   Current move by White {eval: {unit: centipawns , value: 800} (White has a 800 centipawns advantage)

   White's move made them go from mate to an advantage. It is a blunder (here is where it might be useful to think about mate for White as +100000)

  Note also the following as a blunder:

   Previous move by Black {eval: {unit: mate , value: 3} (White can mate in 3 moves)

   Current move by White {eval: {unit: mate , value: -5} (Black can mate in 5 moves)

   White's move made them go from mate for them to mate for Black. It is a blunder

**6) Black misses mate**

   Previous move by White {eval: {unit: mate , value: -3} (Black can mate in 3 moves)

   Current move by Black {eval: {unit: centipawns , value: 800} (White has a 800 centipawns advantage)

   Black's move made them go from mate to a disdvantage. It is a blunder (here is where it might be useful to think in mate for Black as -100000)




In [18]:
def add_blunder_field(games_collection):
    # print(games_collection['moves'][0])
    # print(games_collection.count())
    games_collection.find_one({'event': 'Rated Classical game'})
    p = list(games_collection.find())
    print(p.count())
    # print(games_collection.find_one({'event': 'Rated Classical game'})['event'])
    #your code here
    # pass

# Apply the function to add blunder fields
add_blunder_field(games_collection)

Rated Classical game


###  Question 7

Now that we have blunders data, the company wants to get insights on the factors that may be related to them. They believe that openings where more blunders happen are "more difficult", hence, creating learning resources for those would make business sense. They set as cut-point for the opening phase of a game the first 15 moves (inclusive, and of both White and Black) and ask you to prepare data for a statistical analysis of the correlation between ECO codes and opening phase blunders.

Write code for a function that returns a pandas DataFrame with three columns:

1. ECO: the opening code.
2. Games: the number of games with that ECO
3. MOB: Median Opening Blunders, median of number of blunders in the opening phase 


Example output:

| ECO | Games | MOB |
| --- | --- | ---- |
| C38 | 3756  |   8   |
| C39 | 2100 | 4 |
| C40 | 1152 | 2 |
| ... | ... | ... |

NOTE: The correctness of question 6 will not affect your mark on question 7.
If you find question 6 challenging, you may want to create a test question 7 before coming back to question 6



In [None]:
def blunders_vs_openings(games_collection):
    ## Your code here
    # return a pandas DataFrame
    pass
    

### Question 8

An essential aspect of online chess is cheating detection. Cheaters transfer moves in their games to chess computer engines and play back the moves computed by the engine. Two factors that suggest a player might be cheating are long winning streaks and low move time standard deviation, that is, taking the same time to play each of their moves. Cheat games are useless for the purposes of ChessChowDown, therefore, bosses are interested in filtering them out.

Write a function that receives the following input parameters: 

 * minStreak: minimum number of consecutive wins to warrant cheating analysis
 * TimeControl: Time Control to analyse
 * maxTimeStd: maximum move time standard deviation to flag as cheater
 
 and returns a list of potential cheating usernames that match all the following conditions:
 
 1. Have won 'minStreak' or more consecutive games played with 'TimeControl' at least once.
 2. For all of the username's winning streaks with length greater or equal than 'minStreak'played with 'TimeControl', each game has a move time standard deviation lower or equal than maxTimeStd 
 
 For efficiency evaluation, we will use the TimeControl with the most games, minStreak = 3 and maxTimeStd = 1




In [None]:
def cheater_detector(games_collection,minStreak,timeControl,maxTimeStd):
    #your code here
    #return list of suspicious usernames  
    pass

### Question 9

The company now wants to explore the social aspect of the games. The team decides it is a good idea to store explicit information about who plays against who, in preparation for further social network analysis. 

Write a function that creates a new collection named "social" with the following schema:

* "username" : username
* "numgames" : number of games played by username with any colour
* "played" : list of subdocuments with the following schema:
     "username" : opponent's username (different from parent username)
     "numgames" : Number of games played between parent username and opponent username
     "gameids" : list of id of the games played between parent username and opponent username

An example subset of the new collection is shown below.



```JSON
{
"username" : "DataKnight",
"numgames" : 28 ,
"played" : [ { "username" : "MongoQueen", "numgames" : 12 , 
"gameids" : ["ids of the games between DataKnight and MongoQueen"] } , 
            {"username" : "ThePandas", "numgames" : 14 , 
            "gameids" : ["ids of the games between DataKnight and ThePandas"] } , 
			{"username" : "JupyterGod", "numgames" : 2 ,
            "gameids" : ["ids of the games between DataKnight and JupyterGod"] }]
}
{
"username" : "MongoQueen",
"numgames" : 20,
"played" : [ { "username" : "DataKnight", "numgames" : 12 ,
"gameids" : ["ids of the games between MongoQueen and DataKnight]" } , 
            {"username" : "Hadoooooooop", "numgames" : 6 , 
            "gameids" : ["ids of the games between MongoQueen and Hadoooooooop"]  }, 
			{"username" : "JupyterGod", "numgames" : 2 , 
            "gameids" : ["ids of the games between MongoQueen and JupyterGod"]} ]
}
```

In [None]:
def create_social(games_collection):
    #your code here
    pass

### Question 10

The company wants to investigate communities of players that play the same openings  

Write a function that receives as input a list of ECO codes 'ecoCodes' and construct a networkx graph as follows 

 * Nodes are labeled with usernames
 * A directed edge from user1 to user 2 for each game g such that g's ECO is in the ecoCodes list, user1 played White and user2 played Black
   (Subject of clarification, it was "if a game g exists such that")
 * Each directed edge is labeled with the result of the game it represents.
 (Note this is a MultiDirectedGraph)
 
 Using that graph, return the following dictionary:
 
 { 'graph' : the networkx graph,
   'mostWhite' : username that played the most games as White ,   
   'mostBlack' : username that played the most games as Black  ,
   'keyPlayers' : list of usernames with highest betweenness centrality 
 }
 
 Assume the length of ecoCodes is restricted to at most 5. For efficiency evaluation, we will use this 5 ECO Codes : ['A80','B16','B17','B18','B19']
 (This was the subject of a correction, it was previously the 5 ECO codes with most games )
 
 Hint: To compute centrality, use the subgraph of the MultiDirectedGraph induced by removing all but one of the edges




In [None]:
def opening_community(games_collection,ecoCodes):
    #your code here
    # Return dictionary below with corresponding values
    """
    return { 'graph' : the networkx graph
    'mostWhite' : username that played the most games as White 
    'mostBlack' : username that played the most games as black 
    'keyPlayers' : list of usernames with highest betweenness centrality 
    }
    """
    pass