# STA 141B Assignment 3

### Preliminaries
Due __February 27__ by __11:59pm__.

Submit your work by uploading it to Gradescope. Submission requires two files: the original Jupyter Notebook and its PDF export.
Please rename this file as "H1_Lastname_Firstname_srnr", where srnr are the last four digits of your student's ID number and do the same for the PDF export file.

### Objective

The objective of this homework assignment is to solidify your understanding of and proficiency with __APIs__, and, more generally, with Web scraping.

### Instructions

1. Provide your solutions in new cells between the `Solution START` cell and the `Solution END` cell. Create as many new cells as necessary within these two blocks. Use code cells for your Python scripts and Markdown cells for explanatory text or answers to non-coding questions.

2. You must execute the code following every `Validation` block to get credits for the corresponding task. Failure to do so may result in a loss of points.

3. Prioritize code readability. Just as in writing a book, the clarity of each line matters. Adopt the __one-statement-per-line__ rule. If you have a lengthy code statement, consider breaking it into multiple lines for clarity. Note you can use `'''` to start and end strings in Python that are written over multiple lines.

4. To help understand and maintain code, you should add comments to explain your code. Use the hash symbol (#) to start writing a comment.

5. Submit your work by uploading it to __Gradescope__. Submission requires two files: the original Jupyter Notebook (.ipynb) and its PDF export. To convert your Jupyter notebook file into a PDF, navigate to "File", select "Download as", and then choose either "PDF via LaTeX" or "HTML". If "PDF via LaTeX" does not work for you, export to "HTML", and then use Chrome to print the .html file into PDF.

6. This assignment will be graded on your proficiency in programming. Be sure to demonstrate your abilities and submit your own, correct and readable solutions.

### Code of conduct

The usage of AI for this homework is strictly forbidden.

### Setting

We will use the [lichess](https://lichess.org/api) API to retrieve some information about the current state of chess in the world. In order to answer below questions, make precise and economical requests. This API is well documented, so please make sure to read the documentation before working on the tasks.

Kindly note that the __results depend on the time of your request.__ Thus, your answer might be correct even if you get a different output as provided at the end of your task.

In [1]:
import requests
import json
import pandas as pd
import time

from datetime import datetime

## Exercise 1 [10 points]

#### Examples

The following examples are provided to help you for the task. Note that these data depend on the time of your request and therefore you might get different results.

### 1a) [1 Points]

#### Task

Consider the top 10 chess players (in terms of their rating) of some chess `variant` in lichess.

Write a function that takes a string `variant` as input and returns a list or Pandas.Series consisting of their __ids__.

Afterwards, create a Pandas.Series called `top_pls` that contains the id's of the top10 players of __classical__ chess on lichess by using this function.

#### Solution START

All code for this task must be written between this `Solution START` and the following `Solution END` block.

#### Solution END

#### Examples

The following examples are provided to help you for the task. Note that these data depend on the time of your request and therefore you might get different results.

In [3]:
top_pls[0:2]

0      yuuki-asuna
1    chesstheory64
Name: id, dtype: object

#### Validation
Please run the following code lines. Wrong results or errors in the following code may still get partial credits - as long as the following code is executed.

In [None]:
print(type(top_pls))
print(len(top_pls))

In [None]:
top_pls

### 1b) [1 Points]

#### Task

Create a list or Pandas.Series `variants` consisting of all chess styles on lichess (this includes both chess games with different rules and at different speed). Do NOT fill the list by hand. Instead create the list by post-processing a request that returns all TOP10 lists of all chess variants and retrieve the variants from there. In particular, note that Puzzles are not considered as a chess variant.

#### Solution START

All code for this task must be written between this `Solution START` and the following `Solution END` block.

#### Solution END

#### Examples

The following examples are provided to help you for the task.

In [7]:
len(variants)

13

In [8]:
variants[0:2]

['bullet', 'blitz']

#### Validation
Please run the following code lines. Wrong results or errors in the following code may still get partial credits - as long as the following code is executed.

In [None]:
variants

### 1c) [2 Points]

#### Task
Create a DataFrame `df` that contains the ids of the top10 players of all chess styles. Each column shall contain the ids of the top10 players of a certain chess style. You may either use functions from previous tasks or process a different request.

Afterwards, create a list or Pandas.Series `multi_talents` of all player id's that appear in more than one top10 list. The list shall contain the id as index and the number of occurences as values.

#### Task description

#### Solution START

All code for this task must be written between this `Solution START` and the following `Solution END` block.

#### Solution END

#### Examples

The following examples are provided to help you for the task. Note that these data depend on the time of your request and therefore you might get different results.

In [11]:
df.head(1)

Unnamed: 0,bullet,blitz,rapid,classical,ultraBullet,crazyhouse,chess960,kingOfTheHill,threeCheck,antichess,atomic,horde,racingKings
0,mraquariyaz67,wonderland305,yuuki-asuna,yuuki-asuna,mraquariyaz67,larso,chess-art-us,mraquariyaz67,mraquariyaz67,tetiksh1agrawal,casperyliu,matvei-e2e4,seth_777


In [12]:
multi_talents[0:2]

mraquariyaz67    5
chess-art-us     4
Name: count, dtype: int64

#### Validation
Please run the following code lines. Wrong results or errors in the following code may still get partial credits - as long as the following code is executed.

In [None]:
df

In [None]:
print(multi_talents)

In [None]:
type(multi_talents)

### 1d) [1 Points]

#### Task

Write a function `get_user_info` that gets a user's ID as an argument and returns their name, title, rating on lichess of the _classical_ variant and the fide rating. Whenever one of these properties cannot be found, replace the corresponding entry with `None` (but return all the other properties). Furthermore, if a player has no profile at all, return `None` for each of their properties.

#### Solution START

#### Solution END

#### Examples

The following examples are provided to help you for the task. Note that these data depend on the time of your request and therefore you might get different results.

In [17]:
get_user_info('kurald_galain ')

['Anomander Rake', None, 2292, None]

#### Validation
Please run the following code lines. Wrong results or errors in the following code may still get partial credits - as long as the following code is executed.

In [None]:
get_user_info('muisback')

In [None]:
get_user_info('chesstheory64')

In [None]:
get_user_info('mysterious-master')

### 1e) [2 Points]

#### Task

Write to functions `get_best_style` and `get_favourite_style` that take the player's ID as input and do the following:

`get_best_style` calculates the best current rating of that player within all variants and returns a dictionary consisting of the best variant as keyword and the corresponding rating as value. If the best variant is not unique, then the dictionary shall contain all best variants.

`get_favourite_style` calculates for each variant the number of games a player has played and returns a dictionary consisting of the variant the player has played most often as a keyword and the number of games as a value. If the most played variant is not unique, then the dictionary shall contain all such variants.

#### Solution START

#### Solution END

#### Examples

The following examples are provided to help you for the task. Note that these data depend on the time of your request and therefore you might get different results.

In [23]:
get_best_style('kurald_galain ')

{'bullet': 2998}

In [24]:
get_favourite_style('kurald_galain ')

{'bullet': 38173}

#### Validation
Please run the following code lines. Wrong results or errors in the following code may still get partial credits - as long as the following code is executed.

In [None]:
get_best_style(top_pls[0])

In [None]:
get_best_style('muisback')

In [None]:
get_favourite_style(top_pls[0])

In [None]:
get_favourite_style('muisback')

### 1f) [1 Points]

#### Task

Define a function `get_best_rating` that takes the user's id and a variant as input and calculates the highest rating in the history of this player for this specific variant. The function shall return a dictionary consisting of the best ratings' date (the key) and the highest rating (as the dictionary value).

Note: if a user has not played any games of this variant yet, return the dictionary `{None: None}`.

#### Solution START

#### Solution END

#### Examples

The following examples are provided to help you for the task. Note that these data depend on the time of your request and therefore you might get different results.

In [30]:
print(get_best_rating('kurald_galain', 'Bullet'))

{datetime.datetime(2025, 3, 6, 0, 0): np.int64(3068)}


#### Validation
Please run the following code lines. Wrong results or errors in the following code may still get partial credits - as long as the following code is executed.

In [None]:
print(get_best_rating(top_pls[0], 'Classical'))

In [None]:
print(get_best_rating('muisback', 'Classical'))

In [None]:
print(get_best_rating('muisback', 'Blitz'))

### 1g) [2 Points]

#### Task

Write a function `get_games` that takes two (different) user id's as arguments and returns the number of total games these two players played against each other.

Afterwards, find out against which player in the top10 list of the variant 'Classical' the player 'yuuki-asuna' played the most games and how many games they played. Display this player (e.g. by using the `print` command).

Hint: For this request, you may easily exceed the rate limit. To avoid this, wait for 10 seconds between each request.

#### Solution START

#### Solution END

## Exercise 2 [5 points]

### 2a) [1 Points]

#### Task

Write a function that takes a user_id 'opponent' as argument, loads all games between 'yuuki-asuna' and the player 'opponent' (regardless of the variant) and returns a DataFrame that consists of (at least) the following columns: `players`, `winner` and `opening`.


Hint: for this task, it may be helpful to use the ndjson package because the API returns an ndjson object. To do so, you might want to consider using the following lines:

`import ndjson`: Imports the package (don't forget to install it first)

`headers = {'Accept': 'application/x-ndjson'}`: Add headers to your request

`response.json(cls=ndjson.Decoder)`: Decode the response using the ndjson.Decoder.

#### Solution START

#### Solution END

#### Validation
Please run the following code lines. Wrong results or errors in the following code may still get partial credits - as long as the following code is executed.

In [None]:
df = get_all_games_vs('byebob')

In [None]:
df

In [None]:
len(df)

### 2b) [1 Points]

#### Task

Write a function `get_players` that takes one game of the DataFrame in 2a) as argument (that is: it gets a row of the DataFrame) and returns a dictionary.
The dictionary contains 'white' and 'black' as keywords and the user_ids of the corresponding white/black players as values.

#### Solution START

#### Solution END

#### Validation
Please run the following code lines. Wrong results or errors in the following code may still get partial credits - as long as the following code is executed.

In [None]:
get_players(df.loc[0])

### 2c) [1 Points]

#### Task

Write a function `get_winner` that takes one game of 2a) as argument (that is: it gets a row of the DataFrame) and returns the user_id of the player who won the game. For this task, you may use the function of 2b). If the game ended with a draw, the function should return `None`.

Afterwards, create a Pandas.Series or list `winners` that contains the winners of all games between 'byebob' and 'yuuki-asuna'.

#### Solution START

#### Solution END

#### Validation
Please run the following code lines. Wrong results or errors in the following code may still get partial credits - as long as the following code is executed.

In [None]:
get_winner(df.loc[0])

In [None]:
winners

### 2d) [2 Points]

#### Task

Define a function `get_opening` that takes one game of 2a) as argument (that is: it gets a row of the DataFrame) and returns a dictionary. This dictionary consists of only one entry: the user_id of the white player as key and the name of the opening as value.

Create a DataFrame of all games between 'yuuki-asuna' and 'byebob' that contains the name of the white player in the first column and the opening in the second column. Afterwards, create a new DataFrame consisting of all openings that 'yuuki-asuna' played as a white player and how often he played this opening. Find the two most common openings of all games 'yuuki-asuna' played as a white player against 'byebob' and display them (e.g. by using the `print` command).

#### Solution START

#### Solution END