# Scraping Top 100 Chess Players Details using Python

## In this project we are going scrape Top 100 Chess Players in the world by Rating category.


### Web Scraping is the process of extracting and parsing data from websites in an automated fashion using computer program
1. Importing requests library
2. Downloading web pages using requests library
3. Requests library allows you to access the HTTP links 

![](https://i.imgur.com/heKuzSD.png)



### Chess Rankings Website
Chess-ranking.com is a website which provides list of world top 100 chess players by rating category.
![](https://i.imgur.com/cdBW0aV.jpeg)

##### **Objective**:
Scraping the `Top 100 Chess Players` in each rating category by parsing the information from this website in the form of Tabular data.

#####  **List of creative fields on website:**

1. Rank
2. Rank Change
3. Name
4. Title
5. Federation
6. Rating
7. Rating Change
8. Age
9. K

## **Outline of the project:**
1. Understanding the structure of [Chess Rankings Website]("www.chess-rankings.com")
2. Installing and Importing required libraries 
3. Extracting the Player's details of different fields from website using `BeautifulSoup`
4. Parsing the Top 100 Chess Players details into 9 fields: Rank, Rank Change,Name, Title, Federation, Rating, Rating Change, Age and K using Helper Functions.
5. Storing the extracted data into a dictionary.
6. Compiling all the data into a DataFrame using `Pandas` and saving the data  into `CSV` file.

In [4]:
# Install the library
!pip install requests --upgrade --quiet

In [5]:
import requests

In [6]:
url_1 = "https://chess-rankings.com/?categ=3000&label=1"

#### To download the web page we are using the requests.get function

In [7]:
response = requests.get(url_1)

In [8]:
type(response)

requests.models.Response

#### If the request was successful the value of the response code will be between 200 to 299

In [9]:
response.status_code

200

In [10]:
page_contents = response.text

In [11]:
len(page_contents)

63569

#### The page contains over 63,596 characters! so that we will just view first 1000 characters from the webpage.

In [12]:
page_contents[:1000]

'\n\n\n\n\n\n<!DOCTYPE html>\n<html   style="min-width:375px;width:100%;overflow-x:hidden">\n<head>\n<meta charset="UTF-8" />\n\n<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>\n<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"> \n<meta name="viewport" content="width=device-width, initial-scale=1">\n<meta name="description" content="Chess Live Rankings & Ratings, player games🔥🔥 🤖 Statistics, graphs, customized ranks, players by country, by age,  calculators, world ⌛ Chess Tournaments ~400.000 players database " />\n<meta name="keywords" content="Ajedrez, España, Ajedrez España, Chess, Ranking, Sub 2300, Sub 2000, ajedrez por edades, categorias, top, elo FIDE" />\n\n<!--plugin fb-->\n  <meta property="og:url"           content="https://chess-rankings.com" />\n  <meta property="og:type"          content="website" />\n  <meta property="og:title"         content="Chess-Rankings" />\n  <meta property="og:description"   content="Chess Live Rankings & Ratings,

### Creating a file and Writing page contents into it

In [13]:
with open('chess-rankings.html', 'w', encoding="utf-8") as file:
    file.write(page_contents)

In [14]:
# Install the library
!pip install beautifulsoup4 --upgrade --quiet

## Use Beautiful Soup to parse and extract information
#### To extract the information from the HTML source code of a page we can use BeautifulSoup library to import that we have to use import BeautifulSoup from the bs4 module.

In [15]:
from bs4 import BeautifulSoup

In [16]:
with open('chess-rankings.html', 'r') as f:
    html_source = f.read()

In [17]:
html_source[:1000]

'\n\n\n\n\n\n<!DOCTYPE html>\n<html   style="min-width:375px;width:100%;overflow-x:hidden">\n<head>\n<meta charset="UTF-8" />\n\n<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>\n<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"> \n<meta name="viewport" content="width=device-width, initial-scale=1">\n<meta name="description" content="Chess Live Rankings & Ratings, player games🔥🔥 🤖 Statistics, graphs, customized ranks, players by country, by age,  calculators, world ⌛ Chess Tournaments ~400.000 players database " />\n<meta name="keywords" content="Ajedrez, España, Ajedrez España, Chess, Ranking, Sub 2300, Sub 2000, ajedrez por edades, categorias, top, elo FIDE" />\n\n<!--plugin fb-->\n  <meta property="og:url"           content="https://chess-rankings.com" />\n  <meta property="og:type"          content="website" />\n  <meta property="og:title"         content="Chess-Rankings" />\n  <meta property="og:description"   content="Chess Live Rankings & Ratings,

#### Parsing the HTML webpage with beautifulsoup to read the the content present in that page

In [18]:
doc = BeautifulSoup(html_source, 'html.parser')

In [19]:
type(doc)

bs4.BeautifulSoup

In [20]:
doc.title

<title>Live Chess Ratings &amp; Rankings - Chess-Rankings.com
  </title>

In [21]:
doc.title.text

'Live Chess Ratings & Rankings - Chess-Rankings.com\n  '

#### Fetching the rows with find_all function in the webpage to get the data present in it.

In [23]:
rows = doc.tbody.find_all('tr')

In [24]:
rows[:5]

[<tr id="1"><td>1</td><td>-</td><td class="nombre"><a href="https://chess-rankings.com/jugador.php?nombre=Carlsen, Magnus&amp;id=1503014">    Carlsen, Magnus     </a></td><td class="GM" style="font-weight:bold;">  GM  </td><td><img height="22" src="https://chess-rankings.com/img/banderas/no.png" style="border: 1px solid #CCCCE0;box-shadow: 1px 1px 1px grey;" title="Norway" width="35"/></td><td>  2864  </td><td>  0  </td><td>  32  </td><td>  10  </td></tr>,
 <tr id="2"><td>2</td><td>-</td><td class="nombre"><a href="https://chess-rankings.com/jugador.php?nombre=Ding, Liren&amp;id=8603677">    Ding, Liren     </a></td><td class="GM" style="font-weight:bold;">  GM  </td><td><img height="22" src="https://chess-rankings.com/img/banderas/cn.png" style="border: 1px solid #CCCCE0;box-shadow: 1px 1px 1px grey;" title="China" width="35"/></td><td>  2808.4  </td><td class="subeElo"> +2.4  </td><td>  30  </td><td>  10  </td></tr>,
 <tr id="3"><td>3</td><td class="subeElo">↑4</td><td class="nombre"

#### Creating dictionary for append data inside it

In [25]:
rank = []
rank_change = []
name = []
title = []
fed = []
rating = []
rating_change = []
age = []
k = []
table_dict = {'Rank': rank,
              'Rank Change': rank_change,
              'Name': name,
              'Title':title,
              'Federation':fed,
              'FIDE Rating': rating,
              'Rating Change':rating_change,
              'Age':age,
              'K':k
              }

In [26]:
doc.tbody.find_all('td')

[<td>1</td>,
 <td>-</td>,
 <td class="nombre"><a href="https://chess-rankings.com/jugador.php?nombre=Carlsen, Magnus&amp;id=1503014">    Carlsen, Magnus     </a></td>,
 <td class="GM" style="font-weight:bold;">  GM  </td>,
 <td><img height="22" src="https://chess-rankings.com/img/banderas/no.png" style="border: 1px solid #CCCCE0;box-shadow: 1px 1px 1px grey;" title="Norway" width="35"/></td>,
 <td>  2864  </td>,
 <td>  0  </td>,
 <td>  32  </td>,
 <td>  10  </td>,
 <td>2</td>,
 <td>-</td>,
 <td class="nombre"><a href="https://chess-rankings.com/jugador.php?nombre=Ding, Liren&amp;id=8603677">    Ding, Liren     </a></td>,
 <td class="GM" style="font-weight:bold;">  GM  </td>,
 <td><img height="22" src="https://chess-rankings.com/img/banderas/cn.png" style="border: 1px solid #CCCCE0;box-shadow: 1px 1px 1px grey;" title="China" width="35"/></td>,
 <td>  2808.4  </td>,
 <td class="subeElo"> +2.4  </td>,
 <td>  30  </td>,
 <td>  10  </td>,
 <td>3</td>,
 <td class="subeElo">↑4</td>,
 <td cla

In [28]:
rows[0].find_all('td')[2].text.strip()

'Carlsen, Magnus'

In [29]:
rows[0].find_all('td')[4].img['title'].strip()

'Norway'

#### Appending values from rows in dictionary's value

Each row (tr_tag) contains 9 'td_tag' tags which contains details about each palyer.

![](https://i.imgur.com/DVrVAhX.png)

In [30]:
for row in rows:
    rank.append(row.find_all('td')[0].text.strip())
    rank_change.append(row.find_all('td')[1].text.strip())
    name.append(row.find_all('td')[2].text.strip())
    title.append(row.find_all('td')[3].text.strip())
    fed.append(row.find_all('td')[4].img['title'].strip())
    rating.append(row.find_all('td')[5].text.strip())
    rating_change.append(row.find_all('td')[6].text.strip())
    age.append(row.find_all('td')[7].text.strip())
    k.append(row.find_all('td')[8].text.strip())
    
    
        
  

#### Converting dictionary into a DataFrame

In [31]:
import pandas as pd
df = pd.DataFrame(table_dict)
df

Unnamed: 0,Rank,Rank Change,Name,Title,Federation,FIDE Rating,Rating Change,Age,K
0,1,-,"Carlsen, Magnus",GM,Norway,2864,0,32,10
1,2,-,"Ding, Liren",GM,China,2808.4,+2.4,30,10
2,3,↑4,"Nepomniachtchi, Ian",GM,Russia,2792.4,+26.4,32,10
3,4,↓1,"Firouzja, Alireza",GM,France,2778.2,-14.8,19,10
4,5,↓1,"Caruana, Fabiano",GM,United States,2775.4,-7.6,30,10
...,...,...,...,...,...,...,...,...,...
95,96,-,"Shevchenko, Kirill",GM,Ukraine,2654,0,20,10
96,97,-,"Demchenko, Anton",GM,FIDE,2653,0,35,10
97,98,-,"Swiercz, Dariusz",GM,United States,2652,0,28,10
98,99,-,"Jones, Gawain C B",GM,UNITED KINGDOM,2652,0,35,10


### Creating helper functions for separate tasks

##### Helper function for fetching page and parse with BeautifulSoup()

In [33]:
# Creating function fetching page and parse with BeautifulSoup()
def get_page(url):
    response = requests.get(url)
    if response.status_code != 200:
        raise Exception('Failed to load page {}'.format(url))
    page_content = response.text
    return BeautifulSoup(page_content,'html.parser')


##### Helper function which will return dictionary

In [34]:
# Creating function which will return dictionary
def get_dict():
    rank = []
    rank_change = []
    name = []
    title = []
    fed = []
    rating = []
    rating_change = []
    age = []
    k = []
    table_dict = {'Rank': rank,'Rank Change': rank_change,'Name': name,'Title':title,'Federation':fed,'FIDE Rating': rating,
                 'Rating Change':rating_change,'Age':age,'K':k}
    return table_dict

##### Helper function for getting data from the rows and appending it into dictionary

In [35]:
# Creating function for getting data from the rows and appending it into dictionary
def get_rows(doc,table_dict):              
    rows = doc.tbody.find_all('tr')
    table_dict = table_dict
    for row in rows:
        table_dict['Rank'].append(row.find_all('td')[0].text.strip())
        table_dict['Rank Change'].append(row.find_all('td')[1].text.strip())
        table_dict['Name'].append(row.find_all('td')[2].text.strip())
        table_dict['Title'].append(row.find_all('td')[3].text.strip())
        table_dict['Federation'].append(row.find_all('td')[4].img['title'].strip())
        table_dict['FIDE Rating'].append(row.find_all('td')[5].text.strip())
        table_dict['Rating Change'].append(row.find_all('td')[6].text.strip())
        table_dict['Age'].append(row.find_all('td')[7].text.strip())
        table_dict['K'].append(row.find_all('td')[8].text.strip())
    return table_dict

##### Helper function for getting data from the rows and appending it into dictionary

In [36]:
# Creating function for getting data from the rows and appending it into dictionary
def get_rows(doc,table_dict):              
    rows = doc.tbody.find_all('tr')
    table_dict = table_dict
    for row in rows:
        table_dict['Rank'].append(row.find_all('td')[0].text.strip())
        table_dict['Rank Change'].append(row.find_all('td')[1].text.strip())
        table_dict['Name'].append(row.find_all('td')[2].text.strip())
        table_dict['Title'].append(row.find_all('td')[3].text.strip())
        table_dict['Federation'].append(row.find_all('td')[4].img['title'].strip())
        table_dict['FIDE Rating'].append(row.find_all('td')[5].text.strip())
        table_dict['Rating Change'].append(row.find_all('td')[6].text.strip())
        table_dict['Age'].append(row.find_all('td')[7].text.strip())
        table_dict['K'].append(row.find_all('td')[8].text.strip())
    return table_dict

##### Helper function for converting Dictionary into DataFrame

In [37]:
# Creating function for converting Dictionary into DataFrame
def get_df(dictionary):
    df = pd.DataFrame(dictionary)
    return df

##### Helper function for writing CSV file 

In [38]:
# Creating function for writing CSV file 
def write_csv(dataframe,fideRating):
    file_name = 'top_100_player_under_'+str(fideRating)+'.csv'
    dataframe.to_csv(file_name,index=False)

#### Master function which takes url and store data in CSV file

In [39]:
# Creating function which takes url as argument and return    
def get_players(url):
    fideRating = url[34:38]
    doc = get_page(url)
    table_dict = get_dict()
    table_dict = get_rows(doc,table_dict)
    df = get_df(table_dict)
    write_csv(df,fideRating)
    file_name = 'top_100_player_under_'+str(fideRating)+'.csv'
    return 'Top 100 Players Under "{}" written to file "{}"'.format(fideRating,file_name )
    
    
    
    

In [40]:
get_players('https://chess-rankings.com/?categ=3000&label=1')

'Top 100 Players Under "3000" written to file "top_100_player_under_3000.csv"'

In [41]:
data = pd.read_csv("top_100_player_under_3000.csv")
data

Unnamed: 0,Rank,Rank Change,Name,Title,Federation,FIDE Rating,Rating Change,Age,K
0,1,-,"Carlsen, Magnus",GM,Norway,2864.0,0.0,32,10
1,2,-,"Ding, Liren",GM,China,2808.4,2.4,30,10
2,3,↑4,"Nepomniachtchi, Ian",GM,Russia,2792.4,26.4,32,10
3,4,↓1,"Firouzja, Alireza",GM,France,2778.2,-14.8,19,10
4,5,↓1,"Caruana, Fabiano",GM,United States,2775.4,-7.6,30,10
...,...,...,...,...,...,...,...,...,...
95,96,-,"Shevchenko, Kirill",GM,Ukraine,2654.0,0.0,20,10
96,97,-,"Demchenko, Anton",GM,FIDE,2653.0,0.0,35,10
97,98,-,"Swiercz, Dariusz",GM,United States,2652.0,0.0,28,10
98,99,-,"Jones, Gawain C B",GM,UNITED KINGDOM,2652.0,0.0,35,10


In [42]:
get_players('https://chess-rankings.com/?categ=2300&label=2')

'Top 100 Players Under "2300" written to file "top_100_player_under_2300.csv"'

In [43]:
data_1 = pd.read_csv("top_100_player_under_2300.csv")
data_1

Unnamed: 0,Rank,Rank Change,Name,Title,Federation,FIDE Rating,Rating Change,Age,K
0,1,-,"Van Foreest, Machteld",FM,Netherlands,2299.0,0.0,15,40
1,2,-,"Bulockin, Martin",FM,Czech Republic,2299.0,0.0,20,20
2,3,-,"Wendler, David",,Germany,2299.0,0.0,21,20
3,4,↑1,"Yakubbaeva, Nilufar",WGM,Uzbekistan,2299.0,0.0,22,20
4,5,↑1,"Garcia Garcia, Alejandro",FM,Cuba,2299.0,0.0,23,20
...,...,...,...,...,...,...,...,...,...
95,96,↑1,"Perpinya Rofes, Lluis Maria",IM,Spain,2297.0,0.0,48,10
96,97,↑1,"Merriman, John",FM,UNITED KINGDOM,2297.0,0.0,48,20
97,98,↑1,"Rechmann, Peter",FM,Germany,2297.0,0.0,55,20
98,99,↑1,"Martin, Thomas",FM,Germany,2297.0,0.0,56,20


# **Summary**

- The Scraping was done using Python libraries such as Requests, BeatifulSoup for extracting the data
- Scraping Top 100 chess players details such as Rank, Name, Federation, Rating, Age etc.
- Parsed all the scraped data into a csv file containing 100 rows and 9 columns.

# **References**

https://chess-rankings.com/

https://www.youtube.com/watch?v=RKsLLG-bzEY

https://dorianlazar.medium.com/scraping-medium-with-python-beautiful-soup-3314f898bbf5
