# Web Scraping Sovereign Debt Ratings
The historical sovereign credit ratings will be obtained from the website https://countryeconomy.com/ratings. Below is an image rendering of said website.

![title](img/ratings_homepage.png)

In [1]:
from bs4 import BeautifulSoup, SoupStrainer
import requests
import pandas as pd
import time
url_countries = requests.get("https://countryeconomy.com/ratings").content
# We are only inetrestaed in the main table of homepage which contains the list of all the coutries.
parse_table = SoupStrainer(id="tb1T")
soup = BeautifulSoup(url_countries, 'lxml',parse_only=parse_table)
print(soup.prettify())

<!DOCTYPE html>
<table class="table tabledat table-striped table-condensed table-hover" id="tb1T">
 <thead>
  <tr class="tableheader">
   <th style=" width:19%;">
   </th>
   <th style=" width:27%;">
    <a href="/ratings/moodys">
     Moody's ratings [+]
    </a>
   </th>
   <th style=" width:27%;">
    <a href="/ratings/standardandpoors">
     S&amp;P ratings [+]
    </a>
   </th>
   <th style=" width:27%;">
    <a href="/ratings/fitch">
     Fitch ratings [+]
    </a>
   </th>
  </tr>
 </thead>
 <tbody>
  <tr>
   <td>
    <a href="/ratings/usa">
     United States [+]
    </a>
   </td>
   <td>
    <span class="graph_hbar" style="background-color: #8DEEEE; width: 77%;">
    </span>
    <span class="padleft">
     Aaa
    </span>
   </td>
   <td>
    <span class="graph_hbar" style="background-color: #00D600; width: 74%;">
    </span>
    <span class="padleft">
     AA+
    </span>
   </td>
   <td>
    <span class="graph_hbar" style="background-color: #8DEEEE; width: 77%;">
    </span>

In [2]:
#In the main table, the coutnries are listed as part of the body.
table = soup.find('tbody')
print(table.prettify())

<tbody>
 <tr>
  <td>
   <a href="/ratings/usa">
    United States [+]
   </a>
  </td>
  <td>
   <span class="graph_hbar" style="background-color: #8DEEEE; width: 77%;">
   </span>
   <span class="padleft">
    Aaa
   </span>
  </td>
  <td>
   <span class="graph_hbar" style="background-color: #00D600; width: 74%;">
   </span>
   <span class="padleft">
    AA+
   </span>
  </td>
  <td>
   <span class="graph_hbar" style="background-color: #8DEEEE; width: 77%;">
   </span>
   <span class="padleft">
    AAA
   </span>
  </td>
 </tr>
 <tr>
  <td>
   <a href="/ratings/uk">
    United Kingdom [+]
   </a>
  </td>
  <td>
   <span class="graph_hbar" style="background-color: #00D600; width: 71%;">
   </span>
   <span class="padleft">
    Aa2
   </span>
  </td>
  <td>
   <span class="graph_hbar" style="background-color: #00D600; width: 71%;">
   </span>
   <span class="padleft">
    AA
   </span>
  </td>
  <td>
   <span class="graph_hbar" style="background-color: #00D600; width: 71%;">
   </span>
 

In [3]:
#In the tavle, each country is a link.
country_links=[]
for link in table('a'):
    #Get a lean coutnry name and its link as a touple.
    link = (link.get_text().replace('[+]','').strip(),link['href'])
    #Append to list of countries
    country_links.append(link)
print('Number of countries: ',len(country_links))
#Visualize first 10 countries in the list
country_links[:10]

Number of countries:  143


[('United States', '/ratings/usa'),
 ('United Kingdom', '/ratings/uk'),
 ('Germany', '/ratings/germany'),
 ('France', '/ratings/france'),
 ('Japan', '/ratings/japan'),
 ('Spain', '/ratings/spain'),
 ('Italy', '/ratings/italy'),
 ('Portugal', '/ratings/portugal'),
 ('Greece', '/ratings/greece'),
 ('Ireland', '/ratings/ireland')]

In [4]:
#Check for any invalid links
error_country=[]
for link in country_links:
    try:
        r = requests.get("https://countryeconomy.com"+link[1])
        r.raise_for_status()
    except requests.exceptions.HTTPError as err:
        error_country.append(link[0])
        print (err)
if len(error_country)==0:
    print('All links are working')

All links are working


In [5]:
#look thorugh the first country's page html
url = requests.get("https://countryeconomy.com"+country_links[0][1]).content
#Only interested in the table containing the data
parse_tables = SoupStrainer(id="myTabContent")
soup = BeautifulSoup(url, 'lxml',parse_only=parse_tables)
print(soup.prettify())

<!DOCTYPE html>
<div class="tab-content col-sm-12" id="myTabContent">
 <div class="tab-pane fade in active" id="moodys">
  <a id="MOODYS">
  </a>
  <div class="tabletit">
   Rating Moody's United States
  </div>
  <div class="table-responsive">
   <table class="table tabledat table-striped table-condensed table-hover" id="tb0_963">
    <thead>
     <tr class="tableheader">
      <th class="wborder" colspan="4">
       Long term Rating
      </th>
      <th class="wborder" colspan="4">
       Short term Rating
      </th>
     </tr>
     <tr class="tableheader">
      <th class="wborder" colspan="2">
       Foreign currency
      </th>
      <th class="wborder" colspan="2">
       Local currency
      </th>
      <th class="wborder" colspan="2">
       Foreign currency
      </th>
      <th class="wborder" colspan="2">
       Local currency
      </th>
     </tr>
     <tr class="tableheader">
      <th class="wborder">
       Date
      </th>
      <th class="wborder">
       Rating(Out

In [6]:
tables = soup.find_all('tbody')
print('Number of tables: ',len(tables))

Number of tables:  3


In [7]:
#Instantiate a list that will contain all the scraped ratings.
ratings = []
#Labels of the three credit rating agencies
agencies= ['moodys','sp','fitch']
for link in country_links:
    time.sleep(1)
    url = requests.get("https://countryeconomy.com"+link[1]).content
    parse_tables = SoupStrainer(id="myTabContent")
    soup = BeautifulSoup(url, 'lxml',parse_only=parse_tables)
    tables = soup.find_all('tbody')
    #There are three tbody tags in each page, one for each of the credit agencies.
    for table, agency in zip(tables,agencies):
        #Print task-at-hand
        print(f'Scraping {link[0]} {agency} soverign credit ratings')
        for row in table('tr'):
            #Each row is a credit rating recieved
            #If the first cell of the row is empty then no long-term sovereign credit rating was given
            if not row('td')[0].get_text():
                continue
            date = row('td')[0].get_text(strip=True)
            #get ratings without the outlook.
            rating = row('td')[1].get_text().replace("("," ").split(" ")[0].strip()
            #Add the rating to the list
            ratings.append([date,link[0],rating,agency])
    print(f'Web scraping for {link[0]} completed\n')
#Convert ratings list into pandas dataframe
ratings_df = pd.DataFrame(ratings, columns=['date','country','rating','agency'])
print('Ratings dataframe ready.')

Scraping United States moodys soverign credit ratings
Scraping United States sp soverign credit ratings
Scraping United States fitch soverign credit ratings
Web scraping for United States completed

Scraping United Kingdom moodys soverign credit ratings
Scraping United Kingdom sp soverign credit ratings
Scraping United Kingdom fitch soverign credit ratings
Web scraping for United Kingdom completed

Scraping Germany moodys soverign credit ratings
Scraping Germany sp soverign credit ratings
Scraping Germany fitch soverign credit ratings
Web scraping for Germany completed

Scraping France moodys soverign credit ratings
Scraping France sp soverign credit ratings
Scraping France fitch soverign credit ratings
Web scraping for France completed

Scraping Japan moodys soverign credit ratings
Scraping Japan sp soverign credit ratings
Scraping Japan fitch soverign credit ratings
Web scraping for Japan completed

Scraping Spain moodys soverign credit ratings
Scraping Spain sp soverign credit ratin

Scraping Cyprus moodys soverign credit ratings
Scraping Cyprus sp soverign credit ratings
Scraping Cyprus fitch soverign credit ratings
Web scraping for Cyprus completed

Scraping Czech Republic moodys soverign credit ratings
Scraping Czech Republic sp soverign credit ratings
Scraping Czech Republic fitch soverign credit ratings
Web scraping for Czech Republic completed

Scraping Denmark moodys soverign credit ratings
Scraping Denmark sp soverign credit ratings
Scraping Denmark fitch soverign credit ratings
Web scraping for Denmark completed

Scraping Dominican Republic moodys soverign credit ratings
Scraping Dominican Republic sp soverign credit ratings
Scraping Dominican Republic fitch soverign credit ratings
Web scraping for Dominican Republic completed

Scraping Ecuador moodys soverign credit ratings
Scraping Ecuador sp soverign credit ratings
Scraping Ecuador fitch soverign credit ratings
Web scraping for Ecuador completed

Scraping Estonia moodys soverign credit ratings
Scraping 

Scraping Malta moodys soverign credit ratings
Scraping Malta sp soverign credit ratings
Scraping Malta fitch soverign credit ratings
Web scraping for Malta completed

Scraping Mauritius moodys soverign credit ratings
Scraping Mauritius sp soverign credit ratings
Scraping Mauritius fitch soverign credit ratings
Web scraping for Mauritius completed

Scraping Maldives moodys soverign credit ratings
Scraping Maldives sp soverign credit ratings
Scraping Maldives fitch soverign credit ratings
Web scraping for Maldives completed

Scraping Malawi moodys soverign credit ratings
Scraping Malawi sp soverign credit ratings
Scraping Malawi fitch soverign credit ratings
Web scraping for Malawi completed

Scraping Mexico moodys soverign credit ratings
Scraping Mexico sp soverign credit ratings
Scraping Mexico fitch soverign credit ratings
Web scraping for Mexico completed

Scraping Malaysia moodys soverign credit ratings
Scraping Malaysia sp soverign credit ratings
Scraping Malaysia fitch soverign cr

Scraping Saint Vincent and the Grenadines moodys soverign credit ratings
Scraping Saint Vincent and the Grenadines sp soverign credit ratings
Scraping Saint Vincent and the Grenadines fitch soverign credit ratings
Web scraping for Saint Vincent and the Grenadines completed

Scraping Venezuela moodys soverign credit ratings
Scraping Venezuela sp soverign credit ratings
Scraping Venezuela fitch soverign credit ratings
Web scraping for Venezuela completed

Scraping Vietnam moodys soverign credit ratings
Scraping Vietnam sp soverign credit ratings
Scraping Vietnam fitch soverign credit ratings
Web scraping for Vietnam completed

Scraping South Africa moodys soverign credit ratings
Scraping South Africa sp soverign credit ratings
Scraping South Africa fitch soverign credit ratings
Web scraping for South Africa completed

Scraping Zambia moodys soverign credit ratings
Scraping Zambia sp soverign credit ratings
Scraping Zambia fitch soverign credit ratings
Web scraping for Zambia completed

R

In [8]:
ratings_df.head(10)

Unnamed: 0,date,country,rating,agency
0,2018-04-25,United States,Aaa,moodys
1,2013-07-18,United States,Aaa,moodys
2,2011-08-02,United States,Aaa,moodys
3,2011-07-13,United States,,moodys
4,2003-11-15,United States,,moodys
5,1949-02-05,United States,Aaa,moodys
6,2013-06-10,United States,,sp
7,2011-08-05,United States,AA+,sp
8,2019-04-02,United States,AAA,fitch
9,2018-04-05,United States,AAA,fitch


In [9]:
ratings_df.head(10).isnull()

Unnamed: 0,date,country,rating,agency
0,False,False,False,False
1,False,False,False,False
2,False,False,False,False
3,False,False,False,False
4,False,False,False,False
5,False,False,False,False
6,False,False,False,False
7,False,False,False,False
8,False,False,False,False
9,False,False,False,False


We can see some missing ratings which derive from obtaining only an outlook and not a rating for that specific year. However, they are not NaN, they are just empty strings. To be able to eliminate tehm from the dataset, they should be converted to np.nan type.

In [10]:
import numpy as np
ratings_df.rating.replace("",np.nan, inplace=True)

In [11]:
ratings_df.dropna(inplace=True)

In [12]:
ratings_df.to_csv('data/country_ratings.csv', index=False)