# Lesson 7: Advanced Web Scraping and Data Gathering
## Topic 3: Reading data from an API
This Notebook shows how to use a free API (no authorization or API key needed) to download some basic information about various countries around the world and put them in a DataFrame.

### Import libraries

In [1]:
import urllib.request, urllib.parse
from urllib.error import HTTPError,URLError
import pandas as pd

### Exercise 20: Define the base URL

In [2]:
serviceurl = 'https://restcountries.eu/rest/v2/name/'

### Exercise 21: Define a function to pull the country data from the API

In [3]:
def get_country_data(country):
    """
    Function to get data about a country from "https://restcountries.eu" API
    """
    country_name=str(country)
    url = serviceurl + country_name
    
    try: 
        uh = urllib.request.urlopen(url)
    except HTTPError as e:
        print("Sorry! Could not retrive anything on {}".format(country_name))
        return None
    except URLError as e:
        print('Failed to reach a server.')
        print('Reason: ', e.reason)
        return None
    else:
        data = uh.read().decode()
        print("Retrieved data on {}. Total {} characters read.".format(country_name,len(data)))
        return data

### Exercise 22: Test the function by passing a correct and an incorrect argument

In [4]:
country_name = 'Switzerland'

In [5]:
data=get_country_data(country_name)

Failed to reach a server.
Reason:  [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond


In [6]:
country_name1 = 'Switzerland1'

In [7]:
data1=get_country_data(country_name1)

Failed to reach a server.
Reason:  [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond


### Exercise 23: Use the built-in `JSON` library to read and examine the data properly

In [8]:
import json

In [9]:
# Load from string 'data'
x=json.loads(data)

TypeError: the JSON object must be str, bytes or bytearray, not NoneType

In [None]:
# Load the only element
y=x[0]

In [None]:
type(y)

In [None]:
y.keys()

### Exercise 24: Can you print all the data elements one by one?

In [None]:
for k,v in y.items():
    print("{}: {}".format(k,v))

### Exercise 25: The dictionary values are not of the same type - print all the languages spoken

In [None]:
for lang in y['languages']:
    print(lang['name'])

### Exercise 26: Write a function which can take a list of countries and return a DataFrame containing key info
* Capital
* Region
* Sub-region
* Population
* lattitude/longitude
* Area
* Gini index
* Timezones
* Currencies
* Languages

In [None]:
import pandas as pd
import json
def build_country_database(list_country):
    """
    Takes a list of country names.
    Output a DataFrame with key information about those countries.
    """
    # Define an empty dictionary with keys
    country_dict={'Country':[],'Capital':[],'Region':[],'Sub-region':[],'Population':[],
                  'Lattitude':[],'Longitude':[],'Area':[],'Gini':[],'Timezones':[],
                  'Currencies':[],'Languages':[]}
    
    for c in list_country:
        data = get_country_data(c)
        if data!=None:
            x = json.loads(data)
            y=x[0]
            country_dict['Country'].append(y['name'])
            country_dict['Capital'].append(y['capital'])
            country_dict['Region'].append(y['region'])
            country_dict['Sub-region'].append(y['subregion'])
            country_dict['Population'].append(y['population'])
            country_dict['Lattitude'].append(y['latlng'][0])
            country_dict['Longitude'].append(y['latlng'][1])
            country_dict['Area'].append(y['area'])
            country_dict['Gini'].append(y['gini'])
            # Note the code to handle possibility of multiple timezones as a list
            if len(y['timezones'])>1:
                country_dict['Timezones'].append(','.join(y['timezones']))
            else:
                country_dict['Timezones'].append(y['timezones'][0])
            # Note the code to handle possibility of multiple currencies as dictionaries
            if len(y['currencies'])>1:
                lst_currencies = []
                for i in y['currencies']:
                    lst_currencies.append(i['name'])
                country_dict['Currencies'].append(','.join(lst_currencies))
            else:
                country_dict['Currencies'].append(y['currencies'][0]['name'])
            # Note the code to handle possibility of multiple languages as dictionaries
            if len(y['languages'])>1:
                lst_languages = []
                for i in y['languages']:
                    lst_languages.append(i['name'])
                country_dict['Languages'].append(','.join(lst_languages))
            else:
                country_dict['Languages'].append(y['languages'][0]['name'])
    
    # Return as a Pandas DataFrame
    return pd.DataFrame(country_dict)

### Exercise 27: Test the function by building a small database of countries' info. Include an incorrect name too.

In [None]:
df1=build_country_database(['Nigeria','Switzerland','France',
                            'Turmeric','Russia','Kenya','Singapore'])

In [None]:
df1