# BAIS:3250 - Final Project
### Scraping API Data

**Author(s):** Natalie Brown, Max Kaiser

**Date Modified:** 11-13-2024 (*date created:* 11-13-2024)


**Description:** Gather data on countries via API

---

### REST Countries API

1) An API token/key is required to access data
2) Visit [request access token page](https://restfulcountries.com/request-access-token)
3) Enter email address, enter website url and hit submit

**GENERATED API TOKEN:** 1631|hpnftSt6KVRzucz3rtC5ZxqCBOOm0S03FC9rzBVP
                                
---

### Import Libaries
* **pandas:** for data frames and data wrangling functions
* **requests:** send HTTP requests and handle responses
* **json:** provides methods to work with JSON data
* **time:** used to wait an alloted amount of time before proceeding to next page

In [4]:
import pandas as pd
import requests
import json
import time

---
### Get Data for all Countries

* first I tested the code on a single country
* once this was confirmed to work, then I gathered data for all countries

In [6]:
# define api token
api_token='1631|hpnftSt6KVRzucz3rtC5ZxqCBOOm0S03FC9rzBVP'

# define base url and parameters
base_url='https://restfulcountries.com/api/v1/countries' 
params={'per_page': 100}  

# define headers
headers={'Accept': 'application/json', 'Authorization': f'Bearer {api_token}'} 

# define empty lists to store data
country_names=[]
country_capitals=[]
covid_cases_list=[]
covid_deaths_list=[]
president_names=[]
president_genders=[]
president_appointment_start_dates=[]
president_appointment_end_dates=[]
continents=[]
sizes=[]
populations=[]

# initialize page number variable
page=1

while True:

    # define current page number
    params['page']=page
    
    # Send GET request to the API 
    response=requests.get(base_url, headers=headers, params=params)

    # check if the request was successful
    if response.status_code == 200:

        # display current retrieval page
        print(f'Retrieving Page: {page}')

        # intialize variable to display amount of requests
        n=0
        
        # get data
        raw_data=response.json()
    
        # define list to get data
        raw_list=raw_data['data']
        
        # iterate through list
        for country in raw_list:
            
            # count iterations
            n+=1
            
            # get information
            name=country.get('name').strip()
            capital=country.get('capital')
        
            # get information on covid-19, but this is a nested dictionary
            covid_list=country.get('covid19')
        
            # get information from the nested list
            covid_cases=covid_list.get('total_case')
            covid_deaths=covid_list.get('total_deaths')
        
        
            # get information on president, but nested dictionary
            president_list=country.get('current_president')

            # get information from nested list, some country's do not have president data
            try:
                president_name=president_list.get('name')
            except:
                president_name='None'
    
            try:
                president_gender=president_list.get('gender')
                
            except:
                president_gender='None'
    
            try:
                appointment_start_date=president_list.get("appointment_start_date")
                appointment_end_date=president_list.get("appointment_end_date")
    
            except:
                appointment_start_date='None'
                appointment_end_date='None'
        
            # continue to get information
            continent=country.get('continent')
            size=country.get('size')
            population=country.get('population')
        
        
            # append to lists
            country_names.append(name)
            country_capitals.append(capital)
            covid_cases_list.append(covid_cases)
            covid_deaths_list.append(covid_deaths)
            president_names.append(president_name)
            president_genders.append(president_gender)
            president_appointment_start_dates.append(appointment_start_date)
            president_appointment_end_dates.append(appointment_end_date)
            continents.append(continent)
            sizes.append(size)
            populations.append(population)

        # if there is no more information to be retrieved then stop
        if len(raw_list) < 100:
            retrieval_status='success'
            print('Less than 100 records retrieved, ending retrieval\n')
            break

        # increment page number to move to next page
        page+=1

        # if there is still information to be retrieved, wait 60 seconds then continue
        print(f'Max requests hit: {n}\nWaiting 60 seconds before next request\n')
        time.sleep(60) # wait for 60 seconds before the next request

    # if failed to retrieve data display message, the status code, and the error response
    else:
        retrieval_status='Failed'
        print('Failed to retrieved data')
        print(f'Status Code: {response.status_code}')
        print(f'Response: {response.text}')

# if retieved properly, put lists into dictionary, display success and the number of countries we retirved data from
if retrieval_status=='success':
    # Combine lists into a dictionary
    country_dict = {
        'country': country_names,
        'capital': country_capitals,
        'covid_cases': covid_cases_list,
        'covid_deaths': covid_deaths_list,
        'president_name': president_names,
        'president_gender': president_genders,
        'president_appointment_start_date': president_appointment_start_dates,
        'president_appointment_end_date': president_appointment_end_dates,
        'continent': continents,
        'size': sizes,
        'population': populations
    }

    # display success
    print(f'Data Retrieval Successful\n-------------------------------------------------\nNumber of Countries Retrieved: {len(country_dict['country'])}')

Retrieving Page: 1
Max requests hit: 100
Waiting 60 seconds before next request

Retrieving Page: 2
Less than 100 records retrieved, ending retrieval

Data Retrieval Successful
-------------------------------------------------
Number of Countries Retrieved: 186


---
### Save Country Dictionary to Data Frame

In [8]:
# save df
country_df=pd.DataFrame(country_dict)

# display header
country_df.head()

Unnamed: 0,country,capital,covid_cases,covid_deaths,president_name,president_gender,president_appointment_start_date,president_appointment_end_date,continent,size,population
0,Afghanistan,Kabul,46498,1774,Ashraf Ghani,Male,2020-03-09,,Asia,"652,000 km²",39306195
1,Albania,Tirana,37625,798,Ilir Rexhep Meta,Male,2017-07-24,,Europe,"28,748 km²",2876490
2,Algeria,Algiers,83199,2431,,,,,Africa,"2,381,741 km²",44190030
3,Andorra,Andorra la Vella,6712,76,,,,,Europe,468 km²,77317
4,Angola,Luanda,15139,348,,,,,Africa,"1,246,620 km²",33312843


---
### Save Raw Data Data Frame to CSV file
* **api_country_data_raw**

In [10]:
# save to csv
country_df.to_csv('api_country_data_raw.csv',sep=',',encoding='utf-8',index=False,header=True)