### Scraping Berkeley Earth

**Using this to retreive the average temperature data for each of the countries available in Berkeley Earth.**

In [29]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

%matplotlib inline

**List of countries**
 - First quantifying the list of all the countries.
 - The below file 'countries.csv' is a list of all the countries available on Berkeley Earth.
 - This was manually created from a Berkeley Earth copy. 

In [30]:
countries = pd.read_csv('countries.csv')
country_lst = list(countries.Country.unique()) #creating a list of all the countries
country_lst = [country.replace(" ","-") for country in country_lst]
print('Total Countries:', len(country_lst))

Total Countries: 237


**Files to download**
 - All necessary files can be found at the following URL:

    http://berkeleyearth.lbl.gov/auto/Regional/TAVG/Text/
    
 - The directory at this location contains the necessary text files in the following format:

    '{country}-TAVG-Trend.txt'
    
**Below are two functions that take a list of countries as an argument, download the files available, and return a list of unsuccessful attempts.**

In [31]:
import requests #library for making URL requests

def dwnld_data(country):
    '''function that downloads the text file from Berkeley Earth for a given country'''
    #setting url to be f string that matches country to be downloaded. 
    url = f'http://berkeleyearth.lbl.gov/auto/Regional/TAVG/Text/{country.lower()}-TAVG-Trend.txt'
    response = requests.get(url) #sends request and records response.
    if response.status_code == 404: #if unsuccessful returns '404'
        return '404'
    else: #otherwise writes a .txt file with countries name to current folder
        open(f"./country_data/{country}.txt", "wb").write(response.content)
        pass

In [32]:
def countries_data(country_list):
    '''function that passes a list of countries to dwnld_data 
    and quantifies unsuccessful attempts.'''
    unsuccessful = [] #list to hold countries for unsuccessful download
    for country in country_list:
        if dwnld_data(country) == '404': #performs 'dwnld_data()' function and checks if country is successful
            unsuccessful.append(country) #appends unsuccessful countries to list
    return unsuccessful

**Testing of the above code for a sample list of countries**

In [34]:
country_sample = country_lst[0:30] #sample list of countries to test
country_sample

['Afghanistan',
 'Åland',
 'Albania',
 'Algeria',
 'American-Samoa',
 'Andorra',
 'Angola',
 'Anguilla',
 'Antarctica',
 'Antigua-and-Barbuda',
 'Argentina',
 'Armenia',
 'Aruba',
 'Australia',
 'Austria',
 'Azerbaijan',
 'Bahamas',
 'Bahrain',
 'Baker-Island',
 'Bangladesh',
 'Barbados',
 'Belarus',
 'Belgium',
 'Belize',
 'Benin',
 'Bhutan',
 'Bolivia',
 'Bonaire,-Saint-Eustatius-and-Saba',
 'Bosnia-and-Herzegovina',
 'Botswana']

In [38]:
unsuccessful = countries_data(country_lst)

In [39]:
print('The countries that did not download:', unsuccessful)

The countries that did not download: ['Åland', "Côte-d'Ivoire", 'Curaçao', 'Saint-Barthélemy']


In [40]:
print(len(unsuccessful))

4
