# Analyse historical data from the Climate Change Knowledge Portal 

Author: [Giuseppe La Rocca](mailto:giuseppe.larocca@egi.eu)

Creation date: 03-Sept-2019

Last updated: 04-Sept-2019

---

## Description: 

Calculate the historical precipitation data derived from the Climate Research Unit (Mitchell et at, 2003) aggregated to country and basin levels. Data is obtained from the [World Data Catalogue](https://datacatalog.worldbank.org/dataset/climate-change-knowledge-portal-historical-data) and made available in the EGI DataHub with PID http://hdl.handle.net/21.T15999/3Byz9Cw

The "Climate Chnage Knowledge Portal: Historical Data" spreadsheet contains the following tabs:

* <u>Country_temperatureCRU</u>: mean monthly and annual temperatures by country for the period 1961-1999.  Values are in degrees Celsius.
* <u>Country_precipitationCRU</u>: mean monthly and annual precipitation by country for the period 1961-1999.  Values are in millimeters (mm).

For this exercise the dataset in the <u>Country_temperatureCRU</u> tab will be used.


## Import necessary libraries

In [None]:
import pandas as pd
from pandas import DataFrame
import matplotlib.pyplot as plt


# datahub and auxiliary libraries
import os
import requests
from fs.onedatafs import OnedataFS

## Resolve PID to DataHub files

In [None]:
# First get DataHub share from handle
PID = 'http://hdl.handle.net/21.T15999/3Byz9Cw'

r = requests.get(PID, allow_redirects=False)
share = os.path.basename(r.headers['Location'])

# And now get the path of the file in onedata
# From the share info
r = requests.get('https://datahub.egi.eu/api/v3/onezone/shares/%s' % share,
                 headers={'X-auth-token': os.environ['ONECLIENT_ACCESS_TOKEN'],
                          'Accept': 'application/json'})
space_id = r.json()['spaceId']
folder_name = r.json()['name']
# And the space info
r = requests.get('https://%s/api/v3/oneprovider/spaces/%s' % (os.environ['ONEPROVIDER_HOST'], space_id),
                 headers={'X-Auth-Token': os.environ['ONECLIENT_ACCESS_TOKEN']})
space_name = r.json()['name']
datahub_path = os.path.join('/', space_name, folder_name)

print("Data is available at %s" % datahub_path)

### Provide the ISO_3DIGIT of the country you are interested to analyse

In [None]:
ISO_3DIGIT="ITA"

## Load historical datasets from local and create a DataFrame object

In [None]:
file_name = os.path.join(datahub_path, 'cckp_historical_data_0.xls')

# Create connection to Oneprovider
odfs = OnedataFS(os.environ['ONEPROVIDER_HOST'],
                 os.environ['ONECLIENT_ACCESS_TOKEN'],
                 force_direct_io=True)

raw_data = pd.read_excel(odfs.open(file_name, 'rb'), sheet_name='Country_temperatureCRU')

### Show keys() and datasets

In [None]:
# Show available keys()
raw_data.keys()

In [None]:
raw_data[:10]

### Group datasets based on the "ISO_3DIGIT" code and check data structure

In [None]:
average_annual_temperature = raw_data.groupby(['ISO_3DIGIT'])
#average_annual_temperature.describe()

In [None]:
# Filter datasets by ISO_3DIGIT
iso_3digit_average_annual_temperature = average_annual_temperature.get_group(ISO_3DIGIT)
iso_3digit_average_annual_temperature

### Create the DataFrame to plot

In [None]:
Data = {
    'Mean monthly and annual temperature for period 1961-1999': [
          'Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sept','Oct','Nov','Dec'
    ],
    
    'Temperatures': [
          iso_3digit_average_annual_temperature['Jan_Temp'].values[0],
          iso_3digit_average_annual_temperature['Feb_temp'].values[0],
          iso_3digit_average_annual_temperature['Mar_temp'].values[0],
          iso_3digit_average_annual_temperature['Apr_Temp'].values[0],
          iso_3digit_average_annual_temperature['May_temp'].values[0],
          iso_3digit_average_annual_temperature['Jun_Temp'].values[0],
          iso_3digit_average_annual_temperature['July_Temp'].values[0],
          iso_3digit_average_annual_temperature['Aug_Temp'].values[0],
          iso_3digit_average_annual_temperature['Sept_temp'].values[0],
          iso_3digit_average_annual_temperature['Oct_temp'].values[0],
          iso_3digit_average_annual_temperature['Nov_Temp'].values[0],
          iso_3digit_average_annual_temperature['Dec_temp'].values[0]
    ]
}

data_frame=DataFrame(Data, columns=['Mean monthly and annual temperature for period 1961-1999', 'Temperatures'])
data_frame

### Plot the DataFrame 

In [None]:
data_frame.plot(
    x='Mean monthly and annual temperature for period 1961-1999', 
    y='Temperatures',
    color='lightblue', 
    figsize=(10,5),
    linewidth='3')

# Add legend, grid and show the plot
plt.grid()
plt.legend()

# Saving the final plot
plt.savefig("temperatures.png")