# Nature in a warming World - Data wrangling

This notebook shows the extraction and cleaning of data used in our project.

## Data Sources

**1. International Union for Conservation of Nature [IUCN RED List API -v3](https://apiv3.iucnredlist.org/api/v3/docs)**: Threatened species around the World data. In order to access the data it is necessary to ask for a token.

**2. [World Bank: Climate change knowdledge portal](https://climateknowledgeportal.worldbank.org)**: Climate change related variables, precipitation and average temperatures monthly and yearly for each country.

**3. [National Centers for Environmental Information](https://www.ncdc.noaa.gov/cag/)**: Temperature Anomalies dataset.

**4. [EONET API v.2.1](https://eonet.sci.gsfc.nasa.gov/docs/v2.1)**: Natural events ocurrences around the World.

**5. [NASA Climate](https://climate.nasa.gov/vital-signs/sea-level/)**: Sea level rising measurements.

# Importing libraries

In [1]:
import pandas as pd
from pandas import ExcelFile
import numpy as np
import re
import math
import requests
import json
import geopip
import scipy.stats as ss
import geopandas as gpd
import getpass

from datetime import datetime

pd.set_option('display.max_columns', None)

# 1. Data Extraction

First, we introduce a function useful for data cleaning process ahead.

In [2]:
def null_cols(data):
    
    """
    This function takes a dataframe df and shows the columns of df that have NaN values
    and the number of them
    
    """
    
    nulls = data.isna().sum()
    return nulls[nulls > 0] / len(data) * 100

## 1.1. IUCN Red List data set

Because of copyright reasons, we could not extract all the information related to Bird species from the IUCN API. Then, we will use species datasets obtained via FTP from the IUCN and then complete this data with the API.

In [11]:
def collect(file):
    """
    Returns a dataset with all the 'file' data of the Critical endangered species
    """
    c0 = pd.read_csv("data/CR/"+ file +".csv")
    c1 = pd.read_csv("data/Extinct/" + file +".csv")
    c2 = pd.read_csv("data/rest/" + file +".csv")

    return pd.concat([c0, c1, c2])

We collect the ``assessments`` and ``countries`` files by using the function above:

In [19]:
assessments = collect("assessments")
iucnCountries = collect("countries")
taxons= collect("taxonomy")

Having a look to our datasets:

In [20]:
assessments.head(2)

Unnamed: 0,assessmentId,internalTaxonId,scientificName,redlistCategory,redlistCriteria,yearPublished,assessmentDate,criteriaVersion,language,rationale,habitat,threats,population,populationTrend,range,useTrade,systems,conservationActions,realm,yearLastSeen,possiblyExtinct,possiblyExtinctInTheWild,scopes
0,497499,132523146,Hubbsina turneri,Critically Endangered,"B1ab(i,ii,iii,iv)+2ab(i,ii,iii,iv)",2019,2018-04-17 00:00:00 UTC,3.1,English,The Highland Splitfin is now only known to be ...,<p>This species lives in quiet waters with cur...,The species has a restricted range and it is t...,"The only remaining population, that of Lago Za...",Decreasing,The Highland Splitfin is a freshwater fish spe...,The Highland Splitfin is not a target species ...,Freshwater (=Inland waters),No conservation actions targeting&#160;<em>Hub...,Neotropical,,False,False,Global
1,500479,11058,Kubaryia pilikia,Critically Endangered,B1ab(iii),2012,2011-08-22 00:00:00 UTC,3.1,English,"<p><em><span lang=""EN-US""></em><span lang=""EN-...",This species of snail is ground-dwelling and h...,"<p><span lang=""EN-US"">This species is threaten...",There is no information available on this spec...,Unknown,"<p><span lang=""EN-US"">The geographic range of ...",This species is not utilized.,Terrestrial,"<span lang=""EN-US""><span lang=""EN-US"">Field wo...",Oceanian,2003.0,True,False,Global


In [21]:
iucnCountries.head(2)

Unnamed: 0,assessmentId,internalTaxonId,scientificName,code,name,presence,origin,seasonality,formerlyBred
0,500479,11058,Kubaryia pilikia,PW,Palau,Possibly Extinct,Native,,
1,502298,11256,Obovaria haddletoni,US,United States,Possibly Extinct,Native,,


In [22]:
taxons.head(2)

Unnamed: 0,internalTaxonId,scientificName,kingdomName,phylumName,orderName,className,familyName,genusName,speciesName,infraType,infraName,infraAuthority,subpopulationName,authority,taxonomicNotes
0,132523146,Hubbsina turneri,ANIMALIA,CHORDATA,CYPRINODONTIFORMES,ACTINOPTERYGII,GOODEIDAE,Hubbsina,turneri,,,,,"(de Buen, 1940)",Fernando de Buén described<em>&#160;Hubbsina t...
1,11058,Kubaryia pilikia,ANIMALIA,MOLLUSCA,LITTORINIMORPHA,GASTROPODA,ASSIMINEIDAE,Kubaryia,pilikia,,,,,"Clench, 1948",


From now on we will work on a copy of ``assessments`` dataset.

In [25]:
df = assessments.copy()

# selecting relevant columns
df = df[['assessmentId', 'internalTaxonId', 'scientificName', 'redlistCategory',
       'yearPublished', 'assessmentDate', 'populationTrend', 'systems',
        'realm','scopes']]

#merging the taxon and assessments data
df = df.merge(taxons[["internalTaxonId", "kingdomName", "className"]], on = "internalTaxonId", how = "left")

In [26]:
df.head(2)

Unnamed: 0,assessmentId,internalTaxonId,scientificName,redlistCategory,yearPublished,assessmentDate,populationTrend,systems,realm,scopes,kingdomName,className
0,497499,132523146,Hubbsina turneri,Critically Endangered,2019,2018-04-17 00:00:00 UTC,Decreasing,Freshwater (=Inland waters),Neotropical,Global,ANIMALIA,ACTINOPTERYGII
1,500479,11058,Kubaryia pilikia,Critically Endangered,2012,2011-08-22 00:00:00 UTC,Unknown,Terrestrial,Oceanian,Global,ANIMALIA,GASTROPODA


In [28]:
df.shape

(29284, 12)

In [36]:
iucnCountries.drop_duplicates(inplace = True)

In [40]:
dfen = iucnCountries.merge(df, on= ["assessmentId", "internalTaxonId", "scientificName"], how = "left")

In [43]:
dfen.head()

Unnamed: 0,assessmentId,internalTaxonId,scientificName,code,name,presence,origin,seasonality,formerlyBred,redlistCategory,yearPublished,assessmentDate,populationTrend,systems,realm,scopes,kingdomName,className
0,500479,11058,Kubaryia pilikia,PW,Palau,Possibly Extinct,Native,,,Critically Endangered,2012,2011-08-22 00:00:00 UTC,Unknown,Terrestrial,Oceanian,Global,ANIMALIA,GASTROPODA
1,502298,11256,Obovaria haddletoni,US,United States,Possibly Extinct,Native,,,Critically Endangered,2012,2012-04-11 00:00:00 UTC,,Freshwater (=Inland waters),Nearctic,Global,ANIMALIA,BIVALVIA
2,502943,11479,Lemiox rimosus,US,United States,Extant,Native,"[""Resident""]",,Critically Endangered,2012,2012-03-12 00:00:00 UTC,Decreasing,Freshwater (=Inland waters),Nearctic,Global,ANIMALIA,BIVALVIA
3,509782,12803,Margaritifera hembeli,US,United States,Extant,Native,"[""Resident""]",,Critically Endangered,2012,2012-03-05 00:00:00 UTC,Decreasing,Freshwater (=Inland waters),Nearctic,Global,ANIMALIA,BIVALVIA
4,510583,12930,Medionidus walkeri,US,United States,Extant,Native,"[""Resident""]",,Critically Endangered,2012,2012-03-05 00:00:00 UTC,Decreasing,Freshwater (=Inland waters),Nearctic,Global,ANIMALIA,BIVALVIA


In [42]:
null_cols(dfen)

code                0.217727
seasonality        45.626141
formerlyBred       86.529007
populationTrend    19.614763
systems             0.036873
realm              13.613218
dtype: float64

In [44]:
dfen.drop(columns = ["seasonality", "formerlyBred"], inplace = True)

In [46]:
null_cols(dfen)

code                0.217727
populationTrend    19.614763
systems             0.036873
realm              13.613218
dtype: float64

In [48]:
dfen.shape

(56952, 16)

## 1.1. IUCN Red List API -v3

In order to use this API is necessary to have a token. To encode the password we use the function getpass. 

In [3]:
token = getpass.getpass() #?token=63b2b24eaec509894a102afaefab4da450e8423af2016db4961a5828c3e896a7

········


Now we define functions to obtain the data from the API.

In [4]:
categories = ["DD", "LC", "NT", "VU", "EN", "CR", "EW", "EX", "LRlc", "LRnt", "LRcd"]

base_category = "https://apiv3.iucnredlist.org/api/v3/species/category/EN"
base_reg = "https://apiv3.iucnredlist.org/api/v3/region/list"
base_hist = "https://apiv3.iucnredlist.org/api/v3/species/history/{}/:{}" # {name of species}


def extract_spe(keys, data):
    """
    Takes the keys of a dictionary and return the values as columns of the dataframe
    data
    """
    for key in keys:
        data[key] = data.result.apply(lambda x: x[key])
    return data

def extract_country(keys, data):
    
    """
    Takes the keys of a dictionary and return the values as columns of the dataframe
    data
    """
    for key in keys:
        data[key] = data.results.apply(lambda x: x[key])
    return data

def get_iucn_cat(cat):
    """
    Takes a category "cat" and returns all the endangered species
    whose vulnerability status is "cat".
    """
    
    base_category = "https://apiv3.iucnredlist.org/api/v3/species/category/"
    url = base_category + cat + token
    result = requests.get(url)
    df0 = pd.DataFrame(result.json(), columns = ["category",  "result"])
    
    return df0

def get_iucn_country_list():
    
    """
    Returns the country list with iso3 codes from the IUCN data.
    """
    keys_d = ['isocode', 'country']
    url = "https://apiv3.iucnredlist.org/api/v3/country/list"+ token
    result = requests.get(url)
    df0= pd.DataFrame(result.json(), columns = ["results"])
    df0 = extract_country(keys_d, df0)
    df0.drop(columns = "results", inplace = True)
    
    return df0

def get_iucn_country(name):
    """
    Returns the description of endangered species whose habitat is the country
    "name".
    """
    
    base = "https://apiv3.iucnredlist.org/api/v3/country/getspecies/"

    url = base +name+token
    result = requests.get(url)
    return pd.DataFrame(result.json(), columns = ["country", "result"])
    

In [5]:
codes = pd.read_csv("data/country_code.csv")   

In [6]:
codes.head()

Unnamed: 0,Country,ISO2,ISO3
0,Afghanistan,AF,AFG
1,Albania,AL,ALB
2,Algeria,DZ,DZA
3,American Samoa,AS,ASM
4,Andorra,AD,AND


In [None]:
def get_iucn_group_list():
    
    """
    Returns a list having all the comprehensive groups established by the IUCN.
    """
    
    url = "https://apiv3.iucnredlist.org/api/v3/comp-group/list"+ token
    result = requests.get(url)
    df0 = pd.DataFrame(result.json(), columns = ["result"])
    df0 = extract_spe(["group_name"], df0)
    df0.drop(columns = "result", inplace = True)
    
    return df0.group_name.to_list()

def get_iucn_group(name):
    
    base = "https://apiv3.iucnredlist.org/api/v3/comp-group/getspecies/"

    url = base +name+token
    result = requests.get(url)
    df0 = pd.DataFrame(result.json(), columns = ["result"])
    df0["group"] = name
    
    return df0
    

In [None]:
def get_data(list_cat, cat_function):
    
    """
    Returns a dataset listing all species whose comprehensive group is within 'list_cat'
    by using the function cat_function for extracting the data from API.
    """
    
    list_data = []
    for name in list_cat:
        list_data.append(cat_function(name))
    
    df0 = pd.concat(list_data)
    df0.reset_index(inplace= True, drop = True)
    
    return df0
              

In [None]:
groups = get_data(get_iucn_group_list(), get_iucn_group)

In [None]:
by_groups = extract_spe(['taxonid', 'scientific_name', 'subspecies', 'rank', 'subpopulation', 'category'], groups)

In [None]:
all_categories = get_data(['LC', 'DD', 'VU', 'EN', 'NT', 'CR', 'EX', 'EW', 'LRlc', 'LRnt', 'LRcd'], get_iucn_cat)

In [None]:
by_c= extract_spe(['taxonid', 'scientific_name', 'subspecies', 'rank', 'subpopulation'], all_categories)

In [None]:
by_groups.shape

In [None]:
by_groups.drop(columns = ["result", "subspecies", "rank", "subpopulation"], inplace= True)

In [None]:
by_groups.nunique()

In [None]:
by_c.drop(columns = ["result", "subspecies", "rank", "subpopulation"], inplace = True)

In [None]:
by_c.shape

In [None]:
countries.reset_index(drop =True, inplace= True)

In [None]:
key = list(countries.result[0].keys())

In [None]:
dfc = extract_spe(['taxonid', 'scientific_name', 'subspecies', 'rank', 'subpopulation', 'category'], countries)

In [None]:
dfc.drop(columns = ["result", "subspecies", "rank", "subpopulation"], inplace= True)

In [None]:
dfc.nunique()

In [None]:
by_c.nunique()

In [None]:
dfc = dfc.merge(codes, left_on = "country", right_on = "ISO2", how = "left")

In [None]:
null_cols(dfc)

In [None]:
dfc.category.value_counts()

In [None]:
def get_iucn_species(number):
    
    base = "https://apiv3.iucnredlist.org/api/v3/species/id/"

    url = base +str(number)+token
    res = requests.get(url)
    df0 = pd.DataFrame(res.json())
    
    return df0
    

In [None]:
spe_endang = list(dfc[dfc.category.isin({"VU", "EN", "CR", "EX", "EW"})].taxonid)

In [None]:
len(spe_endang)

In [None]:
s0 = get_data(spe_endang[0:2000], get_iucn_species)

In [None]:
s1 = get_data(spe_endang[2000:4000], get_iucn_species)

In [None]:
s1 = get_data(spe_endang[2000:4000], get_iucn_species)

In [None]:
all_spe = get_data(list_spe[0:1000], get_iucn_species)

In [None]:
all_spe2 = get_data(list_spe[1000:2000], get_iucn_species)

In [None]:
all_spe = pd.concat([all_spe, all_spe2])

In [None]:
all_spe.reset_index(drop = True, inplace= True)

In [None]:
dfc.taxonid.nunique()

In [None]:
def collect(file):
    c0 = pd.read_csv("data/CR/"+ file +".csv")
    c1 = pd.read_csv("data/Extinct/" + file +".csv")
    c2 = pd.read_csv("data/rest/" + file +".csv")

    return pd.concat([c0, c1, c2])

In [None]:
assessments = collect("assessments")

In [None]:
iucnCountries = collect("countries")

In [None]:
assessments.shape

In [None]:
assessments.redlistCategory.value_counts()

In [None]:
df = assessments.copy()

In [None]:
dfc.category.value_counts()

In [None]:
df.nunique()

In [None]:
df.redlistCategory.value_counts()

In [None]:
df = df[['assessmentId', 'internalTaxonId', 'scientificName', 'redlistCategory',
       'yearPublished', 'assessmentDate', 'populationTrend', 'systems',
        'realm','scopes']]

In [None]:
df.head()

In [None]:
taxons= collect("taxonomy")

In [None]:
df = df.merge(taxons[["internalTaxonId", "kingdomName", "className"]], on = "internalTaxonId", how = "left")

In [None]:
dfc[dfc.category == "EX"].country.value_counts()

In [None]:
dfc.drop_duplicates(inplace = True)

In [None]:
dfcop = dfc.copy()

In [None]:
dfen = dfcop[dfcop.category.isin({"VU", "EN", "CR", "EX", "EW"})]

In [None]:
dfen = dfen.merge(df, left_on = ["taxonid", "scientific_name"], right_on = ["internalTaxonId", "scientificName"], how = "left")

In [None]:
df.head()

In [None]:
null_cols(dfen)

In [None]:
missingids = list(dfen[dfen.assessmentId.isna()].taxonid.value_counts().index)

In [None]:
len(missingids)

In [None]:
missing = get_data(missingids, get_iucn_species)

In [None]:
missing.result[0]

In [None]:
all_spe.result[0].keys()

In [None]:
missing = extract_spe(['taxonid', 'scientific_name', 'kingdom', 'phylum', 'class', 'order', 'family',
                       'genus', 'main_common_name', 'authority', 'published_year', 'assessment_date', 
                       'category', 'criteria', 'population_trend', 'marine_system', 'freshwater_system', 
                       'terrestrial_system', 'assessor', 'reviewer', 'aoo_km2', 'eoo_km2', 'elevation_upper',
                       'elevation_lower', 'depth_upper', 'depth_lower', 'errata_flag', 'errata_reason', 'amended_flag',
                       'amended_reason'], missing)

In [None]:
missing.drop(columns = ["name", "result"], inplace = True)

In [None]:
missing2 = missing.copy()

In [None]:
missing2.drop(columns= ["phylum", "order", "family", "genus", "main_common_name", "authority", "criteria", "assessor", "reviewer", 'aoo_km2', 'eoo_km2', 'elevation_upper',
                       'elevation_lower', 'depth_upper', 'depth_lower', 'errata_flag', 'errata_reason', 'amended_flag',
                       'amended_reason'], inplace = True)

In [None]:
missing2.head()

In [None]:
missing.to_csv("data/missing.csv")

In [None]:
df.head()

In [None]:
dff = df[["internalTaxonId", "scientificName", "kingdomName", "className", "yearPublished", "assessmentDate", "redlistCategory", "populationTrend", "systems", "realm"]].copy()

In [None]:
missing2.columns

In [None]:
colnames = {"internalTaxonId": 'taxonid', 
            "scientificName": 'scientific_name', 
            "kingdomName": 'kingdom', 
            "className": 'class', 
            "yearPublished":'published_year',
            "assessmentDate":'assessment_date', 
            "redlistCategory":'category', 
            "populationTrend": 'population_trend'}

dff.rename(columns = colnames, inplace = True)

In [None]:
dff.systems.value_counts()

In [None]:
def get_system(system):
    if system == "Terrestrial":
        return "terrestrial"
    elif system == "Freshwater (=Inland waters)":
        return "freshwater"
    elif system == "Terrestrial|Freshwater (=Inland waters)":
        return "terrestrial, freshwater"
    elif (system == "Marine" or system== "Marine|Marine"):
        return "marine"
    elif system == "Terrestrial|Marine":
        return "terrestrial, marine"
    elif system == "Freshwater (=Inland waters)|Marine":
        return "freshwater, marine"
    elif system == "Terrestrial|Freshwater (=Inland waters)|Marine":
        return "terrestrial, freshwater, marine"
    

In [None]:
dff["system"] = dff.systems.apply(get_system)

In [None]:
dff.drop(columns = "systems", inplace = True)

In [None]:
dff.system.fillna("terrestrial", inplace= True)

In [None]:
dff["marine_system"] = dff.system.apply(lambda n: 1 if "marine" in n.split(", ") else 0)

In [None]:
dff.system.value_counts()

In [None]:
dff["freshwater_system"] = dff.system.apply(lambda n: 1 if "freshwater" in n.split(", ") else 0)

In [None]:
dff["terrestrial_system"] = dff.system.apply(lambda n: 1 if "terrestrial" in n.split(", ") else 0)

In [None]:
dff.drop(columns = "realm", inplace = True)

In [None]:
dff.drop(columns = "system", inplace = True)

In [None]:
missing2.head()

In [None]:
dff.head()

In [None]:
status = {"VU": "Vulnerable",
          "EN": "Endangered",
          "CR": "Critically Endangered",
          "EX": "Extinct",
          "EW": "Extinct in the Wild"
          }
missing2["category"] = missing2.category.apply(lambda n: status[n])

In [None]:
taxondata = pd.concat([dff,missing2]).reset_index(drop = True)

In [None]:
taxondata.head()

In [None]:
taxondata.category.value_counts()

In [None]:
geogdata = dfcop[dfcop.category.isin({"VU", "EN", "CR", "EX", "EW"})].copy()

In [None]:
spe_complete = geogdata.merge(taxondata, on = ["taxonid", "scientific_name"], how = "left")

In [None]:
spe_complete.head()

In [None]:
spe_complete.drop(columns = "category_x", inplace = True)

In [None]:
spe_complete.rename(columns = {"category_y": "category"}, inplace= True)

In [None]:
spe_complete.nunique()

In [None]:
null_cols(spe_complete)

In [None]:
spe_complete.Country.fillna("Disputed Territory", inplace = True)

In [None]:
null_cols(spe_complete)

In [None]:
spe_complete.drop(columns = ["country"], inplace = True)

In [None]:
spe_complete["kingdom"].value_counts()

In [None]:
spe_relevant = spe_complete[spe_complete.kingdom.isin({"ANIMALIA", "PLANTAE"})].copy()

In [None]:
spe_relevant.Country.value_counts()

In [None]:
realms = pd.read_csv("data/regions.csv")

In [None]:
realms.drop(columns = "Unnamed: 0", inplace = True)

In [None]:
realms.head()

In [None]:
spe_relevant = spe_relevant[['taxonid', 'kingdom','class','scientific_name',
              'published_year', 'assessment_date', 'category',
           'population_trend', 'marine_system', 'freshwater_system',
            'terrestrial_system', 'Country', 'ISO2', 'ISO3']]

In [None]:
spe_relevant.head()

In [None]:
spe_relevant = spe_relevant.merge(realms[["Realm", "Continent", "Region", "Subregion", "ISO3"]], on = "ISO3", how = "left")

In [None]:
spe_relevant.to_csv("data/species-realm-curate.csv")

In [None]:
spe_relevant.head()

In [None]:
spe_relevant.Realm.value_counts()

In [None]:
spe_relevant.dtypes

In [None]:
spe_relevant["assessment_date"] = pd.to_datetime(spe_relevant.assessment_date, infer_datetime_format=True, utc= True)


In [None]:
spe_relevant["year"] = spe_relevant.assessment_date.dt.year

In [None]:
spe_relevant = spe_relevant.sort_values("assessment_date")

In [None]:
spe_relevant.shape

In [None]:
spe_relevant.year.value_counts()

In [None]:
spe_relevant.groupby(["year", "Realm", "category", "kingdom"]).size()

In [None]:
year_realm = (spe_relevant[["year", "Realm", "taxonid"]]
            .groupby(["year", "Realm"])
            .count()
            .reset_index())

In [None]:
year_realm.head()

In [None]:
pivot_year_realm = pd.pivot_table(year_realm, values='taxonid', 
                                index=['year'], 
                                columns=['Realm'], 
                                aggfunc=np.sum).reset_index()

#Filling with zeros
pivot_year_realm = pivot_year_realm.fillna(0)

In [None]:
cols = ['Afrotropical', 'Antarctic', 'Australasian', 'Indomalayan','Nearctic', 'Neotropical', 'Oceanian', 'Palearctic']

for col in cols:
    pivot_year_realm[col] = pivot_year_realm[col].cumsum()

In [None]:
pivot_year_realm

In [None]:
# Here we use hvplot library

#Ploting both lines: Variation extent and variation area
"""
j = pivot_cat_year.hvplot.area(x ='yearPublished', 
                       y = ["Extinct in the Wild", "Extinct", 'Critically Endangered', 'Endangered',  "Vulnerable"], 
                       value_label ='Number of species', # counts of species is the numerical feature
                       title = "Animal species in the IUCN red list (2000-2019)",
                       xlabel = "Year",
                       cmap = "Pastel1", # colormap set
                       width =800, 
                       height =400,
                       line_width = 0.5,
                       line_join = "round")

j.opts(legend_position='top_left')
"""

j = (pivot_year_realm.hvplot.line(x= "year", 
                          y= ["Afrotropical", "Antarctic", 'Australasian', 'Indomalayan',  "Nearctic", "Neotropical", "Oceanian", "Palearctic"], 
                          value_label='number of species',
                          title = "Endangered and Extinct species by Bio Realm",
                          xlabel = "year",
                          ylabel = "number of species",
                          #  logy = True, possible to do
                          cmap = "glasbey_cool",
                          width=900, 
                          height=400,
                          line_width = 3,
                          alpha = 0.6))

#positioning legends
j.opts(legend_position='right')

In [None]:
plants = set(spe_relevant[spe_relevant["kingdom"] == "PLANTAE"]["class"].value_counts().index)


In [None]:
spe_relevant[spe_relevant["Realm"]== "Australasian"]["class"].value_counts()

In [None]:
#Plot with HoloViews

temp_an = hv.Curve(global_anomalies, 
                   ('Year', 'Year'), 
                   ('J-D', 'Annual variation'), 
                   label="Temperature Anomalies")

#Line options
temp_an.opts(opts.Curve(height=500, 
                        width=800, 
                        line_width=2, 
                        color="orange", 
                        tools=['hover']))

#baseline plotting
baseline = (hv.HLine(0))

baseline.opts(opts.HLine(color = "cornflowerblue", 
                         line_width = 1, 
                         tools = ["hover"], 
                         line_dash='dashed'))

#Composing the plot
temp_an * baseline * hv.Text(2000, -0.05, 'Baseline temperature 1960')

# Temperatures and Precipitation datasets

Temperature & precipitation datasets:  https://climateknowledgeportal.worldbank.org
We aim to classify or cluster countries in 4 categories:
 * Cold Dry
 * Cold Wet
 * Hot dry
 * Hot wet

Then the idea is to explore the impact of climate change in representatives of those regions including the frequency or vulnerability of suffering natural events.


In [None]:
temp_anomal = pd.read_csv('data/graph.txt', sep="     ", header=None, engine= "python")


In [None]:
temp_anomal.columns = ["Year", "temp", "lowest"]

In [None]:
temp_anomal.drop([0], inplace= True)

In [None]:
temp_anomal = temp_anomal.apply(pd.to_numeric)

In [None]:
temp_anomal.dtypes

In [None]:
prec_temp = pd.read_csv("data/prec_temp.csv")

In [None]:
prec_temp.reset_index(drop = True, inplace = True)

In [None]:
prec_temp.drop(columns = "Unnamed: 0", inplace= True)

In [None]:
prec_temp.merge()

In [None]:
years = hv.HoloMap(kdims=['Year'])

for i in range(2000, 2017):
    years[i] = hv.Points(prec_temp[prec_temp.Year == i], 
                   ["Precipitation", "Temperature"],
                   ['Country', 'Year', "Realm", "ISO3"]).sort('Year')
    
    tooltips = [('Country', '@Country'),
            ('Year', '@Year'), ("Realm", "@Realm")
            ]

    hover = HoverTool(tooltips=tooltips)

    years[i].opts(tools=[hover], 
            color='Realm', 
            cmap='Set2',
            line_color='black', 
            padding=0.1, 
            size = 5,
            width=600, 
            height=350, 
            show_grid=True,
            #logx = True,
            title='Temperature vs precipitation')
    
years.opts(legend_position='right')    

In [None]:
prec_temp.head()

In [None]:
geom = gpd.read_file('data/countries.geojson')

In [None]:
geom.head()

In [None]:
geom.shape

In [None]:
prec_temp = prec_temp.merge(geom[["ISO_A3", "geometry"]], left_on = "ISO3", right_on = "ISO_A3")

In [None]:
prec_temp.drop(columns = "ISO_A3", inplace = True)

In [None]:
prec_temp.dtypes

In [None]:
from geopandas import GeoDataFrame

gdf = GeoDataFrame(prec_temp, geometry="geometry")

In [None]:
gdf = gdf[gdf.Year==1991]

In [None]:
gv.Polygons(gdf, 
            vdims=['Temperature', "Country"]).opts(tools=['hover'],
                                                   width=800,
                                                   height= 500,
                                                   projection=crs.PlateCarree(),
                                                   cmap = "Spectral_r")

In [None]:
prec_temp.head()

In [None]:
pd.read_csv("data/temperature.csv")

In [None]:
pd.read_csv("data/africa.csv")

In [None]:
emissions_pc = pd.read_csv("data/co-emissions-per-capita.csv")

In [None]:
emissions_share = pd.read_csv("data/annual-share-of-co2-emissions.csv")

In [None]:
emissions_share.rename(columns = {"Share of global CO₂ emissions (%)": "share_perc"}, inplace = True)

In [None]:
emissions_pc.rename(columns = {"Per capita CO₂ emissions (tonnes per capita)": "emissions_pc"}, inplace = True)

In [None]:
emissions_share.merge(regions[["ISO3"]])

In [None]:
emissions_share = emissions_share.merge(realms[["Realm", "ISO3"]], left_on = "Code", right_on = "ISO3", how = "left")

In [None]:
emissions_pc= emissions_pc.merge(realms[["Realm", "ISO3"]], left_on = "Code", right_on = "ISO3", how = "left")

In [None]:
emissions_share.drop(columns = "ISO3", inplace = True)
emissions_pc.drop(columns = "ISO3", inplace = True)


In [None]:
emissions_share.head()

In [None]:
emissions_share.head()

In [None]:
emissions_share[emissions_share.Code.isna()].Entity.value_counts()

In [None]:
realms.head()