## Data manipulation with Pandas
<img src="https://pandas.pydata.org/_static/pandas_logo.png" width="400" align="left"/>

Here are the standard ways to import Numpy, Pandas, and Pyplot

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

#### Task: use `conda` to install the packages `seaborn`, `openpyxl`, and `requests` within the LUcompute environment

Run the following in a terminal (or anaconda prompt):
```bash
conda install seaborn=0.8
conda install openpyxl=2.4
conda install requests=2.18 
```

Here we import [Seaborn](https://seaborn.pydata.org/index.html)

In [None]:
import seaborn as sns
sns.set() # seaborn overwrites matplotlib's defaults

In [None]:
plt.rcParams.update({'font.size': 14, 'figure.figsize': [8, 4]})
import warnings
warnings.filterwarnings('ignore')

The Pandas library provides: 
* an implementation of a `DataFrame` that is a multidimensional array with row and column labels;
* implementations of data operations that are typical of spreadsheet programs (e.g. Excel).

Pandas data structures are:
* `Series`
* `DataFrame`
* `Index`

A Pandas `Series` is a 1D array where each value is associated with an explicitly defined index

In [None]:
data = pd.Series(np.linspace(7,10,4),index=['a','e','i','o'])
data

In [None]:
data.values

In [None]:
data.index

Analogy with Python dictionary: we are mapping keys to values

In [None]:
data['a']

Pandas structures can be thought of as advanced dictionaries (`json` structure).<br>
Here is a dictionary (`dict`) where we associate some European countries to the emission (in kilograms per capita) of carbon dioxide in 2014 (eurostat)

In [None]:
emission_co2_dict = { 'Estonia' : 14759.47137, 
              'Luxembourg' : 14695.69977, 
              'Netherlands': 11601.47373,
              'Sweden': 5296.85373 }

#### Task: Show the `type` of `data['a']` and `emission_co2_dict`

In [None]:
print(type(data['a']))
print(type(emission_co2_dict))

#### Saving the dictionary as a `json` file

In [None]:
import json
with open('data/data.json', 'w') as outfile:
    json.dump(emission_co2_dict, outfile)

A Pandas `Series` can be readily created from a dictionary

In [None]:
emission_co2 = pd.Series(emission_co2_dict)
emission_co2

Even in this case, indeces can be explicitly specified

In [None]:
pd.Series(emission_co2_dict,index=['Sweden','Netherlands'])

A `DataFrame` consists of a sequence of `Series` with common indeces.<br>
Here we create a `DataFrame` containing values of emissions (in kilograms per capita) of CO$_2$ and N$_2$O in some European countries in 2014 (eurostat).

In [None]:
emission_n20_dict = { 'Estonia' : 2.10151, 
              'Luxembourg' : 1.60177, 
              'Netherlands': 1.58958,
              'Sweden': 1.73669, 
              'Italy': 1.73669 }

emissions = pd.DataFrame({'CO$_2$':emission_co2_dict,'N$_2$O':emission_n20_dict})
emissions

A `DataFrame` can be indexed using `loc` (with keys) or `iloc` (with row number)

In [None]:
emissions.loc['Estonia':'Netherlands']

In [None]:
emissions.iloc[0:3] # similar to numpy slicing

In [None]:
emissions['CO$_2$']

If the column name is a string and is not a method of the `DataFrame`, we can get the column as `DataFrame.column_name`<br>
We create an additional column where CO$_2$ emissions are expressed in tonnes

In [None]:
emissions['tonnes_co2'] = [19401989, np.nan, 175497,195658947,51358876]
emissions.tonnes_co2

In [None]:
emissions[emissions['CO$_2$']<1e4] # similar to masking numpy arrays

We compute the number of inhabitants from the 1st and 3rd column

In [None]:
emissions['population'] = emissions.tonnes_co2 / emissions['CO$_2$'] * 1e3
emissions

### Case Study: Individual Temperature Observations from [Bolin Centre Database](http://bolin.su.se/data/stockholm)

In [None]:
url = 'http://bolin.su.se/data/stockholm/files/stockholm-historical-weather-observations-ver-1.0.2016/temperature/daily/raw/stockholm_daily_temp_obs_1859_1960_t1t2t3txtn.txt'
temp1859_1960 = pd.read_table(url,header=None,engine='python',delim_whitespace=True,
                     names=['year','month','day','morning','noon','evening','tmax','tmin'])
url = 'http://bolin.su.se/data/stockholm/files/stockholm-historical-weather-observations-ver-1.0.2016/temperature/daily/raw/stockholm_daily_temp_obs_1961_2012_t1t2t3txtntm.txt'
temp1961_2012 = pd.read_table(url,header=None,engine='python',delim_whitespace=True,
                     names=['year','month','day','morning','noon','evening','tmax','tmin','estimated diurnal mean'])
url = 'http://bolin.su.se/data/stockholm/files/stockholm-historical-weather-observations-ver-1.0.2016/temperature/daily/raw/stockholm_daily_temp_obs_2013_2016_t1t2t3txtntm.txt'
temp2013_2016 = pd.read_table(url,header=None,engine='python',delim_whitespace=True,
                     names=['year','month','day','morning','noon','evening','tmax','tmin','estimated diurnal mean'])

Now we can turn the dates into indeces and keep the temperatures as columns

In [None]:
temp1859_1960.set_index(['year','month','day'],inplace=True)
temp1961_2012.set_index(['year','month','day'],inplace=True)
temp2013_2016.set_index(['year','month','day'],inplace=True)

`Dataframes` are concatenated into a single `Dataframe`

In [None]:
temp = pd.concat([temp1859_1960,temp1961_2012,temp2013_2016])
temp[:3]

Save tab-separated value file

In [None]:
# reset_index() turns the indeces into columns
sthlm_2d = temp.reset_index()
sthlm_2d = sthlm_2d[['year','tmax']]
# we save as tab-separated values all tmax for all years
sthlm_2d.to_csv('data/sthlm_2d.csv', sep='\t',index=False,na_rep=np.nan)
!head -n 5 data/sthlm_2d.csv

#### Seaborn allows you to plot `DataFrames` directly

In [None]:
# here we plot the joint distributions between tmax and tmin 
ax = sns.jointplot('tmin','tmax',data=temp.reset_index(),kind='hex')
ax.set_axis_labels(xlabel='$T_{min}$ ($^{\circ}C$)', ylabel='$T_{max}$ ($^{\circ}C$)')
plt.show()

`GroupBy` and `mean()` allow us to group by an index (year) and calculate the mean of records from a column (tmax).<br>
This is a simple way to extraxt a `Series` averaging over sets of values in the `DataFrame`.<br>
We smoothen the curve using a LOWESS function (Locally Weighted Scatterplot Smoothing).

In [None]:
import statsmodels.api as sm
lowess = sm.nonparametric.lowess

In [None]:
tmax_year_mean = temp.groupby('year')['tmax'].mean()
plt.plot(tmax_year_mean.index,tmax_year_mean.values,label='Mean $T_{max}$',lw=2,color='grey')
mean_lowess_STHLM = lowess(tmax_year_mean.values,tmax_year_mean.index,frac=0.5)
plt.plot(mean_lowess_STHLM[:,0],mean_lowess_STHLM[:,1],label='Mean $T_{max}$ with LOWESS',lw=4,color='k')
plt.legend(frameon=False)
plt.xlabel('Year'); plt.ylabel('Mean of $T_{max}$ ($^\circ$C)')
plt.title('Stockholm - yearly mean temperature'); plt.ylim(7,13); plt.show()

#### Task: use `GroupBy`,  `mean()`, and `std()` to group by year and calculate the mean and standard deviation of the mean of morning, noon, and evening temperatures. Plot the data with `Matplotlib`.<br>

In [None]:
for i,c in zip(['morning','noon','evening'],plt.rcParams['axes.prop_cycle'].by_key()['color']):
    mean = temp.groupby('year')[i].mean()
    std = temp.groupby('year')[i].std()/np.sqrt(temp.groupby('year')[i].count())
    plt.errorbar(mean.index,mean.values,std.values,label=i,lw=0,marker='o',color=c,
                ms=3, elinewidth=1., capsize=3, capthick=1.)
    lowess_mean = lowess(mean.values,mean.index,frac=0.5)
    plt.plot(lowess_mean[:,0],lowess_mean[:,1],label=i+' LOWESS',lw=4,color=c)
plt.legend(frameon=False, ncol=2)
plt.xlabel('Year'); 
plt.ylabel(r'$\langle T \rangle$ ($^\circ$C)'); 
plt.ylim(2,15); 
plt.show()

To compute the monthly $T_{max}$, use either `GroupBy` and the `aggregate()` method, or use the convenient `pivot_table` method which groups entries into 2D tables.

In [None]:
temp.groupby(['year','month'])['tmax'].aggregate('max').unstack()[:3]

In [None]:
temp.pivot_table('tmax',index='year',columns='month',aggfunc='max')[:3]

### Case Study: Data on Waste Generation from [eurostat](http://ec.europa.eu/eurostat/data/database)
<img src="figs/eurostat.png" width="1000" />

In this case study, we are going to analyze a data set on waste generated by EU countries categorized by economic activity, waste category, and hazard.<br>
The data set contains more than one million entries.<br>
We are going to load, parse, save, explore, and plot the data that has been downloaded from the European Commission website as a tab-separated values file. 

In [None]:
!head -n 2 data/env_wasgen.tsv

Both comma- and tab-separated column names in the header

In [None]:
df = pd.read_table('data/env_wasgen.tsv',header=0,sep=',|\t',engine='python')

In [None]:
df.columns.values

We can also rename the column labels

In [None]:
df = pd.read_table('data/env_wasgen.tsv',header=0,sep=',|\t',engine='python',
                  names=['unit','hazard','nace_r2','waste','country','2014','2012','2010','2008','2006','2004'])

In [None]:
df.columns.values

In [None]:
df.shape

NACE is the statistical classification of economic activities in the European Community<br>
* TOTAL_HH is all NACE activities plus households
* EP_HH is households
* A is agriculture, forestry and fishing 
* B is mining and quarrying
* C is manufacturing
* F	is construction

Waste Categories

In [None]:
df.waste.unique()

* W06 is metal waste
* W07 is glass, paper, cardboard, plastic, rubber, wood, textile, and equipment (except waste containing PCB)
* W10 is mixed ordinary wastes which includes
    * W101 Household and similar wastes
    * W102 Mixed and undifferentiated materials
	* W103 Sorting residues
* W124 is combustion waste
* W128_13 is mineral wastes from waste treatment and stabilised wastes
* TOT_X_MIN is waste excluding major mineral wastes

We rename some categories accordingly.

In [None]:
df.loc[df.waste=='TOTAL','waste']='Total'
df.loc[df.waste=='W06_07A','waste']='Recyclable'
df.loc[df.waste=='W10','waste']='Mixed' 
df.loc[df.waste=='W12-13','waste']='Mineral'

In [None]:
df.loc[14:19]

In [None]:
type(df['2014'].values[0]) # values were read as string 

* The type of the values is $string$: to analyze the data (_e.g._ sorting, plotting) we need to convert them into numbers<br>
* Some values are missing (:) or followed by a letter (_e.g._ $e$ stands for estimated, $u$ stands for low reliability).<br>
* Forcing the conversion of all strings into numerical values (`float`), turns values followed by a letter into `NaN`s.<br>
* We trim non-digit values using `map()`, a `lambda` function, and `strip()`.<br>
* We use pd.to_numeric to convert.

In [None]:
for col in df.columns:
    if col[0] == '2': # check the the column name starts with 2, i.e. it's a year
        df[col] = df[col].map(lambda x: x.rstrip('bcdefinprsuz')) # remove trailing char from each value of the column
        df[col] = pd.to_numeric(df[col], errors='coerce')
print(df['2014'].values)
type(df['2014'].values[0])

#### Task: Create a `Series` from the dictionary { 'EE' : '14759.4713a', 'LU' : '195.677c', 'NL': '111.437b', 'SE': '596.8ba' } and use `map()` and `rstrip` as above to remove the characters 'abc'

In [None]:
series = pd.Series( { 'EE' : '1459.4713a', 'LU' : '195.677c', 'NL': '111.437b', 'SE': '596.8ba' } )
def strip_abc(item):
    return item.rstrip('abc')
series = series.map(strip_abc)
series

We defined the function `strip` that takes a string and uses `strip` to remove any a, b, or c character from each item of the `list` named $l$.<br>
Instead, we can use a `lambda function`.

In [None]:
print( list( map(lambda item: item.strip('abc'),series) ) )

#### Hazardous waste in Sweden
We can now select data based on waste category, hazard, source (economic activity), country, and unit (tons or kilograms per capita).

In [None]:
df.loc[(df.hazard == 'HAZ') & (df.unit == 'KG_HAB')
        & (df.nace_r2=='TOTAL_HH') & (df.country == 'SE')][:5]

### MultiIndex Creation

The first 5 columns of this multi-dimensional data are used to store the keys. Pandas provides $hierarchical$ $indexing$ to represent this data as a 2D `DataFrame`.

In [None]:
df_mi = df.set_index(['unit','hazard','nace_r2','waste','country'])

In [None]:
df_mi.index.names

In [None]:
df_mi.columns

Alternative MultiIndex Creation:

In [None]:
index = pd.MultiIndex.from_arrays([df.unit,df.hazard,df.nace_r2,df.waste,df.country],
                                  names=['unit','hazard','nace_r2','waste','country'])

In [None]:
df_mi = pd.DataFrame(df.iloc[:,5:11].values,index=index,columns=['2014','2012','2010','2008','2006','2004'])

In [None]:
df_mi[::1000][:4]

#### Task: what percentage of the total non-hazardous waste was generated by agricolture in your country in 2014?

In [None]:
tot_se = df_mi.loc['KG_HAB','NHAZ','TOTAL_HH','Total','SE']['2014']
agriculture_se = df_mi.loc['KG_HAB','NHAZ','B','Total','SE']['2014']
print('Waste from agriculture in Sweden in 2014: {:1.1f}%'.format(agriculture_se/tot_se*100))

### Ranking countries by waste generated in 2014
Here we select one household waste category and rank the countries by waste generation in 2014.

In [None]:
df_tot = df_mi.loc['KG_HAB','NHAZ','EP_HH','Total'].sort_values(by='2014',ascending=False)
df_rec = df_mi.loc['KG_HAB','NHAZ','EP_HH','Recyclable'].sort_values(by='2014',ascending=False)
df_mix = df_mi.loc['KG_HAB','NHAZ','EP_HH','Mixed'].sort_values(by='2014',ascending=False)

`DataFrames` can be conveniently saved as Excel files.

In [None]:
output = pd.ExcelWriter('data/waste.xlsx')
df_tot.to_excel(output,'Sheet1',index=False)
df_rec.to_excel(output,'Sheet2',index=False)
df_mix.to_excel(output,'Sheet3',index=False)
output.save()
df_tot[:5]

### Comparison between Sweden and EU average

#### Total waste from household and all NACE Rev. 2 activities 

In [None]:
df_mi.loc['KG_HAB','NHAZ','TOTAL_HH','Total'].loc[('EU28','SE'),:]

#### Task: compare the total waste from mining (NACE Rev.2 B) in 2014 in your country to the EU average

In [None]:
df_mi.loc['KG_HAB','NHAZ','B','Total'].loc[('EU28','SE'),:]['2014']

#### Cross-Section of a `DataFrame` [focusing on a subcategory (waste type) for all supercategories (activities)]: Which activities generated most of the recyclable wastes in Sweden in 2014

In [None]:
df_mi.loc['KG_HAB','HAZ_NHAZ'][['2014']].xs(('Recyclable','SE'),
                                     level=['waste','country']).sort_values(by='2014',ascending=False)[:3]

#### Plotting the yearly progress in Swedish, Finnish, and British mining waste generation compared to the EU average

In [None]:
for country, c in zip(['SE','EU28','FI','UK'],['r','b','g','y']):
    years = df_mi.loc['KG_HAB','NHAZ','B','Total'].loc[country]
    plt.plot(years.index.astype(int),years.values,color=c)
    plt.plot(years.index.astype(int),years.values,label=country,color=c,marker='X',lw=0,ms=10)
plt.legend(frameon=False,ncol=2,labelspacing=1.5)
plt.yscale('log'); plt.ylabel('kilograms per capita'); plt.xlabel('year'); plt.show()

##### Task: Plot the yearly progress in Swedish household recyclable and mixed waste generation compared to the EU average

In [None]:
for country in ['SE','EU28']:
    for waste, c in zip(['Mixed','Recyclable'],['r','b']):
        years = df_mi.loc['KG_HAB','NHAZ','EP_HH',waste].loc[country]
        ls = '--' if (country == 'EU28') else '-'
        m = 'o' if (country == 'EU28') else 's'
        plt.plot(years.index.astype(int),years.values,color=c,ls=ls)
        plt.plot(years.index.astype(int),years.values,label=waste+', '+country,color=c,marker=m,lw=0,ms=10)
plt.legend(frameon=False,ncol=2,labelspacing=1.5)
plt.ylabel('kilograms per capita'); plt.xlabel('year'); plt.show()

Now we want to create a stacked bar plot showing the total, recyclable, and mixed household waste for each country ranked in ascending order with respect to the total household waste in 2014.<br>
The `DataFrames` `df_mix`, `df_tot`, and `df_rec` were sorted based on the respective 2014 values. Now we want to sort them based on the 2014 values of `df_tot`.

In [None]:
df_mix = df_mix.reindex(df_tot.index)
df_rec = df_rec.reindex(df_tot.index)

In [None]:
ind = range(df_mix['2014'].values.size) # the x locations for the groups
width = 0.35 # the width of the bars: can also be len(x) sequence
plt.bar(ind, df_tot['2014'].values, width, label='Total')
plt.bar(ind, df_rec['2014'].values, width, label='Recyclable')
plt.bar(ind, df_mix['2014'].values, width, bottom=df_rec['2014'].values, label='Mixed ordinary')
plt.ylabel('kilograms per capita'); plt.title('Household wastes in 2014')
plt.xticks(ind, df_tot.index, rotation=45); plt.xlim(-.5,18.5)
plt.legend(loc='upper right',frameon=False); plt.show()

#### Task: create a bar plot showing the mineral waste generated by each country from mining (NACE Rev.2 B) and construction (NACE Rev.2 F) ranked in ascending order with respect to the sum mining+construction (select data from 2014)

In [None]:
df_mining = df_mi.loc['KG_HAB','HAZ_NHAZ','B','Mineral']
df_constr = df_mi.loc['KG_HAB','HAZ_NHAZ','F','Mineral']
df_sum = df_mining + df_constr
df_sum = df_sum.sort_values(by='2014',ascending=False)
df_mining = df_mining.reindex(df_sum.index)
df_constr = df_constr.reindex(df_sum.index)
ind = range(df_constr['2014'].values.size)
plt.bar(ind, df_mining['2014'].values, width, label='Mining and quarrying')
plt.bar(ind, df_constr['2014'].values, width, 
             bottom=df_mining['2014'].values, label='Construction and deconstruction')
plt.ylabel('kilograms per capita'); plt.title('Mineral and solidified wastes in 2014')
plt.xticks(ind, df_mining.index, rotation=45); plt.xlim(-.5,18.5)
plt.legend(loc='upper right',frameon=False); 

In [None]:
plt.show()

### Supplemental Case Study: Data on Waste Treatment from [eurostat](http://ec.europa.eu/eurostat/data/database)

In [None]:
!head -n 1 data/env_wastrt.tsv

In [None]:
tr = pd.read_table('data/env_wastrt.tsv',header=0,sep=',|\t',engine='python',
                   names=['unit','hazard','wst_oper','waste','country','2014','2012','2010','2008','2006','2004'])

Here we rename some of the waste treatment operations

In [None]:
tr.loc[tr.wst_oper=='TRT','wst_oper']='Total'
tr.loc[tr.wst_oper=='DSP_L','wst_oper']='Landfill'
tr.loc[tr.wst_oper=='INC','wst_oper']='Incineration/Disposal' 
tr.loc[tr.wst_oper=='RCV_E','wst_oper']='Incineration/Energy recovery'
tr.loc[tr.wst_oper=='RCV_B','wst_oper']='Backfilling'
tr.loc[tr.wst_oper=='RCV_O','wst_oper']='Recovery'

In [None]:
tr = tr.set_index(['unit','hazard','wst_oper','waste','country'])

In [None]:
for col in tr.columns:
    tr[col] = tr[col].map(lambda x: x.rstrip('bcdefinprsuz'))
    tr[col] = pd.to_numeric(tr[col], errors='coerce')

#### Plastic wastes treatment in the EU in 2014

In [None]:
tot = tr.loc['T','HAZ_NHAZ','Total','W074'][['2014']]
inc = tr.loc['T','HAZ_NHAZ','Incineration/Energy recovery','W074'][['2014']]
sorter = (inc/tot).sort_values(by='2014',ascending=False)
tot = tot.reindex(sorter.index)
ind = range(sorter.index.size)
width = 0.35
bottom = np.zeros(sorter.index.shape,dtype=float)

In [None]:
for operation in ['Recovery','Backfilling','Landfill','Incineration/Disposal','Incineration/Energy recovery']:
    op = tr.loc['T','HAZ_NHAZ',operation,'W074'][['2014']]
    op = op.reindex(sorter.index)
    ratio_op = op / tot
    plt.bar(ind[:21], ratio_op['2014'].values[:21], width, bottom=bottom[:21], label=operation)
    bottom = np.add(ratio_op['2014'].values,bottom)
plt.ylabel('Fraction of total plastic wastes'); plt.title('Treatment of plastic wastes in 2014')
plt.xticks(ind, sorter.index, rotation=45); plt.xlim(1.5,20.5); plt.ylim(0,1.6)
plt.legend(loc='upper right',frameon=False,ncol=2); plt.show()

### Supplemental Case Study: Data Scraping – Temperature in Oxford

It is commonly needed to gather data from a website.<br>
Here we show how to read an HTML page.

In [None]:
from lxml import html 
import requests
from time import sleep
import io
url = 'https://www.metoffice.gov.uk/pub/data/weather/uk/climate/stationdata/oxforddata.txt'
response = requests.get(url)
sleep(3)
parser = html.fromstring(response.text)

In [None]:
temp_oxford = pd.read_table(io.StringIO(parser.text),header=None,engine='python',delim_whitespace=True,
                     skiprows=range(7),names=['yyyy','mm','tmax','tmin','af','rain','sun','comment'])

We set $year$ and $month$ labels as indeces and remove the last four columns from the `DataFrame`

In [None]:
temp_oxford.set_index(['yyyy','mm'],inplace=True)
temp_oxford.drop(['af','rain','sun','comment'],axis=1,inplace=True)
temp_oxford[-8:-5]

Since some values are marked with a * to indicate that they are estimates, temperatures were read as strings.<br>
Now we convert them to numbers.

In [None]:
for col in temp_oxford.columns:
    temp_oxford[col] = temp_oxford[col].map(lambda x: x.rstrip('*'))
    temp_oxford[col] = pd.to_numeric(temp_oxford[col], errors='coerce')

### Supplemental Case Study: Date Formatting – Temperature in L'Aquila

Some useful keywords in `pd.read_table()`:
* na_vaules: sets the values to be replaced by `NaN` 
* parse_dates: we specify the column so that dates are interpreted as such instead of as $strings$
* date_parser: we specify a labda function to parse dates according to our format

In [None]:
temp_aquila = pd.DataFrame()
dateparse = lambda x: pd.datetime.strptime(x, '%d/%m/%Y')
for year in range(1968,2017):
    url = 'http://meteorema.aquila.infn.it/tempaq/dati/hist/'+str(year)+'.txt'
    temp_year_aquila = pd.read_table(url,header=None,engine='python',delim_whitespace=True,
                     comment='#',names=['Date','day','tmin','tmax'],na_values=-999.99,
                     parse_dates=['Date'], date_parser=dateparse)
    temp_aquila = temp_aquila.append(temp_year_aquila,ignore_index=True)
temp_aquila.drop('day',axis=1,inplace=True)
temp_aquila['year'] = temp_aquila['Date'].dt.year
temp_aquila['month'] = temp_aquila['Date'].dt.month
temp_aquila['day'] = temp_aquila['Date'].dt.day
temp_aquila.drop('Date',axis=1,inplace=True)
temp_aquila.set_index(['year','month','day'],inplace=True)
temp_aquila.dropna()
temp_aquila[-3:]

In [None]:
tmax_year_mean_ox = temp_oxford.groupby('yyyy')['tmax'].mean()
mean_lowess_OXFORD = lowess(tmax_year_mean_ox.values,tmax_year_mean_ox.index,frac=0.5)
tmax_year_mean_aq = temp_aquila.groupby('year')['tmax'].mean()
mean_lowess_AQUILA = lowess(tmax_year_mean_aq.values,tmax_year_mean_aq.index,frac=0.5)
plt.plot(mean_lowess_AQUILA[:,0],mean_lowess_AQUILA[:,1],label="Mean $T_{max}$ in L'Aquila",lw=4,color='b')
plt.plot(mean_lowess_OXFORD[:,0],mean_lowess_OXFORD[:,1],label='Mean $T_{max}$ in Oxford',lw=4,color='r')
plt.plot(mean_lowess_STHLM[:,0],mean_lowess_STHLM[:,1],label='Mean $T_{max}$ in Stockholm',lw=4,color='g')
plt.legend(frameon=False)
plt.xlabel('Year'); plt.ylabel('Mean of $T_{max}$ ($^\circ$C)'); plt.show()