# Pandas Workshop at DataHarvest 2016

This is how you import pandas. It's customary to import it as "pd".

In [1]:
import pandas as pd

This is how you can load a CSV file. There are also many other ways to load in data.

In [2]:
df = pd.read_csv('gb_201415.csv')

`df` stands for DataFrame and is basically a table with rows and columns.

`df.head()` shows the first 5 rows, you can also show more, e.g. `df.head(20)`

In [3]:
df.head()

Unnamed: 0,year,recipient_id,recipient_name,recipient_postcode,recipient_location,agency,scheme,amount,country,currency
0,2015,GB-SY23-WELSH ASSEMBLY GOVERNMENT,WELSH ASSEMBLY GOVERNMENT,SY23,ABERYSTWYTH,WG,Technical Assistance,7636181.83,GB,GBP
1,2015,GB-WR5-Natural England,Natural England,WR5,WORCESTER,RPA,Technical Assistance,7373572.0,GB,GBP
2,2014,GB-SY23-WELSH ASSEMBLY GOVERNMENT,WELSH ASSEMBLY GOVERNMENT,SY23,ABERYSTWYTH,WG,Technical Assistance,6128222.74,GB,GBP
3,2015,GB-CB7-G'S GROWERS LTD,G'S GROWERS LTD,CB7,ELY,RPA,Aid in fruit and vegetables sector,5807429.64,GB,GBP
4,2014,GB-PL30-National Trust,National Trust,PL30,Bodmin,RPA,Agri-environment payments,4646028.86,GB,GBP


Let's look at the data types of the columns.

In [4]:
df.dtypes

year                   int64
recipient_id          object
recipient_name        object
recipient_postcode    object
recipient_location    object
agency                object
scheme                object
amount                object
country               object
currency              object
dtype: object

Looks like the amount is not detected correctly due to thousand separators. Let's clean it!

In [5]:
df['amount_clean'] = df['amount'].str.replace(',', '')
df.head()

Unnamed: 0,year,recipient_id,recipient_name,recipient_postcode,recipient_location,agency,scheme,amount,country,currency,amount_clean
0,2015,GB-SY23-WELSH ASSEMBLY GOVERNMENT,WELSH ASSEMBLY GOVERNMENT,SY23,ABERYSTWYTH,WG,Technical Assistance,7636181.83,GB,GBP,7636181.83
1,2015,GB-WR5-Natural England,Natural England,WR5,WORCESTER,RPA,Technical Assistance,7373572.0,GB,GBP,7373572.0
2,2014,GB-SY23-WELSH ASSEMBLY GOVERNMENT,WELSH ASSEMBLY GOVERNMENT,SY23,ABERYSTWYTH,WG,Technical Assistance,6128222.74,GB,GBP,6128222.74
3,2015,GB-CB7-G'S GROWERS LTD,G'S GROWERS LTD,CB7,ELY,RPA,Aid in fruit and vegetables sector,5807429.64,GB,GBP,5807429.64
4,2014,GB-PL30-National Trust,National Trust,PL30,Bodmin,RPA,Agri-environment payments,4646028.86,GB,GBP,4646028.86


In [6]:
df['amount_clean'] = pd.to_numeric(df['amount_clean'])
df.dtypes

year                    int64
recipient_id           object
recipient_name         object
recipient_postcode     object
recipient_location     object
agency                 object
scheme                 object
amount                 object
country                object
currency               object
amount_clean          float64
dtype: object

In [7]:
df = df.drop('amount', 1)
df.head()

Unnamed: 0,year,recipient_id,recipient_name,recipient_postcode,recipient_location,agency,scheme,country,currency,amount_clean
0,2015,GB-SY23-WELSH ASSEMBLY GOVERNMENT,WELSH ASSEMBLY GOVERNMENT,SY23,ABERYSTWYTH,WG,Technical Assistance,GB,GBP,7636181.83
1,2015,GB-WR5-Natural England,Natural England,WR5,WORCESTER,RPA,Technical Assistance,GB,GBP,7373572.0
2,2014,GB-SY23-WELSH ASSEMBLY GOVERNMENT,WELSH ASSEMBLY GOVERNMENT,SY23,ABERYSTWYTH,WG,Technical Assistance,GB,GBP,6128222.74
3,2015,GB-CB7-G'S GROWERS LTD,G'S GROWERS LTD,CB7,ELY,RPA,Aid in fruit and vegetables sector,GB,GBP,5807429.64
4,2014,GB-PL30-National Trust,National Trust,PL30,Bodmin,RPA,Agri-environment payments,GB,GBP,4646028.86


In [10]:
df = df.rename(columns={'amount_clean': 'amount'})
df.head()

Unnamed: 0,year,recipient_id,recipient_name,recipient_postcode,recipient_location,agency,scheme,country,currency,amount
0,2015,GB-SY23-WELSH ASSEMBLY GOVERNMENT,WELSH ASSEMBLY GOVERNMENT,SY23,ABERYSTWYTH,WG,Technical Assistance,GB,GBP,7636181.83
1,2015,GB-WR5-Natural England,Natural England,WR5,WORCESTER,RPA,Technical Assistance,GB,GBP,7373572.0
2,2014,GB-SY23-WELSH ASSEMBLY GOVERNMENT,WELSH ASSEMBLY GOVERNMENT,SY23,ABERYSTWYTH,WG,Technical Assistance,GB,GBP,6128222.74
3,2015,GB-CB7-G'S GROWERS LTD,G'S GROWERS LTD,CB7,ELY,RPA,Aid in fruit and vegetables sector,GB,GBP,5807429.64
4,2014,GB-PL30-National Trust,National Trust,PL30,Bodmin,RPA,Agri-environment payments,GB,GBP,4646028.86


In [11]:
df['amount'].sum()

508795166.64000005

In [12]:
df['scheme'].value_counts()

Single area payment scheme                                                              578
Implementing local development strategies. Quality of life/diversification               65
Agri-environment payments                                                                61
Adding value to agricultural and forestry products                                       49
Aid in fruit and vegetables sector                                                       36
Investment in forest area development and improvement of the viability of forests        28
Investments in physical assets                                                           21
Non-productive investments                                                               18
Vocational training and information actions                                              16
First afforestation of agricultural land                                                 16
Agri-environment-climate                                                        