# Capstone Project Part 2: Open Challenge

For part 2 of your Capstone Project assignment, I want you to submit your own Jupyter Notebook written from scratch. I also want you to select your own data source **and** *your own questions* to ask about the data you have selected.

This sounds difficult -- and it is. But the point here is to give you the experience in exploring data yourself and understanding that a big part of data science is in asking questions and exploring on your own. Who knows, you might find something interesting and valuable enough that this time next year you could be CEO of your own multimillion pound start-up!

Think back to exercise 8 (London 2012 Olympics data) and the kinds of questions I set for you in that challenge. This time however, I want you to demonstrate as much of what you have learned in this course as possible. In particular, I want you to create a Jupyter Notebook that demonstrates the following:
 - Gathering data from a data source. You could do this programmatically (e.g. with a Python library querying an API such as `tweepy`), or just downloaded from somewhere. If the latter, please add some text describing where you got the data from and why you thought it might be interesting.
 - Data formatting and cleaning. If your data is semi-structured and not already in a CSV, it would be great to see how you mapped it across using some string formatting. Also examples of data cleaning -- removing spurious values or dealing with missing values.
 - Using `DataFrame`s - intermediate ones, processed ones, etc. By now you should know that the `DataFrame` is an essential tool!
 - Visualizations. We already know we can visualize directly from `DataFrame`s but it would also be great to see if you could utilize `bokeh` to create other charts.
 - Classification using `scikit-learn` or Natural Language Processing using `nltk`. After the Machine Learning lecture, if you want to try out some classification or NLP, that would be great to see.


Let's get started with loading some libraries

In [1]:
import pandas as pd
import numpy as np
from bokeh.io import show, output_notebook
from bokeh.plotting import figure
from bokeh.palettes import brewer
from bokeh.models import ColumnDataSource, value, Range1d

output_notebook()

## Loading the datasets

Here is an initial list of questions:
- main crop used between 1990 - 2005
    - fastest growing crops?
    - prediction for 2030, 2050? - interact?


- GHG emissions by 
    - area : which continent produce more?
	- global 
        - per use (crops & livestock, deforestation, degarded peatland, fire) : find net GHG emissions due to land use change and deforestation r
    
- other source of GHG? -->
- which livestock produce more GHG




In [17]:
crops = pd.read_csv('kaggle_global-food-agriculture-statistics/fao_data_crops_data.csv')
emissions = pd.read_csv('kaggle_global-food-agriculture-statistics/Environment_Emissions_intensities_E_All_Data.csv')

In [113]:
crops.sample(10)
#crops[(crops.country_or_area == 'Afghanistan') & (crops.element == 'Production Quantity')]

Unnamed: 0,country_or_area,element_code,element,year,unit,value,value_footnotes,category
2502,Afghanistan,51,Production Quantity,2002.0,tonnes,15000.0,F,almonds_with_shell
2503,Afghanistan,51,Production Quantity,2001.0,tonnes,15000.0,F,almonds_with_shell
2504,Afghanistan,51,Production Quantity,2000.0,tonnes,12000.0,F,almonds_with_shell
2505,Afghanistan,51,Production Quantity,1999.0,tonnes,11000.0,F,almonds_with_shell
2514,Afghanistan,51,Production Quantity,1990.0,tonnes,9500.0,F,almonds_with_shell
2515,Afghanistan,51,Production Quantity,1989.0,tonnes,8800.0,F,almonds_with_shell
2516,Afghanistan,51,Production Quantity,1988.0,tonnes,9000.0,F,almonds_with_shell
2517,Afghanistan,51,Production Quantity,1987.0,tonnes,9000.0,F,almonds_with_shell
2518,Afghanistan,51,Production Quantity,1986.0,tonnes,10000.0,F,almonds_with_shell
2519,Afghanistan,51,Production Quantity,1985.0,tonnes,9000.0,F,almonds_with_shell


In [201]:
# Need to get reads of the footnotes for each veg
crops = crops.dropna()
crops.dtypes
crops.year = crops.year.astype(int)
crops.country_or_area = crops.country_or_area.astype(str)
crops.head()

Unnamed: 0,country_or_area,element_code,element,year,unit,value,value_footnotes,category
0,Americas +,31,Area Harvested,2007,Ha,49404.0,A,agave_fibres_nes
1,Americas +,31,Area Harvested,2006,Ha,49404.0,A,agave_fibres_nes
2,Americas +,31,Area Harvested,2005,Ha,49404.0,A,agave_fibres_nes
3,Americas +,31,Area Harvested,2004,Ha,49113.0,A,agave_fibres_nes
4,Americas +,31,Area Harvested,2003,Ha,48559.0,A,agave_fibres_nes


In [202]:
crops.groupby(['country_or_area', 'category', 'element', 'unit']).sum()
crops.element.unique()
crops[crops.element == 'Production Quantity'].category.unique()

array(['agave_fibres_nes', 'almonds_with_shell',
       'anise_badian_fennel_corian', 'apples', 'apricots', 'arecanuts',
       'artichokes', 'asparagus', 'avocados', 'bambara_beans', 'bananas',
       'barley', 'beans_dry', 'beans_green', 'berries_nes', 'blueberries',
       'brazil_nuts_with_shell', 'broad_beans_horse_beans_dry',
       'buckwheat', 'cabbages_and_other_brassicas', 'canary_seed',
       'carobs', 'carrots_and_turnips', 'cashew_nuts_with_shell',
       'cashewapple', 'cassava', 'castor_oil_seed',
       'cauliflowers_and_broccoli', 'cereals_nes',
       'cereals_rice_milled_eqv', 'cereals_total', 'cherries',
       'chestnuts', 'chick_peas', 'chicory_roots',
       'chillies_and_peppers_dry', 'chillies_and_peppers_green',
       'cinnamon_canella', 'citrus_fruit_nes', 'citrus_fruit_total',
       'cloves', 'coarse_grain_total', 'cocoa_beans', 'coconuts',
       'coffee_green', 'coir', 'cow_peas_dry', 'cranberries',
       'cucumbers_and_gherkins', 'currants', 'dates',


In [203]:
col_list = crops.country_or_area.unique()
col_list
areas = ['Americas +', 
         'Asia +', 
         'Africa +',
         'Caribbean +', 
         'Central America +',
         'Low Income Food Deficit Countries +',
         'Net Food Importing Developing Countries +',
         'Small Island Developing States +',
         'South America +',
         'South-Eastern Asia +',
         'World +',
         'Australia and New Zealand +',
         'Oceania +',
         'Central Asia +',
         'Eastern Asia +',
         'Eastern Europe +',
         'Europe +',
         'European Union +',
         'Least Developed Countries +',
         'LandLocked developing countries +',
         'Least Developed Countries +',
         'Northern Africa +', 
         'Northern America +',
         'Southern Africa +',
         'Southern Asia +',
         'Southern Europe +',
         'Western Africa +',
         'Western Asia +',
         'Western Europe +',
         'Eastern Africa +',
         'Northern Europe +',
         'Middle Africa +',
         'Micronesia +',
         'Polynesia +', 
         'Melanesia +'
        ]

In [204]:
crops_regions = crops[crops.country_or_area.isin(areas)]
crops_regions = crops_regions.rename(columns={'country_or_area' : 'area'})

In [205]:
def fix_country_name(country):
    """Will remove the '+' from each region label"""
    return (country.strip(' +'))
crops_regions.area = crops_regions.area.apply(fix_country_name)
#crops_regions.year = crops_regions.year.astype(int)
#crops_regions.area = crops_regions.area.astype(str)

In [322]:
crops_world = crops_regions[(crops_regions['area'] == 'World') & (crops_regions['element'] == 'Production Quantity')]
crops_world_y = crops_world.groupby(['year']).sum()
crops_world_y.value = crops_world_y.value / 1e6

Unnamed: 0_level_0,value
year,Unnamed: 1_level_1
1961,5667.38353
1962,5864.358561
1963,5975.716968
1964,6283.512291
1965,6360.521134
1966,6719.650084
1967,6980.871816
1968,7144.079723
1969,7188.637229
1970,7426.938215


In [207]:
p0 = figure(plot_height=400, title="Total world crop production since 1960", tools = 'hover')
p0.line(crops_world_y.index, crops_world_y.value, line_width=0.9, legend = "Total crop production")

p0.xaxis.axis_label = 'Year'
p0.yaxis.axis_label = 'Total production (Mtones)'
p0.legend.location = "top_left"
show(p0)

In [351]:
from sklearn import linear_model
crops_world_pred = linear_model.LinearRegression()
crops_world_pred.fit([[x] for x in crops_world_y.index], crops_world_y.value)
m = crops_world_pred.coef_[0]
b = crops_world_pred.intercept_
print("slope=", m, "intercept=", b)

slope= 201.93949723462066 intercept= -390407.87277897674


In [361]:
future = np.arange(2025, 2051, 25)
years = np.concatenate([crops_world_y.index.values,future])
years

array([1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971,
       1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982,
       1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993,
       1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004,
       2005, 2006, 2007, 2025, 2050])

In [210]:
# Let's add the regression to the previous plot
crops_world_pred_val = [crops_world_pred.coef_ * i + crops_world_pred.intercept_ for i in years]

p0.line(years, crops_world_pred_val, line_width=1, color='gray', line_dash=[4, 4], legend = "Predicted values")

p0.y_range = Range1d(min(crops_world_pred_val).item(0), max(crops_world_pred_val).item(0))
show(p0)

In [211]:
max(crops_world_pred_val).item(0)

23568.096551995608

In [395]:
crops_countries = crops[~crops.country_or_area.isin(areas)].drop(columns = ['element_code', 'value_footnotes'])
crops_countries = crops_countries.rename(columns = {'country_or_area' : 'country'})

prod_countries = crops_countries[crops_countries.element == 'Production Quantity'].sort_values(by = ['country']).reset_index(drop = True)
prod_countries = prod_countries.groupby(['country', 'year']).sum()
prod_countries = prod_countries.reset_index()
prod_countries

Unnamed: 0,country,year,value
0,Afghanistan,1961,9971250.0
1,Afghanistan,1962,9827604.0
2,Afghanistan,1963,9408353.0
3,Afghanistan,1964,10118091.0
4,Afghanistan,1965,10810225.0
5,Afghanistan,1966,9832047.0
6,Afghanistan,1967,10597257.0
7,Afghanistan,1968,11000752.0
8,Afghanistan,1969,11306749.0
9,Afghanistan,1970,10048512.0


In [308]:
top_countries = prod_countries[prod_countries.year == 2005].sort_values(by='value', ascending = False).country.unique()[:11]
top_countries = np.delete(top_countries, 1, 0) # to remove china, mainland but keep china
top_countries = sorted(top_countries)

top_prod = prod_countries[prod_countries.country.isin(top_countries)]
top_prod = top_prod.pivot(index = 'year', columns = 'country')
top_prod

Unnamed: 0_level_0,value,value,value,value,value,value,value,value,value,value
country,Argentina,Brazil,Canada,China,France,India,Indonesia,Nigeria,Russian Federation,United States of America
year,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2
1961,50352174.0,74673331.0,47051484.0,613620400.0,86414729.0,263439946.0,60080419.0,60237429.0,,538835200.0
1962,47702805.0,80286927.0,77148610.0,613558300.0,101451455.0,269342378.0,65968608.0,61795717.0,,540432300.0
1963,54784730.0,85774996.0,87567396.0,626838400.0,102818449.0,282221092.0,60918774.0,64358973.0,,572284800.0
1964,62799697.0,87956708.0,74306952.0,644886800.0,98283170.0,289262715.0,68201389.0,65338958.0,,524659400.0
1965,61917675.0,100285775.0,84620760.0,692673400.0,107796101.0,267093746.0,64663482.0,69214145.0,,601846300.0
1966,58006052.0,95968199.0,99374127.0,799371100.0,101273574.0,271761671.0,68499510.0,56879771.0,,606270100.0
1967,64602921.0,104927819.0,79261625.0,813499200.0,116248576.0,303376903.0,57954487.0,59906235.0,,676037900.0
1968,55204484.0,108138933.0,91337297.0,799172200.0,118462816.0,320526209.0,64376901.0,60804439.0,,658919500.0
1969,61220817.0,110445337.0,95516861.0,817401500.0,115231664.0,333001951.0,63818829.0,68248189.0,,676702400.0
1970,68652965.0,118712817.0,84445327.0,832929600.0,115353726.0,354723849.0,67331809.0,73305139.0,,623756100.0


In [312]:
numlines = len(top_prod.columns)
colors=brewer['Paired'][11]
xs = [top_prod.index.values]*numlines
ys = [top_prod[name].values for name in top_prod]

p = figure(plot_height=500, plot_width=800, title="Crop production since 1960 for the 10 largest producers", tools = 'hover')

for (colr, leg, x, y ) in zip(colors, top_countries, xs, ys):
    p.line(x, y, color = colr, legend = leg, line_width = 2)
    
p.xaxis.axis_label = 'Year'
p.yaxis.axis_label = 'Production (tonnes)'
p.legend.location = "top_left"

show(p)

In [307]:
for (colr, leg, x, y ) in zip(colors, top_countries, xs, [top_prod[name] for name in top_prod]):
    print(leg, y)

China year
1961     50352174.0
1962     47702805.0
1963     54784730.0
1964     62799697.0
1965     61917675.0
1966     58006052.0
1967     64602921.0
1968     55204484.0
1969     61220817.0
1970     68652965.0
1971     75167913.0
1972     61387954.0
1973     78345681.0
1974     80778703.0
1975     75164707.0
1976     78463161.0
1977     78313653.0
1978     91924524.0
1979     86127851.0
1980     67104671.0
1981    100237102.0
1982    108028782.0
1983    102511684.0
1984    104322962.0
1985     99225087.0
1986     94276527.0
1987     84451989.0
1988     87723864.0
1989     82115920.0
1990     93586148.0
1991    103341334.0
1992    112904130.0
1993    107838046.0
1994    111163427.0
1995    114840030.0
1996    126655765.0
1997    145173974.0
1998    161788382.0
1999    131899167.0
2000    160246645.0
2001    158526413.0
2002    153176683.0
2003    166764860.0
2004    165498865.0
2005    187747249.0
2006    173948047.0
2007    215295127.0
Name: (value, Argentina), dtype: float64
United S

In [98]:
#a = d_regions[(d_regions['element'] == 'Production Quantity') & (d_regions.year == '2010')].groupby('area').sum()
crops_area2005 = crops_regions[(crops_regions['year'] == 2005) & (crops_regions['element'] == 'Production Quantity')]
crops_harvest_2005 = crops_area2005.groupby('area')['value'].sum()

['China',
 'United States of America',
 'India',
 'Brazil',
 'Indonesia',
 'Russian Federation',
 'Nigeria',
 'France',
 'Argentina',
 'Canada']

In [87]:
c_list = crops_harvest_2005.index.tolist()
p1 = figure(x_range=c_list, plot_height=500, title="Production in 2005 per region of the World")
p1.vbar(x=c_list, top=crops_harvest_2005.values, width=0.9,)

# Set some properties to make the plot look better
p1.xgrid.grid_line_color = None
p1.y_range.start = 0
p1.xaxis.major_label_orientation = 1

show(p1)

## Emissions table

In [34]:
### LE's work on emssions and create a table with the world emissions
emissions.head(10)

Unnamed: 0,Area Code,Area,Item Code,Item,Element Code,Element,Unit,Y1961,Y1961F,Y1962,...,Y2012,Y2012F,Y2013,Y2013F,Y2014,Y2014F,Y2015,Y2015F,Y2016,Y2016F
0,2,Afghanistan,1718,Cereals excluding rice,71761,Emissions intensity,kg CO2eq/kg product,0.1191,Fc,0.1209,...,0.1583,Fc,0.1274,Fc,0.1206,Fc,0.1231,Fc,0.127,Fc
1,2,Afghanistan,1718,Cereals excluding rice,7231,Emissions (CO2eq),gigagrams,402.2165,Fc,408.3269,...,930.8136,Fc,765.7368,Fc,749.0513,Fc,664.6333,Fc,657.6652,Fc
2,2,Afghanistan,1718,Cereals excluding rice,5510,Production,tonnes,3376000.0,A,3377000.0,...,5879000.0,A,6008235.0,A,6211113.0,A,5400710.0,A,5178655.0,A
3,2,Afghanistan,27,"Rice, paddy",71761,Emissions intensity,kg CO2eq/kg product,2.0864,Fc,2.0864,...,1.5049,Fc,1.3826,Fc,1.3897,Fc,1.3828,Fc,1.1923,Fc
4,2,Afghanistan,27,"Rice, paddy",7231,Emissions (CO2eq),gigagrams,665.5675,Fc,665.5675,...,752.4375,Fc,708.0397,Fc,746.2642,Fc,566.957,Fc,425.1177,Fc
5,2,Afghanistan,27,"Rice, paddy",5510,Production,tonnes,319000.0,,319000.0,...,500000.0,,512094.0,,537000.0,,410000.0,,356565.0,
6,2,Afghanistan,867,"Meat, cattle",71761,Emissions intensity,kg CO2eq/kg product,36.6727,Fc,39.1258,...,6.415,Fc,6.3922,Fc,7.6995,Fc,10.9185,Fc,11.4844,Fc
7,2,Afghanistan,867,"Meat, cattle",7231,Emissions (CO2eq),gigagrams,1576.926,Fc,1791.962,...,891.6801,Fc,856.5577,Fc,931.1033,Fc,1194.205,Fc,1186.901,Fc
8,2,Afghanistan,867,"Meat, cattle",5510,Production,tonnes,43000.0,F,45800.0,...,139000.0,,134000.0,,120931.0,Im,109375.0,Im,103349.0,Im
9,2,Afghanistan,882,"Milk, whole fresh cow",71761,Emissions intensity,kg CO2eq/kg product,3.349,Fc,3.349,...,4.4425,Fc,4.5232,Fc,4.7642,Fc,4.6513,Fc,4.6513,Fc


In [37]:
to_drop = []
for kk in emissions.keys():
    if kk[-1]=='F':
        to_drop.append(kk)
    
emissions = emissions.drop(to_drop,axis=1) # Remove the Unit column
emissions.columns = emissions.columns.str.replace('Y','') # Remove the Y

<class 'pandas.core.frame.DataFrame'>
Int64Index: 14 entries, 6138 to 6177
Data columns (total 63 columns):
Area Code       14 non-null int64
Area            14 non-null object
Item Code       14 non-null int64
Item            14 non-null object
Element Code    14 non-null int64
Element         14 non-null object
Unit            14 non-null object
1961            14 non-null float64
1962            14 non-null float64
1963            14 non-null float64
1964            14 non-null float64
1965            14 non-null float64
1966            14 non-null float64
1967            14 non-null float64
1968            14 non-null float64
1969            14 non-null float64
1970            14 non-null float64
1971            14 non-null float64
1972            14 non-null float64
1973            14 non-null float64
1974            14 non-null float64
1975            14 non-null float64
1976            14 non-null float64
1977            14 non-null float64
1978            14 non-null float64
19

In [372]:
prod_world = emissions[(emissions.Element == 'Production') & (emissions.Area == 'World')]
tot_prod_world = prod_world.sum()[7:] / 1e6
tot_prod_world.index = tot_prod_world.index.astype(int) # need to turn the index into int for future calculations
prod

1961    1292.45
1962    1354.45
1963    1372.76
1964     1431.7
1965     1448.6
1966    1541.48
1967    1599.91
1968    1647.83
1969    1661.41
1970    1688.29
1971    1802.54
1972    1775.32
1973    1881.27
1974    1865.16
1975    1906.05
1976    2022.07
1977    2032.67
1978    2169.54
1979    2137.97
1980     2161.4
1981    2250.65
1982     2324.2
1983    2282.41
1984    2451.77
1985    2500.92
1986    2528.69
1987     2469.9
1988    2441.03
1989    2593.94
1990    2684.17
1991    2619.19
1992    2699.84
1993    2637.92
1994    2702.12
1995    2662.98
1996    2830.28
1997     2876.7
1998    2884.75
1999    2901.41
2000    2888.92
2001    2947.65
2002    2918.91
2003    2957.91
2004    3184.65
2005    3193.03
2006    3211.06
2007    3325.76
2008    3522.97
2009    3505.72
2010    3505.77
2011    3651.07
2012    3650.76
2013    3873.08
2014    3959.54
2015    3949.91
2016    4002.44
dtype: object

In [373]:
p0 = figure(plot_height=400, title="Total world agriculture production since 1960", tools = 'hover')
p0.line(tot_prod_world.index, tot_prod_world.values, line_width=0.9, legend = "World total agriculture production")

p0.xaxis.axis_label = 'Year'
p0.yaxis.axis_label = 'Total production (Mtonnes)'
p0.legend.location = "top_left"
show(p0)

In [374]:
from sklearn import linear_model
prod_world_pred = linear_model.LinearRegression()

prod_world_pred.fit([[x] for x in tot_prod_world.index], tot_prod_world.values)

m = prod_world_pred.coef_[0]
b = prod_world_pred.intercept_
print("slope=", m, "intercept=", b)

slope= 45.45504429569378 intercept= -87844.76878545136


In [375]:
future = np.arange(2025, 2051, 25)
years = np.concatenate([prod.index.values,future])

In [376]:
# Let's add the regression to the previous plot
prod_world_pred_d = [prod_world_pred.coef_ * i + prod_world_pred.intercept_ for i in years]

p0.line(years, prod_world_pred_d, line_width=1, color='gray', line_dash=[4, 4], legend = "Predicted values")

p0.y_range = Range1d(min(prod_world_pred_d).item(0), max(prod_world_pred_d).item(0))
show(p0)

Lhe linear increases holds true from 1960 to 2000 but of course this is not a perfect fit. The regression most like would not fit with ealier or future times. Since the pre-industrial revolution times, the increase was probably expantial. For the future, our model does not take in account future issues such as land limits, decrease in yields due to climate change.

In [399]:
emissions.Area.unique()
Area = ['World', 'Africa',
       'Eastern Africa', 'Middle Africa', 'Northern Africa',
       'Southern Africa', 'Western Africa', 'Americas',
       'Northern America', 'Central America', 'Caribbean',
       'South America', 'Asia', 'Central Asia', 'Eastern Asia',
       'Southern Asia', 'South-Eastern Asia', 'Western Asia', 'Europe',
       'Eastern Europe', 'Northern Europe', 'Southern Europe',
       'Western Europe', 'Oceania', 'Australia & New Zealand',
       'Melanesia', 'Micronesia', 'Polynesia', 'European Union',
       'Least Developed Countries', 'Land Locked Developing Countries',
       'Small Island Developing States',
       'Low Income Food Deficit Countries',
       'Net Food Importing Developing Countries', 'Annex I countries',
       'Non-Annex I countries', 'OECD']

prod_country = emissions[(emissions.Element == 'Production') & (~emissions.Area.isin(Area))]
prod_country = prod_country.drop(columns=['Area Code', 'Item Code','Element Code'])
prod_country = prod_country.groupby('Area').sum()
prod_country

Unnamed: 0_level_0,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,...,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016
Area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,4401720.0,4407875.0,4165140.0,4538050.0,4658575.0,4417590.0,4832912.0,4976212.0,5099912.0,4453500.0,...,7.814318e+06,5.956178e+06,8.555158e+06,8.010544e+06,6.765330e+06,8.560700e+06,8.684129e+06,8.807396e+06,7.734861e+06,7.443596e+06
Albania,491787.0,515378.0,499204.0,560824.0,553535.0,658715.0,727169.0,739292.0,772536.0,795645.0,...,1.639484e+06,1.767008e+06,1.796008e+06,1.886413e+06,1.927158e+06,1.946786e+06,1.977037e+06,1.968486e+06,1.967438e+06,1.986795e+06
Algeria,1369376.0,2785200.0,2723739.0,1925421.0,2212857.0,1287413.0,2198858.0,2723148.0,2483187.0,2687234.0,...,6.990719e+06,5.008809e+06,8.923676e+06,8.231804e+06,8.661527e+06,9.787355e+06,9.847840e+06,8.571521e+06,9.130518e+06,8.660438e+06
American Samoa,463.0,409.0,511.0,616.0,750.0,750.0,492.0,436.0,379.0,375.0,...,4.240000e+02,4.120000e+02,4.160000e+02,4.200000e+02,4.090000e+02,4.080000e+02,4.080000e+02,4.070000e+02,4.070000e+02,4.080000e+02
Angola,688099.0,693299.0,671484.0,728752.0,730509.0,672580.0,691027.0,706716.0,836291.0,771177.0,...,1.157221e+06,1.140268e+06,1.415884e+06,1.594406e+06,1.822431e+06,9.340576e+05,2.118154e+06,2.284557e+06,2.520925e+06,2.135018e+06
Antigua and Barbuda,2790.0,2829.0,2864.0,3337.0,3298.0,3256.0,3275.0,3477.0,3642.0,3673.0,...,6.987206e+03,7.215000e+03,6.950463e+03,3.769554e+03,6.839075e+03,6.251449e+03,6.190964e+03,6.144505e+03,3.476565e+03,3.167625e+03
Argentina,21375639.0,20974994.0,24758088.0,27827476.0,20716107.0,25021767.0,27109014.0,24193267.0,26562240.0,27357885.0,...,5.736344e+07,6.035777e+07,4.210059e+07,5.600989e+07,6.742442e+07,6.425008e+07,6.742854e+07,7.257132e+07,7.285085e+07,8.272607e+07
Armenia,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.177804e+06,1.162985e+06,1.081737e+06,1.026319e+06,1.133844e+06,1.164554e+06,1.300692e+06,1.384886e+06,1.445734e+06,1.462397e+06
Australia,16946154.0,19606859.0,20518848.0,21921071.0,18773312.0,25767839.0,19256561.0,27907195.0,23469396.0,22896351.0,...,3.137958e+07,3.984705e+07,4.723962e+07,4.613926e+07,5.295120e+07,5.690138e+07,4.965097e+07,5.287655e+07,5.166882e+07,4.768488e+07
Austria,5717478.0,5877259.0,5833488.0,6038316.0,5778863.0,6392368.0,6832440.0,6946438.0,7294197.0,6950026.0,...,8.630847e+06,9.620548e+06,9.063037e+06,8.792660e+06,9.747282e+06,8.991637e+06,8.718857e+06,9.857726e+06,9.064342e+06,9.967694e+06


In [422]:
top_countries = prod_country.sort_values(by='2005', ascending = False).index.unique()[:11]
top_countries = np.delete(top_countries, 1, 0) # to remove china, mainland but keep china
top_countries = sorted(top_countries)
top_countries

top_prod = prod_country[prod_country.index.isin(top_countries)]
top_prod = top_prod.transpose()# transpose the table will make it easier to plot
top_prod = (top_prod / 1e6) # pass the values to Megatonnes
top_prod.index.name = 'year'
#top_prod = top_prod.reset_index()
top_prod

Area,Brazil,Canada,China,France,Germany,India,Indonesia,Pakistan,Russian Federation,United States of America
year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1961,22.614802,25.464487,113.489007,43.668031,45.648718,109.558747,14.925783,13.0917,0.0,239.901475
1962,23.793985,37.807986,124.583404,47.795171,49.095466,109.207306,16.822444,13.47701,0.0,239.236842
1963,24.613541,42.769688,143.589929,48.290605,49.996718,112.192848,14.526273,13.90922,0.0,251.945365
1964,25.370397,37.320855,159.81925,48.65966,52.148968,115.159462,16.663816,14.4601,0.0,240.007858
1965,29.833072,40.853216,170.211329,53.172384,51.327329,100.901855,15.943624,14.92869,0.0,261.10619
1966,27.508485,47.286106,186.369275,51.362572,52.115979,101.496118,17.964666,14.4094,0.0,260.803275
1967,30.06201,38.832737,190.33623,58.270523,57.25246,117.383512,16.174301,15.45098,0.0,284.977758
1968,30.753505,43.151897,186.250261,60.25139,59.701364,125.763555,20.947234,18.28379,0.0,279.082475
1969,31.104159,44.683626,185.349026,59.75376,58.884199,130.053224,20.917576,19.28254,0.0,281.535252
1970,34.47834,37.065111,209.609494,58.040381,56.90774,136.929592,22.792042,20.006157,0.0,264.249728


In [423]:
numlines = len(top_prod.columns)
colors=brewer['Paired'][11]
xs = [top_prod.index.values]*numlines
ys = [top_prod[name].values for name in top_prod]

p = figure(plot_height=500, plot_width=800, title="Agriculture production since 1960 for the 10 largest producers", tools = 'hover')

for (colr, leg, x, y ) in zip(colors, top_countries, xs, ys):
    p.line(x, y, color = colr, legend = leg, line_width = 2)
    
p.xaxis.axis_label = 'Year'
p.yaxis.axis_label = 'Production (tonnes)'
p.legend.location = "top_left"

show(p)

In [None]:
emissions_int_world = emissions[(emissions.Element == 'Emissions intensity') & (emissions.Area == 'World')]
emissions_int_world.info() # Find if any empyt cells are found in the subdataset

In [128]:
# Pivot the table to plot it to have rows of years

tot_emission_int = emissions_int_world.drop(['Area', 'Area Code','Element Code','Item Code'],axis=1)
tot_emission_int = tot_emission_int.pivot_table(index=None, columns='Item')
tot_emission_int

Item,Cereals excluding rice,"Eggs, hen, in shell","Meat, buffalo","Meat, cattle","Meat, chicken","Meat, goat","Meat, pig","Meat, sheep","Milk, whole fresh buffalo","Milk, whole fresh camel","Milk, whole fresh cow","Milk, whole fresh goat","Milk, whole fresh sheep","Rice, paddy"
1961,0.1618,1.1505,103.2366,37.588,1.0073,53.4359,3.4908,43.8651,1.6897,3.8394,1.6273,2.2959,5.1939,1.8606
1962,0.163,1.1476,101.1451,36.5186,1.0115,55.2056,3.4938,43.252,1.6719,3.8313,1.6105,2.3469,5.2,1.8387
1963,0.1719,1.1287,100.0774,35.1963,1.009,56.6251,3.3577,43.3207,1.6584,3.7916,1.6272,2.3051,5.0408,1.6947
1964,0.1759,1.1012,101.5181,35.4845,0.9839,55.3735,3.2808,44.027,1.6496,3.7753,1.5882,2.3267,5.0146,1.6617
1965,0.1871,1.1035,100.6839,35.8378,0.929,53.5321,3.1694,44.4964,1.6492,3.759,1.5194,2.381,4.8876,1.7231
1966,0.188,1.1077,105.4525,34.7524,0.8909,52.3708,3.0807,44.0231,1.6468,3.7764,1.5143,2.3609,4.844,1.7108
1967,0.1926,1.1032,106.5046,34.0016,0.8923,51.9045,3.1317,43.5618,1.6895,3.724,1.474,2.4182,5.0444,1.6304
1968,0.1971,1.0998,106.8264,32.978,0.8912,51.8701,3.1049,42.8697,1.6632,3.7097,1.4525,2.4835,5.2116,1.5991
1969,0.2045,1.0846,107.1353,32.2959,0.8541,50.696,3.1155,43.7275,1.6549,3.7154,1.4494,2.5134,5.1329,1.5837
1970,0.2162,1.0904,105.4367,32.4799,0.8133,49.895,3.0368,41.732,1.7136,3.4356,1.4237,2.5372,5.1478,1.5094


In [95]:
p2 = figure(x_range=tot_emission_int.columns.tolist(), plot_height=500, title="Total of the GHG emissions intensity for agriculture for 1961 - 2016")
p2.vbar(x=tot_emission_int.columns.tolist(), top=tot_emission_int.sum(), width=0.9)

# Set some properties to make the plot look better
p2.xgrid.grid_line_color = None
p2.y_range.start = 0
p2.xaxis.major_label_orientation = 1
p2.yaxis.axis_label = 'GHG emissions (kg CO2eq/kg product)'
show(p2)
##### should try to sort the values for eaiser conclusions

In [96]:
#### from here i decided to switch the plot axis (eaiser?) stack by products instead of year
# categories will be x axis
# years will be stackers
years = ['1961', '1971', '1981', '1991', '2001', '2011']
products = sorted(emissions_int_world['Item'].tolist())

temp_d = emissions_int_world[years].join(emissions_int_world['Item'])
temp_d = temp_d.sort_values(by=['Item']).reset_index(drop=True)
temp_d

Unnamed: 0,1961,1971,1981,1991,2001,2011,Item
0,0.1618,0.2053,0.2414,0.2413,0.2277,0.225,Cereals excluding rice
1,1.1505,1.0897,0.8777,0.7639,0.6644,0.6773,"Eggs, hen, in shell"
2,103.2366,103.2246,90.8772,74.2182,65.4022,57.1905,"Meat, buffalo"
3,37.588,33.3138,31.2939,28.437,27.6655,25.5779,"Meat, cattle"
4,1.0073,0.8079,0.7553,0.7722,0.6543,0.5983,"Meat, chicken"
5,53.4359,49.2669,46.3433,36.784,34.3226,30.3772,"Meat, goat"
6,3.4908,3.0872,2.7672,2.2297,1.7479,1.5846,"Meat, pig"
7,43.8651,41.2118,39.5268,34.3866,24.1582,22.1005,"Meat, sheep"
8,1.6897,1.7338,1.5547,1.4486,1.0854,0.9653,"Milk, whole fresh buffalo"
9,3.8394,3.425,3.2587,3.3682,3.0666,2.6224,"Milk, whole fresh camel"


In [91]:
from bokeh.palettes import brewer
from bokeh.models import ColumnDataSource, value

source = ColumnDataSource(data=temp_d)

p3 = figure(x_range = products, plot_width=800, title="Total of the GHG emissions intensity for agriculture")#, toolbar_location='above', tools=TOOLS)
colors = brewer['Spectral'][6]

p3.vbar_stack(years, x='Item', width=0.9, source=source, color=colors,
                legend=[value(x) for x in years])

p3.xgrid.grid_line_color = None
p3.y_range.start = 0
p3.xaxis.major_label_orientation = 1
p3.xaxis.axis_label = 'Year'
p3.yaxis.axis_label = 'GHG emssions (kg CO2eq/kg product)'
show(p3)

In [129]:
# No we focus on meat to display stack emissions per meat
tot_emission_int.reset_index(inplace=True) # turn the year index into columns
tot_emission_int = tot_emission_int.rename(columns={'index' : 'years'})# rename the column
tot_emission_int
#d_tot_emission.years = pd.to_numeric(d_tot_emission.years, downcast='integer') # turn years into int for filtering

Item,years,Cereals excluding rice,"Eggs, hen, in shell","Meat, buffalo","Meat, cattle","Meat, chicken","Meat, goat","Meat, pig","Meat, sheep","Milk, whole fresh buffalo","Milk, whole fresh camel","Milk, whole fresh cow","Milk, whole fresh goat","Milk, whole fresh sheep","Rice, paddy"
0,1961,0.1618,1.1505,103.2366,37.588,1.0073,53.4359,3.4908,43.8651,1.6897,3.8394,1.6273,2.2959,5.1939,1.8606
1,1962,0.163,1.1476,101.1451,36.5186,1.0115,55.2056,3.4938,43.252,1.6719,3.8313,1.6105,2.3469,5.2,1.8387
2,1963,0.1719,1.1287,100.0774,35.1963,1.009,56.6251,3.3577,43.3207,1.6584,3.7916,1.6272,2.3051,5.0408,1.6947
3,1964,0.1759,1.1012,101.5181,35.4845,0.9839,55.3735,3.2808,44.027,1.6496,3.7753,1.5882,2.3267,5.0146,1.6617
4,1965,0.1871,1.1035,100.6839,35.8378,0.929,53.5321,3.1694,44.4964,1.6492,3.759,1.5194,2.381,4.8876,1.7231
5,1966,0.188,1.1077,105.4525,34.7524,0.8909,52.3708,3.0807,44.0231,1.6468,3.7764,1.5143,2.3609,4.844,1.7108
6,1967,0.1926,1.1032,106.5046,34.0016,0.8923,51.9045,3.1317,43.5618,1.6895,3.724,1.474,2.4182,5.0444,1.6304
7,1968,0.1971,1.0998,106.8264,32.978,0.8912,51.8701,3.1049,42.8697,1.6632,3.7097,1.4525,2.4835,5.2116,1.5991
8,1969,0.2045,1.0846,107.1353,32.2959,0.8541,50.696,3.1155,43.7275,1.6549,3.7154,1.4494,2.5134,5.1329,1.5837
9,1970,0.2162,1.0904,105.4367,32.4799,0.8133,49.895,3.0368,41.732,1.7136,3.4356,1.4237,2.5372,5.1478,1.5094


In [43]:
allyears = tot_emission_int.years.tolist()
meat_products = ['Meat, buffalo',
                 'Meat, cattle',
                 'Meat, chicken',
                 'Meat, goat',
                 'Meat, pig',
                 'Meat, sheep']
meat_emissions = tot_emission_int[meat_products].join(tot_emission_int['years'])
meat_emissions
meat_emissions.columns = ['Buffalo', 'Cattle', 'Chicken', 'Goat', 'Pig', 'Sheep', 'years']

In [92]:
source1 = ColumnDataSource(data=meat_emissions)

p5 = figure(x_range = allyears, plot_width=800, title="GHG emission intensity per meat since 1961", tools = 'hover')#, toolbar_location='above', tools=TOOLS)
colors = brewer['Paired'][6]

p5.line(meat_emissions['years'], meat_emissions['Buffalo'], legend="Buffalo meat", line_color = colors[0], line_width = 3)
p5.line(meat_emissions['years'], meat_emissions['Cattle'], legend="Cattle meat", line_color = colors[1], line_width = 3)
p5.line(meat_emissions['years'], meat_emissions['Chicken'], legend="Chicken meat", line_color = colors[2], line_width = 3)
p5.line(meat_emissions['years'], meat_emissions['Goat'], legend="Goat meat", line_color = colors[3], line_width = 3)
p5.line(meat_emissions['years'], meat_emissions['Pig'], legend="Pig meat", line_color = colors[4], line_width = 3)
p5.line(meat_emissions['years'], meat_emissions['Sheep'], legend="Sheep meat", line_color = colors[5], line_width = 3)


#p1.xgrid.grid_line_color = None
#p1.y_range.start = 0
p5.xaxis.major_label_orientation = 1
p5.xaxis.axis_label = 'Year'
p5.yaxis.axis_label = 'GHG emssions intensity (kg CO2eq/kg product)'
#p1.legend.orientation = "horizontal"
show(p5)

In [51]:
# This is great but we sould look at total emissions rather than intensity
emissions_world = emissions[emissions.Area == 'World']
emissions_world = emissions_world.drop(['Area', 'Area Code','Element Code','Item Code'],axis=1)
tot_emission = emissions_world[emissions_world.Element == 'Emissions (CO2eq)'].pivot_table(index=None, columns='Item')

tot_emission.reset_index(inplace=True) # turn the year index into columns
tot_emission = tot_emission.rename(columns={'index' : 'years'})# rename the column

tot_emission_meat = tot_emission[meat_products].join(tot_emission_int['years'])


In [52]:
tot_emission_meat.columns = ['Buffalo', 'Cattle', 'Chicken', 'Goat', 'Pig', 'Sheep', 'years']
tot_emission_meat

Unnamed: 0,Buffalo,Cattle,Chicken,Goat,Pig,Sheep,years
0,110582.3045,1040606.232,7611.1211,58880.2143,86392.6475,216268.5358,1961
1,111699.0235,1066464.756,7975.4951,62224.0881,91035.168,217974.9823,1962
2,112131.2,1086008.292,8447.8633,63718.86,94073.2604,218251.7541,1963
3,117817.0571,1109833.961,8537.6829,63217.7011,94089.147,220859.9061,1964
4,119016.2957,1141737.479,8746.0995,62773.9873,99155.8477,224239.716,1965
5,126511.1802,1166097.592,8908.0017,63237.2844,99852.9953,224974.1619,1966
6,130797.0168,1199209.878,9427.5908,63456.5663,106056.2557,229625.434,1967
7,134912.7118,1219039.366,9831.1443,64546.189,106817.5009,232707.022,1968
8,137470.2069,1224953.512,10200.1685,64641.2943,106308.9064,235040.8148,1969
9,138438.4749,1245584.72,10687.3161,64588.3205,108707.0268,231047.368,1970


In [53]:
source1 = ColumnDataSource(data=meat_emissions)

p2 = figure(x_range = allyears, plot_width=800, title="Total GHG emission per animal since 1961", tools = 'hover')#, toolbar_location='above', tools=TOOLS)
colors = brewer['Paired'][6]

p2.line(tot_emission_meat['years'], tot_emission_meat['Buffalo'], legend="Buffalo meat", line_color = colors[0], line_width = 3)
p2.line(tot_emission_meat['years'], tot_emission_meat['Cattle'], legend="Cattle meat", line_color = colors[1], line_width = 3)
p2.line(tot_emission_meat['years'], tot_emission_meat['Chicken'], legend="Chicken meat", line_color = colors[2], line_width = 3)
p2.line(tot_emission_meat['years'], tot_emission_meat['Goat'], legend="Goat meat", line_color = colors[3], line_width = 3)
p2.line(tot_emission_meat['years'], tot_emission_meat['Pig'], legend="Pig meat", line_color = colors[4], line_width = 3)
p2.line(tot_emission_meat['years'], tot_emission_meat['Sheep'], legend="Sheep meat", line_color = colors[5], line_width = 3)


#p1.xgrid.grid_line_color = None
#p1.y_range.start = 0
p2.xaxis.major_label_orientation = 1
p2.xaxis.axis_label = 'Year'
p2.yaxis.axis_label = 'GHG emssions (kilotonnes)'
p2.legend.location = "center_right"
show(p2)

In [54]:
emissions_count = emissions[emissions.Element == 'Emissions (CO2eq)'].drop(columns = ['Area Code','Item Code','Element Code', ])
emissions_count

Unnamed: 0,Area,Item,Element,Unit,1961,1962,1963,1964,1965,1966,...,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016
1,Afghanistan,Cereals excluding rice,Emissions (CO2eq),gigagrams,402.2165,408.3269,385.7396,406.7923,410.0940,392.3671,...,5.585772e+02,3.999802e+02,6.229562e+02,5.694834e+02,5.021896e+02,9.308136e+02,7.657368e+02,7.490513e+02,6.646333e+02,6.576652e+02
4,Afghanistan,"Rice, paddy",Emissions (CO2eq),gigagrams,665.5675,665.5675,665.5675,699.3576,699.3576,703.5647,...,5.642252e+02,6.268125e+02,6.644793e+02,6.896050e+02,7.030602e+02,7.524375e+02,7.080397e+02,7.462642e+02,5.669570e+02,4.251177e+02
7,Afghanistan,"Meat, cattle",Emissions (CO2eq),gigagrams,1576.9262,1791.9616,1806.2973,1842.1366,1813.4652,1892.3115,...,9.726768e+02,1.035754e+03,1.018551e+03,1.270859e+03,1.235737e+03,8.916801e+02,8.565577e+02,9.311033e+02,1.194205e+03,1.186901e+03
10,Afghanistan,"Milk, whole fresh cow",Emissions (CO2eq),gigagrams,1172.1377,1172.1377,1306.0963,1306.0963,1456.7998,1607.5032,...,5.023447e+03,5.525792e+03,5.525792e+03,6.530482e+03,6.363033e+03,6.697930e+03,6.764909e+03,6.781654e+03,6.019667e+03,5.992188e+03
13,Afghanistan,"Meat, goat",Emissions (CO2eq),gigagrams,717.8808,693.8384,607.4048,543.4226,454.2740,454.2740,...,6.669083e+02,8.799692e+02,7.571233e+02,9.701842e+02,1.116490e+03,1.068717e+03,1.026062e+03,9.795335e+02,1.076443e+03,1.036090e+03
16,Afghanistan,"Milk, whole fresh goat",Emissions (CO2eq),gigagrams,177.8707,177.8707,203.0370,203.0370,228.2034,228.2034,...,4.819996e+02,4.819996e+02,4.819996e+02,4.777341e+02,5.118580e+02,4.905306e+02,4.747483e+02,5.259689e+02,5.706735e+02,5.523760e+02
19,Afghanistan,"Meat, sheep",Emissions (CO2eq),gigagrams,2325.0786,2348.4854,2332.8808,2340.6831,2412.8542,2740.5498,...,2.740550e+02,8.153380e+02,1.187312e+03,1.284645e+03,1.354148e+03,1.309090e+03,1.240212e+03,1.274675e+03,1.247456e+03,1.251954e+03
22,Afghanistan,"Milk, whole fresh sheep",Emissions (CO2eq),gigagrams,1185.9461,1191.7978,1275.6723,1365.3985,1410.2616,1277.6229,...,1.306881e+03,1.273722e+03,1.209353e+03,1.306881e+03,1.427754e+03,1.386597e+03,1.323031e+03,1.355668e+03,1.330807e+03,1.335515e+03
25,Afghanistan,"Milk, whole fresh camel",Emissions (CO2eq),gigagrams,31.2559,35.0066,43.7583,41.2578,37.5071,37.5071,...,3.250610e+01,3.163100e+01,3.250610e+01,3.250610e+01,3.250610e+01,3.250610e+01,3.250610e+01,2.730270e+01,2.718260e+01,2.725140e+01
28,Afghanistan,"Meat, chicken",Emissions (CO2eq),gigagrams,2.7665,1.9761,1.5809,1.9761,1.1857,1.1857,...,3.300100e+00,6.280100e+00,5.900600e+00,1.852790e+01,1.493140e+01,1.467060e+01,1.009000e+01,6.315600e+00,9.982300e+00,1.000170e+01


In [140]:
emissions_count_top = emissions_count.groupby('Area').sum()
emissions_count_top.sort_values(by = '2007', ascending = False)
emissions_count

Unnamed: 0,Area,Item,Element,Unit,1961,1962,1963,1964,1965,1966,...,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016
1,Afghanistan,Cereals excluding rice,Emissions (CO2eq),gigagrams,402.2165,408.3269,385.7396,406.7923,410.0940,392.3671,...,5.585772e+02,3.999802e+02,6.229562e+02,5.694834e+02,5.021896e+02,9.308136e+02,7.657368e+02,7.490513e+02,6.646333e+02,6.576652e+02
4,Afghanistan,"Rice, paddy",Emissions (CO2eq),gigagrams,665.5675,665.5675,665.5675,699.3576,699.3576,703.5647,...,5.642252e+02,6.268125e+02,6.644793e+02,6.896050e+02,7.030602e+02,7.524375e+02,7.080397e+02,7.462642e+02,5.669570e+02,4.251177e+02
7,Afghanistan,"Meat, cattle",Emissions (CO2eq),gigagrams,1576.9262,1791.9616,1806.2973,1842.1366,1813.4652,1892.3115,...,9.726768e+02,1.035754e+03,1.018551e+03,1.270859e+03,1.235737e+03,8.916801e+02,8.565577e+02,9.311033e+02,1.194205e+03,1.186901e+03
10,Afghanistan,"Milk, whole fresh cow",Emissions (CO2eq),gigagrams,1172.1377,1172.1377,1306.0963,1306.0963,1456.7998,1607.5032,...,5.023447e+03,5.525792e+03,5.525792e+03,6.530482e+03,6.363033e+03,6.697930e+03,6.764909e+03,6.781654e+03,6.019667e+03,5.992188e+03
13,Afghanistan,"Meat, goat",Emissions (CO2eq),gigagrams,717.8808,693.8384,607.4048,543.4226,454.2740,454.2740,...,6.669083e+02,8.799692e+02,7.571233e+02,9.701842e+02,1.116490e+03,1.068717e+03,1.026062e+03,9.795335e+02,1.076443e+03,1.036090e+03
16,Afghanistan,"Milk, whole fresh goat",Emissions (CO2eq),gigagrams,177.8707,177.8707,203.0370,203.0370,228.2034,228.2034,...,4.819996e+02,4.819996e+02,4.819996e+02,4.777341e+02,5.118580e+02,4.905306e+02,4.747483e+02,5.259689e+02,5.706735e+02,5.523760e+02
19,Afghanistan,"Meat, sheep",Emissions (CO2eq),gigagrams,2325.0786,2348.4854,2332.8808,2340.6831,2412.8542,2740.5498,...,2.740550e+02,8.153380e+02,1.187312e+03,1.284645e+03,1.354148e+03,1.309090e+03,1.240212e+03,1.274675e+03,1.247456e+03,1.251954e+03
22,Afghanistan,"Milk, whole fresh sheep",Emissions (CO2eq),gigagrams,1185.9461,1191.7978,1275.6723,1365.3985,1410.2616,1277.6229,...,1.306881e+03,1.273722e+03,1.209353e+03,1.306881e+03,1.427754e+03,1.386597e+03,1.323031e+03,1.355668e+03,1.330807e+03,1.335515e+03
25,Afghanistan,"Milk, whole fresh camel",Emissions (CO2eq),gigagrams,31.2559,35.0066,43.7583,41.2578,37.5071,37.5071,...,3.250610e+01,3.163100e+01,3.250610e+01,3.250610e+01,3.250610e+01,3.250610e+01,3.250610e+01,2.730270e+01,2.718260e+01,2.725140e+01
28,Afghanistan,"Meat, chicken",Emissions (CO2eq),gigagrams,2.7665,1.9761,1.5809,1.9761,1.1857,1.1857,...,3.300100e+00,6.280100e+00,5.900600e+00,1.852790e+01,1.493140e+01,1.467060e+01,1.009000e+01,6.315600e+00,9.982300e+00,1.000170e+01


In [56]:
def total_emission_country(country):
    
    ghg = emissions_count_top.loc[country]
    p3 = figure(x_range = allyears, plot_width=800, tools = 'hover')#, toolbar_location='above', tools=TOOLS)
    colors = brewer['Paired'][6]
    
    p3.line(ghg.index, ghg, line_color = colors[0], line_width = 3)    
    
    p3.xaxis.major_label_orientation = 1
    p3.xaxis.axis_label = 'Year'
    p3.yaxis.axis_label = 'GHG emssions (kilotonnes)'
    show(p3)

In [57]:
total_emission_country('France')

In [58]:
from ipywidgets import interact, interactive, fixed, interact_manual
all_area = emissions_count_top.index.values
_ = interact(total_emission_country, country=list(all_area))

interactive(children=(Dropdown(description='country', options=('Afghanistan', 'Africa', 'Albania', 'Algeria', …

In [137]:
def total_emission_animal_country(country):
    
    ghg = emissions_count_top.loc[country]
    ghg_country_buffalo = emissions_count[(emissions_count.Area == country) & (emissions_count.Item == products[2])].pivot_table(index=None, columns='Item')
    ghg_country_cattle = emissions_count[(emissions_count.Area == country) & (emissions_count.Item == products[3])].pivot_table(index=None, columns='Item')
    ghg_country_chicken = emissions_count[(emissions_count.Area == country) & (emissions_count.Item == products[4])].pivot_table(index=None, columns='Item')
    ghg_country_goat = emissions_count[(emissions_count.Area == country) & (emissions_count.Item == products[5])].pivot_table(index=None, columns='Item')
    ghg_country_pig = emissions_count[(emissions_count.Area == country) & (emissions_count.Item == products[6])].pivot_table(index=None, columns='Item')
    ghg_country_sheep = emissions_count[(emissions_count.Area == country) & (emissions_count.Item == products[7])].pivot_table(index=None, columns='Item')
    
                
    p = figure(x_range = allyears, plot_width=800, title="Total GHG emission per animal since 1961", tools = 'hover')#, toolbar_location='above', tools=TOOLS)
    colors = brewer['Paired'][6]
    
    # Turns out that not every country farm every animal (no pigs in Afghanistan or no buffalo in France for exemple)
    # for this reason every plot line is subjected to a condition
    if ghg_country_buffalo.empty == False:
        p.line(ghg_country_buffalo.index.values, ghg_country_buffalo.iloc[:,0], legend="Buffalo meat", line_color = colors[0], line_width = 3)
    if ghg_country_cattle.empty == False:
        p.line(ghg_country_cattle.index.values, ghg_country_cattle.iloc[:,0], legend="Cattle meat", line_color = colors[1], line_width = 3)
    if ghg_country_chicken.empty == False:    
        p.line(ghg_country_chicken.index.values, ghg_country_chicken.iloc[:,0], legend="Chicken meat", line_color = colors[2], line_width = 3)
    if ghg_country_goat.empty == False:   
        p.line(ghg_country_goat.index.values, ghg_country_goat.iloc[:,0], legend="Goat meat", line_color = colors[3], line_width = 3)
    if ghg_country_pig.empty == False:   
        p.line(ghg_country_pig.index.values, ghg_country_pig.iloc[:,0], legend="Pig meat", line_color = colors[4], line_width = 3)
    if ghg_country_sheep.empty == False:   
        p.line(ghg_country_sheep.index.values, ghg_country_sheep.iloc[:,0], legend="Sheep meat", line_color = colors[5], line_width = 3)
    p.line(ghg.index, ghg, line_color = 'black', line_dash = [4, 4], legend="Total emissions from agriculture",  line_width = 3)
    
    #p1.xgrid.grid_line_color = None
    #p1.y_range.start = 0
    p.xaxis.major_label_orientation = 1
    p.xaxis.axis_label = 'Year'
    p.yaxis.axis_label = 'GHG emssions (kilotonnes)'
    p.legend.location = "top_left"
    show(p)

In [138]:
total_emission_animal_country('World')

In [139]:
all_area = emissions_count.Area.unique()
_ = interact(total_emission_animal_country, country=list(all_area))

interactive(children=(Dropdown(description='country', options=('Afghanistan', 'Albania', 'Algeria', 'American …

In [164]:
def emission_easyhist(country,year):
    """Plot an histogram of the GHG emission for each item for a given country and year."""
    emission = emissions_count[emissions_count.Area == country]
    year = str(year)
    p7 = figure(x_range=emission.Item.unique(), plot_height=500, title="Total of the GHG emissions intensity for agriculture")    
    p7.vbar(x=emission.Item.unique(), top=emission[year], width=0.9)
    
    # Set some properties to make the plot look better
    p7.xgrid.grid_line_color = None
    p7.y_range.start = 0
    p7.xaxis.major_label_orientation = 1
    p7.yaxis.axis_label = 'GHG emissions (kilotonnes)'
    show(p7)

In [165]:
emission_easyhist('France',2006)

In [166]:
import ipywidgets as widgets
_ = widgets.interact(emission_easyhist, 
                     country=list(all_area),
                     year=widgets.IntSlider(min=1961, max=2016, value=2010))

interactive(children=(Dropdown(description='country', options=('Afghanistan', 'Albania', 'Algeria', 'American …