In [3]:
import pandas as pd
import os
import numpy as np
from crop_production_analysis import *
import pickle

In [4]:
filename="../Data/current_FAO/raw_files/Production_Crops_E_All_Data_(Normalized).csv"
df=pd.read_csv(filename,encoding = "ISO-8859-1")
df.sample(5)

Unnamed: 0,Area Code,Area,Item Code,Item,Element Code,Element,Year Code,Year,Unit,Value,Flag
1722637,213,Turkmenistan,826,"Tobacco, unmanufactured",5510,Production,1992,1992,tonnes,2200.0,F
2081154,5203,Northern America,122,Sweet potatoes,5525,Seed,1985,1985,tonnes,47600.0,A
72649,10,Australia,329,Cottonseed,5510,Production,1965,1965,tonnes,17400.0,
2184292,5301,Central Asia,260,Olives,5312,Area harvested,2014,2014,ha,150.0,A
416825,47,Cook Islands,619,"Fruit, fresh nes",5510,Production,1989,1989,tonnes,380.0,F


# Geographical data
We are interested in geographical analysis of the data. The dataset contains a field named "Area" which correspond to the country. In ordre to plot the data on a map, we have to match the country name in our database with an ID contained in a [geoJson file](https://github.com/python-visualization/folium/blob/master/examples/data/world-countries.json). In order to get a match, some manipulation was required on the data. The row corresponding to multiple countries (for instance Belgium and Luxembourg) were simply exploded. 

In [5]:
df_with_ids=get_df_with_ids(df)

Exploding dataframe :
Getting IDs :
82  Countries without IDs


In [6]:
df_nan_ids=df_with_ids[df_with_ids.ID.isna()]
df_nan_ids=df_nan_ids.Area.unique()
for el in df_nan_ids:
    print(el)

american samoa
antigua and barbuda
bahrain
barbados
bermuda
british virgin islands
cabo verde
cayman islands
comoros
cook islands
czechia
czechoslovakia
dominica
faroe islands
grenada
guadeloupe
guam
kiribati
liechtenstein
maldives
malta
marshall islands
martinique
mauritius
micronesia (federated states of)
montserrat
nauru
niue
occupied palestinian territory
pacific islands trust territory
réunion
saint helena, ascension and tristan da cunha
saint kitts and nevis
saint lucia
saint pierre and miquelon
saint vincent and the grenadines
samoa
sao tome and principe
seychelles
singapore
syrian arab republic
leste
tokelau
tonga
tuvalu
united states virgin islands
wallis and futuna islands
western sahara
yugoslav sfr
world
africa
eastern africa
middle africa
northern africa
southern africa
western africa
americas
northern america
central america
caribbean
south america
asia
central asia
eastern asia
southern asia
south
western asia
europe
eastern europe
northern europe
southern europe
western

# Categorizing the crops
After we made sure that the data is clean and usable, we start manipulating it in order to make some observations. There are multiple to classify crops into categories. One that we are interested in are food crops. They are crops used for human consumption. The list of food crop that we use here can be found [in the world crop database](https://world-crops.com/food-crops/). From this database, we extract only the data concerning the food crops and seperate the dataframe in 4: area harvested, production, seed and yield.


In [7]:
food_crop_area_df , food_crop_production_df , food_crop_seed_df , food_crop_yield_df = get_food_crop_data(df_with_ids)

# Visualising the data
After cleaning and reorganising, we can visualise the data. In order to compare results, we plotted the data with two different libraries. The first, folium uses the same [geoJson file](https://github.com/python-visualization/folium/blob/master/examples/data/world-countries.json) to create an overlay. The second, bokeh, uses geopandas and a [dataframe](https://www.naturalearthdata.com/downloads/110m-cultural-vectors/) containing the geographical and geometrical infos. 

### Pros and cons
Folium is easy to use and might provide the use of popups but in my opinion lacks details in the colorbar personalization. Bokeh on the other hand, show great personalisation tools but is harder to go around.

In [8]:
visualise_world_data_folium(food_crop_production_df,1993)

In [9]:
show(visualise_world_data_bokeh(food_crop_production_df[food_crop_production_df.Year==1993][['ID','Value']].groupby('ID').sum()))