# SOIL & FOOD DATA - So what and what now?

## Data Sources

- **The Global Soil Dataset for Earth System Modeling** Soil Organic Carbon Density dataset at 5 minute resolution
    - Land-Atmosphere Interaction Research Group at Sun Yat-sen University
        - http://globalchange.bnu.edu.cn/research/soilwd.jsp
- **FAOSTAT** Trade: Crops and livestock products | Trade: Detailed trade matrix | Production: Crops and livestock products
    - Food and Agriculture Organization of the United Nations
        - https://www.fao.org/faostat/en/#data/TCL
        - https://www.fao.org/faostat/en/#data/TM
        - https://www.fao.org/faostat/en/#data/QCL
        
## Non-geographical Plotting

I'll pull in the dataset I already prepared of Soil Organic Carbon Density, and I'll load the food production and trade datasets to work together with those.

In [139]:
# # view plots inside the notebook
# %matplotlib inline  
# import package dependencies for environment
import numpy as np
import pandas as pd
# import geopandas
# import matplotlib.pyplot as plt
# import plotly.offline as pyo
# # Set notebook mode to work in offline
# pyo.init_notebook_mode()
# import plotly.io as pio
# import plotly.figure_factory as ff
# import plotly.express as px
import plotly.graph_objects as go # or plotly.express as px

In [154]:
# load the cached variables from earlier SOCD analysis
%store -r gdf2flat
# load the unique lists of depths from cache also
%store -r depths

In [155]:
# write the gdf2flat to csv file for app build with less procesing steps
# better for now save it to my hack folder until I can configure storage specific for the app deployment
gdf2flat.to_csv('/Users/kathrynhurchla/Documents/hack_mylfs_GitHub_projects/gdf2flat.csv')

In [143]:
# group by depth and count group records with pandas
# shows that there are not records for all depths at all locations; 
# with the first depth containing the most
gdf2flat.groupby('depth').count()

Unnamed: 0_level_0,index,lon,lat,SOCD,geometry
depth,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
4.5,2166784,2166784,2166784,2166784,2166784
9.1,2166633,2166633,2166633,2166633,2166633
16.6,2152719,2152719,2152719,2152719,2152719
28.9,2152662,2152662,2152662,2152662,2152662
49.299999,2152593,2152593,2152593,2152593,2152593
82.900002,2151635,2151635,2151635,2151635,2151635
138.300003,2147396,2147396,2147396,2147396,2147396
229.600006,1672027,1672027,1672027,1672027,1672027


In [156]:
# take only the 4.5 depth records
gdf2flatsurface = gdf2flat[gdf2flat['depth'] == 4.5]

In [157]:
# write a CSC of only the 4.5 depth should the app be too slow or to start with
gdf2flatsurface.to_csv('/Users/kathrynhurchla/Documents/hack_mylfs_GitHub_projects/gdf2flatsurface.csv')

In [145]:
# is the count the same as the group done earlier for this depth?
gdf2flatsurface['index'].count()

2166784

In [None]:
# groupby depth with plotly.io based on example here: https://plotly.com/python/group-by/
# as a test, I'm not really clear what this is showing,
# or if I need to iterate over the records still, e.g. to show a mean
# depths contains array([  4.5       ,   9.10000038,  16.60000038,  28.89999962,
#         49.29999924,  82.90000153, 138.30000305, 229.6000061 ])

depth = depths
SOCD = gdf2flat['SOCD']

data = [dict(
  type = 'scatter',
  x = depth,
  y = SOCD,
  mode = 'markers',
  markersize = 5,
  transforms = [dict(
    type = 'groupby',
    groups = depths,
    styles = [
        dict(target =    4.5       , value = dict(marker = dict(color = 'Set1[1]'))),
        dict(target =    9.10000038, value = dict(marker = dict(color = 'Set1[2]'))),
        dict(target =   16.60000038, value = dict(marker = dict(color = 'Set1[3]'))),
        dict(target =   28.89999962, value = dict(marker = dict(color = 'Set1[4]'))),
        dict(target =   49.29999924, value = dict(marker = dict(color = 'Set1[5]'))),
        dict(target =   82.90000153, value = dict(marker = dict(color = 'Set1[6]'))),
        dict(target =  138.30000305, value = dict(marker = dict(color = 'Set1[7]'))),
        dict(target =  229.6000061 , value = dict(marker = dict(color = 'Set1[8]'))),
    ]
  )]
)]

fig_dict = dict(data=data)
pio.show(fig_dict, validate=False)

In [146]:
# check my working directory
!pwd

/Users/kathrynhurchla/Documents/GitHub/sustain-our-soil-for-our-food/analysis


In [None]:
# look for the file path of the trade file
!ls ../data

In [None]:
# check the file name of trade file
!ls ../data/Trade_CropsLivestock_E_All_Data_(Normalized)/Trade_Crops_Livestock_E_All_Data_(Normalized).csv

In [None]:
# load in the food trade data from git repository origin directory
# dftrade = pd.read_csv('../data/Trade_CropsLivestock_E_All_Data_(Normalized)/Trade_Crops_Livestock_E_All_Data_(Normalized).csv')

In [None]:
# view top of the dataframe
# dftrade.head()
# unfortunately my git lfs large file storage was shut down and is no longer showing file
# until I can correct that or take the repository off line, I will try to load the file from elsewhere

In [None]:
!ls /Users/kathrynhurchla/Documents/hack_mylfs_GitHub_projects

In [None]:
# # load in the food trade data copy freshly downloaded from an alternate directory
# # adding , encoding = "ISO-8859-1" to resolve "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 158927: invalid continuation byte"
# # alternately use the alias 'latin' for encoding
# dftrade = pd.read_csv('/Users/kathrynhurchla/Documents/hack_mylfs_GitHub_projects/Trade_Crops_Livestock_E_All_Data_(Normalized).csv', encoding = "ISO-8859-1")

In [None]:
# # view a sample top/bottom of the dataframe
# dftrade

In [None]:
# # view the column variables
# dftrade.columns

In [147]:
# but ideally what I want is to see which country exports to which country, in pairs in a record
# load in the food trade detailed matrix copy freshly downloaded from https://www.fao.org/faostat/en/#data/TM to an alternate directory
# adding , encoding = "ISO-8859-1" to resolve "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf4 in position 38698: invalid continuation byte"
# alternately use the alias 'latin' for encoding
dftrade_mx = pd.read_csv('/Users/kathrynhurchla/Documents/hack_mylfs_GitHub_projects/Trade_DetailedTradeMatrix_E_All_Data_(Normalized).csv', encoding = "ISO-8859-1")
dftrade_mx

Unnamed: 0,Reporter Country Code,Reporter Countries,Partner Country Code,Partner Countries,Item Code,Item,Element Code,Element,Year Code,Year,Unit,Value,Flag
0,2,Afghanistan,4,Algeria,230,"Cashew nuts, shelled",5910,Export Quantity,2016,2016,tonnes,3.0,*
1,2,Afghanistan,4,Algeria,230,"Cashew nuts, shelled",5922,Export Value,2016,2016,1000 US$,23.0,*
2,2,Afghanistan,4,Algeria,1293,Crude materials,5922,Export Value,2015,2015,1000 US$,1.0,*
3,2,Afghanistan,4,Algeria,1293,Crude materials,5922,Export Value,2016,2016,1000 US$,1.0,*
4,2,Afghanistan,4,Algeria,1293,Crude materials,5922,Export Value,2017,2017,1000 US$,5.0,R
...,...,...,...,...,...,...,...,...,...,...,...,...,...
39473947,181,Zimbabwe,181,Zimbabwe,826,"Tobacco, unmanufactured",5622,Import Value,1986,1986,1000 US$,571.0,
39473948,181,Zimbabwe,181,Zimbabwe,826,"Tobacco, unmanufactured",5622,Import Value,1990,1990,1000 US$,5.0,
39473949,181,Zimbabwe,181,Zimbabwe,826,"Tobacco, unmanufactured",5622,Import Value,1991,1991,1000 US$,223.0,
39473950,181,Zimbabwe,181,Zimbabwe,565,Vermouths & similar,5610,Import Quantity,1986,1986,tonnes,1.0,


In [None]:
# view a unique list of the element codes/elements
# The input to this function needs to be one-dimensional, so multiple columns will need to be combined.
# select the values and then view them in a flattened numpy array
pd.unique(dftrade_mx[['Element Code','Element']].values.ravel('K'))

In [None]:
# find the value of Element Code for Export elements
print(str('Export Quantity = Element Code: '))
print(dftrade_mx.loc[dftrade_mx['Element'] == 'Export Quantity', 'Element Code'].iloc[0])

In [149]:
# view the unique combination of area and area codes
# where 'Area Code' in table is referred to as Country Code (and/or Country Group Code for 5100+) in the Definitions and standards 
# on FAO website at https://www.fao.org/faostat/en/#data/QCL
# see the last records which are groupings of countries
# note FAO provides downloadable key file of this Country Code with ISO2, ISO3, and M49 codes for each country
# if I need it for any linkage
dftrade_mx.groupby(['Reporter Country Code','Reporter Countries']).size()

Reporter Country Code  Reporter Countries 
1                      Armenia                 84671
2                      Afghanistan             21652
3                      Albania                 77308
4                      Algeria                 97692
8                      Antigua and Barbuda     32132
                                               ...  
251                    Zambia                  77887
255                    Belgium                719112
256                    Luxembourg             155042
272                    Serbia                 155510
273                    Montenegro              71640
Length: 180, dtype: int64

In [150]:
# filter for just the 'Export  Quantity' rows by its element code identified earlier
dftrade_mx_xq = dftrade_mx[dftrade_mx['Element Code'] == 5910]
dftrade_mx_xq

Unnamed: 0,Reporter Country Code,Reporter Countries,Partner Country Code,Partner Countries,Item Code,Item,Element Code,Element,Year Code,Year,Unit,Value,Flag
0,2,Afghanistan,4,Algeria,230,"Cashew nuts, shelled",5910,Export Quantity,2016,2016,tonnes,3.0,*
5,2,Afghanistan,4,Algeria,561,Raisins,5910,Export Quantity,2014,2014,tonnes,12.0,*
7,2,Afghanistan,4,Algeria,723,Spices nes,5910,Export Quantity,2014,2014,tonnes,0.0,*
33,2,Afghanistan,1,Armenia,537,Plums dried (prunes),5910,Export Quantity,2019,2019,tonnes,0.0,*
35,2,Afghanistan,1,Armenia,561,Raisins,5910,Export Quantity,2017,2017,tonnes,1.0,R
...,...,...,...,...,...,...,...,...,...,...,...,...,...
39473867,181,Zimbabwe,251,Zambia,892,"Yoghurt, concentrated or not",5910,Export Quantity,1998,1998,tonnes,1.0,
39473868,181,Zimbabwe,251,Zambia,892,"Yoghurt, concentrated or not",5910,Export Quantity,2015,2015,tonnes,2.0,
39473869,181,Zimbabwe,251,Zambia,892,"Yoghurt, concentrated or not",5910,Export Quantity,2019,2019,tonnes,76.0,
39473890,181,Zimbabwe,181,Zimbabwe,1293,Crude materials,5910,Export Quantity,1998,1998,tonnes,0.0,


### Plot food export partners matrix

Now that I have a dataset showing where food comes from and where it's exported to, see if I can show this visually.

In [152]:
# using Plotly Graph Objects (go), plot lines on a map
# based on an example at https://plotly.com/python/lines-on-maps/
# world scope with locations by country names (collect an ISO-3 if names doesn't work well, i.e. gaps)
# dftrade_mx_xq for paths
# see for projection_type options: https://plotly.com/python/reference/layout/geo/#layout-geo-projection-type

# fig = go.Figure()

# fig.add_trace(go.Scattergeo(
#     locationmode = 'country names',
#     locations = dftrade_mx_xq['Reporter Countries'],
#     hoverinfo = 'text',
# #     # string concatenation in pandas for hover text
# #     # also a <br> within quotes can put that data on a new line in the hover text optionally
# #     text = dftrade_mx_xq['Reporter Countries'].astype(str) + " exported " +  dftrade_mx_xq["Value"].astype(str) + " " + dftrade_mx_xq["Unit"].astype(str) + " of " + dftrade_mx_xq["Item"].astype(str) + " to " + dftrade_mx_xq["Partner Countries"].astype(str) + " in " + dftrade_mx_xq["Year"].astype(str),
#     text = dftrade_mx_xq["Item"]
#     mode = 'markers',
#     marker = dict(
#         size = 2,
#         color = 'rgb(255, 0, 0)',
#         line = dict(
#             width = 3,
#             color = 'rgba(68, 68, 68, 0)'
#         )
#     )))

# fig.add_trace(
#     go.Scattergeo(
#         locationmode = 'country names',
# #         hoverinfo = 'text',
# #         text = dftrade_mx_xq['Item'],
#         mode = 'lines',
#         line = dict(width = 1,color = 'red'),
#         opacity = 0.5
#     )
# )

# fig.update_layout(
#     title_text = 'Food Trade<br>(Hover for item exported)',
#     showlegend = False,
#     geo = go.layout.Geo(
#         scope = 'world',
#         projection_type = 'winkel tripel',
#         showland = True,
#         landcolor = 'rgb(243, 243, 243)',
#         countrycolor = 'rgb(204, 204, 204)',
#     ),
#     height=700,
# )

# fig.show()

In [None]:
# try with gdf2flatsurface which I have lat long values for
fig = go.Figure()

fig.add_trace(go.Scattergeo(
    locationmode = 'country names',
    locations = dftrade_mx_xq['Reporter Countries'],
    hoverinfo = 'text',
#     # string concatenation in pandas for hover text
#     # also a <br> within quotes can put that data on a new line in the hover text optionally
#     text = dftrade_mx_xq['Reporter Countries'].astype(str) + " exported " +  dftrade_mx_xq["Value"].astype(str) + " " + dftrade_mx_xq["Unit"].astype(str) + " of " + dftrade_mx_xq["Item"].astype(str) + " to " + dftrade_mx_xq["Partner Countries"].astype(str) + " in " + dftrade_mx_xq["Year"].astype(str),
    text = dftrade_mx_xq["Item"]
    mode = 'markers',
    marker = dict(
        size = 2,
        color = 'rgb(255, 0, 0)',
        line = dict(
            width = 3,
            color = 'rgba(68, 68, 68, 0)'
        )
    )))

fig.add_trace(
    go.Scattergeo(
        locationmode = 'country names',
#         hoverinfo = 'text',
#         text = dftrade_mx_xq['Item'],
        mode = 'lines',
        line = dict(width = 1,color = 'red'),
        opacity = 0.5
    )
)

fig.update_layout(
    title_text = 'Food Trade<br>(Hover for item exported)',
    showlegend = False,
    geo = go.layout.Geo(
        scope = 'world',
        projection_type = 'winkel tripel',
        showland = True,
        landcolor = 'rgb(243, 243, 243)',
        countrycolor = 'rgb(204, 204, 204)',
    ),
    height=700,
)

fig.show()

In [151]:
# run through a standalone (within this single cell) test with Dash 
# for a web app to build outside of jupyter notebook
import plotly.graph_objects as go # or plotly.express as px
fig = go.Figure() # or any Plotly Express function e.g. px.bar(...)
fig.add_trace( ... )
fig.update_layout( ... )

import dash
import dash_core_components as dcc
import dash_html_components as html

app = dash.Dash()
app.layout = html.Div([
    dcc.Graph(figure=fig)
])

app.run_server(debug=True, use_reloader=False)  # Turn off reloader if inside Jupyter