####  World Agricultural Outlook (2000-2019)
##### (Part-1 / Exploratory Data Analysis)

Objective:

    1. This study was prepared using the OECD database.
    2. The date range was selected between 2000-2019(For 20 years).

    3. The aim of the study is to examine whether the world agriculture and livestock production is at a sufficient level according to the increasing population.

    4. In addition, it is the evaluation of the hypotheses regarding the correlation between global warming and cattle breeding with data.
    
    5. Another purpose is the comparison of states in agriculture and livestock production and consumption.

OECD Source Link: https://stats.oecd.org

#### Import Libraries

In [3]:
import pandas as pd
import numpy as np
import array as arr

import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

from scipy import stats

#### Read csv's and infos

In [27]:
wa = pd.read_csv("world_agricultore.csv")

In [84]:
wp = pd.read_csv("world_population_total_bysex.csv")

In [85]:
cy = pd.read_csv("country_year.csv")

In [86]:
wa.head(2)

Unnamed: 0,COUNTRY,Country,COMMODITY,Commodity,VARIABLE,Variable,TIME,Time,Value,Flag Codes,Flags
0,WLD,World,WT,Wheat,QP,"Production, kt (for Biofuels in millions of li...",2000,2000,585705.5797,,
1,WLD,World,WT,Wheat,QP,"Production, kt (for Biofuels in millions of li...",2001,2001,588610.4753,,


In [87]:
wp.head(2)

Unnamed: 0,LocID,Location,VarID,Variant,Time,MidPeriod,PopMale,PopFemale,PopTotal,PopDensity
0,4,Afghanistan,2,Medium,1950,1950.5,4099.243,3652.874,7752.117,11.874
1,4,Afghanistan,2,Medium,1951,1951.5,4134.756,3705.395,7840.151,12.009


In [88]:
cy.head(2)

Unnamed: 0,wa_all,wa_country,wa_org,wp_all,wp_country,wp_org,years,abbrv,meaning
0,World,Australia,World,World,Australia,World,2000.0,QP,Production
1,OECD,Canada,OECD,Organisation for Economic Co-operation and Dev...,Canada,Organisation for Economic Co-operation and Dev...,2001.0,IM,Imports


In [89]:
wa.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 197739 entries, 0 to 197738
Data columns (total 11 columns):
 #   Column      Non-Null Count   Dtype  
---  ------      --------------   -----  
 0   COUNTRY     197739 non-null  object 
 1   Country     197739 non-null  object 
 2   COMMODITY   197739 non-null  object 
 3   Commodity   197739 non-null  object 
 4   VARIABLE    197739 non-null  object 
 5   Variable    197739 non-null  object 
 6   TIME        197739 non-null  int64  
 7   Time        197739 non-null  int64  
 8   Value       197739 non-null  float64
 9   Flag Codes  0 non-null       float64
 10  Flags       0 non-null       float64
dtypes: float64(3), int64(2), object(6)
memory usage: 16.6+ MB


In [90]:
wp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 280932 entries, 0 to 280931
Data columns (total 10 columns):
 #   Column      Non-Null Count   Dtype  
---  ------      --------------   -----  
 0   LocID       280932 non-null  int64  
 1   Location    280932 non-null  object 
 2   VarID       280932 non-null  int64  
 3   Variant     280932 non-null  object 
 4   Time        280932 non-null  int64  
 5   MidPeriod   280932 non-null  float64
 6   PopMale     250876 non-null  float64
 7   PopFemale   250876 non-null  float64
 8   PopTotal    280932 non-null  float64
 9   PopDensity  280932 non-null  float64
dtypes: float64(5), int64(3), object(2)
memory usage: 21.4+ MB


In [91]:
cy.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 52 entries, 0 to 51
Data columns (total 9 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   wa_all      52 non-null     object 
 1   wa_country  41 non-null     object 
 2   wa_org      10 non-null     object 
 3   wp_all      52 non-null     object 
 4   wp_country  41 non-null     object 
 5   wp_org      10 non-null     object 
 6   years       20 non-null     float64
 7   abbrv       19 non-null     object 
 8   meaning     19 non-null     object 
dtypes: float64(1), object(8)
memory usage: 3.8+ KB


In [92]:
wa.shape

(197739, 11)

In [93]:
wp.shape

(280932, 10)

In [94]:
cy.shape

(52, 9)

In [95]:
wa.describe()

Unnamed: 0,TIME,Time,Value,Flag Codes,Flags
count,197739.0,197739.0,197739.0,0.0,0.0
mean,2009.592554,2009.592554,12759.99,,
std,5.770262,5.770262,99002.62,,
min,2000.0,2000.0,-102730.3,,
25%,2005.0,2005.0,1.4,,
50%,2010.0,2010.0,85.43534,,
75%,2015.0,2015.0,1550.98,,
max,2019.0,2019.0,8781220.0,,


In [96]:
wp.describe()

Unnamed: 0,LocID,VarID,Time,MidPeriod,PopMale,PopFemale,PopTotal,PopDensity
count,280932.0,280932.0,280932.0,280932.0,250876.0,250876.0,280932.0,280932.0
mean,703.125062,22.411345,2051.026494,2051.526494,232436.2,229875.4,412853.6,426.65904
std,631.959759,55.972847,33.642148,33.642148,693883.7,682982.5,1308911.0,2418.990784
min,4.0,2.0,1950.0,1950.5,6.812,6.889,0.151,0.052
25%,300.0,2.0,2031.0,2031.5,1847.834,1860.604,1240.1,34.20525
50%,586.0,5.0,2055.0,2055.5,11368.49,11645.85,14205.55,91.59
75%,903.0,9.0,2078.0,2078.5,85264.02,86488.54,117166.1,216.98575
max,5501.0,207.0,2100.0,2100.5,10920000.0,10712730.0,21632740.0,56025.839


In [97]:
cy.describe()

Unnamed: 0,years
count,20.0
mean,2009.5
std,5.91608
min,2000.0
25%,2004.75
50%,2009.5
75%,2014.25
max,2019.0


#### Creating Lists from cy dataset

In [98]:
cy.head(1)

Unnamed: 0,wa_all,wa_country,wa_org,wp_all,wp_country,wp_org,years,abbrv,meaning
0,World,Australia,World,World,Australia,World,2000.0,QP,Production


In [99]:
wa_all_list = list(cy.wa_all.unique())
wa_country_list = list(cy.wa_country.unique())
wa_org_list = list(cy.wa_org.unique())
wp_all_list = list(cy.wp_all.unique())
wp_country_list = list(cy.wp_country.unique())
wp_org_list = list(cy.wp_org.unique())
years_list = list(cy.years.unique())
abbr_list = list(cy.abbrv.unique())
mean_list = list(cy.meaning.unique())

In [100]:
wa_country_list.pop(-1)
wa_org_list.pop(-1)
wp_country_list.pop(-1)
wp_org_list.pop(-1)
years_list.pop(-1)
abbr_list.pop(-1)
mean_list.pop(-1)

nan

In [101]:
wp_org_list

['World',
 'Organisation for Economic Co-operation and Development (OECD)',
 'More developed regions',
 'ESCAP: ADB Developing member countries (DMCs)',
 'ESCAP: Least Developed Countries (LDCs)',
 'European Union (EU: 28)',
 'WHO: European Region (EURO)',
 'Africa',
 'Latin America and the Caribbean',
 'Asia']

#### Drop wa & wp Columns

In [102]:
wa.head(1)

Unnamed: 0,COUNTRY,Country,COMMODITY,Commodity,VARIABLE,Variable,TIME,Time,Value,Flag Codes,Flags
0,WLD,World,WT,Wheat,QP,"Production, kt (for Biofuels in millions of li...",2000,2000,585705.5797,,


In [103]:
wa.drop(columns=["Flag Codes","Flags","COUNTRY","COMMODITY","TIME"], inplace=True)

In [104]:
wa.head(1)

Unnamed: 0,Country,Commodity,VARIABLE,Variable,Time,Value
0,World,Wheat,QP,"Production, kt (for Biofuels in millions of li...",2000,585705.5797


In [105]:
wp.head(1)

Unnamed: 0,LocID,Location,VarID,Variant,Time,MidPeriod,PopMale,PopFemale,PopTotal,PopDensity
0,4,Afghanistan,2,Medium,1950,1950.5,4099.243,3652.874,7752.117,11.874


In [106]:
wp.drop(columns=["LocID","VarID","Variant","MidPeriod","PopDensity"], inplace=True)

In [107]:
wp.head(1)

Unnamed: 0,Location,Time,PopMale,PopFemale,PopTotal
0,Afghanistan,1950,4099.243,3652.874,7752.117


#### Rename the wa Columns

In [108]:
wa.rename(columns = {'VARIABLE':'Var_abbr', 'Variable':'Var_expl'}, inplace = True)
wp.rename(columns = {'Location':'Country'}, inplace = True)

#### Insert & Reordering Population Columns to wa 

In [109]:
wa.head(1)

Unnamed: 0,Country,Commodity,Var_abbr,Var_expl,Time,Value
0,World,Wheat,QP,"Production, kt (for Biofuels in millions of li...",2000,585705.5797


In [110]:
wa['Var_mean'] = wa['Var_abbr']
wa['PopTotal'] = wa['Country']
wa['PopMale'] = wa['Country']
wa['PopFemale'] = wa['Country']

In [111]:
wa.head(1)

Unnamed: 0,Country,Commodity,Var_abbr,Var_expl,Time,Value,Var_mean,PopTotal,PopMale,PopFemale
0,World,Wheat,QP,"Production, kt (for Biofuels in millions of li...",2000,585705.5797,QP,World,World,World


In [112]:
wa = wa.iloc[:, [0,1,2,6,3,4,5,7,8,9]]

In [113]:
wa.head(1)

Unnamed: 0,Country,Commodity,Var_abbr,Var_mean,Var_expl,Time,Value,PopTotal,PopMale,PopFemale
0,World,Wheat,QP,QP,"Production, kt (for Biofuels in millions of li...",2000,585705.5797,World,World,World


#### Reordering Columns wp

In [114]:
wp.head(1)

Unnamed: 0,Country,Time,PopMale,PopFemale,PopTotal
0,Afghanistan,1950,4099.243,3652.874,7752.117


In [115]:
wp = wp.iloc[:, [0,1,4,2,3]]

In [116]:
wp.head(1)

Unnamed: 0,Country,Time,PopTotal,PopMale,PopFemale
0,Afghanistan,1950,7752.117,4099.243,3652.874


#### Insert the Meanings of Var_abbr to Var_mean

In [117]:
wa.head(2)

Unnamed: 0,Country,Commodity,Var_abbr,Var_mean,Var_expl,Time,Value,PopTotal,PopMale,PopFemale
0,World,Wheat,QP,QP,"Production, kt (for Biofuels in millions of li...",2000,585705.5797,World,World,World
1,World,Wheat,QP,QP,"Production, kt (for Biofuels in millions of li...",2001,588610.4753,World,World,World


In [118]:
var_mean_dict = dict(zip(abbr_list, mean_list))
var_mean_dict

{'QP': 'Production',
 'IM': 'Imports',
 'QC': 'Consumption',
 'ST': 'Ending_stocks',
 'EX': 'Exports',
 'NT': 'Trade_balance',
 'AH': 'Area_harvested',
 'FE': 'Feed',
 'FO': 'Food',
 'BF': 'Biofuel_use',
 'OU': 'Other_use',
 'YLD': 'Yield',
 'XP': 'World_Price',
 'PC': 'Consumption_per_cap',
 'PP': 'Producer_price',
 'CR': 'Crush',
 'CI': 'Cow_inventory',
 'QP__BME': 'Ethanol_Bio',
 'QP__BMD': 'Biodiesel_Bio'}

In [119]:
wa["Var_mean"].replace(var_mean_dict, inplace=True)

In [120]:
wa.head(2)

Unnamed: 0,Country,Commodity,Var_abbr,Var_mean,Var_expl,Time,Value,PopTotal,PopMale,PopFemale
0,World,Wheat,QP,Production,"Production, kt (for Biofuels in millions of li...",2000,585705.5797,World,World,World
1,World,Wheat,QP,Production,"Production, kt (for Biofuels in millions of li...",2001,588610.4753,World,World,World


#### Drop the wp values before the year 2000 

In [121]:
wp = wp[wp['Time'] >= 2000] 

In [122]:
wp.head(1)

Unnamed: 0,Country,Time,PopTotal,PopMale,PopFemale
50,Afghanistan,2000,20779.957,10689.508,10090.449


#### Select only the concurrent all country names from wp to wp1

In [123]:
wp.Country.unique()

array(['Afghanistan', 'Africa', 'African Group', 'African Union',
       'African Union: Central Africa', 'African Union: Eastern Africa',
       'African Union: Northern Africa', 'African Union: Southern Africa',
       'African Union: Western Africa',
       'African, Caribbean and Pacific (ACP) Group of States', 'Albania',
       'Algeria', 'American Samoa', 'Andean Community', 'Andorra',
       'Angola', 'Anguilla', 'Antigua and Barbuda', 'Argentina',
       'Armenia', 'Aruba', 'Asia',
       'Asia-Pacific Economic Cooperation (APEC)', 'Asia-Pacific Group',
       'Association of Southeast Asian Nations (ASEAN)', 'Australia',
       'Australia/New Zealand', 'Austria', 'Azerbaijan', 'BRIC', 'BRICS',
       'Bahamas', 'Bahrain', 'Bangladesh', 'Barbados', 'Belarus',
       'Belgium', 'Belize', 'Belt-Road Initiative (BRI)',
       'Belt-Road Initiative: Africa', 'Belt-Road Initiative: Asia',
       'Belt-Road Initiative: Europe',
       'Belt-Road Initiative: Latin America and the Cari

In [124]:
wp_all_list

['World',
 'Organisation for Economic Co-operation and Development (OECD)',
 'More developed regions',
 'ESCAP: ADB Developing member countries (DMCs)',
 'ESCAP: Least Developed Countries (LDCs)',
 'Australia',
 'Canada',
 'Chile',
 'European Union (EU: 28)',
 'Israel',
 'Japan',
 "Dem. People's Republic of Korea",
 'Mexico',
 'New Zealand',
 'Turkey',
 'United States of America',
 'WHO: European Region (EURO)',
 'Kazakhstan',
 'Russian Federation',
 'Ukraine',
 'Africa',
 'Algeria',
 'Egypt',
 'Ethiopia',
 'Ghana',
 'Mozambique',
 'Nigeria',
 'Sub-Saharan Africa',
 'South Africa',
 'Sudan',
 'United Republic of Tanzania',
 'Zambia',
 'Latin America and the Caribbean',
 'Argentina',
 'Brazil',
 'Colombia',
 'Haiti',
 'Peru',
 'Paraguay',
 'Uruguay',
 'Asia',
 'Bangladesh',
 'China',
 'India',
 'Indonesia',
 'Iran (Islamic Republic of)',
 'Malaysia',
 'Pakistan',
 'Philippines',
 'Saudi Arabia',
 'Thailand',
 'Viet Nam']

In [125]:
wp.head(1)

Unnamed: 0,Country,Time,PopTotal,PopMale,PopFemale
50,Afghanistan,2000,20779.957,10689.508,10090.449


In [126]:
mask = wp.Country.isin(wp_all_list)

In [127]:
wp[mask].sample(5)

Unnamed: 0,Country,Time,PopTotal,PopMale,PopFemale
198325,Philippines,2092,197675.167,97407.498,100267.669
237760,Sub-Saharan Africa,2060,2812687.255,1405158.817,1407528.438
277577,World,2039,9460502.153,4763627.207,4696874.946
269917,Viet Nam,2029,101616.102,50629.531,50986.571
169647,Mozambique,2081,130397.599,64028.999,66368.6


In [128]:
wp1 = wp.loc[mask, ["Country","Time","PopTotal","PopMale","PopFemale"]]

In [129]:
wp1.sample(5)

Unnamed: 0,Country,Time,PopTotal,PopMale,PopFemale
116618,India,2044,1649400.593,852815.305,796585.288
19788,Bangladesh,2075,135430.293,66524.251,68889.487
3968,Algeria,2058,64321.49,32496.065,31825.425
12017,Asia,2000,3741263.352,1912363.401,1828899.951
48034,China,2079,1510046.121,776768.49,733277.631


In [130]:
wp1[wp1.Country=="African, Caribbean and Pacific (ACP) Group of States"]

Unnamed: 0,Country,Time,PopTotal,PopMale,PopFemale


In [131]:
wp.shape

(257082, 5)

In [132]:
wp1.shape

(40537, 5)

In [133]:
wp1.Country.nunique()

52

#### Change the concurrent all country names as wa at wp1 

In [134]:
wp1.sample(5)

Unnamed: 0,Country,Time,PopTotal,PopMale,PopFemale
159875,Mexico,2070,167987.914,82621.674,85366.24
150722,Malaysia,2081,49005.466,25027.815,23977.651
237811,Sub-Saharan Africa,2030,1370019.451,684353.664,685665.787
118238,Iran (Islamic Republic of),2058,99053.614,49743.595,49310.019
192702,Pakistan,2097,924970.358,473382.398,451587.96


In [135]:
len(wp_all_list)

52

In [136]:
len(wa_all_list)

52

In [137]:
wp_all_dict = dict(zip(wp_all_list, wa_all_list))
wp_all_dict

{'World': 'World',
 'Organisation for Economic Co-operation and Development (OECD)': 'OECD',
 'More developed regions': 'Developed',
 'ESCAP: ADB Developing member countries (DMCs)': 'Developing',
 'ESCAP: Least Developed Countries (LDCs)': 'Least Developed Countries',
 'Australia': 'Australia',
 'Canada': 'Canada',
 'Chile': 'Chile',
 'European Union (EU: 28)': 'European Union-27',
 'Israel': 'Israel',
 'Japan': 'Japan',
 "Dem. People's Republic of Korea": 'Korea',
 'Mexico': 'Mexico',
 'New Zealand': 'New Zealand',
 'Turkey': 'Turkey',
 'United States of America': 'United States',
 'WHO: European Region (EURO)': 'EUROPE',
 'Kazakhstan': 'Kazakhstan',
 'Russian Federation': 'Russia',
 'Ukraine': 'Ukraine',
 'Africa': 'AFRICA',
 'Algeria': 'Algeria',
 'Egypt': 'Egypt',
 'Ethiopia': 'Ethiopia',
 'Ghana': 'Ghana',
 'Mozambique': 'Mozambique',
 'Nigeria': 'Nigeria',
 'Sub-Saharan Africa': 'Sub Saharan Africa',
 'South Africa': 'Republic of South Africa',
 'Sudan': 'Sudan',
 'United Republ

In [138]:
wp1[wp1.Country=="Organisation for Economic Co-operation and Development (OECD)"].head(2)

Unnamed: 0,Country,Time,PopTotal,PopMale,PopFemale
191607,Organisation for Economic Co-operation and Dev...,2000,1155286.565,566714.934,588571.631
191608,Organisation for Economic Co-operation and Dev...,2001,1163475.156,570810.829,592664.327


In [139]:
wp1["Country"].replace(wp_all_dict, inplace=True)

In [140]:
wp1[wp1.Country=="Organisation for Economic Co-operation and Development (OECD)"].head(2)

Unnamed: 0,Country,Time,PopTotal,PopMale,PopFemale


#### Drop the "OECD countries" & "Non-OECD" values from wa

In [141]:
wa[wa["Country"]=="OECD countries"].head(5)


Unnamed: 0,Country,Commodity,Var_abbr,Var_mean,Var_expl,Time,Value,PopTotal,PopMale,PopFemale
1556,OECD countries,Wheat,QP,Production,"Production, kt (for Biofuels in millions of li...",2000,269144.4766,OECD countries,OECD countries,OECD countries
1557,OECD countries,Wheat,QP,Production,"Production, kt (for Biofuels in millions of li...",2001,249739.4752,OECD countries,OECD countries,OECD countries
1558,OECD countries,Wheat,QP,Production,"Production, kt (for Biofuels in millions of li...",2002,229837.7329,OECD countries,OECD countries,OECD countries
1559,OECD countries,Wheat,QP,Production,"Production, kt (for Biofuels in millions of li...",2003,249871.4523,OECD countries,OECD countries,OECD countries
1560,OECD countries,Wheat,QP,Production,"Production, kt (for Biofuels in millions of li...",2004,281502.1956,OECD countries,OECD countries,OECD countries


In [142]:
wa.drop(list(wa[wa["Country"]=="OECD countries"].index.values), inplace=True)

In [143]:
wa[wa["Country"]=="OECD countries"]

Unnamed: 0,Country,Commodity,Var_abbr,Var_mean,Var_expl,Time,Value,PopTotal,PopMale,PopFemale


In [144]:
wa[wa["Country"]=="Non-OECD"].head(5)


Unnamed: 0,Country,Commodity,Var_abbr,Var_mean,Var_expl,Time,Value,PopTotal,PopMale,PopFemale
528,Non-OECD,Wheat,QP,Production,"Production, kt (for Biofuels in millions of li...",2000,316561.1031,Non-OECD,Non-OECD,Non-OECD
529,Non-OECD,Wheat,QP,Production,"Production, kt (for Biofuels in millions of li...",2001,338871.0001,Non-OECD,Non-OECD,Non-OECD
530,Non-OECD,Wheat,QP,Production,"Production, kt (for Biofuels in millions of li...",2002,343659.8801,Non-OECD,Non-OECD,Non-OECD
531,Non-OECD,Wheat,QP,Production,"Production, kt (for Biofuels in millions of li...",2003,310399.0781,Non-OECD,Non-OECD,Non-OECD
532,Non-OECD,Wheat,QP,Production,"Production, kt (for Biofuels in millions of li...",2004,349839.3341,Non-OECD,Non-OECD,Non-OECD


In [145]:
wa.drop(list(wa[wa["Country"]=="Non-OECD"].index.values), inplace=True)

In [146]:
wa[wa["Country"]=="Non-OECD"]

Unnamed: 0,Country,Commodity,Var_abbr,Var_mean,Var_expl,Time,Value,PopTotal,PopMale,PopFemale


#### Drop duplicated values if present

In [147]:
wa.duplicated(keep="last").value_counts()

False    190770
dtype: int64

In [148]:
wa.drop_duplicates(keep="last", inplace=True)

In [149]:
wa.duplicated(keep="last").value_counts()

False    190770
dtype: int64

In [150]:
wa.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 190770 entries, 0 to 197738
Data columns (total 10 columns):
 #   Column     Non-Null Count   Dtype  
---  ------     --------------   -----  
 0   Country    190770 non-null  object 
 1   Commodity  190770 non-null  object 
 2   Var_abbr   190770 non-null  object 
 3   Var_mean   190770 non-null  object 
 4   Var_expl   190770 non-null  object 
 5   Time       190770 non-null  int64  
 6   Value      190770 non-null  float64
 7   PopTotal   190770 non-null  object 
 8   PopMale    190770 non-null  object 
 9   PopFemale  190770 non-null  object 
dtypes: float64(1), int64(1), object(8)
memory usage: 16.0+ MB


#### Drop null values if present

In [151]:
wa.isnull().value_counts().sum()

190770

#### Convert the types of population & year to int

In [152]:
wa.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 190770 entries, 0 to 197738
Data columns (total 10 columns):
 #   Column     Non-Null Count   Dtype  
---  ------     --------------   -----  
 0   Country    190770 non-null  object 
 1   Commodity  190770 non-null  object 
 2   Var_abbr   190770 non-null  object 
 3   Var_mean   190770 non-null  object 
 4   Var_expl   190770 non-null  object 
 5   Time       190770 non-null  int64  
 6   Value      190770 non-null  float64
 7   PopTotal   190770 non-null  object 
 8   PopMale    190770 non-null  object 
 9   PopFemale  190770 non-null  object 
dtypes: float64(1), int64(1), object(8)
memory usage: 16.0+ MB


In [153]:
wa['PopTotal'] = pd.to_numeric(wa['PopTotal'],errors='coerce')
wa['PopMale'] = pd.to_numeric(wa['PopMale'],errors='coerce')
wa['PopFemale'] = pd.to_numeric(wa['PopFemale'],errors='coerce')

In [154]:
cy['years'] = cy['years'].fillna(0).astype(np.int64)

In [155]:
wa.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 190770 entries, 0 to 197738
Data columns (total 10 columns):
 #   Column     Non-Null Count   Dtype  
---  ------     --------------   -----  
 0   Country    190770 non-null  object 
 1   Commodity  190770 non-null  object 
 2   Var_abbr   190770 non-null  object 
 3   Var_mean   190770 non-null  object 
 4   Var_expl   190770 non-null  object 
 5   Time       190770 non-null  int64  
 6   Value      190770 non-null  float64
 7   PopTotal   0 non-null       float64
 8   PopMale    0 non-null       float64
 9   PopFemale  0 non-null       float64
dtypes: float64(4), int64(1), object(5)
memory usage: 16.0+ MB


In [156]:
cy.dtypes

wa_all        object
wa_country    object
wa_org        object
wp_all        object
wp_country    object
wp_org        object
years          int64
abbrv         object
meaning       object
dtype: object

#### Assigning population values from wp1 to wa population columns

In [157]:
wp1[wp1["Country"]=="World"].head(2)

Unnamed: 0,Country,Time,PopTotal,PopMale,PopFemale
277295,World,2000,6143493.806,3093433.858,3050059.948
277296,World,2001,6222626.531,3133601.761,3089024.77


In [158]:
wa[wa["Country"]=="World"].head(2)

Unnamed: 0,Country,Commodity,Var_abbr,Var_mean,Var_expl,Time,Value,PopTotal,PopMale,PopFemale
0,World,Wheat,QP,Production,"Production, kt (for Biofuels in millions of li...",2000,585705.5797,,,
1,World,Wheat,QP,Production,"Production, kt (for Biofuels in millions of li...",2001,588610.4753,,,


In [159]:
wp1.Country.nunique()

52

In [160]:
wa.Country.nunique()

52

In [161]:
wp1_cntry_list = list(wp1.Country.unique())

In [162]:
def PopTotal_func (cntlist, yearlist):
    for a in cntlist:
        for b in yearlist:
            q = list(wp1[(wp1["Country"]==a) & (wp1["Time"]==b)].index.values)[0]
            p = wp1.loc[q, "PopTotal"]
            wa.loc[(wa["Country"]==a) & (wa["Time"]==b), "PopTotal"] = p

In [163]:
PopTotal_func(wp1_cntry_list,years_list)

In [164]:
wa[(wa.Country=="World") & (wa.Time==2006)].head(1)

Unnamed: 0,Country,Commodity,Var_abbr,Var_mean,Var_expl,Time,Value,PopTotal,PopMale,PopFemale
6,World,Wheat,QP,Production,"Production, kt (for Biofuels in millions of li...",2006,600783.4302,6623517.917,,


In [165]:
wp1[(wp1.Country=="World") & (wp1.Time==2006)].head(1)

Unnamed: 0,Country,Time,PopTotal,PopMale,PopFemale
277301,World,2006,6623517.917,3338132.929,3285384.988


In [166]:
def PopMale_func (cntlist, yearlist):
    for a in cntlist:
        for b in yearlist:
            q = list(wp1[(wp1["Country"]==a) & (wp1["Time"]==b)].index.values)[0]
            p = wp1.loc[q, "PopMale"]
            wa.loc[(wa["Country"]==a) & (wa["Time"]==b), "PopMale"] = p

In [167]:
PopMale_func(wp1_cntry_list,years_list)

In [168]:
wa[(wa.Country=="World") & (wa.Time==2006)].head(1)

Unnamed: 0,Country,Commodity,Var_abbr,Var_mean,Var_expl,Time,Value,PopTotal,PopMale,PopFemale
6,World,Wheat,QP,Production,"Production, kt (for Biofuels in millions of li...",2006,600783.4302,6623517.917,3338132.929,


In [169]:
wp1[(wp1.Country=="World") & (wp1.Time==2006)].head(1)

Unnamed: 0,Country,Time,PopTotal,PopMale,PopFemale
277301,World,2006,6623517.917,3338132.929,3285384.988


In [170]:
def PopFemale_func (cntlist, yearlist):
    for a in cntlist:
        for b in yearlist:
            q = list(wp1[(wp1["Country"]==a) & (wp1["Time"]==b)].index.values)[0]
            p = wp1.loc[q, "PopFemale"]
            wa.loc[(wa["Country"]==a) & (wa["Time"]==b), "PopFemale"] = p

In [171]:
PopFemale_func(wp1_cntry_list,years_list)

In [172]:
wa[(wa.Country=="World") & (wa.Time==2006)].head(1)

Unnamed: 0,Country,Commodity,Var_abbr,Var_mean,Var_expl,Time,Value,PopTotal,PopMale,PopFemale
6,World,Wheat,QP,Production,"Production, kt (for Biofuels in millions of li...",2006,600783.4302,6623517.917,3338132.929,3285384.988


In [173]:
wp1[(wp1.Country=="World") & (wp1.Time==2006)].head(1)

Unnamed: 0,Country,Time,PopTotal,PopMale,PopFemale
277301,World,2006,6623517.917,3338132.929,3285384.988


#### Creating Organizations Dataset from wa

In [174]:
wa[wa["Country"]==("OECD countries", "Non-OECD")].head(3)

Unnamed: 0,Country,Commodity,Var_abbr,Var_mean,Var_expl,Time,Value,PopTotal,PopMale,PopFemale


In [175]:
wa_org_list

['World',
 'OECD',
 'Developed',
 'Developing',
 'Least Developed Countries',
 'European Union-27',
 'EUROPE',
 'AFRICA',
 'LATIN AMERICA AND CARIBBEAN',
 'ASIA']

In [176]:
Data_World = wa[wa["Country"]=="World"]
Data_OECD = wa[wa["Country"]=="OECD"]
Data_Developed = wa[wa["Country"]=="Developed"]
Data_Developing = wa[wa["Country"]=="Developing"]
Data_LDC = wa[wa["Country"]=="Least Developed Countries"]
Data_EU27 = wa[wa["Country"]=="European Union-27"]
Data_EU = wa[wa["Country"]=="EUROPE"]
Data_Africa = wa[wa["Country"]=="AFRICA"]
Data_Lat_Amer = wa[wa["Country"]=="LATIN AMERICA AND CARIBBEAN"]
Data_Asia = wa[wa["Country"]=="ASIA"]

In [177]:
Data_World.head(5)

Unnamed: 0,Country,Commodity,Var_abbr,Var_mean,Var_expl,Time,Value,PopTotal,PopMale,PopFemale
0,World,Wheat,QP,Production,"Production, kt (for Biofuels in millions of li...",2000,585705.5797,6143493.806,3093433.858,3050059.948
1,World,Wheat,QP,Production,"Production, kt (for Biofuels in millions of li...",2001,588610.4753,6222626.531,3133601.761,3089024.77
2,World,Wheat,QP,Production,"Production, kt (for Biofuels in millions of li...",2002,573497.613,6301773.172,3173900.449,3127872.723
3,World,Wheat,QP,Production,"Production, kt (for Biofuels in millions of li...",2003,560270.5304,6381185.141,3214422.031,3166763.11
4,World,Wheat,QP,Production,"Production, kt (for Biofuels in millions of li...",2004,631341.5297,6461159.391,3255262.626,3205896.765


#### Creating Country_Only Dataset from wa as wa1

In [178]:
wa = wa.set_index("Country")

In [179]:
wa.columns

Index(['Commodity', 'Var_abbr', 'Var_mean', 'Var_expl', 'Time', 'Value',
       'PopTotal', 'PopMale', 'PopFemale'],
      dtype='object')

In [180]:
wa1 = wa.copy()

In [181]:
wa_org_list

['World',
 'OECD',
 'Developed',
 'Developing',
 'Least Developed Countries',
 'European Union-27',
 'EUROPE',
 'AFRICA',
 'LATIN AMERICA AND CARIBBEAN',
 'ASIA']

In [182]:
wa1.drop(index=wa_org_list, inplace=True)

In [183]:
wa1.sample(5)

Unnamed: 0_level_0,Commodity,Var_abbr,Var_mean,Var_expl,Time,Value,PopTotal,PopMale,PopFemale
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Japan,Vegetable oils,QP,Production,"Production, kt (for Biofuels in millions of li...",2017,1440.273359,127502.728,62282.219,65220.509
Peru,Skim milk powder (pw),EX,Exports,"Exports, kt (for Biofuels in millions of litres)",2009,0.033,28792.663,14361.216,14431.447
Peru,Coarse grains,NT,Trade_balance,"Trade balance, kt (for Biofuels in millions of...",2019,-2782.60035,32510.462,16148.241,16362.221
Turkey,Protein meals,NT,Trade_balance,"Trade balance, kt (for Biofuels in millions of...",2012,-943.995116,74651.046,36708.592,37942.454
Uruguay,Sheepmeat(cwe),EX,Exports,"Exports, kt (for Biofuels in millions of litres)",2010,15.183343,3359.273,1618.886,1740.387


In [184]:
wa1.reset_index(inplace=True)

In [185]:
wa1.head(3)

Unnamed: 0,Country,Commodity,Var_abbr,Var_mean,Var_expl,Time,Value,PopTotal,PopMale,PopFemale
0,Australia,Wheat,QP,Production,"Production, kt (for Biofuels in millions of li...",2000,22108.0,18991.434,9473.544,9517.89
1,Australia,Wheat,QP,Production,"Production, kt (for Biofuels in millions of li...",2001,24298.0,19194.676,9577.028,9617.648
2,Australia,Wheat,QP,Production,"Production, kt (for Biofuels in millions of li...",2002,10132.0,19401.366,9680.332,9721.034


In [186]:
wa.reset_index(inplace=True)

In [187]:
wa.head(3)

Unnamed: 0,Country,Commodity,Var_abbr,Var_mean,Var_expl,Time,Value,PopTotal,PopMale,PopFemale
0,World,Wheat,QP,Production,"Production, kt (for Biofuels in millions of li...",2000,585705.5797,6143493.806,3093433.858,3050059.948
1,World,Wheat,QP,Production,"Production, kt (for Biofuels in millions of li...",2001,588610.4753,6222626.531,3133601.761,3089024.77
2,World,Wheat,QP,Production,"Production, kt (for Biofuels in millions of li...",2002,573497.613,6301773.172,3173900.449,3127872.723


In [188]:
wa1[wa1["Country"]=="Australia"].head(3)

Unnamed: 0,Country,Commodity,Var_abbr,Var_mean,Var_expl,Time,Value,PopTotal,PopMale,PopFemale
0,Australia,Wheat,QP,Production,"Production, kt (for Biofuels in millions of li...",2000,22108.0,18991.434,9473.544,9517.89
1,Australia,Wheat,QP,Production,"Production, kt (for Biofuels in millions of li...",2001,24298.0,19194.676,9577.028,9617.648
2,Australia,Wheat,QP,Production,"Production, kt (for Biofuels in millions of li...",2002,10132.0,19401.366,9680.332,9721.034


In [189]:
wa1[wa1["Country"]==("World", "OECD", "Developed")].head(3)

Unnamed: 0,Country,Commodity,Var_abbr,Var_mean,Var_expl,Time,Value,PopTotal,PopMale,PopFemale


#### Download the last versions of wa & wa1 & wp & wp1 & cy

In [190]:
wa.to_csv("wa_for_visualization.csv")
wa1.to_csv("wa1_for_visualization.csv")
wp.to_csv("wp_for_visualization.csv")
wp1.to_csv("wp1_for_visualization.csv")
cy.to_csv("cy_for_visualization.csv")

#### Useful Codes

In [191]:
df['IQR'] = df.groupby(["Country", "Commodity", "VARIABLE"])['Value'].agg(iqr)

SyntaxError: unexpected EOF while parsing (4116460616.py, line 1)

In [None]:
(["sum","mean","median","mode","max","min","std","var","",""]).

<pandas.core.groupby.generic.SeriesGroupBy object at 0x000002AD20A83700>

In [None]:
wp.Location.isin(Out_Cntry_List).sum()

884

In [None]:
Q1 = np.quantile(df.Value, 0.25)
Q3 = np.quantile(df.Value, 0.75)
IQR = Q3-Q1
Q0 = Q1 - (IQR*1.5)
Q4 = Q3 + (IQR*1.5)

In [None]:
wa_country_list = [s.strip() for s in list(cy.wa_country.unique())]

In [None]:
country_list_wp = [s.strip() for s in list(cy.Country_wp.unique())]

In [None]:
df['column name'] = df['column name'].fillna(0).astype(np.int64)

In [None]:
cy.years = pd.to_numeric(cy.years,errors='coerce')

In [None]:
cy.years.round(0).astype(int)