***
# Livestock (Raw Data Processing)
Capstone Project - Ali Sehpar Shikoh
***

<b> Previous Notebook: FertilizerUse-RAW

<b> Next Notebook: ManureApplied-RAW

This the sixth notebook of the project and deals with the processing of data related to livestock statistics per country on yearly basis. The number of livestock units might be indicative of climate changes and thus might be indirectly related to the agricultural yield.

### Exploratory Data Analysis

Importing pandas library.

In [2]:
import pandas as pd

Importing csv file in dataframe called LiveStock_df1.

In [3]:
LiveStock_df1 = pd.read_csv('DataFiles/01-RawDataFiles/Livestock-RAW/Livestock-RAW.csv')
LiveStock_df1

Unnamed: 0,Domain Code,Domain,Area Code (FAO),Area,Element Code,Element,Item Code,Item,Year Code,Year,Unit,Value,Flag,Flag Description
0,EK,Livestock Patterns,2,Afghanistan,7213,Livestock units per agricultural land area,1752,Major livestock types,1961,1961,LSU/ha,0.14,Fc,Calculated data
1,EK,Livestock Patterns,2,Afghanistan,7213,Livestock units per agricultural land area,1752,Major livestock types,1962,1962,LSU/ha,0.14,Fc,Calculated data
2,EK,Livestock Patterns,2,Afghanistan,7213,Livestock units per agricultural land area,1752,Major livestock types,1963,1963,LSU/ha,0.14,Fc,Calculated data
3,EK,Livestock Patterns,2,Afghanistan,7213,Livestock units per agricultural land area,1752,Major livestock types,1964,1964,LSU/ha,0.15,Fc,Calculated data
4,EK,Livestock Patterns,2,Afghanistan,7213,Livestock units per agricultural land area,1752,Major livestock types,1965,1965,LSU/ha,0.15,Fc,Calculated data
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
22020,EK,Livestock Patterns,181,Zimbabwe,5118,Stocks,1752,Major livestock types,2015,2015,Livestock units (LSU),3529759.20,A,"Aggregate, may include official, semi-official..."
22021,EK,Livestock Patterns,181,Zimbabwe,5118,Stocks,1752,Major livestock types,2016,2016,Livestock units (LSU),3557201.40,A,"Aggregate, may include official, semi-official..."
22022,EK,Livestock Patterns,181,Zimbabwe,5118,Stocks,1752,Major livestock types,2017,2017,Livestock units (LSU),3518697.90,A,"Aggregate, may include official, semi-official..."
22023,EK,Livestock Patterns,181,Zimbabwe,5118,Stocks,1752,Major livestock types,2018,2018,Livestock units (LSU),3556852.90,A,"Aggregate, may include official, semi-official..."


Looking at individual columns.

In [4]:
LiveStock_df1.columns

Index(['Domain Code', 'Domain', 'Area Code (FAO)', 'Area', 'Element Code',
       'Element', 'Item Code', 'Item', 'Year Code', 'Year', 'Unit', 'Value',
       'Flag', 'Flag Description'],
      dtype='object')

As seen there are a total of 14 columns.

Looking at 'Item' and 'Domain' columns.

In [5]:
LiveStock_df1['Item'].unique()

array(['Major livestock types'], dtype=object)

In [6]:
LiveStock_df1['Domain'].unique()

array(['Livestock Patterns'], dtype=object)

As seen above the 'Item' and 'Domain' columns only contain one categorical value i.e. 'Major livestock types' and 'Livestock Patterns'. Thus the information incorporated in these columns can easily be incorporated within other columns if required.

Looking at the 'Flag' column.

In [7]:
LiveStock_df1['Flag'].unique()

array(['Fc', 'A'], dtype=object)

As seen the 'Flag' column contains two flag types i.e. Fc and A. It is recommended not to remove any of the flags as various methodologies have been used to fill in the gaps within the dataset and minimize the null values. This will prove to be beneficial in making the data as extensive as possible.

Looking at the 'Element' column.

In [8]:
LiveStock_df1['Element'].unique()

array(['Livestock units per agricultural land area', 'Stocks'],
      dtype=object)

As seen, the 'Element' column contains two different categories, i.e., 'Stocks' and 'Livestock units per agricultural land area'. Since both of the categories mentioned are somewhat similar, therefore, we will only be keeping 'Stocks' category for further analysis.

### Refined Dataset Creation and Exportation

Filtering the dataset based 'Element' column where 'Element' is equal to 'Stocks' category. Dropping down redundant columns like 'Units', 'Flags', etc. Subsequently renaming the retained columns to convey more useful information.

In [9]:
LiveStock_df2 = LiveStock_df1.loc[(LiveStock_df1['Element'] == 'Stocks')]
LiveStock_df2 = LiveStock_df2.drop(['Domain', 'Domain Code', 'Element', 'Element Code', 'Item', 'Item Code', 'Year Code', 'Unit', 'Flag', 'Flag Description'], 1)
LiveStock_df2.rename(columns = {'Area Code (FAO)':'Area Code', 'Value':'Livestock units'}, inplace = True)
LiveStock_df2

  LiveStock_df2 = LiveStock_df2.drop(['Domain', 'Domain Code', 'Element', 'Element Code', 'Item', 'Item Code', 'Year Code', 'Unit', 'Flag', 'Flag Description'], 1)


Unnamed: 0,Area Code,Area,Year,Livestock units
59,2,Afghanistan,1961,5257236.40
60,2,Afghanistan,1962,5259200.50
61,2,Afghanistan,1963,5471475.75
62,2,Afghanistan,1964,5597970.00
63,2,Afghanistan,1965,5734080.00
...,...,...,...,...
22020,181,Zimbabwe,2015,3529759.20
22021,181,Zimbabwe,2016,3557201.40
22022,181,Zimbabwe,2017,3518697.90
22023,181,Zimbabwe,2018,3556852.90


Exporting the refined dataset to a folder containing refined/filtered data and working files.

In [10]:
LiveStock_df2.to_csv(r'DataFiles/02-RefinedDataFiles/Livestock-REFINED.csv', index = False)

### Summary of things done in this notebook:

- Performed basic EDA.
- Discarded region based statistics by applying filter on the 'Area' column.
- Dropped redundant columns.
- Incorporated more information in selected column names.
- Exported the refined data to a CSV file.
