***
# Manure Applied (Raw Data Processing)
Capstone Project - Ali Sehpar Shikoh
***

<b> Previous Notebook: Livestock-RAW

<b> Next Notebook: Population-RAW

This is the seventh notebook of the project and deals with the cleaning of dataset related to 'Manure Applied', i.e. a feature that could be affecting crop yield.

Manure is the decomposed form of dead plants and animals [1]. Its applications support crop production, maintain soil fertility, and recycle locally available nutrients in cold humid temperate regions, consistent with the principles of sustainable agriculture. Manure may be an excellent nitrogen (N) fertilizer for crops if it provides plant-available N as ammonium and from organic N mineralization in synchrony with crop N demands [2]. This notebook deals with the processing of data related to the manure applied to the soil.

### Exploratory Data Analysis

Importing pandas library.

In [1]:
import pandas as pd

Importing csv file in dataframe called Manure_df1 and looking at the imported dataset.

In [2]:
Manure_df1 = pd.read_csv('DataFiles/01-RawDataFiles/ManureApplied-RAW/ManureApplied-RAW.csv')
Manure_df1

Unnamed: 0,Domain Code,Domain,Area Code (FAO),Area,Element Code,Element,Item Code,Item,Year Code,Year,Source Code,Source,Unit,Value,Flag,Flag Description,Note
0,GU,Manure applied to Soils,2,Afghanistan,72381,Manure applied to soils (N content),1755,All Animals,1961,1961,3050,FAO TIER 1,kg,5.624520e+07,A,"Aggregate, may include official, semi-official...",
1,GU,Manure applied to Soils,2,Afghanistan,723812,Manure applied to soils that leaches (N content),1755,All Animals,1961,1961,3050,FAO TIER 1,kg,1.687356e+07,A,"Aggregate, may include official, semi-official...",
2,GU,Manure applied to Soils,2,Afghanistan,723811,Manure applied to soils that volatilises (N co...,1755,All Animals,1961,1961,3050,FAO TIER 1,kg,1.124904e+07,A,"Aggregate, may include official, semi-official...",
3,GU,Manure applied to Soils,2,Afghanistan,72381,Manure applied to soils (N content),1755,All Animals,1962,1962,3050,FAO TIER 1,kg,5.704647e+07,A,"Aggregate, may include official, semi-official...",
4,GU,Manure applied to Soils,2,Afghanistan,723812,Manure applied to soils that leaches (N content),1755,All Animals,1962,1962,3050,FAO TIER 1,kg,1.711394e+07,A,"Aggregate, may include official, semi-official...",
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
33277,GU,Manure applied to Soils,181,Zimbabwe,723812,Manure applied to soils that leaches (N content),1755,All Animals,2018,2018,3050,FAO TIER 1,kg,4.073850e+06,A,"Aggregate, may include official, semi-official...",
33278,GU,Manure applied to Soils,181,Zimbabwe,723811,Manure applied to soils that volatilises (N co...,1755,All Animals,2018,2018,3050,FAO TIER 1,kg,2.715900e+06,A,"Aggregate, may include official, semi-official...",
33279,GU,Manure applied to Soils,181,Zimbabwe,72381,Manure applied to soils (N content),1755,All Animals,2019,2019,3050,FAO TIER 1,kg,1.483084e+07,A,"Aggregate, may include official, semi-official...",
33280,GU,Manure applied to Soils,181,Zimbabwe,723812,Manure applied to soils that leaches (N content),1755,All Animals,2019,2019,3050,FAO TIER 1,kg,4.449253e+06,A,"Aggregate, may include official, semi-official...",


Looking at the columns of the dataframe.

In [3]:
Manure_df1.columns

Index(['Domain Code', 'Domain', 'Area Code (FAO)', 'Area', 'Element Code',
       'Element', 'Item Code', 'Item', 'Year Code', 'Year', 'Source Code',
       'Source', 'Unit', 'Value', 'Flag', 'Flag Description', 'Note'],
      dtype='object')

As seen, there are a total of 17 columns present in the dataset.

Checking for unique categories in 'Domain' and 'Element' columns.

In [4]:
Manure_df1['Domain'].unique()

array(['Manure applied to Soils'], dtype=object)

In [5]:
Manure_df1['Element'].unique()

array(['Manure applied to soils (N content)',
       'Manure applied to soils that leaches (N content)',
       'Manure applied to soils that volatilises (N content)'],
      dtype=object)

As seen, in case of 'Domain' column these is only one categorical value, thus making this column of less value. On the other hand, there are three data categories in the 'Element' column. Among the three categories, only 'Manure applied to soils (N content)' is of interest.

Looking at the 'Item' column.

In [6]:
Manure_df1['Item'].unique()

array(['All Animals'], dtype=object)

Similar to the 'Domain' column, the 'Item' column contains only one categorical value, thus making this column of less interest.

### Refined Dataset Creation and Exportation

Filtering the dataframe on the basis of 'Element' column with values related to 'Manure applied to soils (N content)' category filtered only.

In [7]:
Manure_df2 = Manure_df1.loc[(Manure_df1['Element'] == 'Manure applied to soils (N content)')]
Manure_df2.head(2)

Unnamed: 0,Domain Code,Domain,Area Code (FAO),Area,Element Code,Element,Item Code,Item,Year Code,Year,Source Code,Source,Unit,Value,Flag,Flag Description,Note
0,GU,Manure applied to Soils,2,Afghanistan,72381,Manure applied to soils (N content),1755,All Animals,1961,1961,3050,FAO TIER 1,kg,56245200.0,A,"Aggregate, may include official, semi-official...",
3,GU,Manure applied to Soils,2,Afghanistan,72381,Manure applied to soils (N content),1755,All Animals,1962,1962,3050,FAO TIER 1,kg,57046470.0,A,"Aggregate, may include official, semi-official...",


Selecting columns of interest, filtering out the redundant columns and renaming key columns appropriately.

In [8]:
Manure_df3 = Manure_df2[['Area Code (FAO)', 'Area', 'Year', 'Value']]
Manure_df3.rename(columns = {'Value':'Manure applied to soil - N content (kg)', 'Area Code (FAO)':'Area Code'}, inplace = True)
Manure_df3

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  Manure_df3.rename(columns = {'Value':'Manure applied to soil - N content (kg)', 'Area Code (FAO)':'Area Code'}, inplace = True)


Unnamed: 0,Area Code,Area,Year,Manure applied to soil - N content (kg)
0,2,Afghanistan,1961,5.624520e+07
3,2,Afghanistan,1962,5.704647e+07
6,2,Afghanistan,1963,5.859424e+07
9,2,Afghanistan,1964,5.980591e+07
12,2,Afghanistan,1965,6.162054e+07
...,...,...,...,...
33267,181,Zimbabwe,2015,1.520726e+07
33270,181,Zimbabwe,2016,1.655674e+07
33273,181,Zimbabwe,2017,1.395568e+07
33276,181,Zimbabwe,2018,1.357950e+07


Exporting the refined dataset to a folder containing refined/filtered data and working files.

In [9]:
Manure_df3.to_csv(r'DataFiles/02-RefinedDataFiles/ManureApplied-REFINED.csv', index = False)

### Summary of things done in this notebook:

- Performed basic EDA.
- Discarded region based statistics by applying filter on the 'Area' column.
- Dropped redundant columns.
- Incorporated more information in selected column names.
- Exported the refined data to a CSV file.


### References

[1] “What Is Manure? - Definition, Types & Advantages of Manure.” BYJUS, https://byjus.com/biology/manure/. Accessed 1 Apr. 2022.

[2] Manure - an Overview | ScienceDirect Topics. https://www.sciencedirect.com/topics/agricultural-and-biological-sciences/manure. Accessed 1 Apr. 2022.