# Overview
According to The World Bank, India is a global agricultural powerhouse. It is the world's largest producer of milk, pulses, and spices, and has the world's largest cattle herd (buffaloes), as well as the largest area under wheat, rice and cotton. It is the second largest producer of rice, wheat, cotton, sugarcane, farmed fish, sheep & goat meat, fruit, vegetables and tea. While agriculture’s share in India’s economy has progressively declined to less than 15% due to the high growth rates of the industrial and services sectors, the sector’s importance in India’s economic and social fabric goes well beyond this indicator.

# Objective 
Let us analyze the Indian Agriculture crop production for the data collected from 1997 to 2022. Let us ask interesting questions on existing data, get production and area statistics and understand more on the Indian Agriculture history for crop production. 

# Source of Data
Data is open available on [Kaggle](https://www.kaggle.com/pyatakov/india-agriculture-crop-production) which is made available [Ministry of Agriculture and Farmers Welfare of India](https://aps.dac.gov.in/Home.aspx?ReturnUrl=%2f)
 

# Import required libraries
- Import required analysis libraries
- Import data in pandas dataframe

In [None]:
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns 

In [None]:
agri_df = pd.read_csv("../input/india-agriculture-crop-production/India Agriculture Crop Production.csv")
agri_df

From above output, we can see we have following columns - 
- State - Indian state for which record exists 
- District - District of specific Indian state in record 
- Crop - Crop for which production is recorded 
- Year - Year of specific record 
- Season - Agriculture season. There are many agriculture seasons in India. One can get more details using this [link](https://studynlearn.com/blog/cropping-seasons/#:~:text=India%20has%20three%20cropping%20seasons%20%E2%80%94%20Rabi%2C%20Kharif%2C%20and%20Zaid.)
- Production - Production count example - 100 KG of wheat produced 
- Production Units - Tonnes (measurement 100 KG = 1 Ton)
- Yeild - Crop yields are the harvested production per unit of harvested area for crop products.Visit this [link](https://data.oecd.org/agroutput/crop-production.htm) for more details


# Understanding existing data 
Let us undertand the historical data with descriptive analytics

In [None]:
agri_df.info()

There is no missing data, as shown in the above output. 

In [None]:
agri_df.describe().T

# Exploratory data analysis 
Let us get into the details with exploration of data to answer various questions. Let us place those general questions we want to get an answer from the data. Let us get some basic details like crops which are their in our records, states and districts we have in our records, etc. 

### Unique count and values for Crops, states and districts 

In [None]:
unique_crop_list = agri_df["Crop"].unique()
print("Total number of unique crops - ", len(unique_crop_list))
print("\nWe have following unique crops in the dataset - \n", unique_crop_list)

In [None]:
unique_states = agri_df["State"].unique()
print("Total number of states and union territories found in records - ", len(unique_states))
print("\n Name of unique states and union territories in the record dataset -\n", unique_states)

In [None]:
unique_districts = agri_df["District"].unique()
print("Total number of districts found in records - ", len(unique_districts))

## Dealing with various units of production 
We can observe a column named Production units which is a mesurement of crop production. We need to standardize the units to one specific unit to do proper measurement. Let us get the units we have in our dataset

In [None]:
unique_units = list(agri_df["Production Units"].unique())
print(unique_units)

As per https://www.convertunits.com/ we have 1 Ton = 4.5929637955182825 Bale which is a US standard of measurement. Let us convert all Bales to Tonnes. To do that we will divide all Bales with 4.59 to convert it into Tones 

Nuts is a unit for Coconut and 1 Coconut = 1 Nut unit. As per [this resource]( https://www.howmuchisin.com/produce_converters/coconut#:~:text=One%20medium%20coconut%20weighs%201.5,diameter%20and%20weighs%202.3%20pounds) an average coconut weights 1.5 to 2 KG. Let us assume that one coconut weights 2 KG, we will have 50 coconuts in 1 Ton (100 KG).
To convert Nuts to Tons we will divide Cocnut production units by 50
**Note - This assumption may not be exactly correct, but we can take this as an assumption and go ahead with analysis, until we have an agriculture expert from Coconut farming to correct us :)** 

In [None]:
def unit_standardization(df):
    """
    Converts Nuts and Bales into Ton and standardize the unit of production for calculation purpose 
    """
    
    if df["Production Units"] == "Nuts":
        new_production = df["Production"] / 50 
        return new_production
        
    elif df["Production Units"] == "Tonnes":
        return df["Production"]
    
    else:
        new_production = df["Production"] / 4.59
        return new_production

        
agri_df["New Production"] = agri_df.apply(unit_standardization, axis = 1)
agri_df.sample(10)

We can now drop *Production and Production Units* as all our units are in Tonnes and New Production represents the standard production we calculated

In [None]:
agri_df.drop(columns = ["Production", "Production Units"], inplace = True)

## Which crop is the most common choice for agriculture in India ?

In [None]:
agri_df["Crop"].value_counts().head()

Having found in more than 21 K records, Rice seems to be the most popular choice for farmers in India, followed by Mazie and Moong.

## Get the statewise total production of crops 

In [None]:
total_production_list = []
for state in unique_states:
    total_crop = agri_df.loc[agri_df["State"] == state, "New Production"].sum()
    total_production_list.append(total_crop)
    

crop_production_df = pd.DataFrame({"State" : unique_states, 
             "Total Crop Production" : total_production_list})

In [None]:
crop_production_df.sort_values("Total Crop Production", ascending = False).head()

From above output we can conclude that, **Uttar Pradesh, Kerala, Tamil Nadu, Karnataka, and Maharashtra are the top 5 states with highest total crop production in total for years from 1997 to 2022**

## Which state is the highest producer of Rice for total production since 1997 ?

In [None]:
agri_rice_df = agri_df[agri_df["Crop"] == "Rice"]
agri_rice_df

In [None]:
#Create a dataframe with only Rice as a crop
agri_rice_df = agri_df[agri_df["Crop"] == "Rice"]

#Get unique list of states with rice production
unique_rice_growing_states = list(agri_rice_df["State"].unique())

#Summation of rice production records for each respective states 
total_rice_production = []
for state in unique_rice_growing_states:
    total_rice_crop = agri_rice_df.loc[agri_rice_df["State"] == state, "New Production"].sum()
    total_rice_production.append(total_rice_crop)
    

#Create a dataframe with required information
rice_crop_production_df = pd.DataFrame({"State" : unique_rice_growing_states, 
             "Total Rice Production" : total_rice_production})

#List the top 5 rice producing states 
rice_crop_production_df.sort_values("Total Rice Production", ascending = False).head()

From the above output we can see that **West Bengal, Uttar Pradesh, Punjab, Andhra Pradesh and Odisha are the top 5 states with max rice production** 

## Best year for Agriculture
Having maximum amount of crop production year out of all the records will be the best year for production. Let us calculate the same

In [None]:
unique_year_list = list(agri_df["Year"].unique())
print(unique_year_list)

In [None]:
yearly_production_list = []
for year in unique_year_list:
    total_yearly_production = agri_df.loc[agri_df["Year"] == year, "New Production"].sum()
    yearly_production_list.append(total_yearly_production)
    

yearly_production_df = pd.DataFrame({"year" : unique_year_list, 
                                   "total crop production" : yearly_production_list})


yearly_production_df.sort_values("total crop production", ascending = False).head()

From above output, we can see that year **2018-19, 2017-18, 2011-12, 2014-25, 2013-14 respectively had maximum crop production out of all years in the records**

In [None]:
yearly_production_df.sort_values("total crop production", ascending = True).head()

From the above output we can see that 1997-98 was the year with least production. This is possible due to lack of our capabilities to record the production of crops across India

# Summary
While there is a huge potential of analysis, let us summarize what we analyzed - 

|S.No|Summary Statement|Stats if any|
|------|------|-----|
|1|Rice is the major choice of farmers with 21,175 records found of rice production. Almost all states are into rice production|NA|
|2|Uttar Pradesh is the biggest producer of crops in India|Total 4,18,25,67,000 Tonnes of crop produced|
|3|2018-19 was the best year with maximum crop production in India| Total 1,14,93,77,000 Tonnes of crop was produced|
|4|1997-98 was the year with least production. This is potentially because of the lack of framework and mechanism to collect crop production data across the country| Total 55,91,10,900 Tonnes of crops were produced|
|5|West Bengal is the highest producer of Rice as per the records |Total of 33,89,84,869 Tonnes of Rice was produced in West Bengal|

# References
- [Blog on Indian Agriculture by The World Bank](https://www.worldbank.org/en/news/feature/2012/05/17/india-agriculture-issues-priorities#:~:text=India%20is%20a%20global%20agricultural,under%20wheat%2C%20rice%20and%20cotton.&text=In%20addition%2C%20forests%20cover%20some%2065m%20ha%20of%20India's%20land)
- [About The World Bank](https://www.worldbank.org/en/home)