## Welcome

This notebook covers EDA and visualisation and analysis of them in the state of Tamil Nadu in India.

The history of agriculture in India dates back to Indus Valley Civilization.India ranks second worldwide in farm outputs. As per 2018, agriculture employed more than 50℅ of the Indian work force and contributed 17–18% to country's GDP. According to latest report, agriculture is primary source of livelihood for 58% population in India

In 2016, agriculture and allied sectors like animal husbandry, forestry and fisheries accounted for 15.4% of the GDP (gross domestic product) with about 31% of the workforce in 2014. India ranks first in the world with highest net cropped area followed by US and China. The economic contribution of agriculture to India's GDP is steadily declining with the country's broad-based economic growth. Still, agriculture is demographically the broadest economic sector and plays a significant role in the overall socio-economic fabric of India. India exported $ 38 billion worth of agricultural products in 2013, making it the seventh largest agricultural exporter worldwide and the sixth largest net exporter. Most of its agriculture exports serve developing and least developed nations. Indian agricultural/horticultural and processed foods are exported to more than 120 countries, primarily to the Japan, Southeast Asia, SAARC countries, the European Union and the United States.

**Source:**[Wikipedia](https://en.wikipedia.org/wiki/Agriculture_in_India)

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load
# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# Packages Required

In [None]:
import numpy as np
import pandas as pd
import missingno as msno

import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go

# DataFrame Analysis

In [None]:
df = pd.read_csv('../input/tamilnadu-cropproduction/Tamilnadu agriculture yield data.csv')
df.sample(10)

In [None]:
df.shape

In [None]:
df.info()

In [None]:
df.describe()

## Missing values

In [None]:
df.isnull().sum()

Present. We need a visual of those

In [None]:
msno.matrix(df)
plt.show()

Only Production has null values. We will drop those for now. 

Since this data consists of only Agriculture in Tamil Nadu in particular we can drop State Name.

In [None]:
df['State_Name'].value_counts()

In [None]:
df.drop('State_Name', axis=1, inplace=True)

In [None]:
df.dropna(how='any', inplace=True)

Null values handled. Are there any duplicates?

In [None]:
print("Duplicates:", len(df[df.duplicated()]))

Good. We don't want those in our visuals

Another thing that is noticeable is the all caps of the district name. We can easily capitalise them.

In [None]:
df.District_Name = df.District_Name.apply(lambda x: x.capitalize())

# Area for Agriculture over the years

Agricultural land has sadly seen a decrease over the years due to the modernization of the state as a whole and buildings replacing what once was great soil. Let's see how much was affected from 1997-2013 

In [None]:
grp = df.groupby("Crop_Year")["Area"].sum().sort_index(ascending=True)

In [None]:
ag_area = pd.DataFrame({'Year': grp.index,
                        'Agricultural Area': grp.values})
ag_area.head()

In [None]:
fig = go.Figure(data=go.Scatter(x = ag_area['Year'], y = ag_area['Agricultural Area'], marker_color = ag_area['Agricultural Area']))
fig.update_layout(title='Agricultural Area over the years',  xaxis = dict(tickmode = 'linear', dtick = 1))
fig.show()

Saddened to see this going from a peak of around 12M to just above 4M . Hopefully there is something that can be done

# Agricultural Area in each District for a certain Year

Since 1998 was a pretty big peak let's see how the land was in each district for that particular year.

In [None]:
grp_dist = df[df.Crop_Year == 1998].groupby("District_Name")["Area"].sum().sort_values(ascending = False)

In [None]:
dist_df = pd.DataFrame({'District': grp_dist.index, 'Agricultural Area': grp_dist.values})
dist_df.head()

In [None]:
fig = px.bar(dist_df, x='District', y='Agricultural Area', color='Agricultural Area', height=600, width=1000, text='Agricultural Area', title='Agricultural Area in 1998')
fig.update_traces(texttemplate='%{text:.2s}', textposition='outside')
fig.update_layout(uniformtext_minsize=8, uniformtext_mode='hide')
fig.show()

What once was  : _ (

# Analysis based on Season

The agricultural season consists of two types of crops:

* [Kharif Crops](https://en.wikipedia.org/wiki/Kharif_crop)

* [Rabi Crops](https://en.wikipedia.org/wiki/Rabi_crop)

And there's crops that can be grown any time of year

**Season based counts**

In [None]:
df.Season.value_counts()

**Types of crops grown through the year**

In [None]:
df.Crop.value_counts()

In [None]:
se_crop = df.groupby(['Season', 'Crop'])["Production"].sum()

In [None]:
seas_crops = pd.DataFrame({"Production": se_crop}).reset_index()
seas_crops.head()

In [None]:
seas_crops.Season.value_counts()

Let's compare the two types of crops(Kharif and Rabi) now because combining them with the Whole Year crops makes them hard to view

In [None]:
wy = seas_crops[seas_crops['Season'] == 'Whole Year']
nwy = seas_crops[seas_crops['Season'] != 'Whole Year']
nwy.head()

In [None]:
fig = px.sunburst(nwy, path=['Season', 'Crop'], values='Production')
fig.show()

Click on each type to know more about those type of crops :)

Now let's look at adding in Whole Year. 

Thing is that it really messed up the viewing of Rabi Crops so I had to take a sample of it

In [None]:
crop_df = pd.concat([wy.sample(frac=0.4), nwy])

In [None]:
fig = px.sunburst(crop_df, path=['Season', 'Crop'], values='Production')
fig.show()

## Area and Production in each Season

In [None]:
fig = px.scatter(df, x="Production", y="Area",size="Crop_Year", color="Season", log_x=True, size_max=15, title = "Area and Production in each season")
fig.show()

# Kharif Production vs Rabi Production 

How does the production vary in the two types of crops? 

Let's find out

In [None]:
dist_s = df.groupby(["District_Name", "Season"])["Production"].sum()

In [None]:
kr = pd.DataFrame({"Production": dist_s}).reset_index()
kr = kr.sort_values("Production", ascending=False)
kr = kr[kr.Season != 'Whole Year']
kr.Season.value_counts()

In [None]:
fig = px.bar(kr, "District_Name", y="Production", color="Season", title="Kharif vs Rabi in each District")
fig.show()

# Final Table

We've seen all the possible things that have happened in agriculture. Hope you liked it :)

I will leave you with one final table before closing.

This will consist of:

* Season 

* District

* Crop

In [None]:
fin = df.groupby(["Season","District_Name","Crop"])["Production"].sum()

In [None]:
final_df = pd.DataFrame({"Production": fin}).reset_index()
final_df.sort_values("Production", ascending=False, inplace=True)

In [None]:
fig = go.Figure(data=[go.Table( header=dict(values=list(final_df.columns),
                fill_color='lightblue',
                align='left'),
    cells=dict(values=[final_df.Season, final_df.District_Name, final_df.Crop, final_df.Production],
               fill_color='pink',
               align='left'))
])
fig.show()

# That's all folks

That's it for the analysis and EDA. Hope you liked it. Make sure to check out the dataset [here ](https://www.kaggle.com/aishu200023/tamilnadu-cropproduction) if you are also interested in doing analysis on it as well. Anything to say about the data or the notebook, let me know in the comments below :)