# Crop Production Analysis in India

## Introduction
The agriculture business domain is evolving with advancements in technology. This project aims to analyze a dataset on crop production in India to predict crop production and uncover insights into key indicators and metrics that influence crop production. The goal is to create visualizations and dashboards to effectively communicate findings and facilitate stakeholder collaboration.


## Import Libraries

Import the necessary libraries for data analysis and visualization.


In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

: 

## Data Loading and Preprocessing
In this section, we will load the dataset and perform initial preprocessing steps, including handling missing values and basic exploration of the data.


In [None]:
# Load Data
data = pd.read_csv('crop_production/crop_production.csv') #https://data.world/thatzprem/agriculture-india
pd.set_option("display.max_columns", None)

: 

In [None]:
data.shape

: 

In [None]:
data.head()

: 

In [None]:
data.info()

: 

In [None]:
print(data.describe())

: 

In [None]:
print(data.nunique())

: 

In [None]:
print(data.isnull().sum())


: 

## Data Cleaning

Remove rows with missing values to ensure the analysis is accurate.


In [None]:
# Drop rows with missing values
data.dropna(inplace=True)

# Ensure Correct Data Types
data['Crop_Year'] = data['Crop_Year'].astype(int)

# Check for Duplicates
data = data.drop_duplicates()
data.shape


: 

## Exploratory Data Analysis (EDA)
This section includes various visualizations to understand the data better and identify patterns and trends.


## Univariate Analysis

### Distribution of Numerical Features
We will plot histograms for numerical features to understand their distribution.


In [None]:
import matplotlib.pyplot as plt

# Plot histograms for numerical features
data.hist(figsize=(12, 10))
plt.suptitle('Histograms of Numerical Features')
plt.show()


: 

### Distribution of Categorical Features
Next, we will visualize the distribution of categorical features using bar charts.


In [None]:
import seaborn as sns

# Plot bar charts for categorical features
for col in ['State_Name', 'District_Name', 'Season', 'Crop']:
    plt.figure(figsize=(12, 6))
    sns.countplot(x=data[col], order=data[col].value_counts().index)
    plt.title(f'Distribution of {col}')
    plt.xticks(rotation=90)
    plt.show()


: 

## Bivariate Analysis

## Yearly Crop Production

Analyze and visualize the total crop production for each year.
We will analyze the crop production over time to observe trends and patterns.


In [None]:
# Group by Crop_Year and sum Production
df_yearly_production = data.groupby('Crop_Year')['Production'].sum().reset_index()
print(df_yearly_production)
# Plot crop production over time
plt.figure(figsize=(12, 6))
plt.plot(df_yearly_production['Crop_Year'], df_yearly_production['Production'],marker='o')
plt.xlabel('Crop Year')
plt.ylabel('Total Production')
plt.title('Yearly Crop Production')
plt.show()


: 

: 

## Top 10 Crops by Area

Identify and visualize the top 10 crops by area of cultivation.
We will identify and visualize the top crops by area.


In [None]:
# Group by Crop and sum Area
df_top_crops = data.groupby('Crop')['Area'].sum().reset_index()
df_top_crops = df_top_crops.sort_values('Area', ascending=False).head(10)
df_top_crops

: 

In [None]:
# Plot bar chart
plt.figure(figsize=(12, 6))
plt.barh(df_top_crops['Crop'], df_top_crops['Area'])
plt.xlabel('Total Area')
plt.title('Top 10 Crops by Area')
plt.show()

: 

## Pie Chart (Production Proportions)

In [None]:
# Group by Crop and sum Production
df_production = data.groupby('Crop')['Production'].sum().reset_index()
df_production = df_production.sort_values('Production', ascending=False)
print(df_production)
# Plot pie chart
plt.figure(figsize=(10, 8))
plt.pie(df_production['Production'], labels=df_production['Crop'], autopct='%1.1f%%')
plt.title('Crop Production Distribution')
plt.show()


: 

## Correlation Heatmap

In [None]:
# Compute correlation matrix
corr = data[['Crop_Year', 'Area', 'Production']].corr()
print(corr)
# Plot heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(corr, annot=True, cmap='coolwarm')
plt.title('Correlation Matrix')
plt.show()


: 

## Seasonal Analysis

### Seasonal Crops
Analyze which crops are prevalent in different seasons.


In [None]:
# Group by Crop_Year and Season
df_seasonal_crops = data.groupby(['Crop_Year', 'Season'])['Crop'].unique().reset_index()
df_seasonal_crops['Crop'] = df_seasonal_crops['Crop'].apply(','.join)
df_seasonal_crops = df_seasonal_crops.sort_values('Crop')
print(df_seasonal_crops)
# Plot seasonal crops
plt.figure(figsize=(12, 6))
sns.countplot(data=data, x='Season', order=data['Season'].value_counts().index)
plt.title('Crop Distribution by Season')
plt.xticks(rotation=45)
plt.show()


: 

## Production Analysis

In [None]:
# Plot top crops by total production
df_top_production = data.groupby('Crop')['Production'].sum().reset_index()
df_top_production = df_top_production.sort_values('Production', ascending=False).head(10)
print(df_top_production)
plt.figure(figsize=(12, 6))
plt.barh(df_top_production['Crop'], df_top_production['Production'])
plt.xlabel('Total Production')
plt.title('Top 10 Crops by Production')
plt.show()


: 

# Conclusion
The EDA has revealed various insights into crop production trends and distributions. We observed the distribution of production and area, identified outliers, explored relationships between features, and visualized key metrics. These insights will help guide further analysis and modeling.


: 

: 