# 📊 Unemployment Analysis with Python - COVID-19 Impact Study

This Jupyter Notebook provides a comprehensive analysis of unemployment trends in India during 2020, with a particular focus on the impact of the COVID-19 pandemic. It covers data cleaning, exploratory data analysis, visualization, and an in-depth assessment of COVID-19's influence on unemployment rates.

## Table of Contents

- [1. Introduction](#1.-Introduction)
- [2. Data Loading and Initial Exploration](#2.-Data-Loading-and-Initial-Exploration)
- [3. Data Cleaning and Preprocessing](#3.-Data-Cleaning-and-Preprocessing)
- [4. Exploratory Data Analysis (EDA) and Visualization](#4.-Exploratory-Data-Analysis-(EDA)-and-Visualization)
- [5. COVID-19 Impact Analysis](#5.-COVID-19-Impact-Analysis)
- [6. Key Patterns and Seasonal Trends](#6.-Key-Patterns-and-Seasonal-Trends)
- [7. Conclusion and Policy Implications](#7.-Conclusion-and-Policy-Implications)

## 1. Introduction

The COVID-19 pandemic significantly disrupted global economies and labor markets. This project aims to analyze the unemployment data in India to understand how the pandemic affected employment, identify regional disparities, and observe any underlying trends.

## 2. Data Loading and Initial Exploration

First, we load the dataset and perform an initial inspection to understand its structure, data types, and identify any immediate issues.


In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

# Load the dataset
df = pd.read_csv('unemployment_rate_upto_11_2020.csv')

# Display basic information about the dataset
print('Dataset Info:')
df.info()

# Display the first few rows of the dataset
print('
First 5 rows of the dataset:')
print(df.head())

# Display descriptive statistics
print('
Descriptive Statistics:')
print(df.describe())


## 3. Data Cleaning and Preprocessing

We will clean the data by renaming columns for clarity, converting data types, and checking for missing values.


In [None]:
# Check for missing values
print('
Missing Values:')
print(df.isnull().sum())

# Rename columns for easier access
df.columns = ['Region', 'Date', 'Frequency', 'Estimated Unemployment Rate', 'Estimated Employed', 'Estimated Labour Participation Rate', 'Region_Category', 'longitude', 'latitude']

# Convert 'Date' column to datetime objects
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)

# Convert 'Frequency' and 'Region_Category' to categorical type
df['Frequency'] = df['Frequency'].astype('category')
df['Region_Category'] = df['Region_Category'].astype('category')

# Display updated info to confirm data types
print('
Updated Dataset Info after type conversion:')
df.info()

# Save the cleaned data to a new CSV for further use (optional, for script-based analysis)
df.to_csv('cleaned_unemployment_data.csv', index=False)
print('
Cleaned data saved to cleaned_unemployment_data.csv')


## 4. Exploratory Data Analysis (EDA) and Visualization

This section focuses on visualizing the data to understand distributions, relationships, and initial trends.


In [None]:
# 1. Unemployment Rate Distribution
plt.figure(figsize=(10, 6))
sns.histplot(df['Estimated Unemployment Rate'], kde=True)
plt.title('Distribution of Estimated Unemployment Rate')
plt.xlabel('Estimated Unemployment Rate (%)')
plt.ylabel('Frequency')
plt.savefig('unemployment_rate_distribution.png')
plt.show()

# 2. Unemployment Rate by Region Category
plt.figure(figsize=(12, 7))
sns.boxplot(x='Region_Category', y='Estimated Unemployment Rate', data=df)
plt.title('Estimated Unemployment Rate by Region Category')
plt.xlabel('Region Category')
plt.ylabel('Estimated Unemployment Rate (%)')
plt.savefig('unemployment_rate_by_region_category.png')
plt.show()

# 3. Correlation Heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(df[['Estimated Unemployment Rate', 'Estimated Employed', 'Estimated Labour Participation Rate']].corr(), annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap of Key Economic Indicators')
plt.savefig('correlation_heatmap.png')
plt.show()


## 5. COVID-19 Impact Analysis

We will analyze how the unemployment rate changed during the COVID-19 pandemic, comparing pre-COVID, peak-COVID, and post-peak COVID periods.


In [None]:
# Group by date and calculate the average unemployment rate
monthly_unemployment = df.groupby('Date')['Estimated Unemployment Rate'].mean().reset_index()

plt.figure(figsize=(15, 7))
sns.lineplot(x='Date', y='Estimated Unemployment Rate', data=monthly_unemployment)
plt.title('Average Estimated Unemployment Rate Over Time (All Regions)')
plt.xlabel('Date')
plt.ylabel('Average Estimated Unemployment Rate (%)')
plt.xticks(rotation=45)
plt.grid(True)
plt.tight_layout()
plt.savefig('average_unemployment_rate_overall.png')
plt.show()

# Analyze pre-COVID vs. during COVID (assuming COVID impact starts around March-April 2020)
pre_covid_df = df[df['Date'] < '2020-03-01']
during_covid_df = df[(df['Date'] >= '2020-03-01') & (df['Date'] <= '2020-08-31')] # Peak impact period
post_peak_covid_df = df[df['Date'] > '2020-08-31']

print('Average Unemployment Rate Pre-COVID (Jan-Feb 2020):')
print(pre_covid_df.groupby('Region_Category')['Estimated Unemployment Rate'].mean())

print('Average Unemployment Rate During Peak COVID (Mar-Aug 2020):')
print(during_covid_df.groupby('Region_Category')['Estimated Unemployment Rate'].mean())

print('Average Unemployment Rate Post-Peak COVID (Sep-Nov 2020):')
print(post_peak_covid_df.groupby('Region_Category')['Estimated Unemployment Rate'].mean())


## 6. Key Patterns and Seasonal Trends

This section identifies monthly and regional patterns in the unemployment data.


In [None]:
# Monthly average unemployment rate across all regions
df['Month'] = df['Date'].dt.month_name()
monthly_avg_unemployment = df.groupby('Month')['Estimated Unemployment Rate'].mean().reindex([
    'January', 'February', 'March', 'April', 'May', 'June',
    'July', 'August', 'September', 'October', 'November', 'December'
]).reset_index()

plt.figure(figsize=(12, 6))
sns.barplot(x='Month', y='Estimated Unemployment Rate', data=monthly_avg_unemployment, palette='viridis')
plt.title('Average Estimated Unemployment Rate by Month (Overall)')
plt.xlabel('Month')
plt.ylabel('Average Estimated Unemployment Rate (%)')
plt.xticks(rotation=45)
plt.tight_layout()
plt.savefig('monthly_unemployment_trends.png')
plt.show()

# State-wise unemployment rate during peak COVID
peak_covid_state_unemployment = during_covid_df.groupby('Region')['Estimated Unemployment Rate'].mean().reset_index()
peak_covid_state_unemployment = peak_covid_state_unemployment.sort_values(by='Estimated Unemployment Rate', ascending=False)

plt.figure(figsize=(15, 8))
sns.barplot(x='Estimated Unemployment Rate', y='Region', data=peak_covid_state_unemployment.head(10), palette='coolwarm')
plt.title('Top 10 Regions with Highest Average Unemployment Rate During Peak COVID')
plt.xlabel('Average Estimated Unemployment Rate (%)')
plt.ylabel('Region')
plt.tight_layout()
plt.savefig('top_10_regions_peak_covid.png')
plt.show()

# Regional impact over time (using Plotly for interactivity)
fig = px.line(df, x='Date', y='Estimated Unemployment Rate', color='Region_Category',
              title='Estimated Unemployment Rate Over Time by Region Category (Interactive)',
              hover_data={'Region': True, 'Estimated Employed': True})
fig.write_html('unemployment_rate_time_series_interactive.html')
fig.show()


## 7. Conclusion and Policy Implications

The analysis of unemployment in India during 2020 clearly highlights the devastating impact of the COVID-19 pandemic on the labor market. The unemployment peak in April-May 2020 was a direct consequence of lockdown measures, disproportionately affecting certain regions.

**Key Policy Implications:**

-   **Targeted Support**: Regions most affected (e.g., Tripura, Bihar, Delhi) require targeted employment support programs and economic aid to accelerate recovery.
-   **Labor Market Flexibility**: The rapid recovery in some regions suggests the importance of flexibility and adaptability for businesses and workers.
-   **Continuous Monitoring**: Ongoing monitoring of labor market indicators is crucial to anticipate future shocks and adjust policies accordingly.
-   **Economic Diversification**: Encouraging economic diversification in regions heavily reliant on vulnerable sectors could strengthen their resilience to crises.

This report provides a foundation for understanding unemployment dynamics during a crisis and underscores the importance of a data-driven approach to effective policy-making.
