<a href="https://colab.research.google.com/github/noobhacker02/CBT-CIP/blob/main/Project_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 📊 Task 2: Unemployment Analysis - Gradio Web App (CipherByte Internship)

As part of my internship at **CipherByte Technologies**, I developed an interactive data analysis web application to visualize and analyze unemployment trends in India using Gradio and Python.

## 🚀 Key Features
- 📁 **Dual File Upload**: Accepts two `.xlsx` files:
  - `Unemployment_Rate_upto_11_2020.xlsx`
  - `Unemployment in India.xlsx`
- 🔍 **Preprocessing**: Cleans, merges, and formats time-series unemployment data.
- 📈 **Data Visualizations**:
  - 📉 **Time-Series Plot**: Rural vs. Urban unemployment trends.
  - 📦 **Boxplot**: State-wise distribution of unemployment.
  - 🔥 **Correlation Heatmap**: Relationship among employment indicators.
- 📑 **Text Outputs**:
  - 🧮 Summary statistics of key columns.
  - 🏆 Top 5 states with the highest unemployment rates during COVID (2020 onward).

## 🛠️ Tech Stack
- `Python`
- `Gradio` – for interactive UI
- `Pandas` – for data manipulation
- `Seaborn` & `Matplotlib` – for data visualization
- `Pillow` – for rendering plots
- `Openpyxl` – for Excel file support

## 📂 How It Works
1. Upload the two specified Excel sheets.
2. The system merges both, cleans the data, and prepares visualizations.
3. Outputs include:
   - Summary statistics
   - COVID-era top 5 high-unemployment states
   - Trendline, boxplot, and correlation heatmap

## 📊 Visual Examples
- **Unemployment Over Time**: Tracks rate fluctuation by area (urban/rural).
- **Boxplot by Region**: Shows variance and outliers across Indian states.
- **Heatmap**: Highlights relationships between employment metrics.

## 🎯 Purpose
To demonstrate practical data analytics skills through interactive visualization, enabling users to explore trends and gain insights from raw unemployment data.

## 👨‍💻 Developed By
**Talha Shaikh**  
🔗 [LinkedIn](https://www.linkedin.com/in/talha-s-145729339/)  
📌 Project for **#CipherByteTech** Internship

---

> “Bringing data to life with interactive, insightful visuals.”


In [1]:
# Install dependencies (if needed)
!pip install gradio pandas scikit-learn openpyxl matplotlib seaborn pillow --quiet

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import gradio as gr
import io
from PIL import Image

sns.set(style="whitegrid")

def buffer_to_image(buf):
    return Image.open(buf)

def analyze_unemployment(file1, file2):
    # Read Excel files
    df1 = pd.read_excel(file1, sheet_name='Unemployment_Rate_upto_11_2020')
    df2 = pd.read_excel(file2, sheet_name='Unemployment in India')

    # Clean and preprocess
    df1 = df1.drop(columns=['Region.1', 'Frequency'], errors='ignore')
    df2 = df2.dropna(subset=['Region'])

    df1['Date'] = pd.to_datetime(df1['Date'], dayfirst=True)
    df2['Date'] = pd.to_datetime(df2['Date'], dayfirst=True)

    combined_df = pd.concat([df1, df2], ignore_index=True)

    # Check required columns
    required_cols = ['Estimated Unemployment Rate (%)', 'Date', 'Region', 'Area']
    for col in required_cols:
        if col not in combined_df.columns:
            raise ValueError(f"Missing required column: {col}")

    # Summary Statistics
    summary_stats = combined_df.describe().to_string()

    # Time Series Plot
    fig1, ax1 = plt.subplots(figsize=(10, 5))
    sns.lineplot(x='Date', y='Estimated Unemployment Rate (%)', hue='Area', data=combined_df, ax=ax1, errorbar=None)
    ax1.set_title('Unemployment Rate Over Time (Rural vs. Urban)')
    plt.xticks(rotation=45)
    buf1 = io.BytesIO()
    plt.tight_layout()
    fig1.savefig(buf1, format="png")
    buf1.seek(0)
    plt.close(fig1)

    # Top 5 States During COVID (2020 onwards)
    covid_period = combined_df[combined_df['Date'] >= '2020-03-01']
    top_states = covid_period.groupby('Region')['Estimated Unemployment Rate (%)'].mean().nlargest(5).to_string()

    # Boxplot by Region
    fig2, ax2 = plt.subplots(figsize=(12, 6))
    sns.boxplot(x='Region', y='Estimated Unemployment Rate (%)', data=combined_df, palette='viridis', ax=ax2)
    plt.xticks(rotation=90)
    ax2.set_title('Unemployment Rate Distribution by State')
    buf2 = io.BytesIO()
    plt.tight_layout()
    fig2.savefig(buf2, format="png")
    buf2.seek(0)
    plt.close(fig2)

    # Correlation Heatmap
    fig3, ax3 = plt.subplots(figsize=(6, 4))
    corr_columns = ['Estimated Unemployment Rate (%)', 'Estimated Employed', 'Estimated Labour Participation Rate (%)']
    corr_df = combined_df[[col for col in corr_columns if col in combined_df.columns]]
    corr_matrix = corr_df.corr()
    sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', ax=ax3)
    ax3.set_title('Correlation Matrix')
    buf3 = io.BytesIO()
    plt.tight_layout()
    fig3.savefig(buf3, format="png")
    buf3.seek(0)
    plt.close(fig3)

    # Return all results
    return summary_stats, top_states, buffer_to_image(buf1), buffer_to_image(buf2), buffer_to_image(buf3)

# Gradio Interface
iface = gr.Interface(
    fn=analyze_unemployment,
    inputs=[
        gr.File(label="Upload Unemployment_Rate_upto_11_2020.xlsx"),
        gr.File(label="Upload Unemployment in India.xlsx")
    ],
    outputs=[
        gr.Textbox(label="Summary Statistics"),
        gr.Textbox(label="Top 5 States with Highest Unemployment (2020)"),
        gr.Image(label="Unemployment Rate Over Time"),
        gr.Image(label="Boxplot by Region"),
        gr.Image(label="Correlation Matrix")
    ],
    title="Unemployment Analysis - CipherByte Internship",
    description="Developed by Talha Shaikh | [LinkedIn](https://www.linkedin.com/in/talha-s-145729339/) | #cipherbytetech"

)

iface.launch()


[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m46.9/46.9 MB[0m [31m10.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m322.2/322.2 kB[0m [31m11.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m95.2/95.2 kB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m11.5/11.5 MB[0m [31m35.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m72.0/72.0 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.4/62.4 kB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
[?25hIt looks like you are running Gradio on a hosted a Jupyter notebook. For the Gradio app to work, sharing must be enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab

