<a href="https://colab.research.google.com/github/sahil9022-crypto/data-science-project-all-/blob/main/Customer_Segmentation_static_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [3]:
# Install required packages (for Colab)
!pip install gradio scikit-learn plotly

import gradio as gr
import pandas as pd
import numpy as np
from sklearn.cluster import KMeans, AgglomerativeClustering, DBSCAN
from sklearn.metrics import silhouette_score
from sklearn.decomposition import PCA
import plotly.express as px

# ---------------------
# Core function
# ---------------------
def customer_segmentation(file, algorithm, n_clusters, eps, min_samples):
    try:
        # Load dataset
        df = pd.read_csv(file.name)

        # Select relevant features
        features = df[["Age", "Annual Income (k$)", "Spending Score (1-100)"]]

        # Run selected algorithm
        if algorithm == "KMeans":
            model = KMeans(n_clusters=n_clusters, random_state=42)
            labels = model.fit_predict(features)
        elif algorithm == "Agglomerative":
            model = AgglomerativeClustering(n_clusters=n_clusters)
            labels = model.fit_predict(features)
        elif algorithm == "DBSCAN":
            model = DBSCAN(eps=eps, min_samples=min_samples)
            labels = model.fit_predict(features)
        else:
            return "Invalid Algorithm", None, None, None

        # Add labels to dataframe
        df["Cluster"] = labels

        # PCA for visualization
        pca = PCA(n_components=2)
        reduced = pca.fit_transform(features)
        df["PCA1"], df["PCA2"] = reduced[:, 0], reduced[:, 1]

        # Plot clusters
        fig = px.scatter(
            df, x="PCA1", y="PCA2", color=df["Cluster"].astype(str),
            hover_data=["Age", "Annual Income (k$)", "Spending Score (1-100)"],
            title=f"{algorithm} Clustering Results"
        )

        # Compute silhouette score (only for valid cases)
        sil_score = "N/A"
        if algorithm in ["KMeans", "Agglomerative"] and len(set(labels)) > 1:
            sil_score = round(silhouette_score(features, labels), 3)

        # AI-style explanation
        explanation = f"""
        📊 **Customer Segmentation Summary**:
        - Algorithm Used: {algorithm}
        - Number of clusters: {len(set(labels)) if algorithm!='DBSCAN' else 'Auto from DBSCAN'}
        - Silhouette Score: {sil_score}

        **Business Insight**:
        These clusters can help marketing teams to target customers.
        - High income + high spending score → Premium Shoppers
        - High income + low spending score → Potential Retention Target
        - Low income + high spending score → Value Seekers
        """

        return explanation, df.head().to_html(), fig, sil_score

    except Exception as e:
        return f"❌ Error: {str(e)}", None, None, None


# ---------------------
# Gradio UI
# ---------------------
with gr.Blocks() as demo:
    gr.Markdown("## 🛍 Customer Segmentation Dashboard")

    with gr.Row():
        file = gr.File(label="Upload Mall Customers CSV", type="filepath")

    with gr.Row():
        algorithm = gr.Radio(
            ["KMeans", "Agglomerative", "DBSCAN"],
            value="KMeans", label="Clustering Algorithm"
        )
        n_clusters = gr.Slider(2, 12, value=4, step=1, label="n_clusters (KMeans/Agglomerative)")
        eps = gr.Slider(0.1, 10.0, value=0.5, step=0.1, label="eps (DBSCAN)")
        min_samples = gr.Slider(1, 10, value=5, step=1, label="min_samples (DBSCAN)")

    run_btn = gr.Button("🚀 Run Segmentation")

    with gr.Row():
        explanation = gr.Markdown()

    with gr.Row():
        df_view = gr.HTML()

    with gr.Row():
        fig_view = gr.Plot()

    with gr.Row():
        sil_score = gr.Textbox(label="Silhouette Score")

    run_btn.click(
        customer_segmentation,
        inputs=[file, algorithm, n_clusters, eps, min_samples],
        outputs=[explanation, df_view, fig_view, sil_score]
    )

demo.launch(share=True)

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://a73b3089907e41f5a9.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




# Customer Segmentation Dashboard (K-Means + PCA + More)

## 📌 Project Pitch (Portfolio / Resume Ready)

This project demonstrates my ability as a **data scientist** to build an **end-to-end customer segmentation solution** with real-world business applications in **Marketing Analytics**.

- **Problem Statement**: Businesses need to understand customer behavior to personalize offers, optimize retention, and increase revenue. Traditional segmentation methods lack scalability and adaptability.
- **Solution**: Developed an **interactive customer segmentation dashboard** using clustering algorithms (K-Means, Agglomerative, DBSCAN) and dimensionality reduction (PCA). The dashboard provides actionable insights with AI-style persona explanations.
- **Business Impact**:
  - Identify high-value customers for premium targeting.
  - Recognize discount-sensitive customers for promotions.
  - Support cross-sell/upsell strategies.
  - Provide segment-level insights for data-driven decision-making.
- **Wow Factors**:
  - Interactive Gradio UI for clustering exploration.
  - Automatic cluster explanations with marketing recommendations.
  - Elbow & Silhouette analysis for optimal cluster selection.
  - Downloadable clustered dataset for downstream marketing teams.

---

## ⚙️ Tech Stack
- **Language**: Python
- **Libraries**: Pandas, NumPy, scikit-learn, Plotly, Gradio, Matplotlib
- **ML Algorithms**: KMeans, Agglomerative Clustering, DBSCAN
- **Dimensionality Reduction**: PCA
- **Visualization**: Plotly (interactive charts)
- **Deployment UI**: Gradio (Colab-friendly, shareable)

---

## 📊 Features
- Upload Mall Customers dataset (CSV) or use default.
- Choose clustering algorithm (KMeans / Agglomerative / DBSCAN).
- Elbow plot + Silhouette score for evaluation.
- PCA 2D scatterplot with clusters.
- Cluster profile visualization (bar chart).
- Automated AI-style persona explanations + marketing recommendations.
- Download clustered CSV for further analysis.

---

## 🧑‍💻 Steps to Run in Google Colab
1. Upload this notebook/script into Google Colab.
2. Install required dependencies:
   ```bash
   !pip install -q gradio pandas scikit-learn plotly matplotlib numpy
   ```
3. Run the script. The Gradio app will generate a local + public share link.
4. Upload your **Mall_Customers.csv** dataset or use the default.
5. Explore segments interactively.

---

## 📂 Dataset Info
- **Mall_Customers.csv**
  - `CustomerID`
  - `Gender`
  - `Age`
  - `Annual Income (k$)`
  - `Spending Score (1-100)`

---

## 📈 Business Applications
- **Marketing Strategy**: Personalization, targeted promotions.
- **Customer Lifecycle Management**: Retention and upsell strategies.
- **Revenue Optimization**: Align offers with high-value customers.
- **Data-Driven Decision Making**: Actionable insights for marketing and sales.

---

## 📞 Contact Details
**Name**: Sahil Pawar  
**Role**: Data Science Enthusiast | Aspiring Data Scientist | Full-Stack Learner  
**Location**: Sangli, Maharashtra, India  
**Email**: *[publichacker9999@gmail.com]*   
**GitHub**: *[https://github.com/sahil9022-crypto]*  

---

✅ This project is **portfolio-ready** and highlights both **technical depth** (ML + PCA + dashboard) and **business relevance** (marketing analytics). Perfect to showcase during **internship/job applications**.