# Advanced Customer Segmentation using Unsupervised Machine Learning
## Part 1: Comprehensive Exploratory Data Analysis (EDA)

**Objective**: Implement advanced customer segmentation using multiple unsupervised learning algorithms to identify distinct customer groups for targeted marketing strategies.

**Author**: [Your Name]  
**Course**: BMCS2003 Artificial Intelligence  
**Assignment**: Machine Learning (Unsupervised)


## 1. Setup and Library Installation
Installing all required libraries for advanced clustering analysis


In [None]:
# Install required packages for advanced clustering
!pip install pandas numpy matplotlib seaborn scikit-learn plotly
!pip install yellowbrick umap-learn kneed
!pip install streamlit --quiet

# Import essential libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import warnings
warnings.filterwarnings('ignore')

# Set visualization style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

# Display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 1000)

print("✅ All libraries imported successfully!")


## 2. Data Loading and Initial Inspection
Advanced data profiling and quality assessment


In [None]:
# Load the dataset
# For Google Colab, upload the file first
from google.colab import files
uploaded = files.upload()

# Load data
df = pd.read_csv('shopping_trends.csv')

print(f"📊 Dataset Shape: {df.shape}")
print(f"📅 Data loaded successfully with {df.shape[0]:,} customers and {df.shape[1]} features")
print("\n" + "="*50)


In [None]:
# Comprehensive data overview
def data_overview(df):
    """
    Comprehensive data profiling function
    """
    print("🔍 DATA OVERVIEW")
    print("="*50)
    
    # Basic info
    print(f"Shape: {df.shape}")
    print(f"Memory Usage: {df.memory_usage(deep=True).sum() / 1024**2:.2f} MB")
    
    # Data types
    print("\n📋 Data Types:")
    dtype_counts = df.dtypes.value_counts()
    for dtype, count in dtype_counts.items():
        print(f"  {dtype}: {count} columns")
    
    # Missing values
    print("\n❌ Missing Values:")
    missing = df.isnull().sum()
    if missing.sum() == 0:
        print("  ✅ No missing values found!")
    else:
        missing_pct = (missing / len(df)) * 100
        missing_df = pd.DataFrame({
            'Missing Count': missing[missing > 0],
            'Percentage': missing_pct[missing > 0]
        })
        print(missing_df)
    
    # Duplicate rows
    duplicates = df.duplicated().sum()
    print(f"\n🔄 Duplicate Rows: {duplicates} ({duplicates/len(df)*100:.2f}%)")
    
    return df

df = data_overview(df)
