# Excel-Style Data Cleaning and Summary with Pandas

This project demonstrates how to clean and summarize customer sales data. It includes removing duplicates, handling missing values, and generating summary reports by region. This workflow simulates typical Excel-based reporting tasks often performed in business settings.


In [2]:
import pandas as pd

# Simulate messy data
data = {
    'Customer': ['Alice', 'Bob', 'Charlie', 'Alice', None],
    'Sales': [250, 150, None, 250, 200],
    'Region': ['West', 'East', 'East', 'West', None]
}
df = pd.DataFrame(data)

## Step 1: View and Understand Raw Data

The raw dataset includes missing values and duplicates, which are common issues in real-world Excel exports.


In [3]:
# Clean data
df = df.drop_duplicates()
df['Sales'] = df['Sales'].fillna(df['Sales'].mean())
df['Customer'] = df['Customer'].fillna('Unknown')
df['Region'] = df['Region'].fillna('Unknown')

## Step 2: Clean Data and Summarize Sales by Region

We fill missing values, remove duplicates, and aggregate total sales by region using groupby. These are essential data preparation steps before analysis or dashboarding.


In [5]:
# Summary
summary = df.groupby('Region')['Sales'].sum()
print("Total Sales by Region:\n", summary)

Total Sales by Region:
 Region
East       350.0
Unknown    200.0
West       250.0
Name: Sales, dtype: float64


## Conclusion

This project shows how to clean, analyze, and visualize data using Python and common libraries like Pandas, Matplotlib, and Seaborn. Skills demonstrated include data wrangling, aggregation, and business-oriented reporting.
