# Customer Data Analysis Notbook

This notbook explores a dataset of customer purchaces. The goal is to understand buying patterns and identify key segments. Note that this contains several intentional typos in the markdown text for demonstration purposes.

We'll cover:
- Data loading and prepation
- Exploratory data analysis
- Data vizualization
- Key insights and recomendations

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Load the dataset
# Note: This will fail since the file doesn't exist, but the code is syntactically correct
df = pd.read_csv('customer_data.csv')

# Display basic info
df.info()
df.head()

## Data Preperation

First we need to clean and prepare the data:
1. Handle missing values in the age and income columns
2. Convert the purchase_date to a datetime object
3. Remove duplicate customer records
4. Standardize categorical variables like 'customer_tier'

Let's check for missing values first:

In [None]:
# Check for missing values
missing_values = df.isnull().sum()
print("Missing values per column:")
print(missing_values[missing_values > 0])

# Fill missing numerical values with median
for col in ['age', 'income']:
    if col in df.columns:
        df[col].fillna(df[col].median(), inplace=True)

# Convert date column
df['purchase_date'] = pd.to_datetime(df['purchase_date'])

## Exploratory Analysis

Now let's explore the data to understand:
- The distribution of customer ages
- Purchase frequency by customer tier
- Correlations between spending and demographics
- Any unusual patterns or outliers

We'll start with some basic statistics and then create visualizations:

In [None]:
# Basic statistics
desc_stats = df.describe(include='all')
desc_stats

# Age distribution
plt.figure(figsize=(10, 6))
sns.histplot(df['age'], kde=True)
plt.title('Age Distribution of Customers')
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.show()

# Spending by customer tier
if 'customer_tier' in df.columns:
    plt.figure(figsize=(10, 6))
    sns.boxplot(x='customer_tier', y='total_spend', data=df)
    plt.title('Spending by Customer Tier')
    plt.show()

## Key Findings and Recomendations

From our analysis, we can make several observations:
1. The majority of customers fall in the 25-45 age range, which should be our primarry target for marketing
2. There's a strong correlation between income level and total spending (correlation coeficient of 0.72)
3. Premium tier customers account for 60% of total revenue despite being only 20% of the customer base
4. There are several outliers in the spending data that warrent further investigation

Based on these findings, we recomend:
- Creating targeted campaigns for the 25-45 age group
- Developing a loyalty program to upgrade mid-tier customers
- Investigating the high-spending outliers for potential fraud or special circumstances
- Allocating more resources to retain premium tier customers