# Customer Behavior Analysis

## Introduction
This notebook analyzes customer behavior data from an online retail platform to derive meaningful insights for business decision-making. We'll focus on understanding purchasing patterns, customer segmentation, and transaction trends.

## Data Description and Preparation

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.float_format', lambda x: '%.3f' % x)

In [None]:
# Read the dataset
df = pd.read_excel('Online Retail.xlsx')

# Display basic information about the dataset
print("Dataset Info:")
print("-" * 50)
df.info()

print("\nFirst few rows of the dataset:")
print("-" * 50)
df.head()

### Data Structure Analysis

The Online Retail dataset is a **structured dataset** with the following characteristics:
- Each row represents a transaction
- Contains numerical and categorical variables
- Has a clear schema with defined columns

### Data Cleaning and Preprocessing

In [None]:
# Check for missing values
print("Missing values in each column:")
print("-" * 50)
print(df.isnull().sum())

# Check for duplicates
print("\nNumber of duplicate rows:")
print("-" * 50)
print(df.duplicated().sum())

In [None]:
# Data cleaning steps
def clean_data(df):
    # Create a copy of the dataframe
    df_clean = df.copy()
    
    # Remove rows with missing values
    df_clean = df_clean.dropna()
    
    # Remove duplicates
    df_clean = df_clean.drop_duplicates()
    
    # Filter out rows with quantity <= 0 or unit price <= 0
    df_clean = df_clean[(df_clean['Quantity'] > 0) & (df_clean['UnitPrice'] > 0)]
    
    # Add a TotalAmount column
    df_clean['TotalAmount'] = df_clean['Quantity'] * df_clean['UnitPrice']
    
    return df_clean

# Clean the data
df_clean = clean_data(df)

# Display basic statistics of the cleaned dataset
print("Cleaned dataset statistics:")
print("-" * 50)
df_clean.describe()

### Data Preprocessing Summary

The following preprocessing steps were performed:
1. Removed missing values
2. Removed duplicate transactions
3. Filtered out invalid transactions (negative or zero quantity/price)
4. Added TotalAmount column for transaction value analysis

The cleaned dataset is now ready for further analysis of customer behavior patterns.