# 01 - Data Exploration

This notebook contains the initial exploration of the retail transaction dataset.

**Note**: This notebook is designed to run on Google Colab.

## Objectives
- Load and examine the dataset structure
- Understand data types and missing values
- Perform initial statistical analysis
- Identify data quality issues
- Visualize basic distributions

## Setup for Colab
1. Upload the dataset to Colab or mount Google Drive
2. Install required packages
3. Import necessary libraries


In [None]:
# Mount Google Drive (if using Drive to store data)
from google.colab import drive
drive.mount('/content/drive')

# Alternatively, upload files directly in Colab
# Go to Files -> Upload to upload the dataset


: 

In [None]:
# Install required packages (run once)
!pip install pandas numpy matplotlib seaborn plotly mlxtend scikit-learn statsmodels prophet

# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

# Set visualization style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)


In [None]:
# Load dataset
# Option 1: If uploaded directly to Colab
df = pd.read_csv('/content/OnlineRetail.csv', encoding='latin-1')

# Option 2: If using Google Drive
# df = pd.read_csv('/content/drive/MyDrive/Retail-Stock-Market-Behavior-Analysis/data/raw/OnlineRetail.csv', encoding='latin-1')

# Option 3: Download directly from UCI repository
# !wget https://archive.ics.uci.edu/ml/machine-learning-databases/00352/Online%20Retail.xlsx
# df = pd.read_excel('/content/Online Retail.xlsx', sheet_name='Online Retail')

# Display basic information
print("Dataset Shape:", df.shape)
print("\nColumn Names:", df.columns.tolist())
print("\nData Types:")
print(df.dtypes)
print("\nFirst Few Rows:")
df.head()
