# Exploratory Data Analysis for Tide Dynamic Pricing

This notebook contains exploratory data analysis (EDA) for the dynamic pricing model of Tide at GlobalMart. The goal is to understand the data, identify patterns, and derive insights that can inform the pricing strategy.

In [None]:
# --- Markdown Cell ---
# Exploratory Data Analysis for Tide Dynamic Pricing

This notebook contains exploratory data analysis (EDA) for the dynamic pricing model of Tide at GlobalMart. The goal is to understand the data, identify patterns, and derive insights that can inform the pricing strategy.

# --- Python Cell ---
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set visualization style
sns.set(style='whitegrid')

# --- Python Cell ---
# Load raw data files from data/raw
sales_data = pd.read_csv('../data/raw/sales_data_dictionary.csv')
inventory_data = pd.read_csv('../data/raw/inventory_data_dictionary.csv')
customer_behavior_data = pd.read_csv('../data/raw/customer_behavior_data_dictionary.csv')
competitor_data = pd.read_csv('../data/raw/competitor_data_dictionary.csv')

# Display the first few rows of each dataset
print("Sales Data:")
display(sales_data.head())
print("Inventory Data:")
display(inventory_data.head())
print("Customer Behavior Data:")
display(customer_behavior_data.head())
print("Competitor Data:")
display(competitor_data.head())

# --- Python Cell ---
# Summary statistics for sales data
sales_data.describe(include='all')

# --- Python Cell ---
# Check for missing values in all datasets
print("Sales Data missing values:")
print(sales_data.isnull().sum())
print("\nInventory Data missing values:")
print(inventory_data.isnull().sum())
print("\nCustomer Behavior Data missing values:")
print(customer_behavior_data.isnull().sum())
print("\nCompetitor Data missing values:")
print(competitor_data.isnull().sum())

# --- Python Cell ---
# Visualize the distribution of SellingPrice in sales data
plt.figure(figsize=(10, 6))
sns.histplot(sales_data['SellingPrice'], bins=30, kde=True)
plt.title('Selling Price Distribution of Tide Products')
plt.xlabel('Selling Price')
plt.ylabel('Frequency')
plt.show()

# --- Python Cell ---
# Correlation heatmap for numeric columns in sales data
plt.figure(figsize=(8, 6))
corr = sales_data.select_dtypes(include=[np.number]).corr()
sns.heatmap(corr, annot=True, fmt='.2f', cmap='coolwarm')
plt.title('Correlation Heatmap - Sales Data')
plt.show()

# --- Markdown Cell ---
## Insights and Next Steps

Based on the exploratory data analysis of the raw data files, we can derive insights that will help in refining our dynamic pricing model. The next steps will involve feature engineering and model training.

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set visualization style
sns.set(style='whitegrid')

In [None]:
# Load the dataset
data_path = '../data/processed/tide_pricing_data.csv'
tide_data = pd.read_csv(data_path)

# Display the first few rows of the dataset
tide_data.head()

In [None]:
# Summary statistics
tide_data.describe()

In [None]:
# Check for missing values
missing_values = tide_data.isnull().sum()
missing_values[missing_values > 0]

In [None]:
# Visualize the distribution of prices
plt.figure(figsize=(10, 6))
sns.histplot(tide_data['price'], bins=30, kde=True)
plt.title('Price Distribution of Tide Products')
plt.xlabel('Price')
plt.ylabel('Frequency')
plt.show()

In [None]:
# Correlation heatmap
plt.figure(figsize=(12, 8))
correlation_matrix = tide_data.corr()
sns.heatmap(correlation_matrix, annot=True, fmt='.2f', cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()

## Insights and Next Steps

Based on the exploratory data analysis, we can derive insights that will help in refining our dynamic pricing model. The next steps will involve feature engineering and model training.