## Interactive Sales Dashboard — Preprocessing 

This notebook focuses on **data cleaning and transformation** to prepare the Global Superstore dataset for dashboard development.

The goal is to address issues identified during the EDA and create a **clean, structured dataset** ready for interactive visualization.

**Main tasks:**
- Handle missing values and correct data types.
- Remove or adjust outliers when necessary.
- Create new features (e.g., month, year, profit margin).
- Aggregate or transform data for efficient dashboard use.
- Export the processed dataset to a `processed/` folder.


In [1]:
import pandas as pd

In [12]:
df = pd.read_csv('../data/superstore.csv', encoding='utf-8')
print(f"Dataset com {df.shape[0]} linhas e {df.shape[1]} colunas.")

Dataset com 51290 linhas e 27 colunas.


In [13]:
df['Order.Date'] = pd.to_datetime(df['Order.Date'])
df['Ship.Date'] = pd.to_datetime(df['Ship.Date'])


In [14]:
df.isnull().sum()

Category          0
City              0
Country           0
Customer.ID       0
Customer.Name     0
Discount          0
Market            0
记录数               0
Order.Date        0
Order.ID          0
Order.Priority    0
Product.ID        0
Product.Name      0
Profit            0
Quantity          0
Region            0
Row.ID            0
Sales             0
Segment           0
Ship.Date         0
Ship.Mode         0
Shipping.Cost     0
State             0
Sub.Category      0
Year              0
Market2           0
weeknum           0
dtype: int64

In [15]:
df = df[df['Ship.Date'] >= df['Order.Date']]

In [19]:
df['Year'] = df['Order.Date'].dt.year
df['Month'] = df['Order.Date'].dt.month
df['Month-Year'] = df['Order.Date'].dt.to_period('M').astype(str)


In [16]:
df['Profit.Margin'] = (df['Profit'] / df['Sales']).round(2)

In [20]:
df.to_csv("../data/processed/superstore.csv", index=False, encoding="utf-8")

print(f"Dataset processado salvo com {df.shape[0]} linhas e {df.shape[1]} colunas.")

Dataset processado salvo com 51290 linhas e 30 colunas.
