In [1]:
import pandas as pd

In [2]:
sales = pd.read_csv('/content/dataset_fashion_store_sales.csv')
campaigns = pd.read_csv('/content/dataset_fashion_store_campaigns.csv')
channels = pd.read_csv('/content/dataset_fashion_store_channels.csv')
customers = pd.read_csv('/content/dataset_fashion_store_customers.csv')
products = pd.read_csv('/content/dataset_fashion_store_products.csv')
salesitems = pd.read_csv('/content/dataset_fashion_store_salesitems.csv')
stock = pd.read_csv('/content/dataset_fashion_store_stock.csv')

# Data Modeling (MySQL + Workbench)

## Objective
Store and organize the raw data from CSV files into a relational structure using MySQL, establishing primary keys, foreign keys, and data integrity.


## Imported Tables
The following tables were successfully imported into the `fashion_store` database:

- `sales`
- `salesitems`
- `products`
- `customers`
- `stock`
- `campaigns`
- `channels`

## Data Quality Check (Python Script)

A Python script was executed to check for:

- Null values
- Empty strings
- Column data types
- Number of unique values

**Result**: All tables are clean. No nulls or empty strings were found.

In [4]:
# Dictionary to hold all tables
datasets = {
    'sales': sales,
    'salesitems': salesitems,
    'campaigns': campaigns,
    'products': products,
    'stock': stock,
    'customers': customers,
    'channels': channels
}

In [6]:
# 4. Define reusable function for data quality check
def data_quality_report(df, name):
    print("=" * 60)
    print(f"Data Quality Check: {name.upper()}")
    print("-" * 60)
    print(f"Shape: {df.shape[0]} rows × {df.shape[1]} columns")

    nulls = df.isnull().sum()
    empties = (df == '').sum()

    print("\n Null Values per Column:")
    print(nulls[nulls > 0] if nulls.sum() > 0 else "No null values found.")

    print("\n Empty Strings per Column:")
    print(empties[empties > 0] if empties.sum() > 0 else "No empty strings found.")

    print("\n Column Data Types:")
    print(df.dtypes)

    print("\n Unique Values per Column:")
    print(df.nunique())
    print("=" * 60 + "\n")

# 5. Execute quality check for all tables
for name, df in datasets.items():
    data_quality_report(df, name)

Data Quality Check: SALES
------------------------------------------------------------
Shape: 905 rows × 7 columns

 Null Values per Column:
No null values found.

 Empty Strings per Column:
No empty strings found.

 Column Data Types:
sale_id           int64
channel          object
discounted        int64
total_amount    float64
sale_date        object
customer_id       int64
country          object
dtype: object

 Unique Values per Column:
sale_id         905
channel           2
discounted        2
total_amount    898
sale_date        51
customer_id     580
country           6
dtype: int64

Data Quality Check: SALESITEMS
------------------------------------------------------------
Shape: 2253 rows × 13 columns

 Null Values per Column:
No null values found.

 Empty Strings per Column:
No empty strings found.

 Column Data Types:
item_id                int64
sale_id                int64
product_id             int64
quantity               int64
original_price       float64
unit_price  




## Primary Keys Definition

Primary keys were defined to uniquely identify each row in the respective tables.

```sql
-- Example: Setting primary key for products
ALTER TABLE products
ADD PRIMARY KEY (product_id);
```

This was done for all tables that have unique identifiers.

---

## Foreign Keys Definition

Foreign keys were added to establish relationships between tables and ensure referential integrity.

```sql
-- Link sales.customer_id to customers
ALTER TABLE sales
ADD CONSTRAINT fk_sales_customer
FOREIGN KEY (customer_id) REFERENCES customers(customer_id);

-- Link sales.channel to channels
ALTER TABLE sales
ADD CONSTRAINT fk_sales_channel
FOREIGN KEY (channel) REFERENCES channels(channel);

-- Link salesitems.sale_id to sales
ALTER TABLE salesitems
ADD CONSTRAINT fk_salesitems_sale
FOREIGN KEY (sale_id) REFERENCES sales(sale_id);

-- Link salesitems.product_id to products
ALTER TABLE salesitems
ADD CONSTRAINT fk_salesitems_product
FOREIGN KEY (product_id) REFERENCES products(product_id);

-- Link stock.product_id to products
ALTER TABLE stock
ADD CONSTRAINT fk_stock_product
FOREIGN KEY (product_id) REFERENCES products(product_id);

-- Link campaigns.channel to channels
ALTER TABLE campaigns
ADD CONSTRAINT fk_campaigns_channel
FOREIGN KEY (channel) REFERENCES channels(channel);
```

Note: Some relationships required fixing values manually to ensure compatibility (e.g., `campaigns.channel` matching valid `channels`).

---

## Basic Queries for Initial Exploration

A few basic SQL queries were written to understand the dataset:

```sql
-- Total number of sales
SELECT COUNT(*) AS total_sales
FROM sales;

-- Sample of sales table for inspection
SELECT * FROM sales
LIMIT 10;

-- Total revenue
SELECT SUM(total_amount) AS total_revenue
FROM sales;

-- Number of unique customers
SELECT COUNT(DISTINCT customer_id) AS unique_customers
FROM sales;

-- Average ticket size
SELECT ROUND(SUM(total_amount) / COUNT(*), 2) AS avg_ticket
FROM sales;

-- Total products sold
SELECT SUM(quantity) AS total_products_sold
FROM sales_items;

-- Total number of campaigns
SELECT COUNT(*) AS total_campaigns
FROM campaigns;

-- Total profit (revenue - cost)
SELECT
    SUM(sales_items.item_total - products.cost_price * sales_items.quantity) AS total_profit
FROM
    sales_items
JOIN
    products ON sales_items.product_id = products.product_id;
```

