<a href="https://colab.research.google.com/github/polydiaguiar/portfolio-data-analysis/blob/main/Supplement_Sales_Data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ☑ Imports

In [10]:
# ========================================================
# Library import
# ========================================================

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import shapiro, skew

In [3]:
# ========================================================
# Data import
# ========================================================

# Install dependencies as needed:
!pip install -q kagglehub[pandas-datasets]

import kagglehub
from kagglehub import KaggleDatasetAdapter


# ========================================================
# Load the latest version of the specified file from the dataset
# ========================================================


file_path = "Supplement_Sales_Weekly_Expanded.csv"


df = kagglehub.load_dataset(
  KaggleDatasetAdapter.PANDAS,
  "zahidmughal2343/supplement-sales-data",
  file_path,
)


  df = kagglehub.load_dataset(


# 📚 Dataset Details

**About Dataset**

*Supplement Sales Data (2020–2025)*

Overview
This dataset contains weekly sales data for a variety of health and wellness supplements from January 2020 to April 2025. The data includes products in categories like Protein, Vitamins, Omega, and Amino Acids, among others, and covers multiple e-commerce platforms such as Amazon, Walmart, and iHerb. The dataset also tracks sales in several locations including the USA, UK, and Canada.


Dataset Details

- Time Range: January 2020 to April 2025

- Frequency: Weekly (Every Monday)

- Number of Rows: 4,384

- Columns:

  - Date: The week of the sale.

  - Product Name: The name of the supplement (e.g., Whey Protein, Vitamin C, etc.).

  - Category: The category of the supplement (e.g., Protein, Vitamin, Omega).

  - Units Sold: The number of units sold in that week.

  - Price: The selling price of the product.

  - Revenue: The total revenue generated (Units Sold * Price).

  - Discount: The discount applied on the product (as a percentage of original price).

  - Units Returned: The number of units returned in that week.

  - Location: The location of the sale (USA, UK, or Canada).

  - Platform: The e-commerce platform (Amazon, Walmart, iHerb).

In [4]:
print(df)

            Date        Product Name     Category  Units Sold  Price  Revenue  \
0     2020-01-06        Whey Protein      Protein         143  31.98  4573.14   
1     2020-01-06           Vitamin C      Vitamin         139  42.51  5908.89   
2     2020-01-06            Fish Oil        Omega         161  12.91  2078.51   
3     2020-01-06        Multivitamin      Vitamin         140  16.07  2249.80   
4     2020-01-06         Pre-Workout  Performance         157  35.47  5568.79   
...          ...                 ...          ...         ...    ...      ...   
4379  2025-03-31           Melatonin    Sleep Aid         160  47.79  7646.40   
4380  2025-03-31              Biotin      Vitamin         154  38.12  5870.48   
4381  2025-03-31   Green Tea Extract   Fat Burner         139  20.40  2835.60   
4382  2025-03-31     Iron Supplement      Mineral         154  18.31  2819.74   
4383  2025-03-31  Electrolyte Powder    Hydration         178  39.12  6963.36   

      Discount  Units Retur

# 📊 Data Wrangling

In [5]:
# Visualiza as cinco primeiras linhas e as cinco últimas
print(df.head())
print(df.tail())

         Date  Product Name     Category  Units Sold  Price  Revenue  \
0  2020-01-06  Whey Protein      Protein         143  31.98  4573.14   
1  2020-01-06     Vitamin C      Vitamin         139  42.51  5908.89   
2  2020-01-06      Fish Oil        Omega         161  12.91  2078.51   
3  2020-01-06  Multivitamin      Vitamin         140  16.07  2249.80   
4  2020-01-06   Pre-Workout  Performance         157  35.47  5568.79   

   Discount  Units Returned Location Platform  
0      0.03               2   Canada  Walmart  
1      0.04               0       UK   Amazon  
2      0.25               0   Canada   Amazon  
3      0.08               0   Canada  Walmart  
4      0.25               3   Canada    iHerb  
            Date        Product Name    Category  Units Sold  Price  Revenue  \
4379  2025-03-31           Melatonin   Sleep Aid         160  47.79  7646.40   
4380  2025-03-31              Biotin     Vitamin         154  38.12  5870.48   
4381  2025-03-31   Green Tea Extract  F

In [6]:
# Resumo de informações
informacoes = pd.DataFrame ({'Columns': df.columns,
                            'type': df.dtypes,
                             'NaN': df.isna().sum(),
                             '% NaN': (df.isna().sum()/df.shape[0])*100,
                             'Values unique for features': df.nunique()
})

informacoes

Unnamed: 0,Columns,type,NaN,% NaN,Values unique for features
Date,Date,object,0,0.0,274
Product Name,Product Name,object,0,0.0,16
Category,Category,object,0,0.0,10
Units Sold,Units Sold,int64,0,0.0,81
Price,Price,float64,0,0.0,2919
Revenue,Revenue,float64,0,0.0,4326
Discount,Discount,float64,0,0.0,26
Units Returned,Units Returned,int64,0,0.0,9
Location,Location,object,0,0.0,3
Platform,Platform,object,0,0.0,3


In [8]:
# Visualiza estatísticas das colunas numéricas
df.drop('Date', axis=1).describe()

Unnamed: 0,Units Sold,Price,Revenue,Discount,Units Returned
count,4384.0,4384.0,4384.0,4384.0,4384.0
mean,150.200274,34.781229,5226.569446,0.124398,1.531478
std,12.396099,14.198309,2192.491946,0.071792,1.258479
min,103.0,10.0,1284.0,0.0,0.0
25%,142.0,22.5975,3349.3725,0.06,1.0
50%,150.0,34.72,5173.14,0.12,1.0
75%,158.0,46.7125,7009.96,0.19,2.0
max,194.0,59.97,10761.85,0.25,8.0


In [9]:
# Visualiza informações das variáveis categóricas
df.select_dtypes(include='object').describe()

Unnamed: 0,Date,Product Name,Category,Location,Platform
count,4384,4384,4384,4384,4384
unique,274,16,10,3,3
top,2025-03-31,Whey Protein,Vitamin,Canada,iHerb
freq,16,274,822,1507,1499


 ## 📑 Resumo executivo Data Wrangling - Qualidade dos Dados
  - Não há valores ausentes no dataset
  - O tipo de dado da coluna 'Date' será tratado posteriormente no Power BI para análise temporal adequada


