### **Exploratory Data Analysis**

#### Questioning
1. Specific  
    ❌ "What do users think?"   
    ✅ "What is the average rating given by users on the mobile app in April 2025?"

2. Measurable   
    ❌ "Is our marketing effective?"  
    ✅ Improved: "What was the click-through rate (CTR) of our email campaign in March 2025?"

3. Action-oriented      
    ❌ "How many users do we have?"     
    ✅ "Which user segments should we target to increase product engagement by 15% next quarter?"

4. Relevant     
    ❌ "What are the weather patterns in Canada?"   
    ✅ "What impact did delivery delays in Canada have on customer satisfaction scores in Q1 2025?"

5. Time-bound   
    ❌ "How much revenue are we making?"    
    ✅ "What was the total revenue generated in the last 30 days compared to the previous 30 days?"

---

#### Exploring Data

In [4]:
"""
Execute this cell before continue
""" 

import numpy as np
import pandas as pd

df = pd.DataFrame.from_records((
    (2, 83.82, 8.4),
    (4, 99.31, 16.97),
    (3, 96.52, 14.41),
    (6, 114.3, 20.14),
    (4, 101.6, 16.91),
    (2, 86.36, 12.64),
    (3, 92.71, 14.23),
    (2, 85.09, 11.11),
    (2, 85.85, 14.18),
    (5, 106.68, 20.01),
    (4, 99.06, 13.17),
    (5, 109.22, 15.36),
    (4, 100.84, 14.78),
    (6, 115.06, 20.06),
    (2, 84.07, 10.02),
    (7, 121.67, 28.4),
    (3, 94.49, 14.05),
    (6, 116.59, 17.55),
    (7, 121.92, 22.96),
), columns=("age", "height_cm", "weight_kg"))

df

Unnamed: 0,age,height_cm,weight_kg
0,2,83.82,8.4
1,4,99.31,16.97
2,3,96.52,14.41
3,6,114.3,20.14
4,4,101.6,16.91
5,2,86.36,12.64
6,3,92.71,14.23
7,2,85.09,11.11
8,2,85.85,14.18
9,5,106.68,20.01


In [5]:
"""
Understand data structure
"""

df.head()         # Show first 5 rows
df.tail()         # Show last 5 rows
df.info()         # Column types and missing values
df.shape          # Rows and columns
df.columns        # List of column names

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19 entries, 0 to 18
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   age        19 non-null     int64  
 1   height_cm  19 non-null     float64
 2   weight_kg  19 non-null     float64
dtypes: float64(2), int64(1)
memory usage: 584.0 bytes


Index(['age', 'height_cm', 'weight_kg'], dtype='object')

In [None]:
"""
Summary statistics
"""

df.describe()     # Mean, std, min, 25%, 50%, 75%, max
df.nunique()      # Count of unique values

In [None]:
"""
Check data quality
"""

df.isnull().sum()

In [2]:
"""
Grouping
""" 

df.groupby(by="age").mean()

Unnamed: 0_level_0,height_cm,weight_kg
age,Unnamed: 1_level_1,Unnamed: 2_level_1
2,85.038,11.27
3,94.573333,14.23
4,100.2025,15.4575
5,107.95,17.685
6,115.316667,19.25
7,121.795,25.68


In [3]:
"""
Aggregate
""" 

df.groupby(by='age').agg({
    'height_cm': 'mean',
    'weight_kg': ['mean','max', 'min'],
})

Unnamed: 0_level_0,height_cm,weight_kg,weight_kg,weight_kg
Unnamed: 0_level_1,mean,mean,max,min
age,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
2,85.038,11.27,14.18,8.4
3,94.573333,14.23,14.41,14.05
4,100.2025,15.4575,16.97,13.17
5,107.95,17.685,20.01,15.36
6,115.316667,19.25,20.14,17.55
7,121.795,25.68,28.4,22.96


In [None]:
"""
Class Activity: Understanding the data
"""
# TODO: Open data from E-Commerce Public Dataset
# TODO: Briefly understand the data
# TODO: Define at least 3 questions that you would like to answer
# TODO: Run descriptive statistics, EDA to answer your questions

### **Reflection**
Take a moment to reflect on what we've learned so far. What insights have you gained? Write your thoughts in your own words.

(answer here)

### **Exploration**
You've already gained a fundamental understanding of the mathematical concepts behind data analytics. Next, we’ll learn about data visualization — tools that we used to deliver insight after analyst.
- https://www.geeksforgeeks.org/data-visualization-with-python/