# **AI TECH INSTITUTE** · *Intermediate AI & Data Science*
### Week 01 · Notebook 01 — Introduction to Pandas & Series
**Instructor:** Amir Charkhi  |  **Goal:** Master the transition from Python basics to Pandas data structures.

> Format: short theory → quick practice → build understanding → mini-challenges.


---
## Learning Objectives
- Understand why we need Pandas for data analysis
- Master Series creation and manipulation
- Connect Week 0 Python concepts to Pandas operations
- Prepare for DataFrames

## 1. Why Pandas? From Lists to Series
Remember our Week 0 lists? Let's see why we need something more powerful.

In [12]:
# Week 0 way - calculating average temperature
temps = [22.5, 23.1, 21.8, 24.2, 22.9]
avg_temp = sum(temps) / len(temps)
print(f"Average (Python list): {avg_temp:.2f}°C")

# What if we want temps above 23?
above_23 = [t for t in temps if t > 23]
print(f"Temps > 23: {above_23}")

Average (Python list): 22.90°C
Temps > 23: [23.1, 24.2]


In [13]:
# Week 1 way - with Pandas!
import pandas as pd
import numpy as np

# Create a Series (like a smart list with superpowers)
temps_series = pd.Series([22.5, 23.1, 21.8, 24.2, 22.9])
print("Pandas Series:")
print(temps_series)
print(f"\nAverage (Pandas): {temps_series.mean():.2f}°C")
print(f"\nTemps > 23:")
print(temps_series[temps_series > 23])

Pandas Series:
0    22.5
1    23.1
2    21.8
3    24.2
4    22.9
dtype: float64

Average (Pandas): 22.90°C

Temps > 23:
1    23.1
3    24.2
dtype: float64


**Exercise 1 — Feel the Difference (easy)**  
Create a Series of 5 student scores and find: mean, max, min, and scores above 80.


In [15]:
# Your turn
scores = pd.Series([68, 72, 75, 89, 96])
print(f"Average score: {scores.mean()}")
print(f"\nMax score: {scores.max()}")
print(f"\nMin score: {scores.min()}")
print(f"\nScores above 80:\n{scores[scores>80]}")

Average score: 80.0

Max score: 96

Min score: 68

Scores above 80:
3    89
4    96
dtype: int64


<details>
<summary><b>Solution</b></summary>

```python
scores = pd.Series([75, 82, 91, 68, 87])
print(f"Mean: {scores.mean():.1f}")
print(f"Max: {scores.max()}")
print(f"Min: {scores.min()}")
print("\nScores > 80:")
print(scores[scores > 80])
```
</details>

## 2. Series with Index Labels
Unlike lists, Series can have meaningful labels!

In [16]:
# Create a Series with custom index
cities = ['Perth', 'Sydney', 'Melbourne', 'Brisbane', 'Adelaide']
populations = [2.1, 5.3, 5.0, 2.6, 1.4]  # in millions

pop_series = pd.Series(populations, index=cities, name='Population (M)')
print(pop_series)
print(f"\nPerth population: {pop_series['Perth']}M")
print(f"\nCities over 3M:")
print(pop_series[pop_series > 3])

Perth        2.1
Sydney       5.3
Melbourne    5.0
Brisbane     2.6
Adelaide     1.4
Name: Population (M), dtype: float64

Perth population: 2.1M

Cities over 3M:
Sydney       5.3
Melbourne    5.0
Name: Population (M), dtype: float64


**Exercise 2 — Product Inventory (medium)**  
Create a Series for product inventory: iPhone:45, iPad:32, MacBook:18, AirPods:67.
Find products with stock < 40.


In [19]:
# Your turn
product = ['iPhone', 'iPad', 'MacBook', 'AirPods']
inventory = pd.Series([45,32,18,67], index=product, name = 'stock count')
print(f"Product stock < 40: \n{inventory[inventory<40]}")

Product stock < 40: 
iPad       32
MacBook    18
Name: stock count, dtype: int64


<details>
<summary><b>Solution</b></summary>

```python
inventory = pd.Series(
    {'iPhone': 45, 'iPad': 32, 'MacBook': 18, 'AirPods': 67},
    name='Stock Count'
)
print("Current Inventory:")
print(inventory)
print("\nLow stock items (< 40):")
print(inventory[inventory < 40])
```
</details>

## 3. Series Operations & Methods

In [42]:
# Mathematical operations work element-wise
prices = pd.Series([99.99, 149.99, 199.99, 79.99], 
                   index=['Basic', 'Standard', 'Premium', 'Student'])

# Apply 20% discount
discounted = prices * 0.8
print("Original prices:")
print(prices)
print("\nAfter 20% discount:")
print(discounted.round(2))

# Useful methods
print(f"\nPrice stats:")
print(f"Mean: ${prices.mean():.2f}")
print(f"Median: ${prices.median():.2f}")
print(f"Std Dev: ${prices.std():.2f}")

Original prices:
Basic        99.99
Standard    149.99
Premium     199.99
Student      79.99
dtype: float64

After 20% discount:
Basic        79.99
Standard    119.99
Premium     159.99
Student      63.99
dtype: float64

Price stats:
Mean: $132.49
Median: $124.99
Std Dev: $53.77


## 4. Handling Missing Data

In [43]:
# Real data often has missing values
sales = pd.Series([1200, None, 1450, 980, None, 1680],
                  index=['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat'])
print("Sales with missing data:")
print(sales)
print(f"\nCount of missing: {sales.isna().sum()}")
print(f"Mean (ignoring NaN): ${sales.mean():.2f}")

# Fill missing values
sales_filled = sales.fillna(sales.mean())
print("\nAfter filling with mean:")
print(sales_filled.round(2))

Sales with missing data:
Mon    1200.0
Tue       NaN
Wed    1450.0
Thu     980.0
Fri       NaN
Sat    1680.0
dtype: float64

Count of missing: 2
Mean (ignoring NaN): $1327.50

After filling with mean:
Mon    1200.0
Tue    1327.5
Wed    1450.0
Thu     980.0
Fri    1327.5
Sat    1680.0
dtype: float64


**Exercise 3 — Temperature Analysis (medium)**  
Given a week of temperatures with some missing values, fill them with the median and find days above average.


In [50]:
temps = pd.Series([22.5, None, 24.1, 23.8, None, 25.2, 21.9],
                  index=['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'])
temps_filled = temps.fillna(temps.median())
print(f"Original temperature: \n{temps}")
print(f"\nReplace the missing value with temperature median ({temps.median():.1f} degC)")
print(f"\nNew temperature: \n{temps_filled}")
print(f"\nDays above average temperature ({temps_filled.mean():.1f} degC):\n{temps_filled[temps_filled>temps_filled.mean()]}")


Original temperature: 
Mon    22.5
Tue     NaN
Wed    24.1
Thu    23.8
Fri     NaN
Sat    25.2
Sun    21.9
dtype: float64

Replace the missing value with temperature median (23.8 degC)

New temperature: 
Mon    22.5
Tue    23.8
Wed    24.1
Thu    23.8
Fri    23.8
Sat    25.2
Sun    21.9
dtype: float64

Days above average temperature (23.6 degC):
Tue    23.8
Wed    24.1
Thu    23.8
Fri    23.8
Sat    25.2
dtype: float64


<details>
<summary><b>Solution</b></summary>

```python
temps = pd.Series([22.5, None, 24.1, 23.8, None, 25.2, 21.9],
                  index=['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'])
print("Original temperatures:")
print(temps)

# Fill with median
temps_filled = temps.fillna(temps.median())
print("\nFilled temperatures:")
print(temps_filled)

# Find above average days
avg_temp = temps_filled.mean()
print(f"\nAverage: {avg_temp:.1f}°C")
print("\nDays above average:")
print(temps_filled[temps_filled > avg_temp])
```
</details>

## 5. Series Alignment & Combining

In [20]:
# Pandas aligns by index automatically!
q1_sales = pd.Series({'Product_A': 100, 'Product_B': 150, 'Product_C': 200})
q2_sales = pd.Series({'Product_B': 180, 'Product_C': 220, 'Product_D': 90})

print("Q1 Sales:")
print(q1_sales)
print("\nQ2 Sales:")
print(q2_sales)

# Addition aligns by index
total_sales = q1_sales.add(q2_sales, fill_value=0)
print("\nTotal Sales (Q1 + Q2):")
print(total_sales)

Q1 Sales:
Product_A    100
Product_B    150
Product_C    200
dtype: int64

Q2 Sales:
Product_B    180
Product_C    220
Product_D     90
dtype: int64

Total Sales (Q1 + Q2):
Product_A    100.0
Product_B    330.0
Product_C    420.0
Product_D     90.0
dtype: float64


**Exercise 4 — Revenue Calculator (hard)**  
Given prices and quantities sold, calculate total revenue per product and overall total.


In [40]:
# Your turn
prices = pd.Series({'Laptop': 1200, 'Mouse': 25, 'Keyboard': 80, 'Monitor': 350})
quantities = pd.Series({'Laptop': 5, 'Mouse': 45, 'Keyboard': 30, 'Webcam': 15})
revenue = prices*quantities

print(f"Total revenue per product ($): \n{revenue}")
print(f"\nOverall total revenue: ${revenue.sum():.0f}")
print(f"\nBest seller: {revenue.idxmax()} with revenue of ${revenue.max():.0f}")

Total revenue per product ($): 
Keyboard    2400.0
Laptop      6000.0
Monitor        NaN
Mouse       1125.0
Webcam         NaN
dtype: float64

Overall total revenue: $9525

Best seller: Laptop with revenue of $6000


<details>
<summary><b>Solution</b></summary>

```python
prices = pd.Series({'Laptop': 1200, 'Mouse': 25, 'Keyboard': 80, 'Monitor': 350})
quantities = pd.Series({'Laptop': 5, 'Mouse': 45, 'Keyboard': 30, 'Webcam': 15})

# Calculate revenue (handles mismatched indices)
revenue = prices * quantities
print("Revenue per product:")
print(revenue.dropna())  # Drop products we can't calculate

print(f"\nTotal revenue: ${revenue.sum():.2f}")
print(f"Best seller: {revenue.idxmax()} (${revenue.max():.2f})")
```
</details>

## 6. Mini-Challenges
- **M1 (easy):** Create a Series of 10 random numbers and find values > mean
- **M2 (medium):** Create a grade Series, convert letter grades to numeric (A=4, B=3, etc.)
- **M3 (hard):** Combine two Series with different indices and calculate percentage change

In [60]:
# Your turn - try the challenges!
# M1
random_numbers = pd.Series(np.random.randint(1,101, size=10))
print(f"Ten Random Numbers:\n{random_numbers}")
print(f"\nMean = {random_numbers.mean()}")
print(f"\nNumbers Above Average:\n{random_numbers[random_numbers>random_numbers.mean()]}")

Ten Random Numbers:
0    52
1    82
2    24
3    97
4    48
5    10
6    26
7    22
8    98
9    53
dtype: int32

Mean = 51.2

Numbers Above Average:
0    52
1    82
3    97
8    98
9    53
dtype: int32


In [71]:
# M2
grades = pd.Series(np.random.choice(['A','B','C','D'], size=8))
grade_map = {'A':4,'B':3,'C':2,'D':1}
numeric_grades = grades.map(grade_map)
grade_series = pd.DataFrame({"Grades":grades,"GPA":numeric_grades})
print(grade_series)

  Grades  GPA
0      A    4
1      B    3
2      C    2
3      B    3
4      B    3
5      D    1
6      D    1
7      D    1


In [78]:
# M3
June = pd.Series({'Week 1':25,'Week 2':100,'Week 3':15,'Week 4':100})
July = pd.Series({'Week 1':5,'Week 2':50,'Week 3':25,'Week 4':50,'Week 5':150})
percentage_rainfall_change = round((July-June)/June*100,2)
print(f"Percentage of rainfall change from July to June 2025: \n{percentage_rainfall_change.dropna()}")

Percentage of rainfall change from July to June 2025: 
Week 1   -80.00
Week 2   -50.00
Week 3    66.67
Week 4   -50.00
dtype: float64


<details>
<summary><b>Solutions</b></summary>

```python
# M1
random_series = pd.Series(np.random.randn(10))
print(random_series[random_series > random_series.mean()])

# M2
grades = pd.Series(['A', 'B', 'A', 'C', 'B', 'D'])
grade_map = {'A': 4, 'B': 3, 'C': 2, 'D': 1, 'F': 0}
numeric_grades = grades.map(grade_map)
print(f"GPA: {numeric_grades.mean():.2f}")

# M3
jan = pd.Series({'A': 100, 'B': 200, 'C': 150})
feb = pd.Series({'B': 220, 'C': 140, 'D': 80})
pct_change = ((feb - jan) / jan * 100).round(2)
print(pct_change.dropna())
```
</details>

## Wrap-Up & Next Steps
✅ You've mastered Series - the building block of DataFrames!  
✅ You can create, filter, and manipulate data efficiently  
✅ You understand index alignment and missing data handling  

**Next:** DataFrames - think of them as multiple Series combined into a table!
