# **AI TECH INSTITUTE** · *Intermediate AI & Data Science*
### Week 01 · Notebook 01 — Introduction to Pandas & Series
**Instructor:** Amir Charkhi  |  **Goal:** Master the transition from Python basics to Pandas data structures.

> Format: short theory → quick practice → build understanding → mini-challenges.


---
## Learning Objectives
- Understand why we need Pandas for data analysis
- Master Series creation and manipulation
- Connect Week 0 Python concepts to Pandas operations
- Prepare for DataFrames

## 1. Why Pandas? From Lists to Series
Remember our Week 0 lists? Let's see why we need something more powerful.

In [2]:
temps = [22.5, 23.1, 21.8, 24.2, 22.9]
print(type(temps))

<class 'list'>


In [3]:
import pandas as pd

In [6]:
temp_series = pd.Series(temps) # 1-D data structure with indexes
print(type(temp_series))

<class 'pandas.core.series.Series'>


In [7]:
# Week 0 way - calculating average temperature
temps = [22.5, 23.1, 21.8, 24.2, 22.9]
avg_temp = sum(temps) / len(temps)
print(f"Average (Python list): {avg_temp:.2f}°C")



Average (Python list): 22.90°C


In [8]:
# Week 1 way - with Pandas!
import pandas as pd
import numpy as np

# Create a Series (like a smart list with superpowers)
temps_series = pd.Series([22.5, 23.1, 21.8, 24.2, 22.9])
print("Pandas Series:")
print(temps_series)
print(f"\nAverage (Pandas): {temps_series.mean():.2f}°C")
print(f"\nTemps > 23:")
print(temps_series[temps_series > 23])

Pandas Series:
0    22.5
1    23.1
2    21.8
3    24.2
4    22.9
dtype: float64

Average (Pandas): 22.90°C

Temps > 23:
1    23.1
3    24.2
dtype: float64


In [13]:
temp_series > 23

0    False
1     True
2    False
3     True
4    False
dtype: bool

In [14]:
temps

[22.5, 23.1, 21.8, 24.2, 22.9]

In [15]:
for num in temps:
    if num > 23:
        print(num)
    else:
        continue

23.1
24.2


In [16]:
temp_series[temp_series>23]

1    23.1
3    24.2
dtype: float64

**Exercise 1 — Feel the Difference (easy)**  
Create a Series of 5 student scores and find: mean, max, min, and scores above 80.


In [None]:
# Your turn


<details>
<summary><b>Solution</b></summary>

```python
scores = pd.Series([75, 82, 91, 68, 87])
print(f"Mean: {scores.mean():.1f}")
print(f"Max: {scores.max()}")
print(f"Min: {scores.min()}")
print("\nScores > 80:")
print(scores[scores > 80])
```
</details>

## 2. Series with Index Labels
Unlike lists, Series can have meaningful labels!

In [18]:
# Create a Series with custom index
cities = ['Perth', 'Sydney', 'Melbourne', 'Brisbane', 'Adelaide']
populations = [2.1, 5.3, 5.0, 2.6, 1.4]  # in millions

pop_series = pd.Series(populations, index=cities, name='Population (M)')
print(pop_series)
print(f"\nPerth population: {pop_series['Perth']}M")
print(f"\nCities over 3M:")
print(pop_series[pop_series > 3])

Perth        2.1
Sydney       5.3
Melbourne    5.0
Brisbane     2.6
Adelaide     1.4
Name: Population (M), dtype: float64

Perth population: 2.1M

Cities over 3M:
Sydney       5.3
Melbourne    5.0
Name: Population (M), dtype: float64


**Exercise 2 — Product Inventory (medium)**  
Create a Series for product inventory: iPhone:45, iPad:32, MacBook:18, AirPods:67.
Find products with stock < 40.


In [None]:
# Your turn


<details>
<summary><b>Solution</b></summary>

```python
inventory = pd.Series(
    {'iPhone': 45, 'iPad': 32, 'MacBook': 18, 'AirPods': 67},
    name='Stock Count'
)
print("Current Inventory:")
print(inventory)
print("\nLow stock items (< 40):")
print(inventory[inventory < 40])
```
</details>

## 3. Series Operations & Methods

In [19]:
prices = pd.Series([99.99, 149.99, 199.99, 79.99], 
                   index=['Basic', 'Standard', 'Premium', 'Student'])

prices

Basic        99.99
Standard    149.99
Premium     199.99
Student      79.99
dtype: float64

In [20]:
discounted = prices * 0.8

discounted

Basic        79.992
Standard    119.992
Premium     159.992
Student      63.992
dtype: float64

In [21]:
# Mathematical operations work element-wise
prices = pd.Series([99.99, 149.99, 199.99, 79.99], 
                   index=['Basic', 'Standard', 'Premium', 'Student'])

# Apply 20% discount
discounted = prices * 0.8
print("Original prices:")
print(prices)
print("\nAfter 20% discount:")
print(discounted.round(2))

# Useful methods
print(f"\nPrice stats:")
print(f"Mean: ${prices.mean():.2f}")
print(f"Median: ${prices.median():.2f}")
print(f"Std Dev: ${prices.std():.2f}")

Original prices:
Basic        99.99
Standard    149.99
Premium     199.99
Student      79.99
dtype: float64

After 20% discount:
Basic        79.99
Standard    119.99
Premium     159.99
Student      63.99
dtype: float64

Price stats:
Mean: $132.49
Median: $124.99
Std Dev: $53.77


## 4. Handling Missing Data

In [36]:
# Real data often has missing values
sales = pd.Series([1200, None, 1450, 980, None, 1680],
                  index=['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat'])
print("Sales with missing data:")
print(sales)
print(f"\nCount of missing: {sales.isna().sum()}")
print(f"Mean (ignoring NaN): ${sales.mean():.2f}")

# Fill missing values
sales_filled = sales.fillna(sales.mean())
print("\nAfter filling with mean:")
print(sales_filled.round(2))

Sales with missing data:
Mon    1200.0
Tue       NaN
Wed    1450.0
Thu     980.0
Fri       NaN
Sat    1680.0
dtype: float64

Count of missing: 2
Mean (ignoring NaN): $1327.50

After filling with mean:
Mon    1200.0
Tue    1327.5
Wed    1450.0
Thu     980.0
Fri    1327.5
Sat    1680.0
dtype: float64


**Exercise 3 — Temperature Analysis (medium)**  
Given a week of temperatures with some missing values, fill them with the median and find days above average.


In [None]:
# Your turn
# temps = pd.Series([22.5, None, 24.1, 23.8, None, 25.2, 21.9],
#                   index=['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'])


<details>
<summary><b>Solution</b></summary>

```python
temps = pd.Series([22.5, None, 24.1, 23.8, None, 25.2, 21.9],
                  index=['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'])
print("Original temperatures:")
print(temps)

# Fill with median
temps_filled = temps.fillna(temps.median())
print("\nFilled temperatures:")
print(temps_filled)

# Find above average days
avg_temp = temps_filled.mean()
print(f"\nAverage: {avg_temp:.1f}°C")
print("\nDays above average:")
print(temps_filled[temps_filled > avg_temp])
```
</details>

## 5. Series Alignment & Combining

In [39]:
# Pandas aligns by index automatically!
q1_sales = pd.Series({'Product_A': 100, 'Product_B': 150, 'Product_C': 200})
q2_sales = pd.Series({'Product_B': 180, 'Product_C': 220, 'Product_D': 90})

print("Q1 Sales:")
print(q1_sales)
print("\nQ2 Sales:")
print(q2_sales)

Q1 Sales:
Product_A    100
Product_B    150
Product_C    200
dtype: int64

Q2 Sales:
Product_B    180
Product_C    220
Product_D     90
dtype: int64


In [40]:


# Addition aligns by index
total_sales = q1_sales.add(q2_sales, fill_value=0)
print("\nTotal Sales (Q1 + Q2):")
print(total_sales)


Total Sales (Q1 + Q2):
Product_A    100.0
Product_B    330.0
Product_C    420.0
Product_D     90.0
dtype: float64


**Exercise 4 — Revenue Calculator (hard)**  
Given prices and quantities sold, calculate total revenue per product and overall total.


In [None]:
# Your turn
# prices = pd.Series({'Laptop': 1200, 'Mouse': 25, 'Keyboard': 80, 'Monitor': 350})
# quantities = pd.Series({'Laptop': 5, 'Mouse': 45, 'Keyboard': 30, 'Webcam': 15})


<details>
<summary><b>Solution</b></summary>

```python
prices = pd.Series({'Laptop': 1200, 'Mouse': 25, 'Keyboard': 80, 'Monitor': 350})
quantities = pd.Series({'Laptop': 5, 'Mouse': 45, 'Keyboard': 30, 'Webcam': 15})

# Calculate revenue (handles mismatched indices)
revenue = prices * quantities
print("Revenue per product:")
print(revenue.dropna())  # Drop products we can't calculate

print(f"\nTotal revenue: ${revenue.sum():.2f}")
print(f"Best seller: {revenue.idxmax()} (${revenue.max():.2f})")
```
</details>

## 6. Mini-Challenges
- **M1 (easy):** Create a Series of 10 random numbers and find values > mean
- **M2 (medium):** Create a grade Series, convert letter grades to numeric (A=4, B=3, etc.)
- **M3 (hard):** Combine two Series with different indices and calculate percentage change

In [None]:
# Your turn - try the challenges!


<details>
<summary><b>Solutions</b></summary>

```python
# M1
random_series = pd.Series(np.random.randn(10))
print(random_series[random_series > random_series.mean()])

# M2
grades = pd.Series(['A', 'B', 'A', 'C', 'B', 'D'])
grade_map = {'A': 4, 'B': 3, 'C': 2, 'D': 1, 'F': 0}
numeric_grades = grades.map(grade_map)
print(f"GPA: {numeric_grades.mean():.2f}")

# M3
jan = pd.Series({'A': 100, 'B': 200, 'C': 150})
feb = pd.Series({'B': 220, 'C': 140, 'D': 80})
pct_change = ((feb - jan) / jan * 100).round(2)
print(pct_change.dropna())
```
</details>

## Wrap-Up & Next Steps
✅ You've mastered Series - the building block of DataFrames!  
✅ You can create, filter, and manipulate data efficiently  
✅ You understand index alignment and missing data handling  

**Next:** DataFrames - think of them as multiple Series combined into a table!
