# AI School - Epoch 1 - Checkpoint 2

**Total Points:** 100 points (Main exercises) + 10 bonus points

**Minimum Required:** You need **85/100 points** from the main exercises.

You don't need to complete everything perfectly. The bonus exercises are **optional but recommended** for extra practice with advanced topics.

**Dataset:** `omc_members.csv` : a synthetic (messy) dataset of Open Minds Club member records.

**Submission:** Create a GitHub repo, push your `.ipynb` file and the CSV, submit the repo link in the following form: https://forms.gle/h7Xqa78SHZiPXuM47

*Attempt the bonus if you finish early, and don't stress about perfection - learning is the goal!*


---

## Setup

In [14]:
# pandas installation
%pip install pandas 

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 25.0.1 -> 26.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [15]:
import pandas as pd
import numpy as np

---

## Part 1: Data Loading & First Inspection

### Exo 1: Load the Data (5 points)

Load `omc_members.csv` into a DataFrame called `df`.

Then display:
- The **shape** of the DataFrame
- The **column names**
- The **data types** of each column


In [16]:
# Load the CSV
df = pd.read_csv("omc_members.csv")
df.head()

Unnamed: 0,member_id,full_name,age,wilaya,university,level,track,role,join_date,workshops_attended,project_score,github_profile,status,email
0,OMC-2024-465,Nadine Bensalem,25.0,Sétif,ESI,L1,Web Development,Core Team,2024-04-14,7,,github.com/nadine62,Active,nadine.bensalem@email.com
1,OMC-2024-622,Imane Saidi,26.0,BLIDA,UMBB,M2,AI & Data,Member,2024-07-04,4,16.1,github.com/imane85,Inactive,imane.saidi@email.com
2,OMC-2024-398,Sofiane Ziani,23.0,Béjaïa,Univ. Oran 1,M2,Cybersecurity,Member,2024-02-27,7,15.4,github.com/sofiane45,Active,sofiane.ziani@email.com
3,OMC-2024-766,Wafa Rahmani,23.0,Annaba,ENSIA,M2,Software Engineering,Core Team,2024-09-09,2,11.1,,Active,wafa.rahmani@email.com
4,OMC-2024-345,Nassim Rahmani,25.0,BATNA,ENSIA,M2,Web Development,Member,2024-07-19,0,9.0,github.com/nassim64,Active,nassim.rahmani@email.com


In [13]:
# Display shape, column names, and dtypes
print("DataFrame Informations:")

print(f"1- DataFrame shape (rows, columns)  → {df.shape}")

print(f"2- DataFrame columns names  → {df.columns.tolist()}")

print("3- DataFrame columns types:")
print(df.dtypes)


DataFrame Informations:
1- DataFrame shape (rows, columns)  → (128, 14)
2- DataFrame columns names  → ['member_id', 'full_name', 'age', 'wilaya', 'university', 'level', 'track', 'role', 'join_date', 'workshops_attended', 'project_score', 'github_profile', 'status', 'email']
3- DataFrame columns types:
member_id                 str
full_name                 str
age                   float64
wilaya                    str
university                str
level                     str
track                     str
role                      str
join_date                 str
workshops_attended      int64
project_score         float64
github_profile            str
status                    str
email                     str
dtype: object


---

### Exo 2: First Look (5 points)

Display:
- The **first 5 rows** of the DataFrame
- A **statistical summary** of the numeric columns
- Use `.info()` to get a concise overview


In [15]:
# First 5 rows
print("The first 5 rows:")
print(df.head(5))

The first 5 rows:
      member_id        full_name   age  wilaya    university level  \
0  OMC-2024-465  Nadine Bensalem  25.0   Sétif           ESI    L1   
1  OMC-2024-622      Imane Saidi  26.0   BLIDA          UMBB    M2   
2  OMC-2024-398    Sofiane Ziani  23.0  Béjaïa  Univ. Oran 1    M2   
3  OMC-2024-766     Wafa Rahmani  23.0  Annaba         ENSIA    M2   
4  OMC-2024-345   Nassim Rahmani  25.0   BATNA         ENSIA    M2   

                  track       role   join_date  workshops_attended  \
0       Web Development  Core Team  2024-04-14                   7   
1             AI & Data     Member  2024-07-04                   4   
2         Cybersecurity     Member  2024-02-27                   7   
3  Software Engineering  Core Team  2024-09-09                   2   
4       Web Development     Member  2024-07-19                   0   

   project_score        github_profile    status                      email  
0            NaN   github.com/nadine62    Active  nadine.bensa

In [18]:
# Statistical summary
print("Statistical summary of the numeric columns: ")
print(df.describe())

Statistical summary of the numeric columns: 
              age  workshops_attended  project_score
count  128.000000          128.000000     116.000000
mean    30.304688            3.500000      13.957759
std     87.099324            2.459114       3.433699
min      0.000000            0.000000       8.000000
25%     20.000000            1.000000      11.300000
50%     21.500000            3.000000      13.350000
75%     25.000000            6.000000      17.200000
max    999.000000            7.000000      20.000000


In [62]:
# Concise overview
print("Concise overview: ")
df.info()

Concise overview: 
<class 'pandas.DataFrame'>
RangeIndex: 128 entries, 0 to 127
Data columns (total 14 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   member_id           128 non-null    str    
 1   full_name           128 non-null    str    
 2   age                 128 non-null    float64
 3   wilaya              120 non-null    str    
 4   university          122 non-null    str    
 5   level               128 non-null    str    
 6   track               128 non-null    str    
 7   role                128 non-null    str    
 8   join_date           128 non-null    str    
 9   workshops_attended  128 non-null    int64  
 10  project_score       116 non-null    float64
 11  github_profile      113 non-null    str    
 12  status              128 non-null    str    
 13  email               128 non-null    str    
dtypes: float64(2), int64(1), str(11)
memory usage: 14.1 KB


---

### Exo 3: Missing Values Report (10 points)

1. Display the **count** of missing values per column
2. Display the **percentage** of missing values per column (rounded to 2 decimal places)
3. Print the names of columns that have **at least one** missing value


In [65]:
# Count of missing values per column
print("Count of missing values per column:")
df.isnull().sum()

Count of missing values per column:


member_id              0
full_name              0
age                    0
wilaya                 8
university             6
level                  0
track                  0
role                   0
join_date              0
workshops_attended     0
project_score         12
github_profile        15
status                 0
email                  0
dtype: int64

In [69]:
# Percentage of missing values per column
print("Percentage of missing values per column:")
( df.isnull().sum() / df.shape[0] ) * 100 

Percentage of missing values per column:


member_id              0.00000
full_name              0.00000
age                    0.00000
wilaya                 6.25000
university             4.68750
level                  0.00000
track                  0.00000
role                   0.00000
join_date              0.00000
workshops_attended     0.00000
project_score          9.37500
github_profile        11.71875
status                 0.00000
email                  0.00000
dtype: float64

In [71]:
# Column names with at least one missing value
print("Column names with at least one missing value:")
df.columns[df.isnull().sum() >= 1]


Column names with at least one missing value:


Index(['wilaya', 'university', 'project_score', 'github_profile'], dtype='str')

---

## Part 2: Data Cleaning

### Exo 4: Fix Data Types (15 points)

When you loaded the data, some columns were not stored in the correct type. Fix the following:

- `age` → should be **integer** (it's currently a string like `"22.0"`)
- `workshops_attended` → should be **integer** (it's currently a string)
- `join_date` → should be **datetime**

After converting, print the dtypes to confirm.

**Hint:** Convert `age` to float first, then to int.


In [73]:
# Convert age to integer
df['age'] = df['age'].astype(float)
df['age'] = df['age'].astype(int)

In [74]:
# Convert workshops_attended to integer
df['workshops_attended'] = df['workshops_attended'].astype(int)

In [75]:
# Convert join_date to datetime
df['join_date'] = pd.to_datetime(df['join_date'])

In [77]:
# Confirm dtypes
print("DataFrame columns types after conversion:")
df.dtypes

DataFrame columns types after conversion:


member_id                        str
full_name                        str
age                            int64
wilaya                           str
university                       str
level                            str
track                            str
role                             str
join_date             datetime64[us]
workshops_attended             int64
project_score                float64
github_profile                   str
status                           str
email                            str
dtype: object

---

### Exo 5: Handle Missing Values (10 points)

Fill missing values as follows:

- `project_score` → fill with the **median** of the column
- `wilaya` → fill with `"Unknown"`
- `university` → fill with `"Unknown"`
- `github_profile` → fill with `"Not Provided"`
- `email` → drop rows where email is missing (we can't contact them!)

After cleaning, confirm there are no more missing values.


In [80]:
# Fill missing project_score with median
df['project_score'].fillna(df['project_score'].median())

0      13.35
1      16.10
2      15.40
3      11.10
4       9.00
       ...  
123    19.80
124    17.00
125    13.35
126    13.35
127    18.00
Name: project_score, Length: 128, dtype: float64

In [79]:
# Fill missing wilaya and university with 'Unknown'
df.fillna({'wilaya': 'Unknown', 'university': 'Unknown'})

Unnamed: 0,member_id,full_name,age,wilaya,university,level,track,role,join_date,workshops_attended,project_score,github_profile,status,email
0,OMC-2024-465,Nadine Bensalem,25,Sétif,ESI,L1,Web Development,Core Team,2024-04-14,7,,github.com/nadine62,Active,nadine.bensalem@email.com
1,OMC-2024-622,Imane Saidi,26,BLIDA,UMBB,M2,AI & Data,Member,2024-07-04,4,16.1,github.com/imane85,Inactive,imane.saidi@email.com
2,OMC-2024-398,Sofiane Ziani,23,Béjaïa,Univ. Oran 1,M2,Cybersecurity,Member,2024-02-27,7,15.4,github.com/sofiane45,Active,sofiane.ziani@email.com
3,OMC-2024-766,Wafa Rahmani,23,Annaba,ENSIA,M2,Software Engineering,Core Team,2024-09-09,2,11.1,,Active,wafa.rahmani@email.com
4,OMC-2024-345,Nassim Rahmani,25,BATNA,ENSIA,M2,Web Development,Member,2024-07-19,0,9.0,github.com/nassim64,Active,nassim.rahmani@email.com
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
123,OMC-2024-458,Amina Hadj,24,TIZI OUZOU,UDBA,M2,Web Development,Member,2024-08-11,3,19.8,,Active,amina.hadj@email.com
124,OMC-2024-806,Nadine Mekki,23,Algiers,Univ. Oran 1,M1,AI & Data,Core Team,2024-07-12,4,17.0,github.com/nadine55,Active,nadine.mekki@email.com
125,OMC-2024-004,Ayoub Hadjadj,26,Blida,ENSIA,M1,Web Development,Core Team,2024-07-18,0,,,active,ayoub.hadjadj@email.com
126,OMC-2024-240,Wafa Khelif,19,Tizi Ouzou,Univ. Constantine,M1,Embedded Systems,Core Team,2024-03-16,0,,github.com/wafa85,Active,wafa.khelif@email.com


In [81]:
# Fill missing github_profile with 'Not Provided'
df.fillna({'github_profile': 'Not Provided'})

Unnamed: 0,member_id,full_name,age,wilaya,university,level,track,role,join_date,workshops_attended,project_score,github_profile,status,email
0,OMC-2024-465,Nadine Bensalem,25,Sétif,ESI,L1,Web Development,Core Team,2024-04-14,7,,github.com/nadine62,Active,nadine.bensalem@email.com
1,OMC-2024-622,Imane Saidi,26,BLIDA,UMBB,M2,AI & Data,Member,2024-07-04,4,16.1,github.com/imane85,Inactive,imane.saidi@email.com
2,OMC-2024-398,Sofiane Ziani,23,Béjaïa,Univ. Oran 1,M2,Cybersecurity,Member,2024-02-27,7,15.4,github.com/sofiane45,Active,sofiane.ziani@email.com
3,OMC-2024-766,Wafa Rahmani,23,Annaba,ENSIA,M2,Software Engineering,Core Team,2024-09-09,2,11.1,Not Provided,Active,wafa.rahmani@email.com
4,OMC-2024-345,Nassim Rahmani,25,BATNA,ENSIA,M2,Web Development,Member,2024-07-19,0,9.0,github.com/nassim64,Active,nassim.rahmani@email.com
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
123,OMC-2024-458,Amina Hadj,24,TIZI OUZOU,UDBA,M2,Web Development,Member,2024-08-11,3,19.8,Not Provided,Active,amina.hadj@email.com
124,OMC-2024-806,Nadine Mekki,23,Algiers,Univ. Oran 1,M1,AI & Data,Core Team,2024-07-12,4,17.0,github.com/nadine55,Active,nadine.mekki@email.com
125,OMC-2024-004,Ayoub Hadjadj,26,Blida,ENSIA,M1,Web Development,Core Team,2024-07-18,0,,Not Provided,active,ayoub.hadjadj@email.com
126,OMC-2024-240,Wafa Khelif,19,Tizi Ouzou,Univ. Constantine,M1,Embedded Systems,Core Team,2024-03-16,0,,github.com/wafa85,Active,wafa.khelif@email.com


In [83]:
# Drop rows where email is missing
df_clean = df.dropna(subset=['email'])

In [88]:
# Confirm no more missing values
df_clean.isnull().sum() 

member_id              0
full_name              0
age                    0
wilaya                 8
university             6
level                  0
track                  0
role                   0
join_date              0
workshops_attended     0
project_score         12
github_profile        15
status                 0
email                  0
dtype: int64

---

### Exo 6: Remove Duplicates (5 points)

1. Print how many **duplicate rows** exist in the dataset
2. Remove them, keeping the **first** occurrence
3. Print the shape of the DataFrame before and after to confirm


In [89]:
# Count duplicate rows
df.duplicated().sum() 

np.int64(8)

In [93]:
# Remove duplicates and confirm

print(f"DataFrame before cleaning → {df.shape}") # original dataframe 
df_clean = df.drop_duplicates()
print(f"DataFrame after cleaning → {df_clean.shape}") # dataframe after removing duplicates

DataFrame before cleaning → (128, 14)
DataFrame after cleaning → (120, 14)


---

### Exo 7: Standardize Text Columns (10 points)

The `wilaya` and `status` columns have **inconsistent casing** (some values are uppercase, some lowercase, some mixed).

1. Standardize `wilaya` to **title case** (e.g. `"ALGIERS"` → `"Algiers"`)
2. Standardize `status` to **title case** (e.g. `"active"` → `"Active"`)
3. Display the unique values of each column after cleaning to confirm


In [17]:
# Standardize wilaya to title case
df_original = df.copy() # save the original dataframe
df['wilaya'] = df['wilaya'].str.title() # modify the dataframe wilya column

In [18]:
# Standardize status to title case
df['status'] = df['status'].str.title() # modify the dataframe status column

In [19]:
# Confirm unique values
print("1- Values before standardization: ")
print(df_original[['wilaya', 'status']].head())

print("2- Values After standardization: ")

df['wilaya'] = df['wilaya'].str.title()
df['status'] = df['status'].str.title()

print(df[['wilaya', 'status']].head())

1- Values before standardization: 
   wilaya    status
0   Sétif    Active
1   BLIDA  Inactive
2  Béjaïa    Active
3  Annaba    Active
4   BATNA    Active
2- Values After standardization: 
   wilaya    status
0   Sétif    Active
1   Blida  Inactive
2  Béjaïa    Active
3  Annaba    Active
4   Batna    Active


---

## Part 3: Filtering & Selection

### Exo 8: Boolean Filtering (10 points)

Answer the following questions using boolean indexing:

1. How many members have a `project_score` **greater than 15**?
2. Show all **Core Team** or **Lead** members who are from **Algiers**
3. Show all **Active** members who attended **more than 4 workshops**


In [24]:
# 1. Members with project_score > 15
df[df['project_score'] > 15]

Unnamed: 0,member_id,full_name,age,wilaya,university,level,track,role,join_date,workshops_attended,project_score,github_profile,status,email
1,OMC-2024-622,Imane Saidi,26.0,Blida,UMBB,M2,AI & Data,Member,2024-07-04,4,16.1,github.com/imane85,Inactive,imane.saidi@email.com
2,OMC-2024-398,Sofiane Ziani,23.0,Béjaïa,Univ. Oran 1,M2,Cybersecurity,Member,2024-02-27,7,15.4,github.com/sofiane45,Active,sofiane.ziani@email.com
6,OMC-2024-463,Nadine Boudiaf,25.0,Tlemcen,USTHB,M2,Software Engineering,Member,2024-01-02,7,18.2,github.com/nadine64,Active,nadine.boudiaf@email.com
7,OMC-2024-317,Wafa Ouali,22.0,Sétif,ENP,M2,Cybersecurity,Member,2024-07-22,6,16.1,github.com/wafa32,Active,wafa.ouali@email.com
8,OMC-2024-274,Meriem Boudaoud,18.0,Sétif,USTHB,M1,Cybersecurity,Member,2024-08-12,4,17.8,github.com/meriem39,Active,double@@email.com
9,OMC-2024-017,Ilyes Larbi,19.0,Algiers,ENSIA,M1,Web Development,Member,2024-02-16,7,17.3,github.com/ilyes75,Active,ilyes.larbi@email.com
12,OMC-2024-622,Imane Saidi,26.0,Blida,UMBB,M2,AI & Data,Member,2024-07-04,4,16.1,github.com/imane85,Inactive,imane.saidi@email.com
13,OMC-2024-624,Meriem Mekki,18.0,Annaba,ENSIA,M1,Embedded Systems,Member,2024-09-08,5,18.1,github.com/meriem17,Active,meriem.mekki@email.com
20,OMC-2024-181,Islem Boukhalfa,26.0,Algiers,UMBB,L1,Embedded Systems,Member,2024-08-22,5,19.4,github.com/islem66,Active,islem.boukhalfa@email.com
21,OMC-2024-398,Sofiane Ziani,23.0,Béjaïa,Univ. Oran 1,M2,Cybersecurity,Member,2024-02-27,7,15.4,github.com/sofiane45,Active,sofiane.ziani@email.com


In [29]:
# 2. Core Team or Lead members from Algiers
df[df['wilaya'].isin(['Algiers']) & df['role'].isin(['Core Team', 'Lead'])]

Unnamed: 0,member_id,full_name,age,wilaya,university,level,track,role,join_date,workshops_attended,project_score,github_profile,status,email
36,OMC-2024-482,Youcef Mekki,20.0,Algiers,UDBA,L2,AI & Data,Lead,2024-01-25,0,19.4,github.com/youcef35,Active,youcef.mekki@email.com
77,OMC-2024-084,Oussama Boudiaf,25.0,Algiers,UDBA,L1,Software Engineering,Lead,2024-03-05,0,11.4,github.com/oussama70,Active,oussama.boudiaf@email.com
98,OMC-2024-784,Mehdi Guerfi,19.0,Algiers,ENSIA,M1,Web Development,Core Team,2024-08-09,2,,,Inactive,mehdi.guerfi@email.com
124,OMC-2024-806,Nadine Mekki,23.0,Algiers,Univ. Oran 1,M1,AI & Data,Core Team,2024-07-12,4,17.0,github.com/nadine55,Active,nadine.mekki@email.com


In [30]:
# 3. Active members who attended more than 4 workshops
df[df['workshops_attended'] > 4]

Unnamed: 0,member_id,full_name,age,wilaya,university,level,track,role,join_date,workshops_attended,project_score,github_profile,status,email
0,OMC-2024-465,Nadine Bensalem,25.0,Sétif,ESI,L1,Web Development,Core Team,2024-04-14,7,,github.com/nadine62,Active,nadine.bensalem@email.com
2,OMC-2024-398,Sofiane Ziani,23.0,Béjaïa,Univ. Oran 1,M2,Cybersecurity,Member,2024-02-27,7,15.4,github.com/sofiane45,Active,sofiane.ziani@email.com
6,OMC-2024-463,Nadine Boudiaf,25.0,Tlemcen,USTHB,M2,Software Engineering,Member,2024-01-02,7,18.2,github.com/nadine64,Active,nadine.boudiaf@email.com
7,OMC-2024-317,Wafa Ouali,22.0,Sétif,ENP,M2,Cybersecurity,Member,2024-07-22,6,16.1,github.com/wafa32,Active,wafa.ouali@email.com
9,OMC-2024-017,Ilyes Larbi,19.0,Algiers,ENSIA,M1,Web Development,Member,2024-02-16,7,17.3,github.com/ilyes75,Active,ilyes.larbi@email.com
13,OMC-2024-624,Meriem Mekki,18.0,Annaba,ENSIA,M1,Embedded Systems,Member,2024-09-08,5,18.1,github.com/meriem17,Active,meriem.mekki@email.com
14,OMC-2024-902,Feriel Larbi,25.0,Béjaïa,USTHB,M2,Cybersecurity,Member,2024-04-24,6,,github.com/feriel40,Inactive,no_at_sign.com
16,OMC-2024-992,Cyrine Bensalem,26.0,Blida,,L1,Web Development,Core Team,2024-04-05,7,,github.com/cyrine23,Active,cyrine.bensalem@email.com
17,OMC-2024-356,Amina Saidi,20.0,Tizi Ouzou,ESI,M2,Software Engineering,Member,2024-07-27,7,9.3,github.com/amina56,Inactive,amina.saidi@email.com
18,OMC-2024-356,Amina Saidi,20.0,Tizi Ouzou,ESI,M2,Software Engineering,Member,2024-07-27,7,9.3,github.com/amina56,Inactive,amina.saidi@email.com


---

### Exo 9: Selection with `.isin()` and `.loc[]` (10 points)

1. Use `.isin()` to filter members whose `track` is either `"AI & Data"` or `"Cybersecurity"`. How many are there?
2. Use `.loc[]` to display only the `full_name`, `track`, and `project_score` columns for **M1 and M2** level members.


In [42]:
# 1. Filter by track using .isin()


In [43]:
# 2. M1 and M2 members — selected columns using .loc[]


---

### Exo 10: String Filtering (10 points)

Use `.str` methods to answer:

1. Find all members whose `full_name` **starts with the letter 'A'**
2. Find all members whose `email` **contains `benali`** (case-insensitive)
3. How many members have a `github_profile` that is **not** `"Not Provided"`?

**Hint:** For question 3, you can combine `.str` filtering or just use a comparison.


In [44]:
# 1. Names starting with 'A'


In [45]:
# 2. Emails containing 'benali'


In [46]:
# 3. Members with a real GitHub profile


---

## Part 4: Exploratory Analysis

### Exo 11: Value Counts & Distributions (10 points)

1. Which **3 wilayas** have the most members? Display the count for each.
2. What is the distribution of members across **tracks**? Show as percentages (normalized).
3. How many members are **Active** vs **Inactive**?


In [47]:
# 1. Top 3 wilayas by member count


In [48]:
# 2. Track distribution as percentages


In [49]:
# 3. Active vs Inactive count


---

### Exo 12: Group-Level Statistics (10 points)

Use `.groupby()` to answer the following:

1. What is the **average `project_score`** per `track`? Sort from highest to lowest.
2. What is the **average number of workshops attended** per `level` (L1, L2, L3, M1, M2)?
3. Which `role` has the **highest median `project_score`**?


In [50]:
# 1. Average project_score per track


In [51]:
# 2. Average workshops attended per level


In [52]:
# 3. Role with highest median project_score


---

## Part 5: Bonus — Optional

### Bonus 1: Email Validator (3 points)

Remember the email validator you wrote in **Checkpoint 1.1**? Time to bring it back — but this time inside a Pandas DataFrame.

The `email` column contains some invalid entries that slipped through. Use the **same regex rules** from Checkpoint 1.1:

- Contains exactly one `@`
- Has characters before and after `@`
- Only contains letters, numbers, dots, and underscores before `@`
- Ends with `.com`, `.dz`, or `.edu`

Your tasks:
1. Write the function `is_valid_email(email)` using `re`
2. Apply it to the `email` column to create a new boolean column called `email_valid`
3. Print how many emails are **valid** and how many are **invalid**
4. Display the rows where the email is **invalid** — show only `full_name` and `email`

**Hint:** Use `.apply()` to apply your function across the column.


In [53]:
import re

def is_valid_email(email):
    # TODO: paste and adapt your regex from Checkpoint 1.1
    raise NotImplementedError


In [54]:
# Apply the function to create the email_valid column


In [55]:
# Count valid vs invalid


In [56]:
# Display invalid rows (full_name and email only)


---

### Bonus 2: Detect & Handle Outliers (4 points)

The `age` column has some clearly invalid values (e.g. someone is 150 years old).

1. Display the rows where `age` is **outside the range [17, 30]**
2. Replace those invalid ages with `NaN` using `.loc[]`
3. Fill the NaN ages with the **median** age
4. Confirm the min and max age are now within valid range


In [57]:
# 1. Display outlier rows


In [58]:
# 2. Replace invalid ages with NaN


In [59]:
# 3. Fill NaN ages with median


In [60]:
# 4. Confirm valid range


---

### Bonus 3: Top Performers (3 points)

Create a new DataFrame called `top_performers` that contains only members who satisfy **all** of the following:

- `status` is `"Active"`
- `project_score` is in the **top 10%** of all scores
- `workshops_attended` is **greater than or equal to 5**

Display their `full_name`, `track`, `project_score`, and `workshops_attended`, sorted by `project_score` descending.

**Hint:** Use `df['project_score'].quantile(0.9)` to get the 90th percentile.


In [61]:
# Build top_performers DataFrame


---

## Acknowledgments

- Notebook authored by: Open Minds Club - AI Leadership
- Dataset: Synthetic OMC member data generated by Claude for educational purposes
- Workshop content inspired by Pandas official documentation: https://pandas.pydata.org/docs/