Problem 1: Setup & First DataFrame

**Tasks:**
1. Import pandas with alias pd
2. Import numpy with alias np  
3. Create DataFrame with student data:
   - name: ['Alice', 'Bob', 'Charlie', 'Diana']
   - age: [20, 21, 19, 22]
   - grade: ['A', 'B', 'A', 'C']
   - subject: ['Math', 'Physics', 'Math', 'Chemistry']
4. Display the DataFrame
5. Show the shape

In [2]:
import pandas as pd
import numpy as np

# Load the dataset
dict_data = {
    'name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Emma', 'Frank', 'Grace', 'Henry', 'Ivy', 'Jack', 'Kate', 'Liam', 'Maya', 'Noah'],
    'age': [20, 21, 19, 22, 23, 20, 21, 19, 22, 20, 18, 21, 19, 23],
    'grade': ['A', 'B', 'A', 'C', 'B', 'A', 'C', 'B', 'A', 'C', 'B', 'A', 'C', 'B'],
    'subject': ['Math', 'Physics', 'Math', 'Chemistry', 'Biology', 'Physics', 'Math', 'Chemistry', 'Biology', 'Physics', 'Math', 'Chemistry', 'Biology', 'Physics']
}
dict_data
df = pd.DataFrame(dict_data)
df.sample(n=5) # Display the first few rows of the DataFrame

df.shape # Display the shape of the DataFrame
df.info(verbose=True) # Display detailed information about the DataFrame
df.describe() # Display summary statistics of the DataFrame

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14 entries, 0 to 13
Data columns (total 4 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   name     14 non-null     object
 1   age      14 non-null     int64 
 2   grade    14 non-null     object
 3   subject  14 non-null     object
dtypes: int64(1), object(3)
memory usage: 580.0+ bytes


Unnamed: 0,age
count,14.0
mean,20.571429
std,1.554858
min,18.0
25%,19.25
50%,20.5
75%,21.75
max,23.0


Problem 2: Data Filtering and Selection

**Tasks:**
1. Filter students who are 20 years old or younger
2. Select only students with grade 'A'
3. Get all Math students
4. Find students who are either in Physics OR have grade 'B'
5. Select students with age between 19 and 21 (inclusive)
6. Get the names of students with grade 'A' in Math
7. Count how many students are in each subject
8. Display only the 'name' and 'grade' columns for students older than 20

In [20]:

# Filter students with age greater than 20
filtered_students = df[df['age'] > 20]
display(filtered_students)
# display only the name and age of students with over 20 marks
filtered_with_name = df[df['age'] > 20][['name', 'age']]
display(filtered_with_name)
# math students
math_students = df[df['subject'] == 'Math']
display(math_students)




Unnamed: 0,name,age,grade,subject
1,Bob,21,B,Physics
3,Diana,22,C,Chemistry
4,Emma,23,B,Biology
6,Grace,21,C,Math
8,Ivy,22,A,Biology
11,Liam,21,A,Chemistry
13,Noah,23,B,Physics


Unnamed: 0,name,age
1,Bob,21
3,Diana,22
4,Emma,23
6,Grace,21
8,Ivy,22
11,Liam,21
13,Noah,23


Unnamed: 0,name,age,grade,subject
0,Alice,20,A,Math
2,Charlie,19,A,Math
6,Grace,21,C,Math
10,Kate,18,B,Math


In [29]:
# students in physics or grade b

physics_or_grade_b = df[(df['subject'] == 'Physics') | (df['grade'] == 'B')]
physics_or_grade_b


Unnamed: 0,name,age,grade,subject
1,Bob,21,B,Physics
4,Emma,23,B,Biology
5,Frank,20,A,Physics
7,Henry,19,B,Chemistry
9,Jack,20,C,Physics
10,Kate,18,B,Math
13,Noah,23,B,Physics


In [36]:
#select students with age greater than 20 and grade B
filtered_age_grade = df[(df['age'] > 20) & (df['grade'] == 'B')]
display(filtered_age_grade)

#students with age between 19 and 21 inclusive
students_19_21 = df[df['age'].between(19, 21)]
display(students_19_21)

# names of students with grade A
with_grade_a = df[df['grade'] == 'A']['name']
display(with_grade_a)

Unnamed: 0,name,age,grade,subject
1,Bob,21,B,Physics
4,Emma,23,B,Biology
13,Noah,23,B,Physics


Unnamed: 0,name,age,grade,subject
0,Alice,20,A,Math
1,Bob,21,B,Physics
2,Charlie,19,A,Math
5,Frank,20,A,Physics
6,Grace,21,C,Math
7,Henry,19,B,Chemistry
9,Jack,20,C,Physics
11,Liam,21,A,Chemistry
12,Maya,19,C,Biology


0       Alice
2     Charlie
5       Frank
8         Ivy
11       Liam
Name: name, dtype: object

In [47]:
# name and grade for students > 20 and grade A
older_with_b = df[(df['age'] > 20) & (df['grade'] == 'B')][['name','age']]
display(older_with_b)

Unnamed: 0,name,age
1,Bob,21
4,Emma,23
13,Noah,23


In [49]:
# count number of students in each subject
subject_counts = df['subject'].value_counts()
display(subject_counts)

subject
Math         4
Physics      4
Chemistry    3
Biology      3
Name: count, dtype: int64

Problem 3: Data Manipulation and New Columns

**Tasks:**
1. Add a new column 'age_group' with values:
   - 'Young' for age <= 20
   - 'Adult' for age > 20
2. Create a 'grade_points' column based on grades:
   - 'A' = 4.0, 'B' = 3.0, 'C' = 2.0
3. Add a 'full_info' column combining name and subject (e.g., "Alice - Math")
4. Create a boolean column 'is_stem' for Math, Physics, Chemistry subjects
5. Calculate the age in months (assume current age is in years)
6. Add a 'performance' column:
   - 'Excellent' for grade 'A'
   - 'Good' for grade 'B'  
   - 'Average' for grade 'C'
7. Create a 'senior' column (True if age >= 21, False otherwise)
8. Display the updated DataFrame with all new columns

In [None]:
# Problem 3: Data Manipulation and New Columns

# Make a copy of the DataFrame to avoid modifying the original
df_new = df.copy()

# 1. Add 'age_group' column
df_new['age_group'] = np.where(df_new['age'] > 20, 'Adult', 'Young')

# 2. Create 'grade_points' column
df_new['grade_points'] = np.nan
df_new.loc[df_new['grade'] == 'A', 'grade_points'] = 4.0
df_new.loc[df_new['grade'] == 'B', 'grade_points'] = 3.0
df_new.loc[df_new['grade'] == 'C', 'grade_points'] = 2.0  

def grade_points(grade):
    if grade == 'A':
        return 4.0
    elif grade == 'B':
        return 3.0
    elif grade == 'C':
        return 2.0
    else:
        return 0.0
    
# Apply the function to create 'grade_points_apply' column    
df_new['grade_points_apply'] = df_new['grade'].apply(grade_points)

# Alternatively, using map to create 'grade_points_map' column
df_new['grade_points_map'] = df_new['grade'].map({'A': 4.0, 'B': 3.0, 'C': 2.0})

# 3. Add 'full_info' column combining name and subject
df_new['full_info'] = df_new['name'] + ' - ' + df_new['subject']

# 4. Create boolean 'is_stem' column
df_new['is_stem'] = df_new['subject'].isin(['Math', 'Physics', 'Chemistry', 'Biology'])

df_new['is_stem'] = df_new['subject'].apply(lambda x: x in ['Math', 'Physics', 'Chemistry', 'Biology'])

df_new['is_stem'] = df_new['subject'].str.contains('Math|Physics|Chemistry|Biology')

df_new['is_stem'] = df_new['subject'].map({"Math": True, "Physics": True, "Chemistry": True, "Biology": True, "Other": False})

df_new.drop(columns=['grade_points_apply','grade_points_map'], inplace=True)  # Remove the column if needed

# 5. Calculate age in months
df_new['age_in_months'] = df_new['age'] * 12

# 6. Add 'performance' column
df_new['Performance'] = np.where(
    df_new['grade'] == 'A', 'Excellent',
    np.where(df_new['grade'] == 'B', 'Good', 'Average')
)

df_new['Performance'] = df_new['grade'].map({'A':'Excellent','B':'Good','C':'Average'})

# 7. Create 'senior' column
df_new['senior'] = df_new['age'] > 21

# 8. Display the updated DataFrame
display(df_new.head())

Unnamed: 0,name,age,grade,subject,age_group,grade_points,full_info,is_stem,age_in_months,Performance
0,Alice,20,A,Math,Young,4.0,Alice - Math,True,240,Excellent
1,Bob,21,B,Physics,Adult,3.0,Bob - Physics,True,252,Good
2,Charlie,19,A,Math,Young,4.0,Charlie - Math,True,228,Excellent
3,Diana,22,C,Chemistry,Adult,2.0,Diana - Chemistry,True,264,Average
4,Emma,23,B,Biology,Adult,3.0,Emma - Biology,True,276,Good
