**Q1: Write a Python function that accepts a list of numbers and returns a new list containing only the even numbers.**

A function is a reusable block of code that performs a specific task.

> The function should iterate through the given list and check if each number is even.

> Even numbers are divisible by 2 (i.e., num % 2 == 0).

In [1]:
def filter_even_numbers(numbers):
    """Returns a list of even numbers from the input list."""
    return [num for num in numbers if num % 2 == 0]

# Example usage
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
even_numbers = filter_even_numbers(numbers)
print(even_numbers)


[2, 4, 6, 8, 10]


**Q2: Given a DataFrame named df with columns id, age, and score, write a Pandas script to filter the DataFrame to only include rows where age is greater than 25 and score is above 80.**

> Filtering in Pandas is done using Boolean conditions.

> We use logical operators (& for AND, | for OR) to filter rows.

In [2]:
import pandas as pd

# Sample DataFrame
data = {'id': [1, 2, 3, 4, 5],
        'age': [22, 30, 40, 24, 35],
        'score': [85, 75, 90, 95, 78]}

df = pd.DataFrame(data)

# Filtering condition: age > 25 and score > 80
filtered_df = df[(df['age'] > 25) & (df['score'] > 80)]

print(filtered_df)


   id  age  score
2   3   40     90


**Q3: Explain how you would handle missing values in a DataFrame.**

> Missing values occur when some data points are unavailable.

> In Pandas, missing values are represented as NaN (Not a Number).

> Common methods to handle missing values:

1. Removing Missing Values: df.dropna() removes rows with NaN.

2. Filling Missing Values:

> df.fillna(value) replaces NaN with a specific value.

> df.fillna(df.mean()) replaces NaN with the column's mean (useful for numerical data).

> Forward Fill (ffill) and Backward Fill (bfill):

> df.fillna(method='ffill') propagates the last valid value forward.

> df.fillna(method='bfill') propagates the next valid value backward.

In [3]:
import pandas as pd
import numpy as np

# Sample DataFrame with missing values
data = {'id': [1, 2, 3, 4, 5],
        'age': [25, np.nan, 30, np.nan, 40],
        'score': [85, 90, np.nan, 75, 95]}

df = pd.DataFrame(data)

# Handling missing values
df_filled = df.fillna(df.mean())  # Replace NaN with column mean
print(df_filled)


   id        age  score
0   1  25.000000  85.00
1   2  31.666667  90.00
2   3  30.000000  86.25
3   4  31.666667  75.00
4   5  40.000000  95.00


**Q4: Write a Pandas command to group a DataFrame df containing sales data with columns date, region, and sales, to show total sales by region.**

> Grouping in Pandas is done using groupby().

> We can sum up sales for each region using .sum().

In [4]:
import pandas as pd

# Sample sales data
data = {'date': ['2025-03-01', '2025-03-02', '2025-03-03', '2025-03-04'],
        'region': ['North', 'South', 'North', 'South'],
        'sales': [200, 150, 300, 100]}

df = pd.DataFrame(data)

# Grouping by region and summing up sales
sales_by_region = df.groupby('region')['sales'].sum().reset_index()

print(sales_by_region)


  region  sales
0  North    500
1  South    250


**Q5: Using a DataFrame df with columns customer_id, purchase_amount, and date_of_purchase, write a Pandas script to add a new column showing the cumulative purchase amount for each customer.**

> The cumulative sum of a column is calculated using .cumsum().

> To calculate it per customer, we use groupby().

In [5]:
import pandas as pd

# Sample DataFrame
data = {'customer_id': [101, 101, 102, 102, 101, 103],
        'purchase_amount': [100, 200, 150, 300, 50, 400],
        'date_of_purchase': ['2025-01-01', '2025-01-02', '2025-01-03',
                             '2025-01-04', '2025-01-05', '2025-01-06']}

df = pd.DataFrame(data)

# Cumulative sum per customer
df['cumulative_purchase'] = df.groupby('customer_id')['purchase_amount'].cumsum()

print(df)


   customer_id  purchase_amount date_of_purchase  cumulative_purchase
0          101              100       2025-01-01                  100
1          101              200       2025-01-02                  300
2          102              150       2025-01-03                  150
3          102              300       2025-01-04                  450
4          101               50       2025-01-05                  350
5          103              400       2025-01-06                  400


**Q6: Describe a scenario where merging two DataFrames in Pandas would be useful. Provide an example command that merges two DataFrames on a common column called customer_id.**

> Merging is used to combine data from multiple sources.

> Example: If one DataFrame has customer details and another has purchase records, merging helps in getting complete customer transaction history.

> Pandas provides merge() function for this.

In [6]:
import pandas as pd

# Customer details DataFrame
customers = pd.DataFrame({'customer_id': [101, 102, 103],
                          'name': ['Alice', 'Bob', 'Charlie'],
                          'age': [25, 30, 35]})

# Purchases DataFrame
purchases = pd.DataFrame({'customer_id': [101, 102, 101, 103],
                          'purchase_amount': [200, 150, 50, 400]})

# Merging DataFrames on 'customer_id'
merged_df = pd.merge(customers, purchases, on='customer_id', how='inner')

print(merged_df)


   customer_id     name  age  purchase_amount
0          101    Alice   25              200
1          101    Alice   25               50
2          102      Bob   30              150
3          103  Charlie   35              400
