Scenario 1: Unexpected Results in Data Analysis
You are analyzing customer transaction data for a bank and want to calculate the average transaction amount for each customer. Below is the Python code, but it’s producing incorrect results. Identify and fix the issue.
import pandas as pd



In [None]:
# Sample data
data = {
    'CustomerID': [1, 2, 1, 3, 2, 3],
    'TransactionAmount': [100, 200, 150, 300, 250, 400]
}

df = pd.DataFrame(data)

# Calculate average transaction amount for each customer
average_transaction = df.groupby('CustomerID')['TransactionAmount'].sum()
print(average_transaction)

Problem framing: the user wants to calculate the average transaction account by customer, for this reason they group on customer ID and then apply a function to each group, however, they used the sum function rather then the mean function. For that reason our output is the total transaction amount for each customer rather then the average. The user should replace .sum() with .mean(). I figured this out by examining the output then examning the grouping logic. 

In [None]:
#Corrected code
# Sample data
data = {
    'CustomerID': [1, 2, 1, 3, 2, 3],
    'TransactionAmount': [100, 200, 150, 300, 250, 400]
}

df = pd.DataFrame(data)

# Calculate average transaction amount for each customer
average_transaction = df.groupby('CustomerID')['TransactionAmount'].mean()
print(average_transaction)

Scenario 2: Handling Missing Values
You're working with a dataset containing customer demographic information. You notice that some entries in the "Age" column are missing (NaN). You want to fill these missing values with the average age for all customers. Here's the code you've written, but it doesn't seem to work correctly.

In [7]:
import pandas as pd
import numpy as np

# Sample data
data = {
    'CustomerID': [1, 2, 3, 4, 5],
    'Age': [25, np.nan, 30, np.nan, 40]
}

df = pd.DataFrame(data)

# Fill missing values with the average age
df['Age'] = df['Age'].fillna(df.mean())
print(df)


   CustomerID        Age
0           1  25.000000
1           2  31.666667
2           3  30.000000
3           4  31.666667
4           5  40.000000


Problem Framing: I am working with customer demographic information and want to impute missing values in the Age column with the average age for all customers. Currently when running the output foor the mising columns has not been imputed. Looking at the code I first examine th eimputation line and notice the fillna function is trying to impute based off the mean of the enitre dataframe rather then on the age column. We need to fix it with fillna(df['Age'].mean()

In [None]:
# Corrected code
import pandas as pd
import numpy as np

# Sample data
data = {
    'CustomerID': [1, 2, 3, 4, 5],
    'Age': [25, np.nan, 30, np.nan, 40]
}

df = pd.DataFrame(data)

# Fill missing values with the average age
df['Age'] = df['Age'].fillna(df['Age'].mean())
print(df)



Scenario 3: Feature Creation with a Logical Error
You are working on a dataset with customer transaction data. You want to create a new column, HighSpender, which will indicate whether a customer’s total transaction amount exceeds a threshold of $500. If the total is greater than $500, the value should be 1 (high spender); otherwise, it should be 0.

Here’s the code you wrote, but the output is not as expected:

In [12]:
import pandas as pd

# Sample data
data = {
    'CustomerID': [1, 2, 3, 4],
    'TransactionAmount': [450, 700, 200, 550]
}

df = pd.DataFrame(data)

# Create HighSpender column
df['HighSpender'] = df['TransactionAmount'] > 500
print(df)


   CustomerID  TransactionAmount  HighSpender
0           1                450        False
1           2                700         True
2           3                200        False
3           4                550         True


Printing out my results I see that rather then getting a binary results, I get true false results in the column. This is expected because the column was created using a logic operation. A if then operation could be done instead. If column value for TransactionAmount > 500 then 1, else 0. A simpler method however would be to convert the boolean values to integer values

In [13]:
import pandas as pd

# Sample data
data = {
    'CustomerID': [1, 2, 3, 4],
    'TransactionAmount': [450, 700, 200, 550]
}

df = pd.DataFrame(data)

# Create HighSpender column
df['HighSpender'] = (df['TransactionAmount'] > 500).astype(int)
print(df)


   CustomerID  TransactionAmount  HighSpender
0           1                450            0
1           2                700            1
2           3                200            0
3           4                550            1


In [14]:
import pandas as pd

# Sample data
data = {
    'CustomerID': [1, 2, 3, 4],
    'TransactionAmount': [450, 700, 200, 550]
}

df = pd.DataFrame(data)

# Create HighSpender column using lambda to work across all values in TransactionAmount
df['HighSpender'] = df['TransactionAmount'].apply(lambda x:1 if x > 500 else 0)
print(df)


   CustomerID  TransactionAmount  HighSpender
0           1                450            0
1           2                700            1
2           3                200            0
3           4                550            1


Scenario 4: Categorize Age Groups
You are working with a dataset containing customer ages. You want to create a new column called AgeGroup that categorizes customers based on their age:

Child: Age < 18
Adult: 18 ≤ Age < 60
Senior: Age ≥ 60
Write code to create the AgeGroup column using a lambda function and the .apply() method.



In [19]:
import pandas as pd

# Sample data
data = {
    'CustomerID': [1, 2, 3, 4, 5],
    'Age': [12, 25, 8, 67, 45]
}

df = pd.DataFrame(data)

#use nested if else logic, checks child logic first, then if condition not met checks parenthesis logic
df['AgeGroup'] = df['Age'].apply(lambda x: 'Child' if x < 18 else ('Adult' if x < 60 else 'Senior'))


print(df)

   CustomerID  Age AgeGroup
0           1   12    Child
1           2   25    Adult
2           3    8    Child
3           4   67   Senior
4           5   45    Adult


Scenario 5: Detect Outliers
You are working with a dataset that contains numerical data about customer spending. You want to create a new column called IsOutlier to flag whether a transaction amount is an outlier. Define an outlier as any value that is more than 1.5 times the interquartile range (IQR) above the third quartile (Q3) or below the first quartile (Q1).

The DataFrame should have an additional column IsOutlier with 1 for outliers and 0 for non-outliers

In [29]:
import pandas as pd
import numpy as np
# Sample data
data = {
    'CustomerID': [1, 2, 3, 4, 5, 6],
    'TransactionAmount': [200, 300, 150, 700, 250, 900]
}

df = pd.DataFrame(data)

Q1 = np.percentile(df['TransactionAmount'] , 25)

Q3 =  np.percentile(df['TransactionAmount'] , 75)

IQR = Q3 - Q1

lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
df['IsOutlier'] = df['TransactionAmount'].apply( lambda x: 1 if x < lower_bound or x > upper_bound else 0)

df


Unnamed: 0,CustomerID,TransactionAmount,IsOutlier
0,1,200,0
1,2,300,0
2,3,150,0
3,4,700,0
4,5,250,0
5,6,900,0


Scenario 6: For Loop with Lists
You want to calculate the square of each number in a list and store the results in a new list. However, the following code doesn’t work as expected:

In [33]:
numbers = [1, 2, 3, 4, 5]
squared_numbers = []

for i in range(len(numbers)):
    squared_numbers.append(numbers[i] * i)

print(squared_numbers)


[0, 2, 6, 12, 20]


In [34]:
numbers = [1, 2, 3, 4, 5]
squared_numbers = []

for i in range(len(numbers)):
    squared_numbers.append(numbers[i] ** 2)

print(squared_numbers)


[1, 4, 9, 16, 25]


Looking at the output I noticed that the list had the correct number of entries allowing me to know the actually loop was not the issue but rather the calculation being used. 

Scenario 7: While Loop
You are trying to create a simple countdown using a while loop. The code is meant to count down from 5 to 1 and then print "Liftoff!" However, it gets stuck in an infinite loop.

In [4]:
count = 5
iteration_limit = 10
iterations = 0 

while count >= 0:
    if count > 0:
        print(count)
        count = count - 1
        iterations += 1
    else:
        print(count)
        print("Liftoff!")
        iterations += 1
    if iterations > iteration_limit:
        raise Exception("Potential infinite loop detected!")

    
print("Liftoff!")


5
4
3
2
1
0
Liftoff!
0
Liftoff!
0
Liftoff!
0
Liftoff!
0
Liftoff!
0
Liftoff!


Exception: Potential infinite loop detected!

I just used the above code as a practice with rasing exceptions

In [5]:
# simple way to correct original code
count = 5

while count >= 0:
        print(count)
        count = count - 1
print("Liftoff!")
        



5
4
3
2
1
0
Liftoff!


Scenario 8:  Loop with a Dictionary
You want to iterate through a dictionary of customer names and their account balances. For customers with a balance below 100, you want to add a key-value pair {'NeedsAttention': True}. However, the code throws an error.

In [6]:
customers = {
    "Alice": {"Balance": 50},
    "Bob": {"Balance": 200},
    "Charlie": {"Balance": 80}
}

for name, info in customers.items():
    if info['Balance'] < 100:
        info['NeedsAttention'] = True

print(customers)


{'Alice': {'Balance': 50, 'NeedsAttention': True}, 'Bob': {'Balance': 200}, 'Charlie': {'Balance': 80, 'NeedsAttention': True}}


No error is thrown, however, bob has no marking that he does not need attention. Also, it is not ideal to overwrite a dictionary. Maybe we make a new one 

In [9]:
#key = name ,info = value 

# create a new dictionary
updated_customers = {
    name: {**info, "NeedsAttention": info['Balance'] < 100} # key: {value which is a new dictionary}
    for name, info in customers.items() # iterates through key value pairs in orginal customers dictionary
}

print(updated_customers)

{'Alice': {'Balance': 50, 'NeedsAttention': True}, 'Bob': {'Balance': 200, 'NeedsAttention': False}, 'Charlie': {'Balance': 80, 'NeedsAttention': True}}


Scenario 9: Nested Loops with Strings
You want to count the number of vowels in a list of words using nested loops, but the code produces incorrect results.


In [10]:
words = ["hello", "world", "python"]
vowels = "aeiou"
vowel_count = 0

for word in words:
    for char in word:
        if vowels in char:
            vowel_count += 1

print(vowel_count)


0


Prints 1, expecting more then that as there are multiple vowels in Hello, world, and python. I believe the issue is that it is tryng tp match aeiou as a whole word rather then as seperate characters

In [15]:
words = ["hello", "world", "python"]
vowels = ["a", "e", "i", "o", "u"]
vowel_count = 0

for word in words:
    for char in word:
        if char in vowels:
            vowel_count += 1

print(vowel_count)

4


Scenario 10:  Customer Rewards Program
You are working on a program to calculate rewards points for a bank's customers based on their monthly spending. The program consists of three main functions:

calculate_points(): Calculates rewards points for a single transaction.
get_monthly_points(): Calculates the total points for a customer in a given month.
generate_rewards_summary(): Generates a rewards summary for all customers.
However, the program is not working as expected and has several bugs.

{
    "Alice": {"total_spent": 550, "total_points": 400.0},
    "Bob": {"total_spent": 250, "total_points": 30.0},
    "Charlie": {"total_spent": 750, "total_points": 775.0}
}


In [16]:
# Function to calculate points for a single transaction
def calculate_points(transaction_amount):
    if transaction_amount <= 100:
        return 0
    elif transaction_amount > 100:
        return (transaction_amount - 100) * 1.5

# Function to calculate total points for a customer in a month
def get_monthly_points(transactions):
    total_points = 0
    for t in transactions: # loops through items in list 
        total_points += calculate_points(t) # for each transaction, calls to points calc function
    return total_points

# Function to generate a rewards summary for all customers
def generate_rewards_summary(customers):
    summary = {}
    for customer, data in customers.items(): # loops through each key value pair
        transactions = data['transactions'] # grabs value of inner dictionary, returning a list of transaction numbers
        monthly_points = get_monthly_points(transactions) # sends list of transaction to points functions
        summary[customer] = {
            'total_spent': sum(transactions),
            'total_points': monthly_points
        }
    return summary

# Customer data
customers = {
    "Alice": {"transactions": [50, 200, 300]},
    "Bob": {"transactions": [120, 80, 50]},
    "Charlie": {"transactions": [400, 150, 200]},
}

# Generate rewards summary
summary = generate_rewards_summary(customers)
print(summary)


{'Alice': {'total_spent': 550, 'total_points': 450.0}, 'Bob': {'total_spent': 250, 'total_points': 30.0}, 'Charlie': {'total_spent': 750, 'total_points': 675.0}}


Code already works as expected. Asking for chat for a new problem with actual errors

Scenario 11: 

Problem: Analyzing Student Grades
You are working on a program that analyzes student grades and generates a summary. The program contains the following functions:

calculate_average(grades): Calculates the average of a list of grades.
get_letter_grade(average): Converts a numeric average to a letter grade based on standard grading.
generate_summary(students): Generates a summary of all students, including their average grade and letter grade.
However, the program throws errors and/or produces incorrect results. Your job is to debug and fix it.

In [17]:
# Function to calculate the average grade
def calculate_average(grades):
    return sum(grades) / len(grades)

# Function to get letter grade
def get_letter_grade(average):
    if average >= 90:
        return "A"
    elif average >= 80:
        return "B"
    elif average >= 70:
        return "C"
    elif average >= 60:
        return "D"
    elif average < 60:
        return "F"

# Function to generate summary for all students
def generate_summary(students):
    summary = {}
    for student, grades in students.items():
        avg = calculate_average(grades)
        letter_grade = get_letter_grade(avg)
        summary[student] = {"average": avg, "grade": letter_grade}
    return summary

# Student data
students = {
    "Alice": [95, 85, 92],
    "Bob": [70, 65],  # Edge case: Missing grades
    "Charlie": [],    # Edge case: Empty grades
    "Diana": [88, 90, 85, 100]
}

# Generate the summary
summary = generate_summary(students)
print(summary)


ZeroDivisionError: division by zero

In [27]:
# Function to calculate the average grade
def calculate_average(grades, student):
    if not grades:
        print(f"No grades available for {student}")
        return None
    try:
        average =  sum(grades) / len(grades)
        return average
    except TypeError:
        print("Invalid grade data encountered.")
        return None
    
# Function to get letter grade
def get_letter_grade(average):
    if average is None:  # Handle None gracefully
        return "N/A"  # No grade available
    try:
        if average >= 90:
            return "A"
        elif average >= 80:
            return "B"
        elif average >= 70:
            return "C"
        elif average >= 60:
            return "D"
        else:
            return "F"
    except TypeError:
        print("Invalid average value encountered.")
        return "Error"
    
# Function to generate summary for all students
def generate_summary(students):
    summary = {}
    for student, grades in students.items():
        avg = calculate_average(grades, student)
        letter_grade = get_letter_grade(avg)
        summary[student] = {"average": avg, "grade": letter_grade}
    return summary

# Student data
students = {
    "Alice": [95, 85, 92],
    "Bob": [70, 65],  # Edge case: Missing grades
    "Charlie": [],    # Edge case: Empty grades
    "Diana": [88, 90, 85, 100],
    "Tom": [0]
}

# Generate the summary
summary = generate_summary(students)
print(summary)


No grades available for Charlie
{'Alice': {'average': 90.66666666666667, 'grade': 'A'}, 'Bob': {'average': 67.5, 'grade': 'D'}, 'Charlie': {'average': None, 'grade': 'N/A'}, 'Diana': {'average': 90.75, 'grade': 'A'}, 'Tom': {'average': 0.0, 'grade': 'F'}}
