# Task 02 - Track B: Advanced DataFrame Operations

**Course:** Database Applications Development  
**Lesson:** 02 - Working with DataFrames (Preparing for SQL)  

---

## Instructions

Complete all exercises in this notebook. Track B includes significantly more practice with multiple conditions, chaining operations, and deeper analysis.

**Submission:**
1. Save this notebook as `dbAppsTask02TrackB.ipynb`
2. Add, commit, and push to your `databaseApplications` repository on GitHub
3. Verify the file appears correctly on GitHub

---

## Setup: Load the Dataset

In [None]:
# Import pandas and load data
import pandas as pd

titanic = pd.read_csv('Titanic Dataset.csv')

# Quick check
print(titanic.shape)
titanic.head()

---

## Part 1: Selecting Columns

### Exercise 1.1: Select Single Columns

Select and display the first 10 values:
1. The 'age' column
2. The 'fare' column

In [None]:
# Age column


In [None]:
# Fare column


### Exercise 1.2: Select Multiple Columns

Create a DataFrame called `passenger_info` with these columns: 'name', 'sex', 'age', 'survived'

Display the first 10 rows.

In [None]:
# Your code here
passenger_info = 



---

## Part 2: Filtering Rows

### Exercise 2.1: Filter by Survival

Create a DataFrame called `survivors` with only passengers who survived.

Print how many survivors there were.

In [None]:
# Your code here
survivors = 

print("Number of survivors:", )


### Exercise 2.2: Filter by Gender

Create a DataFrame called `females` with only female passengers.

Print how many there were.

In [None]:
# Your code here


### Exercise 2.3: Filter by Class

Create a DataFrame called `first_class` with only first class passengers.

Print how many there were.

In [None]:
# Your code here


### Exercise 2.4: Filter by Age

Create a DataFrame called `children` with passengers 12 or younger.

Print how many there were.

In [None]:
# Your code here


---

## Part 3: Sorting Data

### Exercise 3.1: Sort by Age

Sort passengers by age and display name and age for the 10 youngest.

In [None]:
# Your code here


### Exercise 3.2: Sort by Fare

Sort passengers by fare (highest first) and display name and fare for the top 10.

In [None]:
# Your code here


---

## Part 4: Statistics

### Exercise 4.1: Overall Statistics

Calculate and print:
1. Average age
2. Average fare
3. Maximum fare
4. Minimum age

In [None]:
# Your code here


### Exercise 4.2: Statistics on Filtered Data

Calculate:
1. Average age of survivors
2. Average fare of first class passengers

In [None]:
# Your code here


---

## Part 5: Value Counts

### Exercise 5.1: Count Values

Use `.value_counts()` to find:
1. How many in each class
2. How many male vs female
3. How many survived vs died

In [None]:
# Class counts


In [None]:
# Gender counts


In [None]:
# Survival counts


---

## Part 6: Multiple Condition Filtering (Track B)

Combine conditions using & (AND) and | (OR). Remember: each condition needs parentheses!

### Example: Combining Conditions

In [None]:
# AND (&) - Both conditions must be True
# Find male survivors
male_survivors = titanic[
    (titanic['sex'] == 'male') & 
    (titanic['survived'] == 1)
]
print(f"Male survivors: {len(male_survivors)}")

# OR (|) - At least one condition must be True
# Find first or second class passengers
upper_class = titanic[
    (titanic['pclass'] == 1) | 
    (titanic['pclass'] == 2)
]
print(f"Upper class passengers: {len(upper_class)}")

# CRITICAL: Each condition MUST have parentheses around it!

### Exercise 6.1: Female Survivors

Find all female passengers who survived.

How many were there?

In [None]:
# Your code here
# Hint: (titanic['sex'] == 'female') & (titanic['survived'] == 1)


### Exercise 6.2: Upper Class Passengers

Find passengers in either first OR second class.

How many were there?

In [None]:
# Your code here
# Hint: (titanic['pclass'] == 1) | (titanic['pclass'] == 2)


### Exercise 6.3: Male First Class Survivors

Find male passengers in first class who survived.

Display name, age, and fare for all of them.

In [None]:
# Your code here (three conditions with &)


### Exercise 6.4: Children Who Survived

Find passengers who were 12 or younger AND survived.

How many children survived?

In [None]:
# Your code here


### Exercise 6.5: Expensive Tickets in Third Class

Find third class passengers who paid more than $20 for their fare.

How many were there? (This seems unusual - third class was cheap!)

In [None]:
# Your code here


### Exercise 6.6: Complex Filter

Find passengers who meet ALL these conditions:
- Female
- Age between 20 and 40 (inclusive)
- Paid more than $30 fare

Display name, age, fare for the first 10.

In [None]:
# Your code here (four conditions!)


---

## Part 7: Advanced Sorting (Track B)

### Example: Sort by Multiple Columns

In [None]:
# Sort by class (ascending), then by fare within each class (descending)
sorted_df = titanic.sort_values(
    ['pclass', 'fare'],           # List of columns to sort by
    ascending=[True, False]        # True for ascending, False for descending
)

# Display name, class, and fare for first 20
print(sorted_df[['name', 'pclass', 'fare']].head(20))

# This groups by class first, then sorts by fare within each class

### Exercise 7.1: Sort by Multiple Columns

Sort passengers by class (ascending), then by fare within each class (descending).

Display name, pclass, fare for the first 20 rows.

In [None]:
# Your code here
# Hint: sort_values(['pclass', 'fare'], ascending=[True, False])


### Exercise 7.2: Who Paid the Least?

Find the 10 passengers who paid the lowest fares.

Display name, pclass, and fare.

In [None]:
# Your code here


### Exercise 7.3: Oldest in Each Class

Sort by class (ascending) and age (descending).

This groups each class together with oldest first. Display name, pclass, age for first 30.

In [None]:
# Your code here


---

## Part 8: Chaining Operations (Track B)

Combine filtering, selecting, and sorting in one statement.

### Example: Chaining Multiple Operations

In [None]:
# Chain: Filter → Select columns → Sort → Display
result = titanic[
    titanic['survived'] == 1                    # Step 1: Filter for survivors
][['name', 'age', 'pclass']                     # Step 2: Select specific columns
].sort_values('age'                             # Step 3: Sort by age
).head(10)                                       # Step 4: Get first 10

print(result)

# This gets the 10 youngest survivors with their name, age, and class

### Exercise 8.1: Female Survivors by Age

Chain operations to:
1. Filter for female survivors
2. Select name, age, pclass columns
3. Sort by age
4. Display first 10

In [None]:
# Your code here (one long chain!)


### Exercise 8.2: First Class by Fare

Chain operations to:
1. Filter for first class
2. Select name, age, fare
3. Sort by fare (highest first)
4. Display first 15

In [None]:
# Your code here


### Exercise 8.3: Young Male Survivors

Chain operations to:
1. Filter for males under 18 who survived
2. Select name, age, pclass
3. Sort by age (youngest first)
4. Display all

In [None]:
# Your code here


---

## Part 9: Extended Statistics (Track B)

### Example: Statistics by Subgroups

In [None]:
# Calculate statistics for different subgroups

# Average fare for first class
first_class = titanic[titanic['pclass'] == 1]
avg_first_fare = first_class['fare'].mean()
print(f"First class average fare: ${avg_first_fare:.2f}")

# Average fare for third class
third_class = titanic[titanic['pclass'] == 3]
avg_third_fare = third_class['fare'].mean()
print(f"Third class average fare: ${avg_third_fare:.2f}")

# Compare the two
print(f"Difference: ${avg_first_fare - avg_third_fare:.2f}")

### Exercise 9.1: Statistics by Class

Calculate the average fare for each class separately.

Which class paid the most on average?

In [None]:
# Your code here


### Exercise 9.2: Survival Rates by Gender

Calculate the percentage of males who survived and the percentage of females who survived.

Which gender had a higher survival rate?

In [None]:
# Your code here


### Exercise 9.3: Age Statistics by Survival

Calculate:
1. Average age of survivors
2. Average age of non-survivors
3. Who was older on average?

In [None]:
# Your code here


---

## Part 10: Analysis Questions

### Question 1: Survival Rate

What percentage of passengers survived?

In [None]:
# Your code here


**Answer:** 

### Question 2: Class Distribution

Which class had the most passengers?

In [None]:
# Your code here


**Answer:** 

### Question 3: Age Comparison

What was the average age of first class vs third class passengers?

In [None]:
# Your code here


**Answer:** 

### Question 4: Gender Distribution

Were there more male or female passengers?

In [None]:
# Your code here


**Answer:** 

### Question 5: Fare Comparison

Did survivors pay more on average than non-survivors?

In [None]:
# Your code here


**Answer:** 

### Question 6: Child Survival Rate

What percentage of children (age <= 12) survived compared to adults (age > 12)?

In [None]:
# Your code here


**Answer:** 

### Question 7: Fare and Survival

Did survivors pay higher fares on average than non-survivors? Why might this be?

In [None]:
# Your code here


**Answer:** 

### Question 8: Class Survival Rates

Calculate the survival rate for each class (1st, 2nd, 3rd). Which class had the highest survival rate?

In [None]:
# Your code here


**Answer:** 

---

## Part 11: Critical Thinking (Track B)

### Question 9: Important Factors

Based on your analysis, what were the THREE most important factors for survival?

Support with specific statistics.

**Your Answer:**

1. 
2. 
3. 

### Question 10: Pandas to SQL

You filtered with: `titanic[(titanic['sex'] == 'female') & (titanic['survived'] == 1)]`

How do you think this might look in SQL? (Guess - we'll learn this next!)

**Your Answer:** 

---

## Submission Checklist

- [ ] Completed all exercises including Track B sections
- [ ] Run all cells successfully
- [ ] Added name and date at top
- [ ] Answered all questions
- [ ] Saved as `dbAppsTask02TrackB.ipynb`
- [ ] Pushed to GitHub

**Excellent work! You're ready for SQL!**