# Python for Data Science - Assignment
# Pandas Data Accessing and Filtering

**Course:** Python for Data Science  
**Instructor:** Siva Jasthi  
**Total Points:** 25 points  
**Points per Question:** 2.5 points

---

## Assignment Overview

In this assignment, you will practice accessing, filtering, and traversing data in a pandas DataFrame using the famous Titanic dataset. You'll use various techniques including:
- Accessing specific rows, columns, and cells
- Using `loc` and `iloc` indexers
- Filtering data with boolean conditions
- Combining multiple selection methods

---

## Instructions

1. **Run the setup cell** to load the Titanic dataset
2. **Answer all 10 questions** in the provided code cells
3. **Test your code** to ensure it produces the correct output
4. **Do not modify** the question cells - only write your code in the answer cells
5. **Submit your completed notebook** by the due date

---

## Dataset Information

The Titanic dataset contains information about passengers aboard the Titanic. Here are the key columns:

- **PassengerId**: Unique ID for each passenger
- **Survived**: Survival status (0 = No, 1 = Yes)
- **Pclass**: Passenger class (1 = 1st, 2 = 2nd, 3 = 3rd)
- **Name**: Passenger name
- **Sex**: Gender (male/female)
- **Age**: Age in years
- **SibSp**: Number of siblings/spouses aboard
- **Parch**: Number of parents/children aboard
- **Ticket**: Ticket number
- **Fare**: Passenger fare
- **Cabin**: Cabin number
- **Embarked**: Port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton)

---

## Setup: Load the Dataset

**Run this cell first!** This will load the Titanic dataset and display basic information.

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np

# Load the Titanic dataset
url = 'https://raw.githubusercontent.com/sjasthi/Python-DS-Data-Science/main/datasets/titanic_data.csv'
df = pd.read_csv(url)

# Display basic information
print("Dataset Shape:", df.shape)
print("\nFirst 5 rows:")
display(df.head())

print("\nColumn Names:")
print(df.columns.tolist())

print("\nData Types:")
print(df.dtypes)

---

## Question 1: Accessing a Single Column (2.5 points)

**Task:** Access and display the 'Age' column from the DataFrame.

**Expected Output:** A Series containing all age values.

**Hint:** You can use bracket notation `df['column_name']` or dot notation `df.column_name`

In [None]:
# YOUR CODE HERE
# Access the 'Age' column



---

## Question 2: Accessing Multiple Columns (2.5 points)

**Task:** Create a new DataFrame containing only the 'Name', 'Sex', and 'Age' columns. Display the first 10 rows.

**Expected Output:** A DataFrame with 3 columns and the first 10 rows.

**Hint:** Use double brackets `df[['col1', 'col2', 'col3']]` to select multiple columns.

In [None]:
# YOUR CODE HERE
# Create a DataFrame with Name, Sex, and Age columns, then display first 10 rows



---

## Question 3: Using iloc for Position-Based Access (2.5 points)

**Task:** Using `iloc`, access and display:
- The passenger in row 100
- All data for this passenger

**Expected Output:** A Series showing all information for the passenger at position 100.

**Hint:** Remember that `iloc` uses zero-based indexing (0, 1, 2, ...). Row 100 means position 100.

In [None]:
# YOUR CODE HERE
# Use iloc to access the passenger at position 100



---

## Question 4: Using loc for Label-Based Access (2.5 points)

**Task:** Using `loc`, access and display the 'Name' and 'Fare' for passengers at index positions 5, 15, and 25.

**Expected Output:** A DataFrame with 2 columns ('Name' and 'Fare') and 3 rows.

**Hint:** You can pass a list of index labels to `loc`: `df.loc[[5, 15, 25], ['Name', 'Fare']]`

In [None]:
# YOUR CODE HERE
# Use loc to access Name and Fare for passengers at index 5, 15, and 25



---

## Question 5: Accessing a Specific Cell (2.5 points)

**Task:** Find and display the age of the passenger at index position 50.

**Expected Output:** A single number representing the age.

**Hint:** You can use `df.loc[50, 'Age']` or `df.iloc[50, column_position]`. Find the column position of 'Age' first if using iloc.

In [None]:
# YOUR CODE HERE
# Access the Age value for the passenger at index 50



---

## Question 6: Simple Boolean Filtering (2.5 points)

**Task:** Filter the DataFrame to show only passengers who:
- Were in first class (Pclass == 1)
- Display only the 'Name', 'Age', and 'Fare' columns
- Show the first 10 results

**Expected Output:** A DataFrame with 3 columns showing first class passengers.

**Hint:** Use boolean indexing: `df[df['Pclass'] == 1][['Name', 'Age', 'Fare']].head(10)`

In [None]:
# YOUR CODE HERE
# Filter for first class passengers and display Name, Age, and Fare (first 10 rows)



---

## Question 7: Multiple Condition Filtering (AND) (2.5 points)

**Task:** Filter the DataFrame to show passengers who meet ALL of these conditions:
- Female passengers (Sex == 'female')
- Who survived (Survived == 1)
- Who were in first or second class (Pclass == 1 OR Pclass == 2)

Display the count of how many passengers meet these criteria.

**Expected Output:** A number showing the count of passengers matching all conditions.

**Hint:** Use `&` for AND, `|` for OR, and wrap each condition in parentheses: `df[(condition1) & (condition2)]`

In [None]:
# YOUR CODE HERE
# Filter for female survivors in first or second class, then count them



---

## Question 8: Using isin() for Filtering (2.5 points)

**Task:** Find all passengers who embarked from either Cherbourg ('C') or Queenstown ('Q').
- Use the `isin()` method
- Display the 'Name', 'Embarked', and 'Pclass' columns
- Show only the first 15 results

**Expected Output:** A DataFrame showing passengers who embarked from C or Q.

**Hint:** Use `df[df['Embarked'].isin(['C', 'Q'])]`

In [None]:
# YOUR CODE HERE
# Filter for passengers who embarked from C or Q, display Name, Embarked, and Pclass (first 15)



---

## Question 9: Combining loc with Boolean Filtering (2.5 points)

**Task:** Using `loc` with boolean filtering, find passengers who:
- Are older than 60 years (Age > 60)
- Select only these columns: 'Name', 'Age', 'Sex', 'Survived'
- Sort the results by Age in descending order (oldest first)

**Expected Output:** A DataFrame with 4 columns showing elderly passengers, sorted by age.

**Hint:** Combine boolean filtering with loc: `df.loc[df['Age'] > 60, ['Name', 'Age', 'Sex', 'Survived']].sort_values('Age', ascending=False)`

In [None]:
# YOUR CODE HERE
# Use loc to filter passengers older than 60 and select specific columns, then sort by age



---

## Question 10: Advanced Filtering Challenge (2.5 points)

**Task:** Create a filtered DataFrame that shows passengers who meet these criteria:
- Paid a fare greater than 50
- Had at least one sibling or spouse aboard (SibSp >= 1)
- Were either male OR in third class (Sex == 'male' OR Pclass == 3)

Display the following columns: 'Name', 'Sex', 'Pclass', 'SibSp', 'Fare'

Then, print:
1. The total number of passengers matching these criteria
2. The average fare paid by these passengers

**Expected Output:**
- A DataFrame with the filtered results
- A count of matching passengers
- The average fare (rounded to 2 decimal places)

**Hint:** Break it down into steps:
1. Create condition1 for Fare > 50
2. Create condition2 for SibSp >= 1
3. Create condition3 for (Sex == 'male') | (Pclass == 3)
4. Combine all with & and select columns
5. Use `.shape[0]` for count and `.mean()` for average

In [None]:
# YOUR CODE HERE
# Create complex filter and display results with statistics



---

## Submission Checklist

Before submitting, make sure you have:

- [ ] Answered all 10 questions
- [ ] Run all cells to verify your code works
- [ ] Checked that your outputs make sense
- [ ] Saved your notebook
- [ ] Submitted the completed `.ipynb` file

---

## Grading Rubric

Each question is worth 2.5 points, graded as follows:

- **2.5 points**: Correct solution, code runs without errors, proper use of methods
- **2.0 points**: Mostly correct but minor errors or inefficient approach
- **1.5 points**: Partial solution, shows understanding but significant errors
- **1.0 points**: Attempted but incorrect approach or major errors
- **0.5 points**: Minimal effort or fundamentally wrong approach
- **0 points**: No attempt or completely incorrect

---

**Good luck! Remember to test your code and use the Pandas documentation if you need help!**