# Titanic Lab (Student) — Selecting, Filtering, Sorting (Beginner)

This student notebook guides you through simple pandas tasks using the Titanic dataset. Each section contains a short demonstration you can run to see how something works, followed by a blank code cell where you'll try it yourself.

Goals:
- Inspect a DataFrame and list its columns
- Select single and multiple columns
- Build and test filtering conditions (including negation)
- Sort the DataFrame and show top/bottom rows

Run cells one at a time and experiment by changing numbers or conditions.

In [None]:
# Demo: load dataset (run this cell to create `df`)
import pandas as pd
import seaborn as sns

try:
    df = pd.read_csv('data/titanic.csv')
except Exception:
    df = sns.load_dataset('titanic')

print('Dataset loaded. Rows, columns:', df.shape)
# show first rows to inspect
df.head()

## Exercise 1 — Inspect the DataFrame

Run the demonstration cell to see the data, then try the tasks below.

Demonstration (run it to see how it works):
```python
# show DataFrame info and columns
print(df.shape)
print(df.head())
print('\nColumns:')
print(list(df.columns))
```

Your turn: use the blank cell below to run `df.head()`, `df.tail()`, and `df.columns`. If you get an error, make sure you ran the previous cell that loads the dataset.

In [None]:
# Your code here: Inspect the DataFrame



## Exercise 2 — Selecting Columns

Demonstration:
```python
# single column returns a Series
ages = df['age']
print(type(ages))
print(ages.head())

# multiple columns return a DataFrame
sel = df[['survived', 'pclass', 'sex', 'age', 'fare']]
print(type(sel))
print(sel.head())
```

Your turn: create `df_sel` with columns `['survived','pclass','sex','age','fare']` using a blank code cell below.

In [None]:
# Create df_sel with the requested columns and inspect it



## Exercise 3 — Filtering rows (build conditions step-by-step)

Demonstration — single condition and negation:
```python
# boolean Series (mask)
mask = df['age'] > 30
print(mask.head())

# use mask to filter
df_age30 = df[mask]
print(df_age30.shape)

# equality and negation
not_survived = df[df['survived'] != 1]   # using !=
print(not_survived.head())

female_mask = (df['sex'] == 'female')
not_female = df[~female_mask]            # using ~ to invert mask
print(not_female.head())
```

Your turn: in the blank cell below try building a condition `df['age'] > 40` and use it to filter rows. Then try a negation example to select passengers who did NOT survive.

In [None]:
# Your code here: Filtering practice

# 1) Filter rows where age > 40

# 2) Select passengers who did NOT survive

# 3) 1st class and older than 30

# 4) those who didn't survive



## Exercise 4 — Convenience filters (`isin`, `str.contains`)

Demonstration:
```python
# passengers in classes 1 or 2
print(df[df['pclass'].isin([1,2])].shape)

# string match (if column exists)
if 'name' in df.columns:
    print(df[df['name'].str.contains('Smith', na=False)].shape)
```

Your turn: try selecting passengers whose `embarked` is in `['C','Q']` using `isin` in the blank cell below.

In [None]:
# Your code here: Convenience filters



## Exercise 5 — Sorting and top/bottom rows

Demonstration (single- and multi-column sorting):
```python
# sort by age descending and show top 5 oldest
print(df.sort_values('age', ascending=False).head(5))

# sort by fare ascending and show bottom 10
print(df.sort_values('fare', ascending=True).head(10))

# sort by multiple columns: class ascending, fare descending
print(df.sort_values(['pclass','fare'], ascending=[True, False]).head(10))
```

Your turn: show top 10 passengers by fare and then show top 10 ordered by class (ascending) and within class by fare (descending) in the blank cell below.

In [None]:
# Your code here: Sorting practice

# 1) Top 10 passengers by fare

# 2) Top 10 ordered by class asc, fare desc

## Small challenges (optional)

- Average fare for passengers who survived vs who did not
- Percentage of survivors by class
- Top 5 oldest passengers who survived

Hints and reminders:
- Use `df[(condition)]` to filter
- Combine conditions with `&` and `|` (use parentheses)
- Use `~` or `!=` to express negation
- `df.sort_values(['col1','col2'], ascending=[True,False])` sorts by multiple columns

---

Good luck! Run the demonstrative cells to see how things work, then complete the blank cells to practice.