# Task 01 - Track B: Python Basics & Advanced Dataset Exploration

**Course:** Database Applications Development  
**Lesson:** 01 - Introduction to JupyterLab & Python Fundamentals  

---

## Instructions

Complete all exercises in this notebook. Each section includes example code to help guide you. Read the examples carefully, then complete the exercises below them.

**Track B** includes all Track A exercises PLUS additional challenges and deeper analysis.

**Submission:**
1. Save this notebook as `dbAppsTask01TrackB.ipynb`
2. Add, commit, and push to your `databaseApplications` repository on GitHub
3. Verify the file appears correctly on GitHub

---

## Titanic Dataset - Data Dictionary

You'll be working with the Titanic dataset throughout this task. Here's what each column means:

| Column | Description | Data Type | Example Values |
|--------|-------------|-----------|----------------|
| **pclass** | Passenger class (ticket class) | Integer | 1 = First Class<br>2 = Second Class<br>3 = Third Class |
| **survived** | Survival status | Integer | 0 = Did not survive<br>1 = Survived |
| **name** | Passenger's full name | String | "Braund, Mr. Owen Harris" |
| **sex** | Gender | String | "male" or "female" |
| **age** | Age in years | Float | 22.0, 38.0, 26.0 |
| **sibsp** | Number of siblings/spouses aboard | Integer | 0, 1, 2, etc. |
| **parch** | Number of parents/children aboard | Integer | 0, 1, 2, etc. |
| **ticket** | Ticket number | String | "A/5 21171", "PC 17599" |
| **fare** | Passenger fare (ticket price) | Float | 7.25, 71.28, 8.05 |
| **cabin** | Cabin number | String | "C85", "E46", "B96 B98" |
| **embarked** | Port of embarkation | String | C = Cherbourg<br>Q = Queenstown<br>S = Southampton |
| **boat** | Lifeboat number | String | "13", "4", "D" |
| **body** | Body identification number | Integer | For victims recovered |
| **home.dest** | Home/destination | String | "New York, NY", "Montreal, PQ" |

**Note:** Some columns may have missing values (NaN = Not a Number), which is common in real-world datasets.

---

## Part 1: Variables and Data Types

Variables store data for later use. Python has several built-in data types.

### Example: Creating Variables

In [None]:
# Example of different data types
student_name = "Alice Johnson"  # String (text)
student_age = 17                # Integer (whole number)
student_gpa = 3.75              # Float (decimal number)
is_passing = True               # Boolean (True/False)

# Print with descriptive labels
print("Name:", student_name)
print("Age:", student_age)
print("GPA:", student_gpa)
print("Passing:", is_passing)

### Exercise 1.1: Create Your Own Variables

Create the following variables with your own information:
- `my_name` - your full name (string)
- `my_age` - your age (integer)
- `my_gpa` - a GPA value (float)
- `enrolled` - whether you're enrolled in this course (boolean)

Print each variable with a descriptive label.

In [2]:
# Your code here
my_name = "Gavin Waibel"
my_age = 17
my_gpa = 4.00
enrolled = True

# Print statements
print(f"{my_name} is {my_age} and is {enrolled} (enrolled) with a GPA of {my_gpa}")

Gavin Waibel is 17 and is True (enrolled) with a GPA of 4.0


### Example: Checking Data Types

In [None]:
# Use type() to check what kind of data is stored
course_name = "Database Applications"
course_code = 145085

print("Type of course_name:", type(course_name))  # <class 'str'>
print("Type of course_code:", type(course_code))  # <class 'int'>

### Exercise 1.2: Check Data Types

Use the `type()` function to check the data type of each variable you created in Exercise 1.1. Print the results.

In [None]:
# Your code here


### Example: Type Conversion (Track B)

In [None]:
# Sometimes you need to convert between data types
number_as_string = "42"
print("Original type:", type(number_as_string))  # <class 'str'>

# Convert string to integer
number_as_int = int(number_as_string)
print("Converted type:", type(number_as_int))    # <class 'int'>

# Now we can do math with it
result = number_as_int + 8
print("Result:", result)  # 50

### Exercise 1.3 (Track B): Type Conversion

Create a variable `grade_string = "87"` (a string).

1. Convert it to an integer and store it in `grade_int`
2. Add 5 points to it
3. Print the result with a descriptive message

In [None]:
# Your code here
grade_string = "87"


---

## Part 2: Basic Operations

Python can perform calculations and manipulate strings.

### Example: Mathematical Operations

In [None]:
# Calculate total cost for movie tickets
ticket_price = 10.50
number_of_tickets = 4

total_cost = ticket_price * number_of_tickets

print("Total cost for", number_of_tickets, "tickets: $", total_cost)

### Exercise 2.1: Calculate Ticket Prices

A theater charges:
- Adult tickets: $12.50
- Child tickets: $7.50

Calculate the total cost for a family with:
- 2 adults
- 3 children

Store the result in a variable called `total_cost` and print it with a descriptive message.

In [None]:
# Your code here
adult_price = 
child_price = 
num_adults = 
num_children = 

total_cost = 

print()


### Example: String Concatenation

In [None]:
# Combining strings together
first = "John"
last = "Smith"
age = 25

# Method 1: Using + operator
full_name = first + " " + last
print(full_name)

# Method 2: Using f-strings (formatted string literals)
message = f"Hello! My name is {first} {last} and I am {age} years old."
print(message)

### Exercise 2.2: String Concatenation

Create three variables:
- `first_name`
- `last_name`
- `favorite_subject`

Combine them to create and print a sentence like:  
"Hello! My name is [first] [last] and my favorite subject is [subject]."

You can use either the + operator or f-strings.

In [None]:
# Your code here
first_name = 
last_name = 
favorite_subject = 

sentence = 

print(sentence)


### Example: Integer Division and Modulus (Track B)

In [None]:
# Integer division (//) gives you the whole number part
# Modulus (%) gives you the remainder

total_students = 47
students_per_team = 5

full_teams = total_students // students_per_team  # How many complete teams?
leftover = total_students % students_per_team     # How many students left over?

print(f"We can make {full_teams} full teams with {leftover} students left over.")

### Exercise 2.3 (Track B): Advanced Calculations

Calculate the following:

1. If a student scores 87% on a test worth 150 points, how many points did they earn? (Use regular division, then convert to int)
2. A dataset has 1309 rows. If you split it into batches of 50, how many full batches do you get? (Use `//`)
3. How many remaining rows don't fit in a full batch? (Use modulus `%`)

In [None]:
# Your code here


### Example: String Methods (Track B)

In [None]:
# Strings have built-in methods for manipulation
text = "hello world"

print(text.upper())       # HELLO WORLD
print(text.title())       # Hello World
print(text.replace("world", "Python"))  # hello Python
print(text.count("l"))    # 3 (counts how many times 'l' appears)

### Exercise 2.4 (Track B): String Methods

Create a variable `passenger_name = "smith, mr. john"`. 

Use string methods to:
1. Convert it to uppercase
2. Convert it to title case (first letter of each word capitalized)
3. Replace "mr." with "Mr."
4. Count how many times the letter 'm' appears (lowercase)

In [None]:
# Your code here
passenger_name = "smith, mr. john"


---

## Part 3: Working with Comments

Comments explain what your code does. They're ignored by Python but help humans understand your work.

### Example: Using Comments

In [None]:
# Calculate the area of a circle
# Formula: area = π × radius²

radius = 5          # Circle radius in centimeters
pi = 3.14159        # Approximation of π (pi)

area = pi * radius ** 2  # ** means "to the power of"

print("Area of circle:", area, "square cm")

### Exercise 3.1: Add Comments

The code below calculates the area of a rectangle. Add comments to explain each step.

In [None]:
length = 10
width = 5
area = length * width
print("The area is:", area)


### Exercise 3.2 (Track B): Write Documented Code

Write code that calculates the perimeter of a rectangle.

**Formula:** perimeter = 2 × length + 2 × width

Use variables for length and width, and add comments explaining each step.

In [None]:
# Your code here


---

## Part 4: Introduction to Pandas

Pandas is Python's most powerful library for working with data. We'll use it to load and explore the Titanic dataset.

### Example: Importing Pandas

In [None]:
# Import pandas with the standard alias 'pd'
import pandas as pd

# Now we can use pandas functions by typing pd.function_name()
print("Pandas version:", pd.__version__)

### Exercise 4.1: Import Pandas

Import the pandas library using the standard alias `pd`.

In [None]:
# Your code here


### Example: Loading a CSV File

In [None]:
# Load a CSV file into a DataFrame
# Replace 'sample.csv' with your actual filename
data = pd.read_csv('sample.csv')

# The data is now stored in a DataFrame object
# A DataFrame is like a spreadsheet - it has rows and columns

### Exercise 4.2: Load the Titanic Dataset

Use pandas to read the `Titanic_Dataset.csv` file into a DataFrame called `titanic`.

**Note:** Make sure the CSV file is in the same folder as this notebook!

In [None]:
# Your code here
titanic = 


### Example: Viewing the First Rows

In [None]:
# View the first 5 rows of any DataFrame
data.head()

# You can specify how many rows to show
data.head(10)  # Shows first 10 rows

### Exercise 4.3: View the First Rows

Display the first 10 rows of the Titanic dataset using the `.head(10)` method.

In [None]:
# Your code here


### Example: Dataset Information

In [None]:
# Get information about the DataFrame structure
data.info()

# This shows:
# - Number of rows and columns
# - Column names
# - Data types of each column
# - How many non-null (non-missing) values in each column

### Exercise 4.4: Dataset Information

Use the `.info()` method to see the structure of the Titanic dataset.

In [None]:
# Your code here


### Exercise 4.5: Answer Questions

Based on the `.info()` output, answer these questions:

1. How many rows (passengers) are in the dataset?
2. How many columns are in the dataset?
3. What is the data type of the 'age' column?
4. What is the data type of the 'survived' column?
5. Which columns have missing (null) values?

**Your Answers:**

1. Number of rows: 
2. Number of columns: 
3. Data type of 'age': 
4. Data type of 'survived': 
5. Columns with missing values: 

---

## Part 5: Basic DataFrame Exploration

Let's explore the Titanic dataset using pandas methods.

### Example: Common DataFrame Methods

In [None]:
# Useful DataFrame methods:

data.tail()       # View last rows
data.columns      # Get column names
data.shape        # Get (rows, columns) as a tuple
data.describe()   # Get statistics for numerical columns

### Exercise 5.1: View Last Rows

Display the last 10 rows of the dataset using the `.tail(10)` method.

In [None]:
# Your code here


### Exercise 5.2: Get Column Names

Print all the column names in the dataset using `.columns`.

In [None]:
# Your code here


### Exercise 5.3: Get Dataset Shape

Print the shape (rows, columns) of the dataset using `.shape`.

**Hint:** This will return a tuple like (1309, 14) meaning 1309 rows and 14 columns.

In [None]:
# Your code here


### Example: Descriptive Statistics

In [None]:
# Get basic statistics about numerical columns
data.describe()

# This shows:
# count - number of non-missing values
# mean - average value
# std - standard deviation (measure of spread)
# min - minimum value
# 25% - first quartile
# 50% - median (middle value)
# 75% - third quartile
# max - maximum value

### Exercise 5.4: Basic Statistics

Use the `.describe()` method to see basic statistics about numerical columns in the Titanic dataset.

In [None]:
# Your code here


### Exercise 5.5: Interpret Statistics

Based on the `.describe()` output, answer these questions:

1. What is the average (mean) age of passengers?
2. What is the maximum fare paid?
3. What percentage of passengers survived? (Hint: Look at the 'survived' column mean - it will be a decimal between 0 and 1)
4. What is the median (50th percentile) fare?
5. What is the standard deviation of age?

**Your Answers:**

1. Average age: 
2. Maximum fare: 
3. Survival rate (as percentage): 
4. Median fare: 
5. Standard deviation of age: 

---

## Part 6 (Track B): Advanced DataFrame Operations

Perform more advanced exploration and analysis.

### Example: Selecting Columns

In [None]:
# Select a single column (returns a Series)
names = data['name']

# Select multiple columns (returns a DataFrame)
subset = data[['name', 'age', 'sex']]

# Display first few rows
subset.head()

### Exercise 6.1: Select a Single Column

Select and display the 'name' column from the Titanic dataset. Show the first 10 entries using `.head(10)`.

In [None]:
# Your code here


### Exercise 6.2: Select Multiple Columns

Create a new DataFrame containing only these columns: 'name', 'sex', 'age', 'survived'.

Display the first 10 rows.

In [None]:
# Your code here


### Example: Value Counts

In [None]:
# Count how many times each value appears in a column
data['column_name'].value_counts()

# For example, to count genders:
data['sex'].value_counts()

### Exercise 6.3: Count Values

Use `.value_counts()` to answer these questions:

1. How many passengers were in each class (pclass)?
2. How many male vs. female passengers?
3. How many passengers survived vs. died?

In [None]:
# Passenger class distribution


In [None]:
# Gender distribution


In [None]:
# Survival distribution


### Example: Column Statistics

In [None]:
# Calculate specific statistics for a column
data['age'].min()      # Minimum age
data['age'].max()      # Maximum age
data['age'].mean()     # Average age
data['age'].sum()      # Sum of all ages
data['age'].median()   # Median age

### Exercise 6.4: Calculate Specific Statistics

Calculate the following:

1. Minimum age in the dataset
2. Maximum fare in the dataset
3. Total number of siblings/spouses (sum of 'sibsp' column)
4. Average fare for all passengers

In [None]:
# Your code here


### Example: Checking for Missing Data

In [None]:
# Count missing values in each column
data.isnull().sum()

# This returns how many NaN (missing) values are in each column

### Exercise 6.5: Check for Missing Data

Use `.isnull().sum()` to count how many missing values exist in each column.

Which columns have the most missing data?

In [None]:
# Your code here


**Answer:** The columns with the most missing data are: 

### Exercise 6.6: Data Type Identification

Look at the dataset and the data dictionary. Answer these questions:

1. Which columns contain **strings (text)**?
2. Which columns contain **integers**?
3. Which columns contain **floats (decimals)**?
4. Why do you think 'survived' is stored as an integer (0 or 1) instead of a boolean (True/False)?

**Your Answers:**

1. String columns: 
2. Integer columns: 
3. Float columns: 
4. Why survived is an integer: 

---

## Part 7 (Track B): Critical Thinking

Answer these questions based on your exploration.

### Exercise 7.1: Dataset Context

Based on what you've learned about the Titanic dataset, answer these questions:

1. What does the 'pclass' column represent? What values does it contain?
2. What does a 'survived' value of 1 mean? What about 0?
3. Why might the 'cabin' column have so many missing values?
4. What could the 'embarked' column represent? (Hint: Look at the values - they're single letters. Check the data dictionary!)
5. Why might the 'age' column have so many missing values? 

**Your Answers:**

1. 
2. 
3. 
4. 

---

## Submission Checklist

Before submitting, make sure you have:

- [ ] Completed all exercises (including Track B sections)
- [ ] Run all cells successfully (no errors)
- [ ] Added your name and date at the top
- [ ] Answered all written questions
- [ ] Saved the notebook as `dbAppsTask01TrackB.ipynb`
- [ ] Pushed to your `databaseApplications` repository on GitHub
- [ ] Verified the file appears on GitHub

**Excellent work on completing the Track B challenges!**