# **Notebook 1: Python Fundamentals for Historians**

Welcome to Python for Historians! This notebook is your first step into programming. You're not becoming a software developer - you're learning to use Python as a powerful research tool.

Think of Python as a brilliant, tireless research assistant who never gets tired of repetitive tasks. You just need to learn how to give clear instructions.

**What you'll learn:**
- Storing information in variables
- Working with different types of data (text, numbers, lists)
- Cleaning messy historical data
- Creating reusable functions
- Managing collections of historical information

**Why this matters for historians:**
These skills will let you clean thousands of documents, analyze large datasets, and automate repetitive research tasks.

## Step 1: Storing Information with Variables

The most fundamental concept in programming is storing information. We do this with variables - think of them as labeled containers for data.

**Step 1a: Your first variable**

Let's start with historical information. Copy this code:
```python
historical_figure = "Ada Lovelace"
print(historical_figure)
```

In [None]:
# Your first variable - copy and run this code
historical_figure = "Ada Lovelace"
print(historical_figure)

**Step 1b: Different types of data**

Python can store different types of information. Let's see the main types historians use:

Copy this code:
```python
# Text (called strings) - always in quotes
name = "Marie Curie"
nationality = "Polish-French"

# Numbers (called integers)
birth_year = 1867
nobel_prizes = 2

# True/False values (called booleans)
first_woman_nobel = True

print(f"Name: {name}")
print(f"Birth year: {birth_year}")
print(f"First woman to win Nobel Prize: {first_woman_nobel}")
```

In [None]:
# Different types of data - copy and run this code
# Text (called strings) - always in quotes
name = "Marie Curie"
nationality = "Polish-French"

# Numbers (called integers)
birth_year = 1867
nobel_prizes = 2

# True/False values (called booleans)
first_woman_nobel = True

print(f"Name: {name}")
print(f"Birth year: {birth_year}")
print(f"First woman to win Nobel Prize: {first_woman_nobel}")

**Step 1c: Understanding f-strings**

The `f"..."` syntax is called an f-string. It's the easiest way to combine text and variables.

In [None]:
# F-strings let you insert variables into text
person = "Leonardo da Vinci"
century = 15

# The variables go inside {curly braces}
description = f"{person} was a Renaissance genius from the {century}th century."
print(description)

# You can also do calculations inside f-strings
current_year = 2024
birth_year = 1452
print(f"If {person} were alive today, he would be {current_year - birth_year} years old.")

**Step 1d: Extracting information from text**

Historians often need to extract specific information from sentences. Let's learn to parse birthdates from biographical text:

In [None]:
biography = "Emily Carr was born in 1871 in Victoria."
print(f"Original sentence: {biography}")

# Split the sentence into individual words
words = biography.split()
print(f"Words: {words}")
      
# Find the birth year by selecting it from the list at index 5
birth_year_string = words[5]  # <-- This is the corrected line
# This is text
print(f"Birth year as string: '{birth_year_string}' (type: {type(birth_year_string)})")
            
# Convert string to integer so we can do math with it
birth_year_integer = int(birth_year_string)
print(f"Birth year as integer: {birth_year_integer} (type: {type(birth_year_integer)})")
      
# Now we can do calculations!
current_year = 2025 
age_if_alive = current_year - birth_year_integer
      
print(f"If Emily Carr were alive today, she would be {age_if_alive} years old.")

### 🔄 **Your Turn: Extract and Convert a Date**
 
Practice extracting a year from a sentence and converting it from string to integer.
 
Given this sentence: "The Constitution Act was passed in 1982."
 
Find the year in the sentence. 

Convert it from a string to an integer.
 
Calculate how many years ago that was. 

Print the result

In [None]:
# Your exercise: Extract and convert a date
historical_fact = "The Constitution Act was passed in 1982."
print(f"Sentence: {historical_fact}")

# 1. Find the year (hint: it's "1982")
year_string = "1982"  # You could extract this from the sentence
print(f"Year as string: '{year_string}' (type: {type(year_string)})")

# 2. Convert to integer
year_integer = int(year_string)
print(f"Year as integer: {year_integer} (type: {type(year_integer)})")

# 3. Calculate years ago
current_year = 2024
years_ago = current_year - year_integer

# 4. Print result
print(f"The Constitution Act was passed {years_ago} years ago.")

### 🔄 **Your Turn: Create Historical Variables**

Now create variables for a historical figure of your choice. Include:
1. Their name (string)
2. Birth year (integer)
3. A significant achievement (string)
4. Whether they're still alive (boolean)

Then use an f-string to create a description.

In [None]:
# Your exercise: Create variables for a historical figure
# Example with Louis Riel - replace with your choice
name = "Louis Riel"
birth_year = 1844
achievement = "Leader of Métis resistance movements"
still_alive = False

# Create an f-string description
description = f"{name} was born in {birth_year}. He was known for {achievement}. Still alive: {still_alive}"
print(description)

# Try with your own historical figure:
# name = "Your choice"
# birth_year = ...
# achievement = "..."
# still_alive = True or False
# description = f"..."
# print(description)

## Step 2: Working with Text (Strings)

Historical sources are mostly text, so understanding strings is crucial. Python has powerful tools for cleaning and manipulating text.

**Step 2a: String methods - the basics**

Methods are actions you can perform on data. For strings, common methods include cleaning and formatting:

Copy this code:
```python
messy_title = "  THE DECLARATION of INDEPENDENCE  "

# .strip() removes extra spaces from beginning and end
no_spaces = messy_title.strip()
print(f"After strip(): '{no_spaces}'")

# .lower() makes everything lowercase
lowercase = messy_title.lower()
print(f"After lower(): '{lowercase}'")

# .title() capitalizes the first letter of each word
title_case = messy_title.title()
print(f"After title(): '{title_case}'")
```

In [None]:
# String methods - copy and run this code
messy_title = "  THE DECLARATION of INDEPENDENCE  "

# .strip() removes extra spaces from beginning and end
no_spaces = messy_title.strip()
print(f"After strip(): '{no_spaces}'")

# .lower() makes everything lowercase
lowercase = messy_title.lower()
print(f"After lower(): '{lowercase}'")

# .title() capitalizes the first letter of each word
title_case = messy_title.title()
print(f"After title(): '{title_case}'")

**Step 2b: The replace() method**

One of the most useful methods for historians is `.replace()` - it finds and replaces text:

Copy this code:
```python
manuscript_note = "John Smith---born 1650---died 1725"

# Replace the dashes with commas
cleaned_note = manuscript_note.replace("---", ", ")
print(f"Original: {manuscript_note}")
print(f"Cleaned:  {cleaned_note}")

# You can chain multiple replacements
scan_data = "[DAMAGED] King Charles II [UNCLEAR] ruled from 1660"
clean_scan = scan_data.replace("[DAMAGED]", "").replace("[UNCLEAR]", "")
print(f"\nOriginal scan: {scan_data}")
print(f"Cleaned scan:  {clean_scan}")
```

In [None]:
# The replace() method - copy and run this code
manuscript_note = "John Smith---born 1650---died 1725"

# Replace the dashes with commas
cleaned_note = manuscript_note.replace("---", ", ")
print(f"Original: {manuscript_note}")
print(f"Cleaned:  {cleaned_note}")

# You can chain multiple replacements
scan_data = "[DAMAGED] King Charles II [UNCLEAR] ruled from 1660"
clean_scan = scan_data.replace("[DAMAGED]", "").replace("[UNCLEAR]", "")
print(f"\nOriginal scan: {scan_data}")
print(f"Cleaned scan:  {clean_scan}")

**Step 2c: Chaining methods together**

You can combine multiple string methods in one line. This is very powerful for cleaning historical data:

In [None]:
# Messy historical document title
messy_document = "  THE MAGNA CARTA---original document [scan quality: poor]  "

# Clean it in one line by chaining methods
clean_document = messy_document.strip().replace("---", ": ").replace("[scan quality: poor]", "").title()

print(f"Original: {messy_document}")
print(f"Cleaned:  {clean_document}")

# Let's break this down step by step to understand what happened:
step1 = messy_document.strip()  # Remove extra spaces
step2 = step1.replace("---", ": ")  # Replace dashes
step3 = step2.replace("[scan quality: poor]", "")  # Remove scan note
step4 = step3.title()  # Make title case

print(f"\nStep by step:")
print(f"1. After strip(): '{step1}'")
print(f"2. After replace(): '{step2}'")
print(f"3. After second replace(): '{step3}'")
print(f"4. After title(): '{step4}'")

### 🔄 **Your Turn: Clean a Historical Document Title**

Practice cleaning this messy archival document title:
```
"  LETTERS FROM NEW FRANCE---volume III [digitized from microfilm]  "
```

Your task:
1. Remove extra spaces
2. Replace "---" with ": "
3. Remove the "[digitized from microfilm]" note
4. Make it proper title case

Try both step-by-step and chained approaches.

In [None]:
# Your exercise: Clean the document title
messy_title = "  LETTERS FROM NEW FRANCE---volume III [digitized from microfilm]  "

# Method 1: Step by step
step1 = messy_title.strip()
step2 = step1.replace("---", ": ")
step3 = step2.replace("[digitized from microfilm]", "")
step4 = step3.title()

print("Step by step approach:")
print(f"1. After strip(): '{step1}'")
print(f"2. After replace(): '{step2}'")
print(f"3. After second replace(): '{step3}'")
print(f"4. After title(): '{step4}'")

# Method 2: Chained (all in one line)
clean_title = messy_title.strip().replace("---", ": ").replace("[digitized from microfilm]", "").title()

print(f"\nChained approach result: '{clean_title}'")

## Step 3: Collections of Data (Lists)

Historians often work with multiple pieces of information. Lists let you store and work with collections of data.

**Step 3a: Creating your first list**

Lists are created with square brackets `[]` and can hold multiple items:

Copy this code:
```python
# A list of French kings
french_kings = ["Louis XIV", "Louis XV", "Louis XVI"]
print(f"French kings: {french_kings}")

# Lists can hold numbers too
reign_years = [1643, 1715, 1774]
print(f"Reign start years: {reign_years}")

# How many items in the list?
print(f"Number of kings: {len(french_kings)}")
```

In [None]:
# Creating your first list - copy and run this code
# A list of French kings
french_kings = ["Louis XIV", "Louis XV", "Louis XVI"]
print(f"French kings: {french_kings}")

# Lists can hold numbers too
reign_years = [1643, 1715, 1774]
print(f"Reign start years: {reign_years}")

# How many items in the list?
print(f"Number of kings: {len(french_kings)}")

**Step 3b: Accessing individual items**

You can get specific items from a list using their position (called an index). Python starts counting from 0:

Copy this code:
```python
canadian_provinces = ["Ontario", "Quebec", "British Columbia", "Alberta"]

# Get items by position (starting from 0)
first_province = canadian_provinces[0]
second_province = canadian_provinces[1]
last_province = canadian_provinces[-1]  # -1 gets the last item

print(f"First province: {first_province}")
print(f"Second province: {second_province}")
print(f"Last province: {last_province}")
```

In [None]:
# Accessing individual items - copy and run this code
canadian_provinces = ["Ontario", "Quebec", "British Columbia", "Alberta"]

# Get items by position (starting from 0)
first_province = canadian_provinces[0]
second_province = canadian_provinces[1]
last_province = canadian_provinces[-1]  # -1 gets the last item

print(f"First province: {first_province}")
print(f"Second province: {second_province}")
print(f"Last province: {last_province}")

**Step 3c: Adding items to lists**

Lists can grow and change. You can add new items:

In [None]:
# Start with an empty list
important_battles = []
print(f"Starting list: {important_battles}")

# Add items one by one
important_battles.append("Battle of Hastings")
important_battles.append("Battle of Waterloo")
important_battles.append("Battle of Gettysburg")

print(f"After adding battles: {important_battles}")
print(f"Number of battles: {len(important_battles)}")

# You can also start with items and add more
ancient_philosophers = ["Socrates", "Plato"]
ancient_philosophers.append("Aristotle")
print(f"Philosophers: {ancient_philosophers}")

### 🔄 **Your Turn: Build a Historical Timeline**

Create a list of important events in Canadian history. Start with an empty list and add events one by one:

1. Start with an empty list called `canadian_events`
2. Add at least 4 historical events using `.append()`
3. Print the list and its length
4. Print the first and last events using indexing

In [None]:
# Your exercise: Build a Canadian history timeline
# 1. Start with empty list
canadian_events = []

# 2. Add events with .append()
canadian_events.append("Confederation (1867)")
canadian_events.append("Constitution Act (1982)")
canadian_events.append("Battle of Plains of Abraham (1759)")
canadian_events.append("Women get federal vote (1918)")

# 3. Print the list and length
print(f"Canadian events: {canadian_events}")
print(f"Number of events: {len(canadian_events)}")

# 4. Print first and last events
print(f"First event: {canadian_events[0]}")
print(f"Last event: {canadian_events[-1]}")

# Add your own events:
# canadian_events.append("Your event here")
# print(f"Updated list: {canadian_events}")

## Step 4: Organizing Information (Dictionaries)

Sometimes you need to store related pieces of information together. Dictionaries let you create structured records - perfect for historical data.

**Step 4a: Your first dictionary**

Dictionaries store information in key-value pairs. Think of them like a filing system:

Copy this code:
```python
# Information about a historical figure
napoleon = {
    "name": "Napoleon Bonaparte",
    "birth_year": 1769,
    "nationality": "French",
    "occupation": "Emperor"
}

print(f"Historical figure: {napoleon}")
```

In [None]:
# Your first dictionary - copy and run this code
# Information about a historical figure
napoleon = {
    "name": "Napoleon Bonaparte",
    "birth_year": 1769,
    "nationality": "French",
    "occupation": "Emperor"
}

print(f"Historical figure: {napoleon}")

**Step 4b: Accessing dictionary information**

You get information from a dictionary using the key names:

Copy this code:
```python
# Get specific information
name = napoleon["name"]
birth_year = napoleon["birth_year"]

print(f"Name: {name}")
print(f"Born: {birth_year}")

# Use in f-strings
description = f"{napoleon['name']} was born in {napoleon['birth_year']} and was {napoleon['nationality']}."
print(description)
```

In [None]:
# Accessing dictionary information - copy and run this code
# Get specific information
name = napoleon["name"]
birth_year = napoleon["birth_year"]

print(f"Name: {name}")
print(f"Born: {birth_year}")

# Use in f-strings
description = f"{napoleon['name']} was born in {napoleon['birth_year']} and was {napoleon['nationality']}."
print(description)

**Step 4c: Adding and changing information**

Dictionaries can be updated with new information:

In [None]:
# Add new information
napoleon["death_year"] = 1821
napoleon["place_of_death"] = "St. Helena"

print(f"Updated information: {napoleon}")

# Change existing information
napoleon["occupation"] = "Former Emperor"
print(f"Updated occupation: {napoleon['occupation']}")

# Calculate age at death
age_at_death = napoleon["death_year"] - napoleon["birth_year"]
print(f"{napoleon['name']} died at age {age_at_death}")

### 🔄 **Your Turn: Create a Historical Record**

Create a dictionary for a Canadian historical figure of your choice. Include:
1. name
2. birth_year
3. province_or_territory
4. occupation
5. major_achievement

Then add death_year (if applicable) and create a biographical sentence using f-strings.

In [None]:
# Your exercise: Create a Canadian historical figure dictionary
# Example with Emily Carr
canadian_figure = {
    "name": "Emily Carr",
    "birth_year": 1871,
    "province_or_territory": "British Columbia",
    "occupation": "Artist",
    "major_achievement": "Painting Indigenous culture and West Coast landscapes"
}

# Add death_year if needed
canadian_figure["death_year"] = 1945

# Create a biographical sentence
biography = f"{canadian_figure['name']} was born in {canadian_figure['birth_year']} in {canadian_figure['province_or_territory']}. She was a {canadian_figure['occupation']} known for {canadian_figure['major_achievement']}."
print(biography)

# Calculate lifespan
lifespan = canadian_figure["death_year"] - canadian_figure["birth_year"]
print(f"Emily Carr lived for {lifespan} years.")

# Try with your own Canadian figure:
# my_figure = {
#     "name": "...",
#     "birth_year": ...,
#     "province_or_territory": "...",
#     "occupation": "...",
#     "major_achievement": "..."
# }

## Step 5: Making Decisions with Conditions

Often in historical research, you need to categorize or filter information based on certain criteria. Python's `if` statements let you make decisions.

**Step 5a: Basic if statements**

Copy this code:
```python
# Categorize historical periods
year = 1066

if year < 1000:
    period = "Early Medieval"
elif year < 1500:
    period = "Late Medieval"
else:
    period = "Early Modern"

print(f"The year {year} is in the {period} period.")
```

In [None]:
# Basic if statements - copy and run this code
# Categorize historical periods
year = 1066

if year < 1000:
    period = "Early Medieval"
elif year < 1500:
    period = "Late Medieval"
else:
    period = "Early Modern"

print(f"The year {year} is in the {period} period.")

**Step 5b: Conditions with historical data**

Let's use conditions to analyze our historical figures:

In [None]:
# Analyze a historical figure
marie_curie = {
    "name": "Marie Curie",
    "birth_year": 1867,
    "death_year": 1934,
    "nationality": "Polish-French",
    "nobel_prizes": 2
}

# Calculate lifespan
lifespan = marie_curie["death_year"] - marie_curie["birth_year"]

# Categorize by lifespan
if lifespan < 50:
    lifespan_category = "short life"
elif lifespan < 70:
    lifespan_category = "average life"
else:
    lifespan_category = "long life"

# Analyze Nobel Prize achievements
if marie_curie["nobel_prizes"] > 1:
    achievement_level = "exceptional"
else:
    achievement_level = "notable"

print(f"{marie_curie['name']} lived {lifespan} years ({lifespan_category})")
print(f"With {marie_curie['nobel_prizes']} Nobel Prizes, she had {achievement_level} achievements.")

### 🔄 **Your Turn: Categorize Historical Periods**

Create a function that categorizes Canadian historical periods:
- Before 1534: Pre-European Contact
- 1534-1763: French Colonial Period
- 1763-1867: British Colonial Period
- 1867 and after: Confederation Era

Test it with several important dates in Canadian history.

In [None]:
# Your exercise: Categorize Canadian historical periods
# Test with these years: 1534, 1608, 1759, 1867, 1982

test_years = [1534, 1608, 1759, 1867, 1982]

print("Categorizing Canadian Historical Periods:")
print("=" * 40)

for year in test_years:
    # Categorize each year
    if year < 1534:
        period = "Pre-European Contact"
    elif year < 1763:
        period = "French Colonial Period"
    elif year < 1867:
        period = "British Colonial Period"
    else:
        period = "Confederation Era"
    
    print(f"{year}: {period}")

# Try with your own important dates:
# my_year = 1755  # Example: Deportation of Acadians
# Add your categorization logic here

## Step 6: Creating Reusable Code (Functions)

When you find yourself doing the same task repeatedly, it's time to create a function. Functions are reusable blocks of code that make your work more efficient.

**Step 6a: Your first function**

Functions are defined with `def` and can take input parameters:

Copy this code:
```python
def clean_document_title(messy_title):
    """Clean a messy historical document title"""
    clean_title = messy_title.strip().replace("---", ": ").replace("[scan]", "").title()
    return clean_title

# Test the function
messy1 = "  the federalist papers---number 10 [scan]  "
messy2 = "  COMMON SENSE---by thomas paine [scan]  "

clean1 = clean_document_title(messy1)
clean2 = clean_document_title(messy2)

print(f"Original: {messy1}")
print(f"Cleaned:  {clean1}")
print(f"\nOriginal: {messy2}")
print(f"Cleaned:  {clean2}")
```

In [None]:
# Define our document cleaning function first
def clean_document_title(messy_title):
    """Clean a messy historical document title"""
    clean_title = messy_title.strip().replace("---", ": ").replace("[scan]", "").replace("[digitized]", "").replace("[manuscript]", "").title()
    return clean_title

# Your first function - copy and run this code
messy1 = "  the federalist papers---number 10 [scan]  "
messy2 = "  COMMON SENSE---by thomas paine [scan]  "

clean1 = clean_document_title(messy1)
clean2 = clean_document_title(messy2)

print(f"Original: {messy1}")
print(f"Cleaned:  {clean1}")
print(f"\nOriginal: {messy2}")
print(f"Cleaned:  {clean2}")

**Step 6b: Functions with multiple parameters**

Functions can take multiple inputs to make them more flexible:

Copy this code:
```python
def calculate_age_at_event(birth_year, event_year, person_name, event_name):
    """Calculate how old someone was when an event occurred"""
    age = event_year - birth_year
    return f"{person_name} was {age} years old during {event_name} ({event_year})"

# Test with historical examples
result1 = calculate_age_at_event(1732, 1776, "George Washington", "American Revolution")
result2 = calculate_age_at_event(1815, 1867, "John A. Macdonald", "Canadian Confederation")

print(result1)
print(result2)
```

In [None]:
def calculate_age_at_event(birth_year, event_year, person_name, event_name):
    """Calculate how old someone was when an event occurred"""
    age = event_year - birth_year
    return f"{person_name} was {age} years old during {event_name} ({event_year})"

# Test with historical examples
result1 = calculate_age_at_event(1732, 1776, "George Washington", "American Revolution")
result2 = calculate_age_at_event(1844, 1885, "Louis Riel", "North-West Rebellion")

print(result1)
print(result2)

**Step 6c: Functions that work with lists**

Functions can process entire collections of data:

In [None]:
def analyze_historical_figures(figures_list):
    """Analyze a list of historical figure dictionaries"""
    print(f"Analyzing {len(figures_list)} historical figures:")
    print("=" * 50)
    
    for figure in figures_list:
        name = figure["name"]
        birth_year = figure["birth_year"]
        
        # Calculate current age if they were alive
        current_year = 2024
        theoretical_age = current_year - birth_year
        
        print(f"{name} (born {birth_year}) would be {theoretical_age} today")

# Test with multiple figures
historical_figures = [
    {"name": "Emily Carr", "birth_year": 1871},
    {"name": "Louis Riel", "birth_year": 1844},
    {"name": "Nellie McClung", "birth_year": 1873}
]

analyze_historical_figures(historical_figures)

### 🔄 **Your Turn: Create a Historical Analysis Function**

Create a function called `categorize_by_century` that:
1. Takes a birth year as input
2. Determines which century the person was born in
3. Returns a string like "19th century" or "20th century"

Test it with several Canadian historical figures.

In [None]:
# Your exercise: Create a century categorization function
def categorize_by_century(birth_year):
    """Determine which century a person was born in"""
    # Calculate the century from the birth year
    # Years 1801-1900 = 19th century, 1901-2000 = 20th century, etc.
    century = ((birth_year - 1) // 100) + 1
    
    # Convert to ordinal (1st, 2nd, 3rd, etc.)
    if century == 1:
        return "1st century"
    elif century == 2:
        return "2nd century"
    elif century == 3:
        return "3rd century"
    elif century in [4, 5, 6, 7, 8, 9, 10, 11, 12, 13]:
        return f"{century}th century"
    elif century == 21:
        return "21st century"
    else:
        return f"{century}th century"

# Test your function
test_figures = [
    ("Samuel de Champlain", 1574),
    ("Laura Secord", 1775),
    ("Tommy Douglas", 1904)
]

print("Century categorization:")
for name, birth_year in test_figures:
    century = categorize_by_century(birth_year)
    print(f"{name} (born {birth_year}) lived in the {century}")

# Try with your own historical figures:
# my_figure = ("Your choice", birth_year)
# century = categorize_by_century(birth_year)
# print(f"{my_figure[0]} lived in the {century}")

## Step 7: Processing Multiple Items (Loops)

Historians often need to process large amounts of data. Loops let you perform the same operation on many items automatically.

**Step 7a: Basic for loops**

For loops process each item in a list:

Copy this code:
```python
# Process a list of historical documents
documents = [
    "  the MAGNA CARTA [manuscript]  ",
    "  DECLARATION of independence [copy]  ",
    "  the CONSTITUTION of canada [official]  "
]

print("Cleaning historical documents:")
print("=" * 40)

for document in documents:
    clean_doc = document.strip().replace("[manuscript]", "").replace("[copy]", "").replace("[official]", "").title()
    print(f"Cleaned: {clean_doc}")
```

In [None]:
# Basic for loops - copy and run this code
# Process a list of historical documents
documents = [
    "  the MAGNA CARTA [manuscript]  ",
    "  DECLARATION of independence [copy]  ",
    "  the CONSTITUTION of canada [official]  "
]

print("Cleaning historical documents:")
print("=" * 40)

for document in documents:
    clean_doc = clean_document_title(document)
    print(f"Cleaned: {clean_doc}")

**Step 7b: Loops with conditions**

Combine loops with if statements to filter and categorize data:

In [None]:
# Analyze a list of historical events by date
canadian_events = [
    {"event": "Confederation", "year": 1867},
    {"event": "Battle of Plains of Abraham", "year": 1759},
    {"event": "Women get federal vote", "year": 1918},
    {"event": "Constitution Act", "year": 1982}
]

print("Categorizing Canadian historical events:")
print("=" * 45)

for event in canadian_events:
    name = event["event"]
    year = event["year"]
    
    # Categorize by historical period
    if year < 1763:
        period = "French Colonial"
    elif year < 1867:
        period = "British Colonial"
    elif year < 1950:
        period = "Early Confederation"
    else:
        period = "Modern Canada"
    
    print(f"{name} ({year}) - {period} period")

**Step 7c: Building new lists with loops**

You can create new lists by processing existing ones:

In [None]:
# Start with messy historical data
messy_titles = [
    "  HISTORY of NEW FRANCE---volume i [digitized]  ",
    "  the FUR TRADE in canada---economic impact [scan]  ",
    "  INDIGENOUS peoples---pre contact [manuscript]  "
]

# Create a new list of cleaned titles
clean_titles = []

for messy_title in messy_titles:
    # Clean each title using our function from earlier
    clean_title = clean_document_title(messy_title)
    clean_titles.append(clean_title)

print("Original titles:")
for title in messy_titles:
    print(f"  - {title}")

print("\nCleaned titles:")
for title in clean_titles:
    print(f"  - {title}")

### 🔄 **Your Turn: Process a Historical Dataset**

You have a list of Canadian Prime Ministers with their data. Process this list to:
1. Print each PM's name and the length of their term
2. Identify PMs who served more than 10 years
3. Create a new list of just the long-serving PMs

```python
prime_ministers = [
    {"name": "John A. Macdonald", "start_year": 1867, "end_year": 1873},
    {"name": "Wilfrid Laurier", "start_year": 1896, "end_year": 1911},
    {"name": "William Lyon Mackenzie King", "start_year": 1921, "end_year": 1948},
    {"name": "Pierre Trudeau", "start_year": 1968, "end_year": 1984}
]
```

In [None]:
# Your exercise: Process a historical dataset
prime_ministers = [
    {"name": "John A. Macdonald", "start_year": 1867, "end_year": 1873},
    {"name": "Wilfrid Laurier", "start_year": 1896, "end_year": 1911},
    {"name": "William Lyon Mackenzie King", "start_year": 1921, "end_year": 1948},
    {"name": "Pierre Trudeau", "start_year": 1968, "end_year": 1984}
]

print("Canadian Prime Ministers Analysis:")
print("=" * 45)

# Initialize list for long-serving PMs
long_serving_pms = []

# 1. Loop through the PMs and calculate term length
for pm in prime_ministers:
    name = pm["name"]
    term_length = pm["end_year"] - pm["start_year"]
    
    # 2. Print each PM's name and term length
    print(f"{name}: {term_length} years in office")
    
    # 3. Identify PMs who served more than 10 years
    if term_length > 10:
        long_serving_pms.append(pm)
        print(f"  → Long-serving PM!")

# 4. Show the long-serving PMs
print(f"\nLong-serving Prime Ministers (>10 years):")
for pm in long_serving_pms:
    term_length = pm["end_year"] - pm["start_year"]
    print(f"- {pm['name']}: {term_length} years")

print(f"\nTotal long-serving PMs: {len(long_serving_pms)}")

## Understanding Common Errors

Before diving into the final challenge, let's understand common Python errors you might encounter and how to handle them.

**Step 7d: Common Python Errors**

Learning to read error messages is crucial for debugging your code. Let's see some common errors:

In [None]:
# Common errors and how to fix them

print("1. TypeError - mixing incompatible data types:")
try:
    # This will cause an error:
    # age = "25"  # This is a string
    # future_age = age + 10  # Can't add string and number
    
    # Fixed version:
    age = "25"  # String from user input
    age_integer = int(age)  # Convert to integer
    future_age = age_integer + 10
    print(f"   Fixed: Current age {age_integer}, future age {future_age}")
except Exception as e:
    print(f"   Error caught: {e}")

print("\n2. KeyError - accessing dictionary key that doesn't exist:")
try:
    historical_figure = {"name": "Louis Riel", "birth_year": 1844}
    # This will cause an error:
    # death_year = historical_figure["death_year"]  # Key doesn't exist
    
    # Fixed version - check if key exists first:
    if "death_year" in historical_figure:
        death_year = historical_figure["death_year"]
    else:
        death_year = "Unknown"
        print(f"   Fixed: Death year is {death_year}")
except Exception as e:
    print(f"   Error caught: {e}")

print("\n3. IndexError - accessing list position that doesn't exist:")
try:
    provinces = ["Ontario", "Quebec"]
    # This will cause an error:
    # third_province = provinces[2]  # Only 0 and 1 exist
    
    # Fixed version - check list length first:
    if len(provinces) > 2:
        third_province = provinces[2]
    else:
        third_province = "Not available"
        print(f"   Fixed: Third province is {third_province}")
except Exception as e:
    print(f"   Error caught: {e}")

print("\nRemember: Error messages tell you what went wrong and where!")

## Final Challenge: Historical Data Processing Project

Now let's combine everything you've learned! You'll process a dataset of historical Canadian women and extract meaningful insights.

**Your task:** Analyze this dataset and answer these research questions:
1. Who lived the longest?
2. How many were born in each century?
3. What were the most common occupations?
4. Clean up the messy achievement descriptions

In [None]:
# Historical Canadian women dataset
canadian_women = [
    {
        "name": "Emily Carr",
        "birth_year": 1871,
        "death_year": 1945,
        "occupation": "Artist",
        "achievement": "  FAMOUS for---painting indigenous culture [renowned]  "
    },
    {
        "name": "Nellie McClung",
        "birth_year": 1873,
        "death_year": 1951,
        "occupation": "Activist",
        "achievement": "  WOMEN'S suffrage---pioneer [influential]  "
    },
    {
        "name": "Agnes Macphail",
        "birth_year": 1890,
        "death_year": 1954,
        "occupation": "Politician",
        "achievement": "  FIRST woman---in parliament [groundbreaking]  "
    },
    {
        "name": "Roberta Bondar",
        "birth_year": 1945,
        "death_year": None,  # Still alive
        "occupation": "Astronaut",
        "achievement": "  FIRST canadian woman---in space [historic]  "
    }
]

print("Analyzing Historical Canadian Women")
print("=" * 40)

# 1. FIND WHO LIVED THE LONGEST
print("1. Analyzing lifespans:")
longest_lived = None
max_lifespan = 0

for woman in canadian_women:
    name = woman["name"]
    birth_year = woman["birth_year"]
    death_year = woman["death_year"]
    
    if death_year is not None:  # Only calculate for those who have died
        lifespan = death_year - birth_year
        print(f"   {name}: {lifespan} years ({birth_year}-{death_year})")
        
        if lifespan > max_lifespan:
            max_lifespan = lifespan
            longest_lived = woman
    else:
        current_year = 2024
        current_age = current_year - birth_year
        print(f"   {name}: {current_age} years old (still alive, born {birth_year})")

if longest_lived:
    print(f"\n   → Longest lived: {longest_lived['name']} ({max_lifespan} years)")

# 2. COUNT BIRTHS BY CENTURY
print("\n2. Births by century:")
century_counts = {}

for woman in canadian_women:
    birth_year = woman["birth_year"]
    # Calculate century (1801-1900 = 19th, 1901-2000 = 20th, etc.)
    century = ((birth_year - 1) // 100) + 1
    century_name = f"{century}th century"
    
    if century_name in century_counts:
        century_counts[century_name] += 1
    else:
        century_counts[century_name] = 1

for century, count in century_counts.items():
    print(f"   {century}: {count} women")

# 3. COUNT OCCUPATIONS
print("\n3. Occupations:")
occupation_counts = {}

for woman in canadian_women:
    occupation = woman["occupation"]
    if occupation in occupation_counts:
        occupation_counts[occupation] += 1
    else:
        occupation_counts[occupation] = 1

for occupation, count in occupation_counts.items():
    print(f"   {occupation}: {count} women")

# 4. CLEAN ACHIEVEMENT DESCRIPTIONS
print("\n4. Cleaned achievements:")
for woman in canadian_women:
    name = woman["name"]
    messy_achievement = woman["achievement"]
    
    # Clean the achievement text using our string methods
    clean_achievement = messy_achievement.strip().replace("---", " ").replace("[renowned]", "").replace("[influential]", "").replace("[groundbreaking]", "").replace("[historic]", "").title()
    
    print(f"   {name}: {clean_achievement}")

print(f"\n🎉 Analysis complete! You've successfully processed {len(canadian_women)} historical records.")

## Summary: Your Python Foundation

🎉 **Congratulations!** You've built a solid foundation in Python for historical research. You now know:

**Core Programming Concepts:**
- ✅ Variables (strings, integers, booleans)
- ✅ Lists for collections of data
- ✅ Dictionaries for structured records
- ✅ Conditions (if/elif/else) for decision-making
- ✅ Functions for reusable code
- ✅ Loops for processing multiple items

**Text Processing Skills:**
- ✅ String cleaning with .strip(), .lower(), .title()
- ✅ Text replacement with .replace()
- ✅ Method chaining for complex cleaning
- ✅ F-strings for readable output

**Historical Data Skills:**
- ✅ Organizing historical information in structured formats
- ✅ Categorizing data by time periods
- ✅ Processing collections of historical records
- ✅ Building reusable analysis functions

**You're now ready for Notebook 2!**
In the next notebook, you'll use these skills to scrape historical data from websites, extract information from Internet Archive documents, and work with real Canadian historical sources.

**Key takeaway:** Python is a tool for historians. Every concept you've learned here will help you process, analyze, and understand historical data more efficiently than ever before.

### 🔄 **Bonus Challenge: Your Own Research Project**

Now that you've mastered the basics, try analyzing your own historical dataset! Choose Canadian historical figures from a specific time period or region and create your own analysis.

**Suggested projects:**
1. **Canadian Prime Ministers**: Analyze term lengths, provinces, and political parties
2. **Indigenous Leaders**: Research and categorize leaders by nation and time period
3. **Canadian Artists**: Analyze birth decades, art forms, and regional distribution
4. **Women in Politics**: Track the progression of women's political participation

Use the template below to structure your analysis:

In [None]:
# Your research project template
# Choose your own historical figures and research questions

# Step 1: Define your dataset
my_research_topic = "Your choice: e.g., 'Canadian Scientists'"
my_historical_figures = [
    {
        "name": "Your first figure",
        "birth_year": 1900,
        "death_year": 1990,  # or None if still alive
        "category": "Your categorization",  # e.g., field of science, political party, etc.
        "achievement": "What they're known for"
    },
    # Add more figures here...
]

print(f"Research Topic: {my_research_topic}")
print(f"Number of figures: {len(my_historical_figures)}")

# Step 2: Your research questions
# Example questions:
# - Who lived the longest/shortest?
# - What time periods are represented?
# - What are the most common categories?
# - How can you clean and standardize the achievement descriptions?

# Step 3: Write your analysis code
# Use the techniques you've learned:
# - Loops to process each figure
# - Conditions to categorize data
# - Dictionaries to count categories
# - String methods to clean text
# - Functions to make reusable code

# Your analysis code here:
print("\nYour analysis results will appear here when you add your code!")

# Step 4: Present your findings
# Create clear, readable output that answers your research questions

print("\n📊 Research Findings:")
print("1. [Your first finding]")
print("2. [Your second finding]")
print("3. [Your third finding]")