# Lecture 3: Python Best Practices

How to "read" this lecture notebook
<details>
<summary>click to expand</summary>

As you go through this notebook (or any notebook for this class), you will encounter new concepts and python code that implements them -- just like you would see in a textbook. Of course, in a textbook, it's easy to read code and an explanation of what it does and think that you understand it.
<br />

### Learn by doing
But this notebook is different from a textbook because it allows you to not just read the code, but play with it. **You can and should try out changing the code that you see**. In fact, in many places throughout this reading notebook, you will be asked to write your own code to experiment with a concept that was just covered. This is a form of "active reading" and the idea behind it is that we really learn by **doing**. 
<br />

### Change everything
But don't feel limited to only change code when I prompt you. This notebook is your learning environment and your playground. I encourage you to try changing and running all the code throughout the notebook and even to **add your own notes and new code blocks**. Adding comments to code to explain what you are testing, experimenting with or trying to do is really helpful to understand what you were thinking when you revisit it later. 
<br />

### Make this notebook your own
Make this notebook your own. Write your questions and thoughts. At the end of every reading notebook, I will ask the same set of questions to try to elicit your questions, reaction and feedback. When we review the reading notebook in class, I encourage you to share these.
</details>

## Learning Objectives

By the end of this lecture, you will be able to:
- Master f-strings for readable string formatting
- Write concise, Pythonic code using comprehensions
- Understand and apply lambda functions appropriately
- Read and understand type hints in modern Python code

<div class="callout" style="
  width: 80%;
  background: rgba(127,127,127,0.15);
  border: 1px solid rgba(127,127,127,0.3);
  padding: 10px 30px;
  margin: 20px;
  border-radius: 6px;
  text-align: justify;
  text-align-last: left;
  font-size: 11pt;
">
  <span style="
    font-size: 60pt;
    line-height: 1;
    float: left;
    margin: 0px 0px 0px 0;
  ">
    üß©
  </span>

  This lecture includes lots of specific Python details about syntax, formats, grammar, etc. Don't get overwhelmed or hung up on trying to memorize them. **We don't code in a vacuum!** We have lots of tools at our disposal to help us when writing code. Its better to internalize concepts like "I know there is a way to line up my string output nicely like a table with f-string formatting rules" even if you forget how to do so. *You can always look it up!*  
  <!-- clearfix -->
  <div style="clear: both;"></div>
</div>

# 3.0 Code Preface

In [None]:
from pathlib import Path
from datetime import datetime

# 3.1 F-Strings: String Formatting That Doesn't Stink

<img alt="Old string formatting was like bomb defusal... easy to mess up" src="../images/L03_bomb_defusal.png" width="800" style="display:block;">
<font size=2> Charlie defuses a "bomb" in <i>Die Hard with a Vengeance (1992)</i>. Don't worry, it was just pancake syrup. </font>

Before Python 3.6, formatting strings was... painful. Like watching someone try to defuse a bomb.

## The Old Ways (Don't Do This)

Let's say we want to print out some information about a character. Here's how people used to do it:

In [None]:
# The ancient ways of string formatting
hero_name = "John McClane"
building = "Nakatomi Plaza"
terrorists_defeated = 12

# Method 1: String concatenation (ugly and error-prone)
message1 = hero_name + " defeated " + str(terrorists_defeated) + " terrorists at " + building
print(message1)

In [None]:
# Method 2: %-formatting (old school, hard to read)
message2 = "%s defeated %d terrorists at %s" % (hero_name, terrorists_defeated, building)
print(message2)

In [None]:
# Method 3: .format() (better, but still clunky)
message3 = "{} defeated {} terrorists at {}".format(hero_name, terrorists_defeated, building)
print(message3)

All three approaches work, but they're hard to read and easy to mess up. The variable names are separated from where they appear in the string, making the code harder to understand at a glance.

## F-Strings: The Modern Way

F-strings (formatted string literals) were introduced in Python 3.6 and they changed everything. Just put an `f` before your string and you can embed expressions directly inside curly braces `{}`.

<div class="callout" style="
  width: 80%;
  background: rgba(127,127,127,0.15);
  border: 1px solid rgba(127,127,127,0.3);
  padding: 10px 30px;
  margin: 20px;
  border-radius: 6px;
  text-align: justify;
  text-align-last: left;
  font-size: 11pt;
">
  <span style="
    font-size: 60pt;
    line-height: 1;
    float: left;
    margin: 0px 0px 0px 0;
  ">
    üí°
  </span>

  The `f` in f-strings stands for "formatted". You'll sometimes see them called "formatted string literals" in documentation.
  
  <!-- clearfix -->
  <div style="clear: both;"></div>
</div>

In [None]:
# F-strings: clean, readable, and powerful
hero_name = "John McClane"
building = "Nakatomi Plaza"
terrorists_defeated = 12

message = f"{hero_name} defeated {terrorists_defeated} terrorists at {building}"
print(message)

Notice how much cleaner that is! The variables appear exactly where they'll be inserted into the string. No more counting placeholder positions or worrying about the order of arguments.

## Expressions Inside F-Strings

F-strings aren't limited to just variable names. You can put **any valid Python expression inside the curly braces**:

In [None]:
# Math expressions
proton_packs = 4
ghosts_per_pack = 127
print(f"The Ghostbusters can trap {proton_packs * ghosts_per_pack} ghosts total.")


In [None]:
import string

# Method calls
venkman_quote = "he slimed me!"
print(f"Catchphrase: {venkman_quote.upper()}")


In [None]:
# Conditional expressions
slimer_caught = False # Try changing this value to False and re-running the code.
print(f"Slimer status: {'Caught!' if slimer_caught else 'Still loose!'}")

## Formatting Numbers

F-strings really shine when you need to format numbers. After the variable name, use a colon `:` followed by a format specifier inside the curly braces.

Here are some common format specifiers:
- `:.2f` - Float with 2 decimal places
- `:.1%` - Percentage with 1 decimal place (multiplies by 100 automatically)
- `:,` - Add commas as thousands separators
- `:<15` - Left-align in a field of width 15 characters
- `:>15` - Right-align in a field of width 15 characters
- `:^15` - Center in a field of width 15 characters

For example:

In [None]:
# Formatting floating point numbers
model_accuracy = 0.8734521
print(f"Model accuracy: {model_accuracy}")
print(f"Model accuracy: {model_accuracy:.2f}")  # 2 decimal places
print(f"Model accuracy: {model_accuracy:.1%}")  # As percentage with 1 decimal

In [None]:
# Large numbers with thousands separator
box_office = 2847246203  # Avatar's box office
print(f"Box office: ${box_office:,}")

In [None]:
# Padding and alignment
for movie, year in [("Ghostbusters", 1984), ("Top Gun", 1986), ("Die Hard", 1988)]:
    print(f"{movie:<15} | {year}")

## Formatting Dates

<img alt="Date formatting skynet" src="../images/L03_skynet.png" width="800" style="display:block;">
<font size=2>Sarah Connor dreams of skynet nuclear strike in <i>Terminator 2: Judgment Day (1991)</i>.</font>

F-strings work beautifully with datetime objects.

Here are some common date format codes:
- `%Y` - 4-digit year (1997)
- `%m` - 2-digit month (08)
- `%d` - 2-digit day (29)
- `%B` - Full month name (August)
- `%A` - Full weekday name (Friday)
- `%I` - Hour (12-hour clock)
- `%H` - Hour (24-hour clock)
- `%M` - Minute
- `%p` - AM/PM

For example:

In [None]:
from datetime import datetime

# Date formatting
skynet_activation = datetime(1997, 8, 29, 2, 14, 0)

print(f"Skynet became self-aware on {skynet_activation}")
print(f"Date: {skynet_activation:%B %d, %Y}")
print(f"Time: {skynet_activation:%I:%M %p}")
print(f"Full: {skynet_activation:%A, %B %d, %Y at %I:%M %p}")

## Multi-line F-Strings

For longer formatted strings, you can use triple quotes:

In [None]:
# ML model report using multi-line f-string
model_name = "RandomForest"
accuracy = 0.8923
precision = 0.8756
recall = 0.9102
training_time = 45.7

report = f"""
‚ïî‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïó
‚ïë         MODEL PERFORMANCE REPORT     ‚ïë
‚ï†‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ï£
‚ïë  Model:     {model_name:<24} ‚ïë
‚ïë  Accuracy:  {accuracy:<24.2%} ‚ïë
‚ïë  Precision: {precision:<24.2%} ‚ïë
‚ïë  Recall:    {recall:<24.2%} ‚ïë
‚ïë  Training:  {f"{training_time:.1f} sec":<24} ‚ïë
‚ïö‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïù
"""
print(report)

<div class="callout" style="
  width: 80%;
  background: rgba(127,127,127,0.15);
  border: 1px solid rgba(127,127,127,0.3);
  padding: 10px 30px;
  margin: 20px;
  border-radius: 6px;
  text-align: justify;
  text-align-last: left;
  font-size: 11pt;
">
  <span style="
    font-size: 60pt;
    line-height: 1;
    float: left;
    margin: 0px 0px 0px 0;
  ">
    üí°
  </span>

  You might have noticed that the <b>Training</b> time line in the above example used a nested f-string. Why? The inner f-string formats the float to one decimal, the outer converts to it to a string and  left-aligns the field to 24 chars.
  
  <!-- clearfix -->
  <div style="clear: both;"></div>
</div>

There's more about f-strings that I'm not telling you:
<img alt="Visual cheat sheet of Python f-string format specifiers" src="../images/L03_fstring_cheatsheet.png" style="display:block;">
But honestly, if I need to do something more sophisticated, I just look it up.

## Debugging with F-Strings (Python 3.8+)

Here's a neat trick: adding `=` after an expression will **print both the expression and its value**. It's like having a tiny debugger built into your strings. As Egon Spengler might say, "Print is good."

In [None]:
# The = specifier for debugging (Python 3.8+)
x = 42
y = 17
print(f"{x=}, {y=}, {x + y=}")

In [None]:

# Works with more complex expressions too
ghost_types = ["Slimer", "Stay Puft", "Library Ghost"]
print(f"{len(ghost_types)=}")
print(f"{ghost_types[0].upper()=}")

<!-- Start Exercise 3.1 -->
<hr/>
<img src="../images/stop_right_margin.png" align="left">

<font size=3 color="darkred"> Exercise: F-String Formatting </font>
<div class="inclass_exercise_body" style="padding-left: 130px; width: 85%; text-align: justify;text-align-last: left;">
You've trained an ML model and need to report its results. Create an f-string that formats the following variables into a readable report. Your output should look something like:<br><br>
<code>Model 'LogisticRegression' achieved 87.3% accuracy on 1,234 samples (training time: 12.46 seconds)</code>
</div>

In [None]:
# Exercise: Format these variables into a nice report string
model_name = "LogisticRegression"
accuracy = 0.87345
n_samples = 1234
training_time_seconds = 12.4567

# Your code here - create an f-string called 'report'
report = ""  # Replace this with your f-string

print(report)

<hr/>
<!-- End Exercise 3.1 -->

# 3.2 List and Dictionary Comprehensions
<img alt="Comprehensions are elegant... like catching a fly with chopsticks" src="../images/L03_miyagi_fly.png" style="display:block;">
<font size=2>Mr. Miyagi tries to catch a fly with chopsticks in <i>Karate Kid (1984)</i>. Daniel succeeded moments later. Hmph, beginner's luck. </font>

Comprehensions are one of Python's most elegant features. They let you create lists, dictionaries, and sets in a single, readable line. As Daniel LaRusso learned from Mr. Miyagi, sometimes the most effective technique is also the most elegant.

## List Comprehensions: The Basics

Let's start with a simple example. Suppose we want to create a list of squared numbers:

In [None]:
# The traditional loop approach
squares_loop = []
for i in range(10):
    squares_loop.append(i ** 2)
print(f"Loop result: {squares_loop}")

In [None]:
# The list comprehension approach
squares_comp = [i ** 2 for i in range(10)]
print(f"Comprehension result: {squares_comp}")

Both produce the same result, but the comprehension is:
1. More concise (1 line vs 3 lines)
2. More readable once you're familiar with the syntax
3. Often faster (Python optimizes comprehensions internally)

The basic syntax is:
```python
[expression for item in iterable]
```

## Adding Conditions: Filtering

You can add an `if` clause to filter which items get included:

In [None]:
# Get only the 80s movies from a list
movies = [
    ("The Terminator", 1984),
    ("Aliens", 1986),
    ("Jurassic Park", 1993),
    ("Predator", 1987),
    ("The Matrix", 1999),
    ("Beetlejuice", 1988),
    ("Big Trouble in Little China", 1986)
]

# Traditional approach
eighties_movies_loop = []
for movie, year in movies:
    if 1980 <= year < 1990:
        eighties_movies_loop.append(movie)
print(f"Loop: {eighties_movies_loop}")

In [None]:
# Comprehension approach
eighties_movies_comp = [movie for movie, year in movies if 1980 <= year < 1990]
print(f"Comprehension: {eighties_movies_comp}")

The syntax with filtering:
```python
[expression for item in iterable if condition]
```

In [None]:
# Label movies as '80s' or '90s'
movie_labels = [
    f"{movie} (80s)" if 1980 <= year < 1990 else f"{movie} (90s)"
    for movie, year in movies
]
print(movie_labels)

Note the difference:
- `if` at the **end** = filtering (which items to include)
- `if/else` in the **expression** = transforming (what value to create)

## Dictionary Comprehensions

The same concept works for dictionaries. The syntax uses curly braces and a colon for key-value pairs:

In [None]:
# Create a dictionary mapping movie names to years
movie_years = {movie: year for movie, year in movies}
print(movie_years)

# Only 80s movies
eighties_dict = {movie: year for movie, year in movies if 1980 <= year < 1990}
print(eighties_dict)

## Using `zip()` to Combine Iterables

The `zip()` function is incredibly useful with comprehensions. It pairs up elements from multiple iterables like a zipper bringing two sides together.

<div class="callout" style="
  width: 80%;
  background: rgba(127,127,127,0.15);
  border: 1px solid rgba(127,127,127,0.3);
  padding: 10px 30px;
  margin: 20px;
  border-radius: 6px;
  text-align: justify;
  text-align-last: left;
  font-size: 11pt;
">
  <span style="
    font-size: 60pt;
    line-height: 1;
    float: left;
    margin: 0px 0px 0px 0;
  ">
    üí°
  </span>

  Think of <code>zip()</code> like pairing up dance partners. If you have a list of leads and a list of follows, zip brings them together pair by pair. If the lists are different lengths, it stops when the shorter one runs out.
  
  <!-- clearfix -->
  <div style="clear: both;"></div>
</div>

In [None]:
# Two separate lists
features = ["age", "income", "education", "experience"]
importances = [0.23, 0.45, 0.12, 0.20]

# Combine them into a dictionary with zip + dict comprehension
feature_importance = {feat: imp for feat, imp in zip(features, importances)}
print(feature_importance)

# Or even simpler - dict() can take zip directly!
feature_importance_simple = dict(zip(features, importances))
print(feature_importance_simple)

In [None]:
# Zip is great for iterating over parallel lists
callsigns = ["Maverick", "Goose", "Iceman", "Viper"]
pilots = ["Pete Mitchell", "Nick Bradshaw", "Tom Kazansky", "Mike Metcalf"]

# Print matched pairs
for callsign, real_name in zip(callsigns, pilots):
    print(f"{callsign}'s real name is {real_name}")

## Real-World Application: Finding Files with `pathlib`

One of the most practical uses of list comprehensions is working with file systems. The `pathlib` module combined with comprehensions makes it easy to find and filter files.

In [None]:
from pathlib import Path

# Get the current working directory
current_dir = Path(".")

# Find all Python files in the current directory (non-recursive)
python_files = [f for f in current_dir.iterdir() if f.suffix == ".ipynb"]
print(f"Python files in current dir: {python_files}")

# Find all png files recursively using glob
all_png_files = [f for f in current_dir.glob("../**/*.png")]
print(f"All PNG files (recursive): {all_png_files[:5]}...")  # Show first 5

In [None]:
# More complex example: find all PNG files and get their sizes
# Let's use the images directory if it exists
images_dir = Path("../images")

if images_dir.exists():
    png_files_with_sizes = [
        (f.name, f.stat().st_size / 1024)  # size in KB
        for f in images_dir.glob("*.png")
    ]
    
    # Create a nice report
    for name, size_kb in png_files_with_sizes:
        print(f"{name}: {size_kb:.1f} KB")
else:
    print("Images directory not found - that's okay for this demo!")

## Nested Comprehensions

You can nest comprehensions, though be careful‚Äîreadability matters! If a comprehension gets too complex, it's often better to use a regular loop.

In [None]:
# Flatten a matrix (list of lists) into a single list
some_matrix = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
]

print(f"Original matrix:\n{some_matrix}")

# Nested comprehension to flatten
flattened = [num for row in some_matrix for num in row]
print(f"Flattened: {flattened}")

# Read it like nested loops: "for row in matrix" then "for num in row"

In [None]:
# Create a multiplication table as a nested list
# This creates a list of lists (each inner list is a row)
mult_table = [[i * j for j in range(1, 6)] for i in range(1, 6)]  # Read this nested comprehension from the outside in

print("Multiplication table:")
print(mult_table)

In [None]:
print(f"\nFormatted multiplication table:")
for row in mult_table:
    print(" ".join(f"{value:3d}" for value in row))  # Here were using the string join() method to create a nicely formatted string for each row

## When NOT to Use Comprehensions

Comprehensions are great, but they're not always the best choice:

1. **When the logic is complex** - If you need multiple conditions or transformations, a loop is clearer
2. **When you need side effects** - Comprehensions are for creating new collections, not for executing actions
3. **When the line gets too long** - If it doesn't fit on one readable line, break it up

Remember what Axel Foley might say: "Just because you can doesn't mean you should."

In [None]:
# BAD: This comprehension is too complex and hard to read
#result = [transform(x) if condition1(x) else other_transform(x) for x in data if filter1(x) and filter2(x) or filter3(x)]

# BETTER: Use a regular loop when logic is complex
def process_data(data):
    result = []
    for x in data:
        # Complex filtering logic
        if not (x > 0 and x < 100):
            continue
        if x % 2 == 0 and x % 3 == 0:
            continue
            
        # Complex transformation
        if x < 50:
            result.append(x * 2)
        else:
            result.append(x ** 0.5)
    
    return result

# Clear, maintainable, debuggable

<!-- Start Exercise 3.2 -->
<hr/>
<img src="../images/stop_right_margin.png" align="left">

<font size=3 color="darkred"> Exercise: Comprehension Practice </font>
<div class="inclass_exercise_body" style="padding-left: 130px; width: 85%; text-align: justify;text-align-last: left;">
Given the following data about ML models, complete the exercises below:
</div>

In [None]:
# Model data: (name, accuracy, training_time_seconds)
models = [
    ("LogisticRegression", 0.82, 5.2),
    ("RandomForest", 0.91, 45.7),
    ("GradientBoosting", 0.93, 120.3),
    ("SVM", 0.78, 200.1),
    ("NeuralNetwork", 0.95, 350.8),
    ("DecisionTree", 0.75, 2.1),
]

# 1: Create a list of model names that have accuracy > 0.85
# Expected: ['RandomForest', 'GradientBoosting', 'NeuralNetwork']
high_accuracy_models = []  # Your comprehension here
print(f"High accuracy models: {high_accuracy_models}")

In [None]:

# 2: Create a dictionary mapping model names to their accuracy
# Expected: {'LogisticRegression': 0.82, 'RandomForest': 0.91, ...}
model_accuracy_dict = {}  # Your comprehension here
print(f"Model accuracies: {model_accuracy_dict}")

In [None]:

# 3: Use zip to create a list of tuples pairing names with a "fast"/"slow" label
# (training_time < 60 = "fast", otherwise "slow")
# Hint: You'll need a conditional expression
model_speed_labels = []  # Your comprehension here
print(f"Speed labels: {model_speed_labels}")

<hr/>
<!-- End Exercise 3.2 -->

# 3.3 Lambda Functions: Anonymous One-Liners

<img alt="Lambdas are one-liners" src="../images/L03_arnie_chopper.png" width="800" style="display:block;">
<font size=2> Arnie's classic one-liner from <i>Predator (1987)</i> </font>

Lambda functions are small, anonymous functions that you can define inline. They're like the hired guns of the Python world‚Äîthey show up, do one job, and disappear. As the mercenary Dutch might say before a mission, "Get to the chopper!" Sometimes you just need something quick and disposable.

## Basic Lambda Syntax

A lambda function is defined with the `lambda` keyword. The syntax is:
```python
lambda arguments: expression
```

For example:

In [None]:
# Regular function
def square(x):
    return x ** 2

# Equivalent lambda function
square_lambda = lambda x: x ** 2

# Both work the same way
print(f"Regular function: {square(5)}")
print(f"Lambda function: {square_lambda(5)}")

Key limitations of lambda functions:
- Can only contain a single expression (no statements)
- No assignments, no loops, no multiple lines
- The expression's result is automatically returned

But, they can take more than one (or even zero) argument:

In [None]:
# Lambdas can take multiple arguments
add = lambda x, y: x + y
print(f"3 + 4 = {add(3, 4)}")

In [None]:
# Lambdas can also take no arguments at all
get_greeting = lambda: "Hello, World!"
print(get_greeting())

## The Real Power: Lambdas as Arguments

The real power of lambdas isn't in assigning them to variables (that's actually discouraged‚Äîjust use a regular function). It's in passing them as arguments to other functions.

Common applications of lambdas include:
- sorting
- functools like `map()` and `filter()`
- Transforming dataframes with `apply()`


### Sorting with Custom Keys

The `sorted()` function and `.sort()` method accept a `key` parameter that specifies how to compare items:

In [None]:
# Sort movies by year

# movies is a list of tuples: (title, year, box_office_millions)
movies = [
    ("The Terminator", 1984, 78.3),
    ("Aliens", 1986, 131.1),
    ("Predator", 1987, 98.3),
    ("Die Hard", 1988, 140.8),
    ("Beetlejuice", 1988, 73.7),
]

# Sort by year (index 1)
by_year = sorted(movies, key=lambda movie: movie[1]) # key = a lambda function that takes a movie tuple as input and returns the year (index 1)
print("Sorted by year:")
for m in by_year:
    print(f"  {m[0]} ({m[1]})")

In [None]:
# Sort by box office (index 2), descending
by_box_office = sorted(movies, key=lambda movie: movie[2], reverse=True) # key = a lambda function that takes a movie tuple as input and returns the box office (index 2)
print("Sorted by box office (descending):")
for m in by_box_office:
    print(f"  {m[0]}: ${m[2]}M")

### Using `map()` and `filter()` (and other functools)

The `map()` and `filter()` functions are classic use cases for lambdas:

In [None]:
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# map() applies a function to every element
squared = list(map(lambda x: x ** 2, numbers))
print(f"Squared: {squared}")

# filter() keeps only elements where the function returns True
evens = list(filter(lambda x: x % 2 == 0, numbers))
print(f"Evens: {evens}")

# For this very simple use case, comprehensions are a much cleaner approach
squared_comp = [x ** 2 for x in numbers]
evens_comp = [x for x in numbers if x % 2 == 0]
print(f"Squared (comprehension): {squared_comp}")
print(f"Evens (comprehension): {evens_comp}")

# BUT, map()/filter() could be preferable if we are chaining multiple transformations/filters together
#  OR for large objects where lazy evaluation is beneficial.

## Lambdas in Pandas

One place where lambdas really shine is in pandas operations. The `.apply()` method is commonly used with lambdas:

In [None]:
import pandas as pd
pd.set_option('display.width', 1000)  # Set a wide display width to prevent wrapping


# Create a sample DataFrame
df = pd.DataFrame({
    'movie': ['The Terminator', 'Ghostbusters', 'Top Gun', 'Ferris Bueller'],
    'year': [1984, 1984, 1986, 1986],
    'budget_millions': [6.4, 30.0, 15.0, 5.8],
    'box_office_millions': [78.3, 295.2, 356.8, 70.1]
})

print("Original DataFrame:")
print(df)

In [None]:
# Calculate ROI using apply with a lambda
df['roi'] = df.apply(
    lambda row: (row['box_office_millions'] - row['budget_millions']) / row['budget_millions'],
    axis=1
)

# Format movie titles using apply on a series
df['movie_formatted'] = df['movie'].apply(lambda x: x.upper())   # Though, for this use case, I would just do df['movie'].str.upper()

print("Dataframe with new column:")
print(df)

**Use a lambda when:**
- The function is simple (one expression)
- You're using it once, inline
- It makes the code more readable, not less

**Use a named function when:**
- The logic is complex or multi-step
- You need to reuse the function
- You want a descriptive name for documentation
- You need to debug (lambdas show up as `<lambda>` in tracebacks)

<!-- Start Exercise 3.3 -->
<hr/>
<img src="../images/stop_right_margin.png" align="left">

<font size=3 color="darkred"> Exercise: Lambda Practice </font>
<div class="inclass_exercise_body" style="padding-left: 130px; width: 85%; text-align: justify;text-align-last: left;">
Complete the following exercises using lambda functions:
</div>

In [None]:
# Data: List of (model_name, accuracy, training_time) tuples
models = [
    ("LogisticRegression", 0.82, 5.2),
    ("RandomForest", 0.91, 45.7),
    ("GradientBoosting", 0.93, 120.3),
    ("SVM", 0.78, 200.1),
    ("NeuralNetwork", 0.95, 350.8),
]

# 1: Sort models by accuracy (highest first)
# Hint: Use sorted() with a lambda as the key, and reverse=True
by_accuracy = None  # Your code here
print(f"By accuracy: {by_accuracy}")

#  2: Sort models by "efficiency" (accuracy / training_time)
# Higher efficiency = better
by_efficiency = None  # Your code here
print(f"By efficiency: {by_efficiency}")

<hr/>
<!-- End Exercise 3.3 -->

# 3.4 Type Hints: Documentation That Helps You Code

<img alt="Type hints are like a mission briefing that helps you code" width="900" src="../images/L03_starwars_briefing.png" style="display:block;">
<font size=2>The rebel alliance gets a mission briefing before their mission to destroy the death star in <i>Star Wars (1977)</i></font>

Python is a dynamically typed language‚Äîyou don't have to declare variable types like you would in Java or C++. But as codebases grow larger, it becomes helpful to have some indication of what types a function expects and returns.

Enter type hints (also called type annotations). They were introduced in Python 3.5 and have become increasingly popular in modern Python code. Like having a **detailed briefing before a mission**, type hints tell you exactly what you're working with.

## Basic Type Hints for Variables

You can annotate variables with their expected type using a colon. The syntax is

```python
 variable_name: variable_type = assigned_value
 ```

Here are some examples:


In [None]:
# Basic type hints for variables
name: str = "Sarah Connor"
age: int = 29
is_targeted: bool = True
survival_probability: float = 0.73

# Python doesn't enforce these at runtime!
# This will work fine (but your IDE will warn you)
age = "twenty-nine"  # No error, but clearly wrong
print(f"Age is now: {age}")

<div class="callout" style="
  width: 80%;
  background: rgba(127,127,127,0.15);
  border: 1px solid rgba(127,127,127,0.3);
  padding: 10px 30px;
  margin: 20px;
  border-radius: 6px;
  text-align: justify;
  text-align-last: left;
  font-size: 11pt;
">
  <span style="
    font-size: 60pt;
    line-height: 1;
    float: left;
    margin: 0px 0px 0px 0;
  ">
    üí°
  </span>

  <b>Important:</b> Type hints are NOT enforced at runtime! Python will happily let you assign a string to an <code>int</code> variable. The hints are there for documentation, IDE support, and optional static analysis tools like <code>mypy</code>.
  
  <!-- clearfix -->
  <div style="clear: both;"></div>
</div>

## Type Hints for Functions

Where type hints really shine is in function signatures. You can annotate parameters and return types. The syntax is:

```python
def fn_name(argument_name: argument_type = default_value, ...) -> return_type:
    ...
```

where `= default_value` is optional (for cases when you want to specify a default value for the argument).

Here are some examples:

In [None]:
# Function without type hints (harder to understand)
def calculate_accuracy(correct, total):
    return correct / total

# Function WITH type hints (self-documenting!)
def calculate_accuracy_typed(correct: int, total: int) -> float:
    """Calculate accuracy as a ratio of correct predictions to total."""
    return correct / total

# The -> float indicates the return type
result = calculate_accuracy_typed(85, 100)
print(f"Accuracy: {result:.2%}")

## Collection Types: Lists, Dicts, and More

For collections, you can specify what types they contain. In Python 3.9+, you can use the built-in types directly. In earlier versions, you need to import from `typing`:

In [None]:
# Python 3.9+ syntax (preferred)
movie_titles: list[str] = ["Robocop", "Total Recall", "Running Man"]
movie_years: dict[str, int] = {"Robocop": 1987, "Total Recall": 1990}
coordinates: tuple[float, float] = (34.0522, -118.2437)  # LA coordinates

# For Python 3.8 and earlier, import from typing
from typing import List, Dict, Tuple
movie_titles_old: List[str] = ["Robocop", "Total Recall", "Running Man"]

print(f"Movies: {movie_titles}")
print(f"Years: {movie_years}")

## Optional and Union Types

Sometimes a value can be `None`, or it could be one of several types. Use `Optional` and `Union` (or the `|` operator in Python 3.10+):

In [None]:
from typing import Optional, Union

# Optional means "this type OR None"
def find_movie(title: str, movie_years: dict[str, int]) -> Optional[int]:
    """Return the year of a movie, or None if not found."""
    return movie_years.get(title)

# Union means "one of these types"
def get_id(identifier: Union[int, str]) -> str:
    """Accept either an int or string ID, return as string."""
    return str(identifier)

# Python 3.10+ syntax (cleaner!)
# def find_movie(title: str, movies: dict[str, int]) -> int | None:
# def get_id(identifier: int | str) -> str:

# Example usage
movies = {"Aliens": 1986, "Predator": 1987}
year = find_movie("Aliens", movies)
print(f"Aliens came out in: {year}")

missing = find_movie("Jaws", movies)
print(f"Jaws: {missing}")  # None

## Real-World Example: ML Function Signatures

Here's what type hints look like in a realistic ML context:

In [None]:
from typing import Optional
import pandas as pd

def train_model(
    X_train: pd.DataFrame,
    y_train: pd.Series,
    model_type: str = "random_forest",
    n_estimators: int = 100,
    random_state: Optional[int] = None
) -> dict[str, float]:
    """
    Train a classification model and return performance metrics.
    
    Args:
        X_train: Feature matrix
        y_train: Target labels
        model_type: Type of model ('random_forest', 'gradient_boost', etc.)
        n_estimators: Number of estimators for ensemble models
        random_state: Random seed for reproducibility
        
    Returns:
        Dictionary with 'accuracy', 'precision', 'recall', 'f1' scores
    """
    # Implementation would go here...
    # For now, just return dummy metrics
    return {
        "accuracy": 0.92,
        "precision": 0.89,
        "recall": 0.94,
        "f1": 0.91
    }

# When you hover over this function in VS Code, you'll see all the type info!
# This makes the function self-documenting
metrics = train_model(X_train=pd.DataFrame(), y_train=pd.Series())

## Why Use Type Hints?

1. **IDE Support**: Your editor (VS Code, PyCharm) will provide better autocomplete and catch errors before you run the code
2. **Documentation**: The function signature tells you exactly what types to pass
3. **Maintainability**: Future you (and your teammates) will thank you
4. **Bug Prevention**: Static analysis tools like `mypy` can catch type errors

<div class="callout" style="
  width: 80%;
  background: rgba(127,127,127,0.15);
  border: 1px solid rgba(127,127,127,0.3);
  padding: 10px 30px;
  margin: 20px;
  border-radius: 6px;
  text-align: justify;
  text-align-last: left;
  font-size: 11pt;
">
  <span style="
    font-size: 60pt;
    line-height: 1;
    float: left;
    margin: 0px 0px 0px 0;
  ">
    üß†
  </span>

<font color=blue><b>Q</b></font>: Do I have to use type hints?

<font color=blue><b>A</b></font>: No! Type hints are completely optional. You'll see them everywhere in modern Python libraries, so you need to be able to <i>read</i> them. Whether you <i>write</i> them is up to you and your team. For data science and ML work, many people skip them for quick scripts but use them for production code.
  
  <!-- clearfix -->
  <div style="clear: both;"></div>
</div>

<!-- Start Exercise 3.4 -->
<hr/>
<img src="../images/stop_right_margin.png" align="left">

<font size=3 color="darkred"> In-Class Exercise: Reading Type Hints </font>
<div class="inclass_exercise_body" style="padding-left: 130px; width: 85%; text-align: justify;text-align-last: left;">
Look at the following function signature and answer the questions in the comments:
</div>

In [None]:
from typing import Optional

def evaluate_models(
    model_names: list[str],
    scores: dict[str, float],
    threshold: float = 0.8,
    include_metadata: bool = False
) -> tuple[list[str], Optional[dict]]:
    """Evaluate models and return those passing the threshold."""
    pass  # Implementation not shown

# Questions:
# 1. What type should model_names be? 
#    Answer: 

# 2. What types does scores map from and to?
#    Answer: 

# 3. What is the default value of threshold?
#    Answer: 

# 4. What type does this function return?
#    Answer: 

# 5. What does the Optional mean in the return of this function?
#    Answer:

<hr/>
<!-- End Exercise 3.4 -->

# Summary

In this lecture, we covered four essential Python best practices and modern patterns:

## F-Strings
- Clean, readable string formatting with `f"...{expression}..."`
- Format specifiers for numbers, percentages, and dates
- Debugging with the `=` specifier

## List and Dictionary Comprehensions
- Concise list creation: `[expr for item in iterable if condition]`
- Dictionary comprehensions: `{key: value for item in iterable}`
- Using `zip()` to combine iterables
- Real-world file operations with `pathlib`

## Lambda Functions
- Anonymous one-liners: `lambda args: expression`
- Best used as arguments to `sorted()`, `map()`, `filter()`, `.apply()`
- When to use named functions instead

## Type Hints
- Annotating variables and functions with types
- Common types: `str`, `int`, `float`, `list[T]`, `dict[K, V]`, `Optional[T]`
- Benefits: IDE support, documentation, bug prevention
- Not enforced at runtime‚Äîjust helpful documentation

These patterns will make your code more readable, maintainable, and robust. You'll see them everywhere in modern Python libraries and production code‚Äîlike the unwritten rules that everyone in the field just knows.

# Questions, Reactions, and Feedback

As always, I'd like to hear your thoughts on this material:

1. **What concepts were most new or surprising to you?**

2. **What would you like to see explained in more depth?**

3. **Any questions that came up as you worked through the notebook?**

4. **What connections do you see between these patterns and the ML work you'll be doing?**