# Python for ML - Part 2: Advanced Data Structures & Functions

**Prerequisites:** Python Basics Part 1 (variables, operators, strings, basic lists)

## 📖 Topics Covered

### Advanced Data Structures
- **Tuples**: Immutable sequences, packing/unpacking, use cases
- **Sets**: Unique elements, set operations, performance benefits
- **Dictionaries**: Key-value pairs, methods, nested dicts, comprehensions

### Functions
- Function basics: definition, parameters, return values
- Default arguments, *args, **kwargs
- Lambda functions, map, filter, reduce



# Part 1: Advanced Data Structures

## 1. Tuples - Immutable Sequences

### What are Tuples?
- **Ordered** collection (like lists)
- **Immutable** (cannot be changed after creation)
- Created using parentheses `()` or just commas

### When to Use Tuples?
1. **Fixed data**: Coordinates `(x, y)`, RGB colors `(255, 0, 0)`
2. **Function returns**: Return multiple values
3. **Dictionary keys**: Tuples can be dict keys, lists cannot!
4. **Performance**: Faster than lists for iteration
5. **Data integrity**: Prevent accidental modifications

### ML Use Cases
- Image dimensions: `(height, width, channels)`
- Model architecture: `(input_size, hidden_size, output_size)`
- Data shape: `tensor.shape` returns a tuple


In [None]:
# ========================================
# TUPLES: BASIC OPERATIONS
# ========================================

# Creating tuples
print("=== Creating Tuples ===")
point = (3, 5)
rgb = (255, 128, 0)
single = (42,)  # Note the comma! Without it, it's just a number
empty = ()
no_parens = 1, 2, 3  # Tuple packing

print(f"Point: {point}")
print(f"RGB: {rgb}")
print(f"Single element: {single}, type: {type(single)}")
print(f"Without comma: {(42)}, type: {type((42))}")
print(f"No parentheses: {no_parens}, type: {type(no_parens)}")

# Accessing elements (just like lists)
print("\n=== Accessing Elements ===")
print(f"First element: {point[0]}")
print(f"Last element: {rgb[-1]}")
print(f"Slicing: {rgb[1:3]}")

# Common operations
print("\n=== Common Operations ===")
print(f"Length: {len(rgb)}")
print(f"Count 255: {rgb.count(255)}")
print(f"Index of 128: {rgb.index(128)}")
print(f"Concatenation: {point + (7, 9)}")
print(f"Repetition: {(1, 2) * 3}")

# Immutability demonstration
print("\n=== Immutability ===")
try:
# try: Begin block to test for errors/exceptions
    point[0] = 10  # This will fail!
except TypeError as e:
# except: Handle specific error that occurred in try block
    print(f"Error: {e}")
    print("✓ Tuples cannot be modified!")

### 🎁 Tuple Unpacking - A Powerful Pattern

Unpacking is one of Python's most elegant features. It's used EVERYWHERE in ML code!

In [None]:
# ========================================
# TUPLE UNPACKING PATTERNS
# ========================================

# Basic unpacking
print("=== Basic Unpacking ===")
point = (10, 20)
x, y = point
print(f"x = {x}, y = {y}")


# Ignoring values with _
print("\n=== Ignoring Values ===")
rgb = (255, 128, 64)
red, _, _ = rgb  # We only care about red
print(f"Red channel: {red}")

# Unpacking in loops
print("\n=== Unpacking in Loops ===")
coordinates = [(0, 0), (1, 2), (3, 4)]
for x, y in coordinates:
# for: Loop through each item in an iterable (list, tuple, set, dict, etc.)
    print(f"Point: ({x}, {y})")

# Real ML example: unpacking batch data
print("\n=== ML Example: Batch Unpacking ===")
# Simulating a batch of (features, label) pairs
batch = [
    ([1.2, 3.4, 5.6], 0),
    ([2.1, 4.3, 6.5], 1),
    ([3.2, 5.4, 7.6], 0)
]

for features, label in batch:
# for: Loop through each item in an iterable (list, tuple, set, dict, etc.)
    print(f"Features: {features}, Label: {label}")

### ✏️ Exercise 1: Tuple Practice

Complete the following tasks:

In [None]:
# Exercise 1.1: Create a tuple representing an image shape (224, 224, 3)
# and unpack it into height, width, channels
image_shape = None  # Your code here
# height, width, channels = image_shape
# print(f"Image: {height}x{width}, {channels} channels")

# Exercise 1.2: Given coordinates, swap them
# coord1, coord2 = (5, 10), (15, 20)
# Swap them in one line
# Your code here

# Exercise 1.3: Extract first and last from this tuple
# scores = (92, 85, 78, 95, 88)
# first, *_, last = ???
# Your code here

**Solutions:**

In [None]:
# Solution 1.1: Create tuple and unpack
image_shape = (224, 224, 3)
height, width, channels = image_shape
print(f"Image: {height}x{width}, {channels} channels")
# Output: Image: 224x224, 3 channels

# Solution 1.2: Swap coordinates
coord1, coord2 = (5, 10), (15, 20)
coord1, coord2 = coord2, coord1  # Swap in one line
print(f"After swap: coord1={coord1}, coord2={coord2}")
# Output: After swap: coord1=(15, 20), coord2=(5, 10)

# Solution 1.3: Extract first and last
scores = (92, 85, 78, 95, 88)
first, *_, last = scores
print(f"First: {first}, Last: {last}")
# Output: First: 92, Last: 88

### Advanced Tuple Techniques

In [1]:
# ========================================
# ADVANCED TUPLE PATTERNS
# ========================================

# Named tuples (from collections module)
print("=== Named Tuples ===")
from collections import namedtuple

# Define a Point type
Point = namedtuple('Point', ['x', 'y'])
p = Point(3, 5)
print(f"Point: {p}")
print(f"Access by name: p.x={p.x}, p.y={p.y}")
print(f"Access by index: p[0]={p[0]}, p[1]={p[1]}")

# ML example: Model configuration
ModelConfig = namedtuple('ModelConfig', ['input_size', 'hidden_size', 'output_size', 'dropout'])
config = ModelConfig(784, 256, 10, 0.5)
print(f"\nModel Config: {config}")
print(f"Hidden layer size: {config.hidden_size}")

# Tuples as dictionary keys
print("\n=== Tuples as Dict Keys ===")
# Useful for storing data by coordinates, pairs, etc.
grid = {}
grid[(0, 0)] = "origin"
grid[(1, 0)] = "right"
grid[(0, 1)] = "up"

print(f"Grid: {grid}")
print(f"Value at (1, 0): {grid[(1, 0)]}")

# ML example: Confusion matrix
confusion_matrix = {
    ('actual_0', 'pred_0'): 95,
    ('actual_0', 'pred_1'): 5,
    ('actual_1', 'pred_0'): 3,
    ('actual_1', 'pred_1'): 97
}
print(f"\nConfusion Matrix: {confusion_matrix}")
print(f"True Positives: {confusion_matrix[('actual_1', 'pred_1')]}")

=== Named Tuples ===
Point: Point(x=3, y=5)
Access by name: p.x=3, p.y=5
Access by index: p[0]=3, p[1]=5

Model Config: ModelConfig(input_size=784, hidden_size=256, output_size=10, dropout=0.5)
Hidden layer size: 256

=== Tuples as Dict Keys ===
Grid: {(0, 0): 'origin', (1, 0): 'right', (0, 1): 'up'}
Value at (1, 0): right

Confusion Matrix: {('actual_0', 'pred_0'): 95, ('actual_0', 'pred_1'): 5, ('actual_1', 'pred_0'): 3, ('actual_1', 'pred_1'): 97}
True Positives: 97


## 2. Sets - Unique Elements Collection

### What are Sets?
- **Unordered** collection of **unique** elements
- Created using curly braces `{}` or `set()`
- **Mutable** (can add/remove elements)
- Based on hash tables → **very fast** lookups!

### When to Use Sets?
1. **Remove duplicates** from a list
2. **Membership testing**: "Is X in the collection?"
3. **Set operations**: Union, intersection, difference
4. **Unique value tracking**: Track unique users, unique words, etc.

### 📊 ML Use Cases
- Vocabulary creation (unique words in text)
- Label validation (ensure all labels are from expected set)
- Feature engineering (unique categories)
- Data cleaning (find unique values, duplicates)

### Performance Tip
```python
# Checking if item in collection:
item in my_list    # O(n) - slow for large lists
item in my_set     # O(1) - instant!
```


In [6]:
# ========================================
# SETS: BASIC OPERATIONS
# ========================================

# Creating sets


print("=== Creating Sets ===")
fruits = {'apple', 'banana', 'cherry'}
numbers = set[int]([1, 2, 3, 2, 1])  # Duplicates removed!
empty_set = set()  # Note: {} creates an empty dict, not set!

print(f"Fruits: {fruits}")
print(f"Numbers (duplicates removed): {numbers}")
print(f"Empty set: {empty_set}, type: {type(empty_set)}")
print(f"Empty dict: {{}}, type: {type({})}")

# Adding and removing
print("\n=== Adding and Removing ===")
fruits.add('mango')
# .add(): Add a single element to the set (no effect if already exists)
print(f"After add: {fruits}")

fruits.remove('banana')  # Raises KeyError if not found
print(f"After remove: {fruits}")

fruits.discard('grape')  # No error if not found
print(f"After discard (non-existent): {fruits}")

popped = fruits.pop()  # Remove and return arbitrary element
print(f"Popped: {popped}, Remaining: {fruits}")

# Membership testing (FAST!)
print("\n=== Membership Testing ===")
print(f"'apple' in fruits: {'apple' in fruits}")
print(f"'banana' in fruits: {'banana' in fruits}")

# Real ML example: Unique labels
print("\n=== ML Example: Finding Unique Labels ===")
labels = [0, 1, 0, 0, 1, 2, 1, 0, 2, 1, 1, 0]
unique_labels = set(labels)
# set(): Create a set - unordered collection of unique elements, O(1) lookup
print(f"All labels: {labels}")
print(f"Unique labels: {unique_labels}")
print(f"Number of classes: {len(unique_labels)}")

=== Creating Sets ===
Fruits: {'apple', 'cherry', 'banana'}
Numbers (duplicates removed): {1, 2, 3}
Empty set: set(), type: <class 'set'>
Empty dict: {}, type: <class 'dict'>

=== Adding and Removing ===
After add: {'apple', 'mango', 'cherry', 'banana'}
After remove: {'apple', 'mango', 'cherry'}
After discard (non-existent): {'apple', 'mango', 'cherry'}
Popped: apple, Remaining: {'mango', 'cherry'}

=== Membership Testing ===
'apple' in fruits: False
'banana' in fruits: False

=== ML Example: Finding Unique Labels ===
All labels: [0, 1, 0, 0, 1, 2, 1, 0, 2, 1, 1, 0]
Unique labels: {0, 1, 2}
Number of classes: 3


In [3]:
# ========================================
# SET OPERATIONS (The Real Power!)
# ========================================

# Setup
A = {1, 2, 3, 4, 5}
B = {4, 5, 6, 7, 8}

print("Set A:", A)
print("Set B:", B)

# Union: All elements from both sets
print("\n=== Union (A ∪ B) ===")
print(f"A | B = {A | B}")
print(f"A.union(B) = {A.union(B)}")

# Intersection: Common elements
print("\n=== Intersection (A ∩ B) ===")
print(f"A & B = {A & B}")
print(f"A.intersection(B) = {A.intersection(B)}")

# Difference: Elements in A but not in B
print("\n=== Difference (A - B) ===")
print(f"A - B = {A - B}")
print(f"B - A = {B - A}")
print(f"A.difference(B) = {A.difference(B)}")

# Symmetric Difference: Elements in either A or B but not both
print("\n=== Symmetric Difference (A △ B) ===")
print(f"A ^ B = {A ^ B}")
print(f"A.symmetric_difference(B) = {A.symmetric_difference(B)}")

# Subset and Superset
print("\n=== Subset and Superset ===")
C = {1, 2, 3}
print(f"C = {C}")
print(f"C is subset of A: {C.issubset(A)}")
# .issubset(): Check if all elements of this set are in another set
print(f"A is superset of C: {A.issuperset(C)}")
# .issuperset(): Check if this set contains all elements of another set

# Disjoint
print("\n=== Disjoint Sets ===")
D = {10, 20, 30}
print(f"D = {D}")
print(f"A and D are disjoint: {A.isdisjoint(D)}")
print(f"A and B are disjoint: {A.isdisjoint(B)}")

Set A: {1, 2, 3, 4, 5}
Set B: {4, 5, 6, 7, 8}

=== Union (A ∪ B) ===
A | B = {1, 2, 3, 4, 5, 6, 7, 8}
A.union(B) = {1, 2, 3, 4, 5, 6, 7, 8}

=== Intersection (A ∩ B) ===
A & B = {4, 5}
A.intersection(B) = {4, 5}

=== Difference (A - B) ===
A - B = {1, 2, 3}
B - A = {8, 6, 7}
A.difference(B) = {1, 2, 3}

=== Symmetric Difference (A △ B) ===
A ^ B = {1, 2, 3, 6, 7, 8}
A.symmetric_difference(B) = {1, 2, 3, 6, 7, 8}

=== Subset and Superset ===
C = {1, 2, 3}
C is subset of A: True
A is superset of C: True

=== Disjoint Sets ===
D = {10, 20, 30}
A and D are disjoint: True
A and B are disjoint: False


### 🎓 Real ML Examples with Sets

In [None]:
# ========================================
# SETS IN ML: PRACTICAL EXAMPLES
# ========================================

# Example 1: Text Vocabulary
print("=== Example 1: Building Vocabulary ===")
documents = [
    "machine learning is awesome",
    "deep learning is powerful",
    "machine learning and deep learning"
]

# Build vocabulary (unique words)
vocabulary = set()
# set(): Create a set - unordered collection of unique elements, O(1) lookup
for doc in documents:
# for: Loop through each item in an iterable (list, tuple, set, dict, etc.)
    words = doc.split()
    vocabulary.update(words)  # Add all words to set

print(f"Vocabulary: {vocabulary}")
print(f"Vocabulary size: {len(vocabulary)}")

# Example 2: Train-Test Split Validation
print("\n=== Example 2: Checking Data Leakage ===")
train_ids = {1, 2, 3, 4, 5, 6, 7, 8}
test_ids = {9, 10, 11, 12}

# Check for overlap (data leakage!)
overlap = train_ids & test_ids
if overlap:
# if: Execute code block only if condition is True
    print(f"⚠️ WARNING: Data leakage detected! Overlap: {overlap}")
else:
# else: Execute if all previous if/elif conditions were False
    print("✓ No data leakage - train and test are disjoint")

# Example 3: Label Validation
print("\n=== Example 3: Validating Labels ===")
expected_labels = {'cat', 'dog', 'bird'}
actual_labels = ['cat', 'dog', 'cat', 'bird', 'fish', 'cat']

actual_label_set = set(actual_labels)
# set(): Create a set - unordered collection of unique elements, O(1) lookup
unexpected = actual_label_set - expected_labels

if unexpected:
# if: Execute code block only if condition is True
    print(f"⚠️ Unexpected labels found: {unexpected}")
else:
# else: Execute if all previous if/elif conditions were False
    print("✓ All labels are valid")

# Example 4: Finding Common Features
print("\n=== Example 4: Feature Selection ===")
features_dataset1 = {'age', 'height', 'weight', 'income', 'education'}
features_dataset2 = {'age', 'weight', 'occupation', 'income', 'city'}

common_features = features_dataset1 & features_dataset2
print(f"Common features: {common_features}")

unique_to_dataset1 = features_dataset1 - features_dataset2
unique_to_dataset2 = features_dataset2 - features_dataset1
print(f"Unique to dataset 1: {unique_to_dataset1}")
print(f"Unique to dataset 2: {unique_to_dataset2}")

### ✏️ Exercise 2: Set Practice

In [None]:
# Exercise 2.1: Remove duplicates from this list
# user_ids = [101, 102, 103, 102, 104, 101, 105, 103]
# unique_users = ???
# Your code here

# Exercise 2.2: Find students who are in both classes
# class_A = {'Alice', 'Bob', 'Charlie', 'David'}
# class_B = {'Charlie', 'David', 'Eve', 'Frank'}
# both_classes = ???
# Your code here

# Exercise 2.3: Find words that appear in doc1 but not in doc2
# doc1 = "python is great for machine learning"
# doc2 = "machine learning is powerful"
# unique_to_doc1 = ???
# Your code here

**Solutions:**

In [None]:
# Solution 2.1: Remove duplicates
user_ids = [101, 102, 103, 102, 104, 101, 105, 103]
unique_users = set(user_ids)
# set(): Create a set - unordered collection of unique elements, O(1) lookup
print(f"Unique users: {unique_users}")
# or convert back to sorted list:
unique_users_list = sorted(unique_users)
# sorted(): Return new sorted list from iterable (original unchanged)
print(f"Sorted unique users: {unique_users_list}")
# Output: {101, 102, 103, 104, 105}

# Solution 2.2: Find students in both classes
class_A = {'Alice', 'Bob', 'Charlie', 'David'}
class_B = {'Charlie', 'David', 'Eve', 'Frank'}
both_classes = class_A & class_B  # or class_A.intersection(class_B)
print(f"Students in both classes: {both_classes}")
# Output: {'Charlie', 'David'}

# Solution 2.3: Words unique to doc1
doc1 = "python is great for machine learning"
doc2 = "machine learning is powerful"
words_doc1 = set(doc1.split())
# set(): Create a set - unordered collection of unique elements, O(1) lookup
words_doc2 = set(doc2.split())
# set(): Create a set - unordered collection of unique elements, O(1) lookup
unique_to_doc1 = words_doc1 - words_doc2
print(f"Words unique to doc1: {unique_to_doc1}")
# Output: {'python', 'great', 'for'}

## 3. Dictionaries - Key-Value Power

### What are Dictionaries?
- **Unordered** collection of **key-value pairs**
- Keys must be **unique** and **hashable** (immutable)
- Values can be anything
- Created using `{}` or `dict()`
- Since Python 3.7+, dicts maintain insertion order!

### When to Use Dictionaries?
1. **Lookup by name/ID**: Fast retrieval by key
2. **Counting**: Count occurrences of items
3. **Grouping**: Group data by category
4. **Configuration**: Store settings and parameters
5. **Caching**: Store computed results

### 📊 ML Use Cases
- Model hyperparameters
- Feature mappings (label encoding)
- Word frequency counts
- Batch data (keys: feature names, values: arrays)
- Configuration files (JSON → dict)

### Performance
- **Lookup, insert, delete**: O(1) average case
- **Much faster than lists** for searching!


In [7]:
# ========================================
# DICTIONARIES: BASICS
# ========================================

# Creating dictionaries
print("=== Creating Dictionaries ===")
student = {'name': 'Alice', 'age': 20, 'grade': 'A'}
empty_dict = {}
from_pairs = dict([('a', 1), ('b', 2), ('c', 3)])
# dict(): Create a dictionary - key-value pairs with O(1) access by key
from_kwargs = dict(x=10, y=20, z=30)
# dict(): Create a dictionary - key-value pairs with O(1) access by key

print(f"Student: {student}")
print(f"Empty dict: {empty_dict}")
print(f"From pairs: {from_pairs}")
print(f"From kwargs: {from_kwargs}")

# Accessing values
print("\n=== Accessing Values ===")
print(f"Student name: {student['name']}")
print(f"Student age: {student.get('age')}")
# .get(): Safely get dict value by key (returns None if key not found, no error)
print(f"Student email (not exists): {student.get('email', 'Not provided')}")
# .get(): Safely get dict value by key (returns None if key not found, no error)

# Try-except for missing keys
try:
# try: Begin block to test for errors/exceptions
    print(student['email'])  # KeyError!
except KeyError:
# except: Handle specific error that occurred in try block
    print("KeyError: 'email' not found")

# Adding and modifying
print("\n=== Adding and Modifying ===")
student['email'] = 'alice@example.com'
student['age'] = 21  # Modify existing
print(f"Updated student: {student}")

# Removing items
print("\n=== Removing Items ===")
del student['grade']
print(f"After del: {student}")

popped_age = student.pop('age')
print(f"Popped age: {popped_age}, Remaining: {student}")

# Checking keys
print("\n=== Checking Keys ===")
print(f"'name' in student: {'name' in student}")
print(f"'age' in student: {'age' in student}")

=== Creating Dictionaries ===
Student: {'name': 'Alice', 'age': 20, 'grade': 'A'}
Empty dict: {}
From pairs: {'a': 1, 'b': 2, 'c': 3}
From kwargs: {'x': 10, 'y': 20, 'z': 30}

=== Accessing Values ===
Student name: Alice
Student age: 20
Student email (not exists): Not provided
KeyError: 'email' not found

=== Adding and Modifying ===
Updated student: {'name': 'Alice', 'age': 21, 'grade': 'A', 'email': 'alice@example.com'}

=== Removing Items ===
After del: {'name': 'Alice', 'age': 21, 'email': 'alice@example.com'}
Popped age: 21, Remaining: {'name': 'Alice', 'email': 'alice@example.com'}

=== Checking Keys ===
'name' in student: True
'age' in student: False


In [8]:
# ========================================
# DICTIONARY METHODS
# ========================================

scores = {'Alice': 95, 'Bob': 87, 'Charlie': 92, 'David': 88}

print("=== Dictionary Methods ===")
print(f"Original: {scores}")

# keys(), values(), items()
print("\n=== Keys, Values, Items ===")
print(f"Keys: {list(scores.keys())}")
# .keys(): Return view of all dictionary keys
print(f"Values: {list(scores.values())}")
# .values(): Return view of all dictionary values
print(f"Items: {list(scores.items())}")
# .items(): Return view of (key, value) pairs as tuples

# Iterating
print("\n=== Iteration ===")
for name in scores:  # Iterates over keys
    print(f"{name}: {scores[name]}")

print()
for name, score in scores.items():  # Better way!
    print(f"{name} scored {score}")

# update() - merge dictionaries
print("\n=== Update (Merge) ===")
new_scores = {'Eve': 90, 'Frank': 85}
scores.update(new_scores)
# .update(): Add multiple elements to set or update dict with key-value pairs
print(f"After update: {scores}")

# setdefault() - get or set default
print("\n=== Setdefault ===")
print(f"Alice's score: {scores.setdefault('Alice', 0)}")
# .setdefault(): Get value if key exists, otherwise set default and return it
print(f"Grace's score: {scores.setdefault('Grace', 0)}")
# .setdefault(): Get value if key exists, otherwise set default and return it
print(f"After setdefault: {scores}")

# fromkeys() - create dict from keys
print("\n=== Fromkeys ===")
names = ['Alice', 'Bob', 'Charlie']
initialized = dict.fromkeys(names, 0)
print(f"Initialized scores: {initialized}")

=== Dictionary Methods ===
Original: {'Alice': 95, 'Bob': 87, 'Charlie': 92, 'David': 88}

=== Keys, Values, Items ===
Keys: ['Alice', 'Bob', 'Charlie', 'David']
Values: [95, 87, 92, 88]
Items: [('Alice', 95), ('Bob', 87), ('Charlie', 92), ('David', 88)]

=== Iteration ===
Alice: 95
Bob: 87
Charlie: 92
David: 88

Alice scored 95
Bob scored 87
Charlie scored 92
David scored 88

=== Update (Merge) ===
After update: {'Alice': 95, 'Bob': 87, 'Charlie': 92, 'David': 88, 'Eve': 90, 'Frank': 85}

=== Setdefault ===
Alice's score: 95
Grace's score: 0
After setdefault: {'Alice': 95, 'Bob': 87, 'Charlie': 92, 'David': 88, 'Eve': 90, 'Frank': 85, 'Grace': 0}

=== Fromkeys ===
Initialized scores: {'Alice': 0, 'Bob': 0, 'Charlie': 0}


### 🎓 Real ML Examples with Dictionaries

In [9]:
# ========================================
# DICTIONARIES IN ML: PRACTICAL EXAMPLES
# ========================================

# Example 1: Word Frequency Count
from typing import LiteralString


print("=== Example 1: Word Frequency Counter ===")
text = "machine learning is great machine learning is powerful learning is fun"
words = text.split()

# Method 1: Manual counting
word_count = {}
for word in words:
# for: Loop through each item in an iterable (list, tuple, set, dict, etc.)
    if word in word_count:
    # if: Execute code block only if condition is True
        word_count[word] += 1
    else:
    # else: Execute if all previous if/elif conditions were False
        word_count[word] = 1

print(f"Word frequencies: {word_count}")

# Method 2: Using get()
word_count2 = {}
for word in words:
# for: Loop through each item in an iterable (list, tuple, set, dict, etc.)
    word_count2[word] = word_count2.get(word, 0) + 1

print(f"Word frequencies (get): {word_count2}")

# Method 3: Using Counter (best way!)
from collections import Counter
word_count3 = Counter[LiteralString](words)
print(f"Word frequencies (Counter): {dict(word_count3)}")
print(f"Most common 3: {word_count3.most_common(3)}")

# Example 2: Label Encoding
print("\n=== Example 2: Label Encoding ===")
categories = ['cat', 'dog', 'bird', 'cat', 'bird', 'cat']

# Create label encoder
label_encoder = {}
label_decoder = {}
for i, label in enumerate(set(categories)):
# for: Loop through each item in an iterable (list, tuple, set, dict, etc.)
    label_encoder[label] = i
    label_decoder[i] = label

print(f"Label encoder: {label_encoder}")
print(f"Label decoder: {label_decoder}")

# Encode labels
encoded = [label_encoder[cat] for cat in categories]
# List comprehension: Create list using compact for-loop syntax
print(f"Original: {categories}")
print(f"Encoded: {encoded}")

# Example 3: Model Configuration
print("\n=== Example 3: Model Configuration ===")
model_config = {
    'architecture': 'CNN',
    'layers': [
        {'type': 'conv', 'filters': 32, 'kernel_size': 3},
        {'type': 'pool', 'size': 2},
        {'type': 'conv', 'filters': 64, 'kernel_size': 3},
        {'type': 'dense', 'units': 128}
    ],
    'optimizer': 'adam',
    'learning_rate': 0.001,
    'batch_size': 32,
    'epochs': 10
}

print(f"Model config: {model_config}")
print(f"Learning rate: {model_config['learning_rate']}")
print(f"Number of layers: {len(model_config['layers'])}")

# Example 4: Grouping Data
print("\n=== Example 4: Grouping by Category ===")
data = [
    ('Alice', 'Math', 95),
    ('Bob', 'Math', 87),
    ('Alice', 'Science', 92),
    ('Charlie', 'Math', 78),
    ('Bob', 'Science', 85)
]

# Group scores by subject
grouped = {}
for name, subject, score in data:
# for: Loop through each item in an iterable (list, tuple, set, dict, etc.)
    if subject not in grouped:
    # if: Execute code block only if condition is True
        grouped[subject] = []
    grouped[subject].append((name, score))
    # .append(): Add element to end of list (modifies in-place)

print(f"Grouped by subject:")
for subject, scores in grouped.items():
# for: Loop through each item in an iterable (list, tuple, set, dict, etc.)
    print(f"  {subject}: {scores}")

=== Example 1: Word Frequency Counter ===
Word frequencies: {'machine': 2, 'learning': 3, 'is': 3, 'great': 1, 'powerful': 1, 'fun': 1}
Word frequencies (get): {'machine': 2, 'learning': 3, 'is': 3, 'great': 1, 'powerful': 1, 'fun': 1}
Word frequencies (Counter): {'machine': 2, 'learning': 3, 'is': 3, 'great': 1, 'powerful': 1, 'fun': 1}
Most common 3: [('learning', 3), ('is', 3), ('machine', 2)]

=== Example 2: Label Encoding ===
Label encoder: {'cat': 0, 'bird': 1, 'dog': 2}
Label decoder: {0: 'cat', 1: 'bird', 2: 'dog'}
Original: ['cat', 'dog', 'bird', 'cat', 'bird', 'cat']
Encoded: [0, 2, 1, 0, 1, 0]

=== Example 3: Model Configuration ===
Model config: {'architecture': 'CNN', 'layers': [{'type': 'conv', 'filters': 32, 'kernel_size': 3}, {'type': 'pool', 'size': 2}, {'type': 'conv', 'filters': 64, 'kernel_size': 3}, {'type': 'dense', 'units': 128}], 'optimizer': 'adam', 'learning_rate': 0.001, 'batch_size': 32, 'epochs': 10}
Learning rate: 0.001
Number of layers: 4

=== Example 4: 

### Dictionary Comprehensions

In [10]:
# ========================================
# DICTIONARY COMPREHENSIONS
# ========================================

# Basic syntax: {key_expr: value_expr for item in iterable}

# Example 1: Square numbers
squares = {x: x**2 for x in range(1, 6)}
# Dict comprehension: Create dictionary using compact for-loop syntax
print(f"Squares: {squares}")

# Example 2: With condition
even_squares = {x: x**2 for x in range(10) if x % 2 == 0}
print(f"Even squares: {even_squares}")

# Example 3: From two lists (zip)
names = ['Alice', 'Bob', 'Charlie']
scores = [95, 87, 92]
student_scores = {name: score for name, score in zip(names, scores)}
# Dict comprehension: Create dictionary using compact for-loop syntax
print(f"Student scores: {student_scores}")

# Example 4: Transform dictionary
original = {'a': 1, 'b': 2, 'c': 3}
doubled = {k: v*2 for k, v in original.items()}
# Dict comprehension: Create dictionary using compact for-loop syntax
print(f"Doubled: {doubled}")

# Example 5: Filter dictionary
scores = {'Alice': 95, 'Bob': 65, 'Charlie': 92, 'David': 55}
passing = {name: score for name, score in scores.items() if score >= 70}
# Dict comprehension: Create dictionary using compact for-loop syntax
print(f"Passing students: {passing}")

# ML Example: Normalize features
print("\n=== ML Example: Feature Normalization ===")
features = {'age': 25, 'height': 175, 'weight': 70}
max_values = {'age': 100, 'height': 200, 'weight': 150}

normalized = {k: v/max_values[k] for k, v in features.items()}
# List comprehension: Create list using compact for-loop syntax
print(f"Original: {features}")
print(f"Normalized: {normalized}")

Squares: {1: 1, 2: 4, 3: 9, 4: 16, 5: 25}
Even squares: {0: 0, 2: 4, 4: 16, 6: 36, 8: 64}
Student scores: {'Alice': 95, 'Bob': 87, 'Charlie': 92}
Doubled: {'a': 2, 'b': 4, 'c': 6}
Passing students: {'Alice': 95, 'Charlie': 92}

=== ML Example: Feature Normalization ===
Original: {'age': 25, 'height': 175, 'weight': 70}
Normalized: {'age': 0.25, 'height': 0.875, 'weight': 0.4666666666666667}


### 📦 Nested Dictionaries

In [11]:
# ========================================
# NESTED DICTIONARIES
# ========================================

# Example: Student database
students = {
    'S001': {
        'name': 'Alice',
        'age': 20,
        'grades': {'Math': 95, 'Science': 92, 'English': 88}
    },
    'S002': {
        'name': 'Bob',
        'age': 21,
        'grades': {'Math': 87, 'Science': 85, 'English': 90}
    },
    'S003': {
        'name': 'Charlie',
        'age': 20,
        'grades': {'Math': 92, 'Science': 88, 'English': 85}
    }
}

print("=== Nested Dictionary Access ===")
print(f"Alice's data: {students['S001']}")
print(f"Alice's Math grade: {students['S001']['grades']['Math']}")

# Iterating nested dicts
print("\n=== Iterating Nested Dicts ===")
for student_id, info in students.items():
# for: Loop through each item in an iterable (list, tuple, set, dict, etc.)
    print(f"\nStudent {student_id}:")
    print(f"  Name: {info['name']}, Age: {info['age']}")
    print(f"  Grades:")
    for subject, grade in info['grades'].items():
    # for: Loop through each item in an iterable (list, tuple, set, dict, etc.)
        print(f"    {subject}: {grade}")

# Calculate average grade for each student
print("\n=== Average Grades ===")
for student_id, info in students.items():
# for: Loop through each item in an iterable (list, tuple, set, dict, etc.)
    grades = list(info['grades'].values())
    # list(): Convert iterable to a list - ordered, mutable collection
    avg = sum(grades) / len(grades)
    # sum(): Add all numeric values in an iterable
    print(f"{info['name']}: {avg:.2f}")

# ML Example: Model training history
print("\n=== ML Example: Training History ===")
training_history = {
    'epoch_1': {'loss': 0.5, 'accuracy': 0.85, 'val_loss': 0.6, 'val_accuracy': 0.82},
    'epoch_2': {'loss': 0.4, 'accuracy': 0.88, 'val_loss': 0.55, 'val_accuracy': 0.85},
    'epoch_3': {'loss': 0.3, 'accuracy': 0.91, 'val_loss': 0.52, 'val_accuracy': 0.87}
}

for epoch, metrics in training_history.items():
# for: Loop through each item in an iterable (list, tuple, set, dict, etc.)
    print(f"{epoch}: Loss={metrics['loss']:.2f}, Acc={metrics['accuracy']:.2f}")

=== Nested Dictionary Access ===
Alice's data: {'name': 'Alice', 'age': 20, 'grades': {'Math': 95, 'Science': 92, 'English': 88}}
Alice's Math grade: 95

=== Iterating Nested Dicts ===

Student S001:
  Name: Alice, Age: 20
  Grades:
    Math: 95
    Science: 92
    English: 88

Student S002:
  Name: Bob, Age: 21
  Grades:
    Math: 87
    Science: 85
    English: 90

Student S003:
  Name: Charlie, Age: 20
  Grades:
    Math: 92
    Science: 88
    English: 85

=== Average Grades ===
Alice: 91.67
Bob: 87.33
Charlie: 88.33

=== ML Example: Training History ===
epoch_1: Loss=0.50, Acc=0.85
epoch_2: Loss=0.40, Acc=0.88
epoch_3: Loss=0.30, Acc=0.91


### ✏️ Exercise 3: Dictionary Practice

In [None]:
# Exercise 3.1: Count character frequencies in this string
# text = "hello world"
# char_freq = ???
# Expected: {'h': 1, 'e': 1, 'l': 3, 'o': 2, ' ': 1, 'w': 1, 'r': 1, 'd': 1}
# Your code here

# Exercise 3.2: Create a dictionary mapping numbers to their squares
# Only include odd numbers from 1 to 10
# odd_squares = {1: 1, 3: 9, 5: 25, 7: 49, 9: 81}
# Your code here (use dict comprehension)

# Exercise 3.3: Given this data, group by department
# employees = [
#     ('Alice', 'Engineering', 95000),
#     ('Bob', 'Marketing', 75000),
#     ('Charlie', 'Engineering', 90000),
#     ('David', 'Marketing', 80000)
# ]
# Expected: {'Engineering': [('Alice', 95000), ('Charlie', 90000)], ...}
# Your code here

**Solutions:**

In [None]:
# Solution 3.1: Count character frequencies
text = "hello world"
char_freq = {}
for char in text:
# for: Loop through each item in an iterable (list, tuple, set, dict, etc.)
    char_freq[char] = char_freq.get(char, 0) + 1
print(f"Character frequencies: {char_freq}")
# Output: {'h': 1, 'e': 1, 'l': 3, 'o': 2, ' ': 1, 'w': 1, 'r': 1, 'd': 1}

# Alternative using dict comprehension:
char_freq_alt = {char: text.count(char) for char in set(text)}
# Dict comprehension: Create dictionary using compact for-loop syntax
print(f"Alternative method: {char_freq_alt}")

# Solution 3.2: Dictionary of odd squares
odd_squares = {x: x**2 for x in range(1, 11) if x % 2 == 1}
print(f"Odd squares: {odd_squares}")
# Output: {1: 1, 3: 9, 5: 25, 7: 49, 9: 81}

# Solution 3.3: Group by department
employees = [
    ('Alice', 'Engineering', 95000),
    ('Bob', 'Marketing', 75000),
    ('Charlie', 'Engineering', 90000),
    ('David', 'Marketing', 80000)
]

# Method 1: Using loops
grouped = {}
for name, dept, salary in employees:
# for: Loop through each item in an iterable (list, tuple, set, dict, etc.)
    if dept not in grouped:
    # if: Execute code block only if condition is True
        grouped[dept] = []
    grouped[dept].append((name, salary))
    # .append(): Add element to end of list (modifies in-place)
print(f"Grouped by department: {grouped}")
# Output: {'Engineering': [('Alice', 95000), ('Charlie', 90000)], 'Marketing': [('Bob', 75000), ('David', 80000)]}

# Method 2: Using setdefault
grouped_alt = {}
for name, dept, salary in employees:
# for: Loop through each item in an iterable (list, tuple, set, dict, etc.)
    grouped_alt.setdefault(dept, []).append((name, salary))
    # .setdefault(): Get value if key exists, otherwise set default and return it
print(f"Alternative method: {grouped_alt}")

### Quick Recap: Data Structures
- **Tuples**: Immutable, unpacking, fast iteration
- **Sets**: Unique elements, fast lookups, set operations
- **Dictionaries**: Key-value pairs, O(1) lookup, comprehensions


# Part 2: Functions

## Why Functions?

### Benefits of Functions
1. **Reusability**: Write once, use many times
2. **Organization**: Break complex problems into smaller pieces
3. **Readability**: Give names to operations
4. **Testing**: Test small units independently
5. **Abstraction**: Hide implementation details

### In Machine Learning
- Data preprocessing functions
- Custom loss functions
- Evaluation metrics
- Model training pipelines
- Feature engineering


## 1. Function Basics

In [12]:
# ========================================
# FUNCTION BASICS
# ========================================

# Basic function
def greet(name):
# def: Define a function - reusable block of code with parameters and return value
    """Say hello to someone."""
    return f"Hello, {name}!"
    # return: Send value back to caller and exit the function

# Call the function
message = greet("Alice")
print(message)

# Function with multiple parameters
def add(a, b):
# def: Define a function - reusable block of code with parameters and return value
    """Add two numbers."""
    return a + b
    # return: Send value back to caller and exit the function

result = add(5, 3)
print(f"5 + 3 = {result}")

# Function with no return (returns None)
def print_info(name, age):
# def: Define a function - reusable block of code with parameters and return value
    """Print information about a person."""
    print(f"{name} is {age} years old.")

print_info("Bob", 25)

# Function with multiple returns
def get_stats(numbers):
# def: Define a function - reusable block of code with parameters and return value
    """Calculate mean and std of a list."""
    mean = sum(numbers) / len(numbers)
    # sum(): Add all numeric values in an iterable
    variance = sum((x - mean) ** 2 for x in numbers) / len(numbers)
    # sum(): Add all numeric values in an iterable
    std = variance ** 0.5
    return mean, std  # Returns a tuple!

data = [10, 20, 30, 40, 50]
mean, std = get_stats(data)  # Tuple unpacking
print(f"Mean: {mean:.2f}, Std: {std:.2f}")

# Docstrings (documentation)
print("\n=== Function Documentation ===")
print(get_stats.__doc__)
help(get_stats)

Hello, Alice!
5 + 3 = 8
Bob is 25 years old.
Mean: 30.00, Std: 14.14

=== Function Documentation ===
Calculate mean and std of a list.
Help on function get_stats in module __main__:

get_stats(numbers)
    Calculate mean and std of a list.



## 2. Function Arguments Deep Dive

In [None]:
# ========================================
# FUNCTION ARGUMENTS
# ========================================

# Positional vs Keyword arguments
def describe_pet(animal_type, pet_name):
# def: Define a function - reusable block of code with parameters and return value
    print(f"I have a {animal_type} named {pet_name}.")

# Positional
describe_pet("dog", "Max")

# Keyword
describe_pet(pet_name="Max", animal_type="dog")

# Default arguments
print("\n=== Default Arguments ===")
def power(base, exponent=2):
# def: Define a function - reusable block of code with parameters and return value
    """Calculate base^exponent. Default exponent is 2."""
    return base ** exponent
    # return: Send value back to caller and exit the function

print(f"power(5) = {power(5)}")       # Uses default
print(f"power(5, 3) = {power(5, 3)}") # Override default

# ML Example: Training function with defaults
def train_model(data, learning_rate=0.001, batch_size=32, epochs=10):
# def: Define a function - reusable block of code with parameters and return value
    """Train a model with configurable hyperparameters."""
    print(f"Training with:")
    print(f"  Learning rate: {learning_rate}")
    print(f"  Batch size: {batch_size}")
    print(f"  Epochs: {epochs}")
    return "Model trained!"
    # return: Send value back to caller and exit the function

# Use defaults
train_model("data.csv")

print()
# Override some defaults
train_model("data.csv", learning_rate=0.01, epochs=20)


## 3. *args and **kwargs

In [13]:
# ========================================
# *args and **kwargs
# ========================================

# *args: Variable number of positional arguments
def sum_all(*args):
# def: Define a function - reusable block of code with parameters and return value
    """Sum any number of arguments."""
    print(f"args = {args}")  # args is a tuple
    return sum(args)
    # return: Send value back to caller and exit the function

print("=== *args ===")
print(f"sum_all(1, 2, 3) = {sum_all(1, 2, 3)}")
print(f"sum_all(10, 20, 30, 40, 50) = {sum_all(10, 20, 30, 40, 50)}")

# **kwargs: Variable number of keyword arguments
def print_info(**kwargs):
# def: Define a function - reusable block of code with parameters and return value
    """Print any number of keyword arguments."""
    print(f"kwargs = {kwargs}")  # kwargs is a dict
    for key, value in kwargs.items():
    # for: Loop through each item in an iterable (list, tuple, set, dict, etc.)
        print(f"  {key}: {value}")

print("\n=== **kwargs ===")
print_info(name="Alice", age=25, city="NYC")

# Combining regular args, *args, and **kwargs
def full_example(required, *args, optional=None, **kwargs):
# def: Define a function - reusable block of code with parameters and return value
    """Function with all types of arguments."""
    print(f"Required: {required}")
    print(f"*args: {args}")
    print(f"Optional: {optional}")
    print(f"**kwargs: {kwargs}")

print("\n=== Combined ===")
full_example("A", "B", "C", optional="D", x=1, y=2)

# ML Example: Flexible model configuration
def create_model(architecture, *layers, **config):
# def: Define a function - reusable block of code with parameters and return value
    """Create a model with flexible configuration."""
    print(f"Creating {architecture} model")
    print(f"Layers: {layers}")
    print(f"Config: {config}")
    return {"architecture": architecture, "layers": layers, "config": config}
    # return: Send value back to caller and exit the function

print("\n=== ML Example ===")
model = create_model(
    "CNN",
    "conv2d", "maxpool", "conv2d", "dense",
    optimizer="adam",
    learning_rate=0.001,
    batch_size=32
)

# Unpacking in function calls
print("\n=== Unpacking in Calls ===")
def multiply(a, b, c):
# def: Define a function - reusable block of code with parameters and return value
    return a * b * c
    # return: Send value back to caller and exit the function

values = [2, 3, 4]
result = multiply(*values)  # Unpacks list
print(f"multiply(*[2, 3, 4]) = {result}")

config = {"a": 2, "b": 3, "c": 4}
result = multiply(**config)  # Unpacks dict
print(f"multiply(**{{'a': 2, 'b': 3, 'c': 4}}) = {result}")

=== *args ===
args = (1, 2, 3)
sum_all(1, 2, 3) = 6
args = (10, 20, 30, 40, 50)
sum_all(10, 20, 30, 40, 50) = 150

=== **kwargs ===
kwargs = {'name': 'Alice', 'age': 25, 'city': 'NYC'}
  name: Alice
  age: 25
  city: NYC

=== Combined ===
Required: A
*args: ('B', 'C')
Optional: D
**kwargs: {'x': 1, 'y': 2}

=== ML Example ===
Creating CNN model
Layers: ('conv2d', 'maxpool', 'conv2d', 'dense')
Config: {'optimizer': 'adam', 'learning_rate': 0.001, 'batch_size': 32}

=== Unpacking in Calls ===
multiply(*[2, 3, 4]) = 24
multiply(**{'a': 2, 'b': 3, 'c': 4}) = 24


## 4. Lambda Functions (Anonymous Functions)

In [14]:
# ========================================
# LAMBDA FUNCTIONS
# ========================================

# Lambda syntax: lambda arguments: expression

# Regular function
def square(x):
# def: Define a function - reusable block of code with parameters and return value
    return x ** 2
    # return: Send value back to caller and exit the function

# Equivalent lambda
square_lambda = lambda x: x ** 2

print("=== Lambda Basics ===")
print(f"square(5) = {square(5)}")
print(f"square_lambda(5) = {square_lambda(5)}")

# Lambda with multiple arguments
add = lambda a, b: a + b
print(f"add(3, 4) = {add(3, 4)}")

# Common use: Sorting with custom key
print("\n=== Sorting with Lambda ===")
students = [
    {'name': 'Alice', 'grade': 85},
    {'name': 'Bob', 'grade': 92},
    {'name': 'Charlie', 'grade': 78}
]

# Sort by grade
sorted_students = sorted(students, key=lambda s: s['grade'])
# sorted(): Return new sorted list from iterable (original unchanged)
print("Sorted by grade:")
for student in sorted_students:
# for: Loop through each item in an iterable (list, tuple, set, dict, etc.)
    print(f"  {student}")

# Sort by name
sorted_by_name = sorted(students, key=lambda s: s['name'])
# sorted(): Return new sorted list from iterable (original unchanged)
print("\nSorted by name:")
for student in sorted_by_name:
# for: Loop through each item in an iterable (list, tuple, set, dict, etc.)
    print(f"  {student}")

# Lambda with map()
print("\n=== Lambda with map() ===")
numbers = [1, 2, 3, 4, 5]
squared = list(map(lambda x: x**2, numbers))
# list(): Convert iterable to a list - ordered, mutable collection
print(f"Original: {numbers}")
print(f"Squared: {squared}")

# Lambda with filter()
print("\n=== Lambda with filter() ===")
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
evens = list(filter(lambda x: x % 2 == 0, numbers))
print(f"Original: {numbers}")
print(f"Evens: {evens}")

# ML Example: Data transformations
print("\n=== ML Example: Transformations ===")
features = [10, 20, 30, 40, 50]

# Normalize to 0-1 range
max_val = max(features)
# max(): Return the largest value
normalized = list(map(lambda x: x/max_val, features))
# list(): Convert iterable to a list - ordered, mutable collection
print(f"Original features: {features}")
print(f"Normalized: {normalized}")

# Filter outliers (values > 35)
filtered = list(filter(lambda x: x <= 35, features))
# list(): Convert iterable to a list - ordered, mutable collection
print(f"After filtering (>35): {filtered}")

=== Lambda Basics ===
square(5) = 25
square_lambda(5) = 25
add(3, 4) = 7

=== Sorting with Lambda ===
Sorted by grade:
  {'name': 'Charlie', 'grade': 78}
  {'name': 'Alice', 'grade': 85}
  {'name': 'Bob', 'grade': 92}

Sorted by name:
  {'name': 'Alice', 'grade': 85}
  {'name': 'Bob', 'grade': 92}
  {'name': 'Charlie', 'grade': 78}

=== Lambda with map() ===
Original: [1, 2, 3, 4, 5]
Squared: [1, 4, 9, 16, 25]

=== Lambda with filter() ===
Original: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Evens: [2, 4, 6, 8, 10]

=== ML Example: Transformations ===
Original features: [10, 20, 30, 40, 50]
Normalized: [0.2, 0.4, 0.6, 0.8, 1.0]
After filtering (>35): [10, 20, 30]


## 5. Map, Filter, Reduce

In [15]:
# ========================================
# MAP, FILTER, REDUCE
# ========================================

# map(function, iterable): Apply function to each item
print("=== map() ===")
numbers = [1, 2, 3, 4, 5]

# Using map with lambda
doubled = list(map(lambda x: x * 2, numbers))
# list(): Convert iterable to a list - ordered, mutable collection
print(f"Doubled: {doubled}")

# Using map with named function
def cube(x):
# def: Define a function - reusable block of code with parameters and return value
    return x ** 3
    # return: Send value back to caller and exit the function

cubed = list(map(cube, numbers))
# list(): Convert iterable to a list - ordered, mutable collection
print(f"Cubed: {cubed}")

# Map with multiple iterables
a = [1, 2, 3]
b = [10, 20, 30]
sums = list(map(lambda x, y: x + y, a, b))
# list(): Convert iterable to a list - ordered, mutable collection
print(f"Element-wise sum: {sums}")

# filter(function, iterable): Keep items where function returns True
print("\n=== filter() ===")
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# Filter evens
evens = list(filter(lambda x: x % 2 == 0, numbers))
print(f"Evens: {evens}")

# Filter numbers > 5
greater_than_5 = list(filter(lambda x: x > 5, numbers))
# list(): Convert iterable to a list - ordered, mutable collection
print(f"Greater than 5: {greater_than_5}")

# reduce(function, iterable): Reduce to single value
print("\n=== reduce() ===")
from functools import reduce

numbers = [1, 2, 3, 4, 5]

# Sum using reduce
total = reduce(lambda x, y: x + y, numbers)
print(f"Sum: {total}")

# Product using reduce
product = reduce(lambda x, y: x * y, numbers)
print(f"Product: {product}")

# Find maximum
maximum = reduce(lambda x, y: x if x > y else y, numbers)
print(f"Maximum: {maximum}")

# ML Example: Data pipeline
print("\n=== ML Example: Data Pipeline ===")
raw_data = [10, -5, 20, 0, 15, -10, 25, 30]

# Step 1: Filter out negative and zero values
positive = list(filter(lambda x: x > 0, raw_data))
# list(): Convert iterable to a list - ordered, mutable collection
print(f"After filtering: {positive}")

# Step 2: Normalize to 0-1 range
max_val = max(positive)
# max(): Return the largest value
normalized = list(map(lambda x: x/max_val, positive))
# list(): Convert iterable to a list - ordered, mutable collection
print(f"After normalization: {normalized}")

# Step 3: Calculate mean
mean = reduce(lambda x, y: x + y, normalized) / len(normalized)
print(f"Mean: {mean:.4f}")

# Compare with list comprehension (often more Pythonic)
print("\n=== List Comprehension vs map/filter ===")
numbers = [1, 2, 3, 4, 5]

# Using map/filter
result1 = list(map(lambda x: x**2, filter(lambda x: x % 2 == 0, numbers)))
print(f"map/filter: {result1}")

# Using list comprehension (more readable!)
result2 = [x**2 for x in numbers if x % 2 == 0]
print(f"List comp: {result2}")

=== map() ===
Doubled: [2, 4, 6, 8, 10]
Cubed: [1, 8, 27, 64, 125]
Element-wise sum: [11, 22, 33]

=== filter() ===
Evens: [2, 4, 6, 8, 10]
Greater than 5: [6, 7, 8, 9, 10]

=== reduce() ===
Sum: 15
Product: 120
Maximum: 5

=== ML Example: Data Pipeline ===
After filtering: [10, 20, 15, 25, 30]
After normalization: [0.3333333333333333, 0.6666666666666666, 0.5, 0.8333333333333334, 1.0]
Mean: 0.6667

=== List Comprehension vs map/filter ===
map/filter: [4, 16]
List comp: [4, 16]


### ✏️ Exercise 4: Functions Practice

In [None]:
# Exercise 4.1: Write a function that calculates the mean and standard deviation
# def calculate_stats(numbers):
#     # Your code here
#     return mean, std

# Exercise 4.2: Write a function that takes any number of arguments
# and returns the maximum value
# def find_max(*args):
#     # Your code here

# Exercise 4.3: Use lambda and filter to extract words longer than 3 characters
# words = ['cat', 'elephant', 'dog', 'butterfly', 'ant']
# long_words = ???

# Exercise 4.4: Use map to convert list of strings to integers
# string_numbers = ['1', '2', '3', '4', '5']
# integers = ???

**Solutions:**

In [None]:
# Solution 4.1: Calculate mean and standard deviation
def calculate_stats(numbers):
# def: Define a function - reusable block of code with parameters and return value
    """Calculate mean and standard deviation of a list of numbers."""
    n = len(numbers)
    # len(): Return the number of items in a collection
    mean = sum(numbers) / n
    # sum(): Add all numeric values in an iterable
    
    # Calculate variance
    variance = sum((x - mean) ** 2 for x in numbers) / n
    # sum(): Add all numeric values in an iterable
    
    # Standard deviation is square root of variance
    std = variance ** 0.5
    
    return mean, std
    # return: Send value back to caller and exit the function

# Test
data = [10, 12, 23, 23, 16, 23, 21, 16]
mean, std = calculate_stats(data)
print(f"Mean: {mean:.2f}, Std Dev: {std:.2f}")
# Output: Mean: 18.00, Std Dev: 4.90

# Solution 4.2: Find maximum using *args
def find_max(*args):
# def: Define a function - reusable block of code with parameters and return value
    """Return the maximum value from any number of arguments."""
    if not args:
    # if: Execute code block only if condition is True
        return None
        # return: Send value back to caller and exit the function
    return max(args)
    # return: Send value back to caller and exit the function

# Test
print(f"Max: {find_max(5, 12, 3, 89, 45, 23)}")
# Output: Max: 89

# Solution 4.3: Filter words longer than 3 characters
words = ['cat', 'elephant', 'dog', 'butterfly', 'ant']
long_words = list(filter(lambda word: len(word) > 3, words))
# list(): Convert iterable to a list - ordered, mutable collection
print(f"Long words: {long_words}")
# Output: Long words: ['elephant', 'butterfly']

# Alternative using list comprehension (more Pythonic):
long_words_alt = [word for word in words if len(word) > 3]
# List comprehension: Create list using compact for-loop syntax
print(f"Long words (alt): {long_words_alt}")

# Solution 4.4: Convert strings to integers
string_numbers = ['1', '2', '3', '4', '5']
integers = list(map(int, string_numbers))
# list(): Convert iterable to a list - ordered, mutable collection
print(f"Integers: {integers}")
# Output: Integers: [1, 2, 3, 4, 5]

# Alternative using list comprehension:
integers_alt = [int(x) for x in string_numbers]
# List comprehension: Create list using compact for-loop syntax
print(f"Integers (alt): {integers_alt}")

# Part 3: Integration & Real-World Examples

## Mini-Project: Data Cleaning Pipeline

Complete data cleaning pipeline combining all concepts:

In [None]:
# ========================================
# DATA CLEANING PIPELINE
# ========================================

# Simulated dataset
raw_data = [
    {'id': 1, 'name': 'Alice', 'age': 25, 'score': 85, 'city': 'NYC'},
    {'id': 2, 'name': 'Bob', 'age': None, 'score': 92, 'city': 'LA'},
    {'id': 3, 'name': 'Charlie', 'age': 30, 'score': -1, 'city': 'NYC'},  # Invalid score
    {'id': 1, 'name': 'Alice', 'age': 25, 'score': 85, 'city': 'NYC'},  # Duplicate
    {'id': 4, 'name': 'David', 'age': 22, 'score': 78, 'city': 'SF'},
    {'id': 5, 'name': 'Eve', 'age': 28, 'score': 95, 'city': None},  # Missing city
]

print("=== Original Data ===")
for record in raw_data:
# for: Loop through each item in an iterable (list, tuple, set, dict, etc.)
    print(record)

# Step 1: Remove duplicates
def remove_duplicates(data):
# def: Define a function - reusable block of code with parameters and return value
    """Remove duplicate records based on id."""
    seen_ids = set()
    # set(): Create a set - unordered collection of unique elements, O(1) lookup
    unique_data = []
    
    for record in data:
    # for: Loop through each item in an iterable (list, tuple, set, dict, etc.)
        if record['id'] not in seen_ids:
        # if: Execute code block only if condition is True
            seen_ids.add(record['id'])
            # .add(): Add a single element to the set (no effect if already exists)
            unique_data.append(record)
            # .append(): Add element to end of list (modifies in-place)
    
    return unique_data
    # return: Send value back to caller and exit the function

# Step 2: Handle missing values
def handle_missing(data, default_values):
# def: Define a function - reusable block of code with parameters and return value
    """Fill missing values with defaults."""
    cleaned = []
    
    for record in data:
    # for: Loop through each item in an iterable (list, tuple, set, dict, etc.)
        cleaned_record = record.copy()
        for key, value in record.items():
        # for: Loop through each item in an iterable (list, tuple, set, dict, etc.)
            if value is None and key in default_values:
            # if: Execute code block only if condition is True
                cleaned_record[key] = default_values[key]
        cleaned.append(cleaned_record)
        # .append(): Add element to end of list (modifies in-place)
    
    return cleaned
    # return: Send value back to caller and exit the function

# Step 3: Validate and filter invalid records
def validate_data(data):
# def: Define a function - reusable block of code with parameters and return value
    """Filter out records with invalid values."""
    def is_valid(record):
    # def: Define a function - reusable block of code with parameters and return value
        # Age must be positive
        if record.get('age') is not None and record['age'] < 0:
        # if: Execute code block only if condition is True
            return False
            # return: Send value back to caller and exit the function
        # Score must be 0-100
        if record.get('score') is not None and not (0 <= record['score'] <= 100):
        # if: Execute code block only if condition is True
            return False
            # return: Send value back to caller and exit the function
        return True
        # return: Send value back to caller and exit the function
    
    return list(filter(is_valid, data))
    # return: Send value back to caller and exit the function

# Step 4: Transform data (normalize scores, categorize ages)
def transform_data(data):
# def: Define a function - reusable block of code with parameters and return value
    """Apply transformations to data."""
    def add_age_category(record):
    # def: Define a function - reusable block of code with parameters and return value
        age = record.get('age')
        if age is None:
        # if: Execute code block only if condition is True
            category = 'Unknown'
        elif age < 25:
        # elif: Check additional condition if previous if/elif was False
            category = 'Young'
        elif age < 30:
        # elif: Check additional condition if previous if/elif was False
            category = 'Adult'
        else:
        # else: Execute if all previous if/elif conditions were False
            category = 'Senior'
        
        record['age_category'] = category
        return record
        # return: Send value back to caller and exit the function
    
    return list(map(add_age_category, data))
    # return: Send value back to caller and exit the function

# Step 5: Create summary statistics
def create_summary(data):
# def: Define a function - reusable block of code with parameters and return value
    """Create summary statistics from cleaned data."""
    # Count by city
    city_counts = {}
    for record in data:
    # for: Loop through each item in an iterable (list, tuple, set, dict, etc.)
        city = record.get('city', 'Unknown')
        city_counts[city] = city_counts.get(city, 0) + 1
    
    # Calculate average score
    scores = [r['score'] for r in data if 'score' in r and r['score'] is not None]
    # List comprehension: Create list using compact for-loop syntax
    avg_score = sum(scores) / len(scores) if scores else 0
    # sum(): Add all numeric values in an iterable
    
    # Age distribution
    age_distribution = {}
    for record in data:
    # for: Loop through each item in an iterable (list, tuple, set, dict, etc.)
        category = record.get('age_category', 'Unknown')
        age_distribution[category] = age_distribution.get(category, 0) + 1
    
    return {
    # return: Send value back to caller and exit the function
        'total_records': len(data),
        'city_counts': city_counts,
        'average_score': avg_score,
        'age_distribution': age_distribution
    }

# Run the pipeline
print("\n=== Running Data Cleaning Pipeline ===")

# Apply transformations
data = remove_duplicates(raw_data)
print(f"\nAfter removing duplicates: {len(data)} records")

data = handle_missing(data, {'age': 25, 'city': 'Unknown'})
print(f"After handling missing: {len(data)} records")

data = validate_data(data)
print(f"After validation: {len(data)} records")

data = transform_data(data)
print(f"After transformation: {len(data)} records")

# Show cleaned data
print("\n=== Cleaned Data ===")
for record in data:
# for: Loop through each item in an iterable (list, tuple, set, dict, etc.)
    print(record)

# Create and show summary
summary = create_summary(data)
print("\n=== Summary Statistics ===")
print(f"Total records: {summary['total_records']}")
print(f"Average score: {summary['average_score']:.2f}")
print(f"City distribution: {summary['city_counts']}")
print(f"Age distribution: {summary['age_distribution']}")

## 🎯 Final Practice Exercises

Try these on your own!

In [None]:
# Exercise 1: Word Frequency Analyzer
# Write a function that takes a text and returns:
# - Total word count
# - Unique word count
# - Top 5 most frequent words
# - Dictionary of word frequencies

def analyze_text(text):
# def: Define a function - reusable block of code with parameters and return value
    # Your code here
    pass

# Test
# sample_text = "machine learning is great machine learning is powerful"
# result = analyze_text(sample_text)
# print(result)

# Exercise 2: Feature Encoder
# Create a class that can encode categorical features to numbers
# and decode them back

# class LabelEncoder:
#     def fit(self, labels):
#         # Learn the unique labels
#         pass
#     
#     def transform(self, labels):
#         # Convert labels to numbers
#         pass
#     
#     def inverse_transform(self, encoded):
#         # Convert numbers back to labels
#         pass

# Exercise 3: Data Validator
# Write a function that validates a dataset against a schema
# Schema example: {'name': str, 'age': int, 'score': float}

def validate_dataset(data, schema):
# def: Define a function - reusable block of code with parameters and return value
    # Your code here
    pass

# Test
# schema = {'name': str, 'age': int, 'score': float}
# data = [
#     {'name': 'Alice', 'age': 25, 'score': 85.5},
#     {'name': 'Bob', 'age': '30', 'score': 92.0},  # Invalid age type
# ]
# errors = validate_dataset(data, schema)
# print(errors)