# Python basics (6): functools and itertools
## `Functools`: Higher-Order Functions (Very Useful!)

The `functools` module provides tools that make these patterns cleaner and more expressive.

### What Is a Higher-Order Function?
Higher-order functions are functions that take other functions as arguments or return functions.


In [None]:
def apply_twice(func, x): #apply_twice accepts a function, so it is a higher-order function.
    return func(func(x))

def add_one(n):
    return n + 1

print("Apply add_one twice to 3:", apply_twice(add_one, 3))


### `functools.partial` — Freezing Arguments
`partial()` creates a new function by fixing some arguments of an existing function. It turns a general function into a more specific one.

Very common when:
- passing functions into pipelines
- reusing a function with fixed parameters

In [None]:
from functools import partial

def power(base, exponent):
    return base ** exponent

square = partial(power, exponent=2)
cube = partial(power, exponent=3)

print("Square of 4:", square(4))
print("Cube of 4:", cube(4))


### `functools.reduce` — Reducing an Iterable to One Value
`reduce()` applies a function cumulatively to items in an iterable, producing a single result.


In [None]:
from functools import reduce

numbers = [1, 2, 3, 4]

def add(a, b):
    return a + b

sum_all = reduce(add, numbers) # Add items together from left to right.

print("Sum of numbers:", sum_all)


In [None]:
"reduce with an initial value. The initial value is often important in real workflows."

sum_with_initial = reduce(add, numbers, 10)

print("Sum with initial value 10:", sum_with_initial)


### Real Data Science Example: Merging Many DataFrames

Suppose you have:
- an original_dataframe
- a list of additional DataFrames to merge horizontally

In [None]:
"Base DataFrame (Original)"

import pandas as pd

original_dataframe = pd.DataFrame({
    "name": ["Alice", "Bob", "Charlie", "Diana"]
})

print("Original dataframe:")
print(original_dataframe)


In [None]:
"Additional DataFrames (Characteristics)"

age_df = pd.DataFrame({
    "name": ["Alice", "Bob", "Charlie"],
    "age": [25, 30, 35]
})

job_df = pd.DataFrame({
    "name": ["Alice", "Diana"],
    "job": ["Engineer", "Designer"]
})

city_df = pd.DataFrame({
    "name": ["Bob", "Charlie", "Diana"],
    "city": ["New York", "Chicago", "San Francisco"]
})


The Naive Way (What We Want to Avoid)

Problems:
- Repetitive
- Hard to scale
- Error-prone when you add more tables

In [None]:

merged = pd.merge(original_dataframe, age_df, on="name", how="left")
merged = pd.merge(merged, job_df, on="name", how="left")
merged = pd.merge(merged, city_df, on="name", how="left")

print("Merged dataframe")
print(merged)

The Elegant Way: reduce + merge

In [None]:
from functools import reduce

merging = lambda left, right: pd.merge(left, right, on="name", how="left") # alternative way, using lambda

list_of_dataframes = [original_dataframe, age_df, job_df, city_df]

merged_dataframe = reduce(merging, list_of_dataframes)

print("Merged dataframe:")
print(merged_dataframe)


Why reduce Is Perfect Here?
- You don’t know how many characteristic tables you’ll have
- You want one consistent merge rule
- You want clean, scalable code

This pattern shows up a lot in:
- feature engineering
- panel data construction
- survey + admin data integration

## `itertools`: Fast, Memory-Efficient Iterators

The `itertools` module provides efficient tools for working with iterators.
They generate values on the fly instead of storing everything in memory.
It is especially useful for combinatorics.

### `itertools.chain`: Iterate Over Multiple Iters as One
`chain()` lets you treat multiple iterables as one continuous sequence.

In [None]:
from itertools import chain

names_a = ["Alice", "Bob"]
names_b = ["Charlie", "Diana"]

all_names = chain(names_a, names_b)

print("All names:")
for name in all_names:
    print(name)


### `pairwise`: Consecutive Pairs (Python 3.10+)
`pairwise()` returns adjacent pairs from an iterable.
Commonly used in time series differences and comparing consecutive observations

In [None]:
from itertools import pairwise

numbers = [1, 2, 3, 4]

pairs = list(pairwise(numbers))
print("Pairwise values:", pairs)


### `product`: Cartesian Product (All Possible Pairs)
`product()` generates all combinations across iterables (like nested loops).
Commonly used in grid search, parameter combinations, experimental design, etc.

In [None]:
from itertools import product

colors = ["red", "blue"]
sizes = ["S", "M", "L"]

all_combinations = list(product(colors, sizes))

print("Color–size combinations:")
print(all_combinations)


### `permutations`: Order Matters
`permutations()` generates all possible orderings.

In [None]:
from itertools import permutations

names = ["Alice", "Bob", "Charlie"]

perms = list(permutations(names, 2))
print("Permutations (order matters):")
print(perms)


### `combinations`: Order Does NOT Matter
`combinations()` generates unique pairs, ignoring order.

An example from Dr. Wang's research: I was implementing a method that measures the novelty of scientific papers. The method looked at all the combinations of referenced journals cited in papers. I collected a dataset of about 300,000 AI papers, which cite 19,474 journals. I used `itertools.combinations` and identified 7.8 million combinations of journals ([Wang et al., 2024](https://scholarspace.manoa.hawaii.edu/items/90d7e2de-d649-44b7-ab1a-705e0582f1cc)).

In [None]:
from itertools import combinations

names = ["Alice", "Bob", "Charlie"]

combination_of_names = list(combinations(names, 2))
print("Combinations (order does not matter):")
print(combination_of_names)

### `combinations_with_replacement`: Reuse Allowed
Like `combinations`, but elements can repeat.

In [None]:
from itertools import combinations_with_replacement

names = ["Alice", "Bob", "Charlie"]

combination_of_names = list(combinations_with_replacement(names, 2))
print("Combinations with replacement::")
print(combination_of_names)


## Regular expressions

Python `re` Module: Regular Expressions (Regex)

Python’s built-in `re` module provides support for regular expressions, which are powerful tools for searching, matching, and manipulating text patterns.

IDEs usually have regex search built in — encourage students to use it.

Recommended reading: [W3Schools Python RegEx Tutorial](https://www.w3schools.com/python/python_regex.asp) (good beginner-friendly reference)

### A Realistic Pattern Example (Course Codes)

We want to match EST 300-level courses, allowing flexible formatting:
- EST389
- EST 371
- EST-303

In [None]:
import re

pattern = r"(EST)( |-)*(3[0-9]{2})"
"""
Part	    Meaning
(EST)	    Literal string "EST"
`( |-)*`    an empty space or a dash, repeating zero or more times
(3[0-9]{2})	A 300-level course number (starting with 3, the next number could be anything between 0 to 9, repeating exactly two times)
"""

In [None]:
"Matching Against a List of Courses"

course_list = ["EST389", "EST 371", "CSE 213", "EST-303", "EST   332", "EST 232", "EST-3A3",]

for course in course_list:
    match = re.match(pattern, course)
    if match:
        print("Matched course:", course)
