# From Python to Production
## Notebook 7 ‚Äî Functional Programming

By **Prerna Joshi** | #25DaysOfDataTech 

"Functional thinking reduces bugs ‚Äî clean, predictable code scales effortlessly."

---

### What you'll learn
- Pure functions, referential transparency, and immutability (as a practice)
- Higher‚Äëorder functions: `map`, `filter`, `reduce`, `sorted(key=...)`
- Comprehensions vs functional forms (when to pick what)
- The `itertools` toolbox for streaming pipelines
- Function utilities: `functools.partial`, `lru_cache`, `wraps`
- The `operator` module for fast, readable function objects
- Generators & iterators; lazy evaluation; back‚Äëpressure friendly design
- Composition patterns, error handling, and side‚Äëeffect boundaries
- Practical pipelines for data cleaning and analytics


> **Why this matters for data work**  
> Functional style reduces hidden state and side effects, making data code more predictable and testable. It pairs well with streaming/large files and ETL.


## 1. Pure Functions & Immutability (by discipline)

A **pure function** returns the same output for the same input and has no side effects.  
Python doesn't enforce immutability, but we can *practice* it by:
- Avoiding mutation of inputs
- Returning new objects
- Keeping I/O at the edges


In [1]:
def normalize_score(x: float, mean: float, std: float) -> float:
    # pure: no side effects, same input ‚Üí same output
    return (x - mean) / std

normalize_score(88, mean=80, std=5)


1.6

## 2. Higher‚ÄëOrder Functions ‚Äî `map`, `filter`, `sorted(key=...)`, `reduce`

Prefer comprehensions for readability; use these when composing lazy pipelines or when a function handle improves clarity.


In [2]:
from functools import reduce

nums = [3, 10, 7, 2, 8]
squared = list(map(lambda x: x*x, nums))
even = list(filter(lambda x: x % 2 == 0, nums))
total = reduce(lambda a,b: a+b, nums, 0)
top3 = sorted(nums, reverse=True)[:3]

squared, even, total, top3


([9, 100, 49, 4, 64], [10, 2, 8], 30, [10, 8, 7])

## 3. Comprehensions vs Functional Forms

- Prefer **list/dict/set comprehensions** for simple transforms/filters (more Pythonic).  
- Prefer functional forms when you already have named functions or want lazy evaluation.


In [3]:
nums = [1,2,3,4,5,6]
comp = [x*x for x in nums if x % 2 == 0]
func = list(map(lambda x: x*x, filter(lambda x: x % 2 == 0, nums)))
comp, func


([4, 16, 36], [4, 16, 36])

## 4. The `itertools` Toolbox (streaming friendly)

- `count`, `cycle`, `repeat` (infinite iterators)
- `accumulate`, `chain`, `compress`, `dropwhile`, `takewhile`
- `islice`, `tee`, `pairwise`, `groupby`


In [4]:
from itertools import islice, accumulate, chain, pairwise, groupby

nums = [1,2,3,4,5]
prefix = list(accumulate(nums))                 # running totals
pairs = list(pairwise(nums))                    # adjacent pairs (3.10+)
chained = list(chain("ab", "cd"))
grouped = {k:list(g) for k, g in groupby("aaabbccc")}

prefix, pairs, chained, grouped


([1, 3, 6, 10, 15],
 [(1, 2), (2, 3), (3, 4), (4, 5)],
 ['a', 'b', 'c', 'd'],
 {'a': ['a', 'a', 'a'], 'b': ['b', 'b'], 'c': ['c', 'c', 'c']})

## 5. The `operator` Module ‚Äî Faster & Readable Callables

Use prebuilt function objects instead of tiny lambdas.


In [5]:
import operator as op

rows = [
    {"name":"alice","score":91},
    {"name":"bob","score":78},
    {"name":"carol","score":88},
]
top = max(rows, key=op.itemgetter("score"))
names = list(map(op.itemgetter("name"), rows))
product = op.mul(6, 7)

top, names, product


({'name': 'alice', 'score': 91}, ['alice', 'bob', 'carol'], 42)

## 6. `functools.partial` & Currying (lightweight)

Freeze some arguments of a function to make a specialized version.


In [6]:
from functools import partial

def scale_and_shift(x, scale=1.0, shift=0.0):
    return x * scale + shift

stdize = partial(scale_and_shift, scale=1/5, shift=-80/5)  # (x - 80)/5
stdize(95), stdize(80)


(3.0, 0.0)

## 7. Caching with `lru_cache`

Memoize expensive pure-ish functions to speed up repeated calls.


In [7]:
from functools import lru_cache

@lru_cache(maxsize=128)
def fib(n: int) -> int:
    if n < 2:
        return n
    return fib(n-1) + fib(n-2)

[fib(i) for i in range(10)], fib.cache_info()


([0, 1, 1, 2, 3, 5, 8, 13, 21, 34],
 CacheInfo(hits=16, misses=10, maxsize=128, currsize=10))

## 8. Generators & Lazy Evaluation

A generator yields items one-by-one and remembers its state. Great for large/streaming data and back‚Äëpressure friendly pipelines.


In [8]:
def gen_chunks(iterable, size=3):
    chunk = []
    for x in iterable:
        chunk.append(x)
        if len(chunk) == size:
            yield chunk
            chunk = []
    if chunk:
        yield chunk

list(gen_chunks(range(10), size=4))


[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9]]

## 9. Composition & Pipelines

Keep I/O at the edges; compose pure transforms in the middle. Small helpers = easier tests.


In [9]:
import re, unicodedata

def strip_accents(s: str) -> str:
    nfkd = unicodedata.normalize("NFKD", s)
    return "".join(ch for ch in nfkd if unicodedata.category(ch) != "Mn")

WS = re.compile(r"\s+")
PUNCT = str.maketrans({c:" " for c in ",.;:!?"})

def normalize(s: str) -> str:
    s = unicodedata.normalize("NFKC", s)
    s = strip_accents(s).casefold().translate(PUNCT)
    s = WS.sub(" ", s).strip()
    return s

def tokens(s: str):
    return (t for t in normalize(s).split() if t)  # generator

text = "Caf√© ‚Äî Data, AI; Engineering!"
list(tokens(text))


['cafe', '‚Äî', 'data', 'ai', 'engineering']

## 10. Error Handling in Pipelines

Keep transforms total (defined for all inputs) or isolate edge cases. Use small adapters for validation and fallback.


In [10]:
def to_int(s, default=None):
    try:
        return int(s)
    except (TypeError, ValueError):
        return default

values = ["10", "x", None, "30"]
converted = list(map(lambda v: to_int(v, default=-1), values))
converted


[10, -1, -1, 30]

## 11. Recursion (and Python's TCO note)

Python **does not** perform tail-call optimization. Prefer iteration for deep recursions, or increase recursion limit carefully.


In [11]:
def rec_sum(lst):
    if not lst:
        return 0
    return lst[0] + rec_sum(lst[1:])

rec_sum([1,2,3,4])


10

## 12. Decorators (Functional Perspective)

Decorators take a function and return a function ‚Äî perfect for cross-cutting concerns (auth, timing, caching).


In [12]:
import time
from functools import wraps

def timer(fn):
    @wraps(fn)
    def wrapper(*args, **kwargs):
        t0 = time.perf_counter()
        try:
            return fn(*args, **kwargs)
        finally:
            dt = (time.perf_counter() - t0)*1000
            print(f"{fn.__name__} took {dt:.2f} ms")
    return wrapper

@timer
def slow_pow(a,b):
    time.sleep(0.03)
    return a**b

slow_pow(2, 10)


slow_pow took 30.13 ms


1024

## 13. Practical Examples

**Example A ‚Äî Streaming CSV rows:** transform, filter bad rows, compute aggregates lazily.  
**Example B ‚Äî Top‚Äëk rolling metrics:** use `heapq.nlargest` in a pipeline.  
(We use tiny synthetic data here.)


In [13]:
from io import StringIO
import csv, heapq

CSV = StringIO("""user,score
alice,91
bob,x
carol,88
dave,95
""")

def read_csv_rows(fobj):
    r = csv.DictReader(fobj)
    for row in r:
        yield row

def parse_score(row):
    try:
        row["score"] = int(row["score"])
        return row
    except ValueError:
        return None

rows = (parse_score(r) for r in read_csv_rows(CSV))
valid = (r for r in rows if r is not None)
top2 = heapq.nlargest(2, valid, key=lambda r: r["score"])
top2


[{'user': 'dave', 'score': 95}, {'user': 'alice', 'score': 91}]

## 14. Mini Cheatsheet

- Prefer pure, stateless helpers; isolate I/O and side effects
- Use comprehensions for concise transforms; `itertools` for streaming
- Reach for `operator.itemgetter`, `attrgetter`, `methodcaller` for clarity
- Cache pure-ish expensive calls with `@lru_cache`
- Compose small functions; test them in isolation


## 15. Practice (Try first, then reveal solutions)

1. **pipeline_numbers**: Given a list, square only the even numbers and return their sum (try both comprehension and `map/filter/reduce`).  
2. **top_k_words**: Given a token stream (iterator), return the top‚Äëk words by frequency lazily (no full list materialization if possible).  
3. **moving_avg**: Write a generator `moving_avg(iterable, w)` that yields the windowed average.  
4. **compose2**: Implement `compose2(f, g)` that returns a function `h(x)=f(g(x))`.  
5. **partial_demo**: Create `to_fixed(base)` via `partial` that formats numbers to `base` decimal places.  
6. **safe_map**: Implement `safe_map(fn, iterable, default=None)` that applies `fn` and yields `default` on exceptions.  
7. **unique_everseen**: Generator that yields the first time each element appears (like `itertools` recipe).  
8. **chunked**: Generator that yields fixed‚Äësize chunks from an iterable.  
9. **cached_slow**: Wrap a slow pure function with `lru_cache` and show speedup by calling it repeatedly.  
10. **groupby_len**: Using `groupby`, group words by their length (remember to sort first!).  
11. **argmax_op**: Using `operator`, find the dict in a list with the largest `"score"` key.  
12. **normalize_pipeline**: Build a functional text normalize pipeline using `strip_accents` + lower + punctuation removal and return top‚Äë3 tokens by frequency.


## 16. Practice Solutions  
*(Click to reveal after solving.)*

<details>
<summary><strong>Solution 1Ô∏è‚É£ ‚Äî pipeline_numbers</strong></summary>

```python
from functools import reduce
# Comprehension
def sum_squares_even_comp(nums):
    return sum(x*x for x in nums if x % 2 == 0)

# map/filter/reduce
def sum_squares_even_hof(nums):
    return reduce(lambda a,b: a+b, map(lambda x: x*x, filter(lambda x: x%2==0, nums)), 0)
```
</details>

<details>
<summary><strong>Solution 2Ô∏è‚É£ ‚Äî top_k_words</strong></summary>

```python
import heapq
from collections import Counter

def top_k_words(tokens, k=3):
    # Materialize minimal structure via Counter (needs one pass)
    c = Counter(tokens)
    return heapq.nlargest(k, c.items(), key=lambda kv: kv[1])
```
</details>

<details>
<summary><strong>Solution 3Ô∏è‚É£ ‚Äî moving_avg</strong></summary>

```python
from collections import deque

def moving_avg(iterable, w):
    d = deque()
    s = 0
    for x in iterable:
        d.append(x); s += x
        if len(d) > w:
            s -= d.popleft()
        if len(d) == w:
            yield s / w
```
</details>

<details>
<summary><strong>Solution 4Ô∏è‚É£ ‚Äî compose2</strong></summary>

```python
def compose2(f, g):
    def h(x):
        return f(g(x))
    return h
```
</details>

<details>
<summary><strong>Solution 5Ô∏è‚É£ ‚Äî partial_demo</strong></summary>

```python
from functools import partial

def to_fixed(x, base=2):
    return f"{x:.{base}f}"

two_dp = partial(to_fixed, base=2)
three_dp = partial(to_fixed, base=3)
```
</details>

<details>
<summary><strong>Solution 6Ô∏è‚É£ ‚Äî safe_map</strong></summary>

```python
def safe_map(fn, iterable, default=None):
    for x in iterable:
        try:
            yield fn(x)
        except Exception:
            yield default
```
</details>

<details>
<summary><strong>Solution 7Ô∏è‚É£ ‚Äî unique_everseen</strong></summary>

```python
def unique_everseen(iterable):
    seen = set()
    for x in iterable:
        if x not in seen:
            seen.add(x)
            yield x
```
</details>

<details>
<summary><strong>Solution 8Ô∏è‚É£ ‚Äî chunked</strong></summary>

```python
def chunked(iterable, size):
    chunk = []
    for x in iterable:
        chunk.append(x)
        if len(chunk) == size:
            yield chunk
            chunk = []
    if chunk:
        yield chunk
```
</details>

<details>
<summary><strong>Solution 9Ô∏è‚É£ ‚Äî cached_slow</strong></summary>

```python
import time
from functools import lru_cache

def slow_square(x):
    time.sleep(0.02)
    return x*x

@lru_cache(maxsize=None)
def cached_square(x):
    return slow_square(x)
```
</details>

<details>
<summary><strong>Solution üîü ‚Äî groupby_len</strong></summary>

```python
from itertools import groupby

def groupby_len(words):
    words = sorted(words, key=len)
    return {k:list(g) for k,g in groupby(words, key=len)}
```
</details>

<details>
<summary><strong>Solution 1Ô∏è‚É£1Ô∏è‚É£ ‚Äî argmax_op</strong></summary>

```python
import operator as op

def argmax_score(rows):
    return max(rows, key=op.itemgetter("score"))
```
</details>

<details>
<summary><strong>Solution 1Ô∏è‚É£2Ô∏è‚É£ ‚Äî normalize_pipeline</strong></summary>

```python
import re, unicodedata
from collections import Counter

WS = re.compile(r"\\s+")
PUNCT = str.maketrans({c:" " for c in ",.;:!?"})

def strip_accents(s: str) -> str:
    nfkd = unicodedata.normalize("NFKD", s)
    return "".join(ch for ch in nfkd if unicodedata.category(ch) != "Mn")

def normalize_tokens(s: str):
    s = unicodedata.normalize("NFKC", s)
    s = strip_accents(s).casefold().translate(PUNCT)
    s = WS.sub(" ", s).strip()
    return (t for t in s.split() if t)

def top3_tokens(text):
    cnt = Counter(normalize_tokens(text))
    return cnt.most_common(3)
```
</details>
