In [1]:
# Setup code; make sure to run this if using Binder or Colab
import sys
import os
sys.path.insert(0, os.path.abspath(os.path.join(os.getcwd(), '..', 'shared')))
import setup_code
stroke_data = setup_code.stroke_data

# Module 6f: Extra (OPTIONAL)


<div class="alert alert-block alert-success">
<b>Section Objectives:</b><br> 
- Learn some useful functions we use often.
</div>

These are some bonus concepts that aren't essential for day-to-day data analysis but are useful to know — and can come up in more advanced tasks and code reading.



## Recursion

A recursive function is a function that calls itself to solve a problem by breaking it down into smaller parts. In contrast, an iteration (like a `for` loop) repeats a task step by step — for example, looping through rows or columns in a DataFrame.

Most of the time, iteration is preferred because it’s more efficient and easier to understand for straightforward tasks. Recursion isn't commonly used in day-to-day DataFrame operations, but it's a powerful concept that helps you understand how certain algorithms (e.g. decision trees) work.

Let’s say we want to calculate a factorial:

In [2]:
def factorial(n):
    if n == 0 or n == 1:
        return 1
    return n * factorial(n - 1)

`factorial(3)` returns `3 * factorial(2)`, which becomes `3 * 2 * factorial(1)`, and so on…



```{note}
Be careful! Recursive functions must always have a base case (a stopping condition), or they’ll go on forever.
```

Using our stroke dataset for the next example, let's see how recursion can be useful:

Without going into the details of what a decision tree is, imagine you want to sort patients into groups based on a question like, “Is the patient older than 60?”

You ask this question and split the patients into two groups:

Group A: Patients older than 60

Group B: Patients 60 or younger

For each group, you then ask another question to split it further, for example, “Does the patient have hypertension?”



You keep splitting groups like this — recursively breaking down the problem — until the groups are small enough or you’ve asked enough questions.

## Anonymous (Lambda) Functions

A lambda function is a quick way to define a quick and simple function in one line, without using `def`. It's very useful with `.apply()` in pandas. It's an anonymous function because you create the function on the spot, without naming it.

The general syntax of a lambda function is as follows: `lambda arguments: expression`, where lambda is the keyword to define an anonymous function, arguments are the function inputs, and expression is the single expression whose result is returned automatically.

Let’s label patients with a bmi_status column, based on their BMI:

In [3]:
stroke_data['bmi_status'] = stroke_data['bmi'].apply(lambda x: 'high' if x > 30 else 'normal')

This adds a new column with the label 'high' if BMI is over 30, 'normal' if less or equal to 30.
The code above is the same as writing the following:

In [4]:
def bmi_label(x):
    return 'high' if x > 30 else 'normal'

stroke_data['bmi_status'] = stroke_data['bmi'].apply(bmi_label)

```{note}
When to use lambda instead of a regular function?

- When your function is short and simple, like a quick calculation or condition.

- When you want to pass a function as an argument (e.g., inside `.apply()`, `.map()`, or `sorted()`).

- When you don’t need to reuse the function elsewhere.
```

## Useful Built-in Functions: `map`, `filter`, and `zip`

These are functions you can use with lists or Series to quickly transform or filter data.

### `map(function, iterable)`

Useful to apply a function to every item in a list or Series.

In [5]:
#Example: Let’s convert categorical gender labels into numeric codes.
stroke_data['gender_code'] = stroke_data['gender'].map({'Male': 1, 'Female': 0, 'Other': 2})

From the examples so far, you might have noticed that `.map` and `.apply` work similarly.

The main difference is that:
-  `.map()` is designed for element-wise value mapping on a single pandas Series, often using dictionaries for fast replacements.
- `.apply()` is more flexible, working on both pandas Series and DataFrames to apply functions that can operate on elements, rows, or columns, including aggregations.

You can read more about the differences from this [stackoverflow Q&A](https://stackoverflow.com/questions/19798153/difference-between-map-applymap-and-apply-methods-in-pandas).

### `filter(function, iterable)`


In [6]:
Filters a list by a condition that returns True.

SyntaxError: invalid syntax (1883615981.py, line 1)

In [None]:
#Example: Get a list of patient ages 80 or older
ages = stroke_data['age'].tolist()
senior_patients = list(filter(lambda x: x >= 80, ages))

### `zip(function, iterable)`


Combines two lists into pairs.

In [None]:
#Example: Pair each patient’s ID and age into tuples
patient_ids = stroke_data['id'].head(3)
ages = stroke_data['age'].head(3)

list(zip(patient_ids, ages))