<a id='list-comp'></a> 
# Comprehensions

Comprehensions in Python provide a concise and readable way to create new lists or dictionaries. They are inspired by concepts from functional programming, and Haskell is often cited as an influence.

Instead of writing a multi-line for loop, you can often accomplish the same task in a single line of code.

In this notebook, you will learn:

- [List comprehensions](#list-comp)
- [Dictionary comprehensions](#dict)
- [<mark>Exercises</mark>](#ex)


<a id=list></a>
## List comprehensions

List comprehensions are an elegant way to define and create lists in Python, without needing to use a for loop:

In [None]:
new_list = list(range(10))

So instead of something like this:

In [None]:
square_values = []

for x in new_list:
    square_values.append(x**2)

square_values

A list comprehension can be used:

In [None]:
[x**2 for x in new_list]

Why is this better?

- One liner
- No need to create an empty list at first
- No need to call `.append()` for every loop (which can be costly)
- Direct creation of a list

In [None]:
# only square if the number was odd
[x**2 for x in new_list if x % 2 != 0]

In [None]:
# make the even numbers strings and the odd numbers floats
[float(x) if x % 2 != 0 else str(x) for x in new_list]

![](images/comprehension2.png)
<!-- source? -->

<a id='ex-list-comp'></a> 
## <mark> Exercise: List comprehensions </mark>

1. Add to the code below to create a new list of all the UNEXPECTED columns from the `pokemon`... i.e. all that DO NOT appear in the expected columns list

In [None]:
import pandas as pd

expected_columns = ['id', 'name', 'type', 'hp', 'attack', 'defense', 'special_atk', 'speed', 'legendary']

pokemon = pd.read_csv('data/pokemon.csv')

# Your code here:


2. Data Quality Flags 

You're cleaning survey data where age values can be messy. Create a new list where:
- Valid ages (between 0 and 120) stay as numbers
- Invalid ages are replaced with `Invalid`

**Bonus:** Modify your comprehension to replace invalid ages with the median of the valid ages instead of `None`.

In [None]:
ages = [25, 150, 30, -5, 42, 200, 18, 0, 99, -10, 65]

# Your code here using list comprehension with if/else
cleaned_ages = ...

<a id='dict'></a>
## Dictionaries Comprehensions

Comprehensions are just something that we can do with lists, we can also do them with dictionaries. 

For example, making a dictionary of all the letters and their position in the alphabet.

In [None]:
import string

letters = string.ascii_lowercase

The `enumerate` function will be really helpful here...

In [None]:
for index, item in enumerate(letters):
    print(index, item)

Dict comprehension

`{ key : value for key, value in list_of_tuples if condition }`



In [None]:
{ letter : index+1 for index, letter in enumerate(letters) }

<a id='ex-dict-comp'></a> 
## <mark> Exercise: Dictionary comprehension </mark>

Create a dictionary, such that each key is a letter from 'a' to 'g' repeated 3 times and each value is it's ordered number to the power of 2. The list of letters is already given to you.

*example*: `'ccc' : 9`

**bonus points**: use a dictionary comprehension

**extra**: capitalize the keys.

**extra**: only keep elements with even order numbers.

In [None]:
use_this_list = ['a','b','c','d','e', 'f', 'g']

## your code here ##

<a id='ex'></a>
## <mark>Exercises</mark>

### <mark>Exercise 1: Extract Numeric Columns </mark>

You're doing exploratory data analysis and want to quickly identify which columns contain numeric data for statistical analysis.

Given a DataFrame, create a list of all column names that have numeric data types (`int64`, `float64`).

You should be able to check your code using:
```python
print(f"Numeric columns: {numeric_columns}")
# Expected output: ['user_id', 'age', 'salary']
```

**Bonus:** Create a second list that contains only the float columns (excluding integers).


In [None]:
import pandas as pd

# Sample DataFrame
df = pd.DataFrame({
    'user_id': [1, 2, 3, 4, 5],
    'name': ['Alice', 'Bob', 'Carol', 'Dan', 'Eve'],
    'age': [25, 30, 35, 28, 42],
    'email': ['alice@example.com', 'bob@example.com', 'carol@example.com', 'dan@example.com', 'eve@example.com'],
    'salary': [50000.0, 60000.0, 55000.0, 48000.0, 72000.0],
    'is_active': [True, True, False, True, True]
})

# TODO: Create a list of numeric column names using list comprehension
# Hint: Use df.dtypes to get column data types
# Hint: Numeric types include 'int64', 'float64'

numeric_columns = ... # your code here

### <mark>Exercise 2: Column Data Type Mapping</mark>

You need to document your DataFrame schema for a data pipeline. Create a dictionary that maps each column name to its data type as a string.

You should be able to check your code using:

```python

print("Column types:")
for col, dtype in column_types.items():
    print(f"  {col}: {dtype}")

# Expected output:
# Column types:
#   user_id: int64
#   name: object
#   age: int64
#   email: object
#   salary: float64
#   is_active: bool
```

In [None]:
import pandas as pd

# Sample DataFrame
df = pd.DataFrame({
    'user_id': [1, 2, 3, 4, 5],
    'name': ['Alice', 'Bob', 'Carol', 'Dan', 'Eve'],
    'age': [25, 30, 35, 28, 42],
    'email': ['alice@example.com', 'bob@example.com', 'carol@example.com', 'dan@example.com', 'eve@example.com'],
    'salary': [50000.0, 60000.0, 55000.0, 48000.0, 72000.0],
    'is_active': [True, True, False, True, True]
})

# TODO: Create a dictionary mapping column names to their data types
# Hint: Use df.dtypes which returns a Series with column names as index
# Hint: You can iterate over df.dtypes.items() to get (column, dtype) pairs
# Hint: Convert dtype to string using str(dtype)

column_types = ... # your code here

**Bonus 1:** Create a dictionary that only includes columns with numeric types (int64, float64).

**Bonus 2:** Create a dictionary where the key is the data type and the value is a list of column names with that type.

Example output for Bonus 2:
```python
{
    'int64': ['user_id', 'age'],
    'object': ['name', 'email'],
    'float64': ['salary'],
    'bool': ['is_active']
}
```

**Answers**: Uncomment and run the code cells below to see answers

In [None]:
# %load answers/list-comp-1.py

In [None]:
# %load answers/list-comp-2.py

In [None]:
# %load answers/ex-dict-comp.py

In [None]:
# %load answers/comp-1.py

In [None]:
# %load answers/comp-2.py

<a id=summ></a>

## Summary: List & Dictionary Comprehensions

**Key Takeaways:**

Comprehensions provide a **concise and readable** way to create lists and dictionaries in a single line, replacing multi-line for loops with cleaner syntax.

**List comprehension syntax:** `[expression for item in iterable if condition]`
- Use for transforming and filtering lists
- Example: `[col for col in df.columns if df[col].dtype == 'int64']`

**Dictionary comprehension syntax:** `{key: value for item in iterable if condition}`
- Use for creating mappings and groupings
- Example: `{col: str(dtype) for col, dtype in df.dtypes.items()}`

**Common data science patterns:**
- Filter columns by type: `[col for col in df.columns if condition]`
- Map columns to properties: `{col: df[col].mean() for col in numeric_cols}`
- Document schema: `{col: str(dtype) for col, dtype in df.dtypes.items()}`

**Best practices:**
- Use comprehensions for simple transformations (1-2 operations)
- Keep it readable - if it's complex, use a regular loop
- Avoid nesting beyond 2 levels
- Combine with pandas operations for efficient data analysis

**Remember** that comprehensions are about readability and conciseness - use them when they make your code clearer, not just shorter! üêç