## Introduction to Python Data Types and Lists


Understanding **data types** is foundational in Python programming, particularly in data science. Python’s data type system defines what kind of values can be stored, how they behave, and how operations are performed upon them. Mastering data types enables clearer reasoning, robust code design, and efficient data manipulation—essential skills for any data scientist.


### Container Sequences: Theory and Principles

**Container sequences** are data structures designed to store and manage collections of other objects (such as numbers, strings, or even other sequences). They provide powerful mechanisms for aggregation, organisation, sorting, and iteration.

### Key Properties of Container Sequences:
- **Mutable vs Immutable**:
  - *Mutable*: Contents can change after creation (e.g., lists, sets, dictionaries).
  - *Immutable*: Contents cannot change after creation (e.g., tuples, strings).
- **Iterability**:
  - Support iteration (looping), enabling systematic data processing.
- **Ordering**:
  - Some container types (lists, tuples) maintain a defined order of elements.


### Lists: Fundamentals and Characteristics

Lists in Python are one of the most versatile and frequently used container types.

#### Essential Properties:
- **Mutable**: You can add, remove, and modify elements freely.
- **Ordered**: Lists preserve the order of insertion, allowing reliable indexing.
- **Dynamic**: Lists can grow or shrink as needed.

#### Syntax Example:
```python
list_example = [element1, element2, element3]
```

### Accessing List Elements: Indexing Basics

List indexing uses integer positions, starting at **0** (the first element):

```python
my_list = ["item1", "item2", "item3"]
print(my_list[1])  # Outputs: item2
```

#### Negative indexing:
```python
print(my_list[-1])  # Outputs: item3 (last item)
```


### Adding Elements: `.append()` Method

- Adds a single new element at the end of a list.

```python
my_list.append("new_item")
print(my_list)
# ["item1", "item2", "item3", "new_item"]
```

### Combining Lists: Concatenation and Extension

Lists can be combined through **concatenation** (`+`) or **extension** (`.extend()`):

#### Concatenation:
- Creates a new combined list.
```python
list1 = [1, 2]
list2 = [3, 4]
combined_list = list1 + list2
# combined_list: [1, 2, 3, 4]
```

#### Extension:
- Adds elements from one list directly into another existing list.
```python
list1.extend(list2)
# list1 is now: [1, 2, 3, 4]
```

### Locating Elements in Lists: `.index()` Method

Use `.index()` to find an element’s position:

```python
position = my_list.index("item2")
print(position)
# Outputs: 1
```

**Important**: `.index()` raises an error if the item isn't present—use cautiously or with `in` checks.

### Removing Elements from Lists: `.pop()` Method

`.pop()` removes and returns an element at a specific index (default is last element):

```python
removed_item = my_list.pop(1)
print(removed_item)  # "item2"
print(my_list)       # ["item1", "item3"]
```

- Efficiently supports operations such as stacks and queues.


### Iterating Over Lists: List Comprehensions

List comprehensions provide concise, efficient iteration and transformation:

```python
new_list = [operation(element) for element in original_list]
```

Example of capitalising all elements:
```python
capitalised_list = [item.title() for item in ["apple", "banana"]]
# ["Apple", "Banana"]
```

- Concise syntax improves readability.
- Performs better than traditional loops in many contexts.


### Sorting Lists: `sorted()` vs `.sort()`

Python provides two powerful methods for sorting:

#### `sorted()` Function:
- Returns a new, sorted list without changing the original.
```python
original_list = ["b", "a", "c"]
sorted_list = sorted(original_list)
# sorted_list: ["a", "b", "c"]
# original_list remains unchanged
```

#### `.sort()` Method:
- Modifies the original list directly (in-place).
```python
original_list.sort()
# original_list: ["a", "b", "c"]
```

#### Sorting considerations:
- Both can sort numerical and textual data.
- `sorted()` is preferred when you need to retain the original list.
- `.sort()` is efficient for large datasets when memory is a consideration.


### Rationale of Lists

- **Dynamic aggregation**: Lists effectively store and manipulate collections of data, fundamental in exploratory data analysis (EDA), simulation, and statistical computations.
- **Sorting and ordering**: Essential for tasks like finding medians, percentiles, and performing group operations.
- **Iteration efficiency**: List comprehensions facilitate powerful data transformations, optimising code readability and computational speed.


Python lists are indispensable in data science workflows, underpinning tasks from data cleaning and transformation to algorithmic data processing. Mastering list operations enhances analytical rigour, computational efficiency, and code readability. As foundational building blocks, lists empower data scientists to systematically manage, manipulate, and derive insights from structured collections of data across any domain or context.


In [1]:
import csv
import os
from pathlib import Path

In [2]:
# Acessing single items in list
cookies = ["chocolate chip", "peanut butter", "sugar"]

In [3]:
cookies.append("tirggel")

In [4]:
print(cookies)

['chocolate chip', 'peanut butter', 'sugar', 'tirggel']


In [5]:
print(cookies[2])

sugar


In [6]:
# Combining lists using operators
cakes = ["strawberry", "vanilla"]

desserts = cookies + cakes

print(desserts)

['chocolate chip', 'peanut butter', 'sugar', 'tirggel', 'strawberry', 'vanilla']


In [7]:
# .extend() method merges a list into another list at the end
cookies.extend(cakes)

In [8]:
print(cookies)

['chocolate chip', 'peanut butter', 'sugar', 'tirggel', 'strawberry', 'vanilla']


In [9]:
# .index() method locates theposition of a data element in a list
position = cookies.index("sugar")

print(position)

2


In [10]:
# .pop() method removes an item from a list and allows you to save it
name = cookies.pop(position)

print(name)

sugar


In [11]:
# List comprehensions are a common way of iterating over a list to perform some action on them
title_case_cookies = [cookie.title() for cookie in cookies]

print(title_case_cookies)

['Chocolate Chip', 'Peanut Butter', 'Tirggel', 'Strawberry', 'Vanilla']


In [12]:
# .sorted() functions sort data in numerical or alphabetical order and returns a new list
sorted_cookies = sorted(cookies)

print(sorted_cookies)

['chocolate chip', 'peanut butter', 'strawberry', 'tirggel', 'vanilla']


### Manipulating lists for fun and profit
You may be familiar with adding individual data elements to a list by using the `.append()` method. However, if you want to combine a list with another array type (list, set, tuple), you can use the `.extend()` method on the list.

You can also use the `.index()` method to find the position of an item in a list. You can then use that position to remove the item with the `.pop()` method.

In [13]:
# Create a list called baby_names with the names 'Ximena', 'Aliza', 'Ayden', and 'Calvin'.
baby_names = ["Ximena", "Aliza", "Ayden", "Calvin"]


# Use the .extend() method on baby_names to add 'Rowen' and 'Sandeep' and print the list.
baby_names.extend(["Rowen", "Sandeep"])

# Use the .index() method to find the position of 'Rowen' in the list. Save the result as position.
position = baby_names.index("Rowen")

# Use the .pop() method with position to remove 'Rowen' from the list.
baby_names.pop(position)

'Rowen'

### Looping over lists
Previously, you've used a `for` loop to iterate over a list, but you can also use a list comprehension. List comprehensions take the form of `[action for item in list]` and return a new list.

We can use the `sorted()` function to sort the data in a list from lowest to highest in the case of numbers and alphabetical order if the list contains strings. The `sorted()` function returns a new list and does not affect the list you passed into the function. You can learn more about `sorted()` in the Python documentation.

In [14]:
# 1. Verify where Jupyter is running
print("Current working directory:", os.getcwd())

# 2. File path
csv_path = Path("data_types") / "data" / "baby_names.csv"
print("CSV exists at:", csv_path.exists(), "→", csv_path)

# 3. Read and store all rows before the file closes
with open(csv_path, mode="r", newline="", encoding="utf-8") as file:
    records = list(csv.reader(file))

# 4. Inspect the result
print(f"Loaded {len(records)} total rows.")


Current working directory: c:\Users\jhonm\Downloads\Code\Python\Courses\DataCamp\Python\python_fundamentals
CSV exists at: True → data_types\data\baby_names.csv
Loaded 13963 total rows.


In [15]:
# Use a list comprehension on records to create a list called baby_names that contains the name, found in the fourth element of row.
baby_names = [record[3] for record in records]

# Print baby_names in alphabetical order using the sorted() function.
print(sorted(baby_names)[:10])

['AALIYAH', 'AALIYAH', 'AALIYAH', 'AALIYAH', 'AALIYAH', 'AALIYAH', 'AALIYAH', 'AALIYAH', 'AALIYAH', 'AALIYAH']


## Meet the Tuples


**Tuples** are one of Python’s core sequence data types. They provide an ordered, immutable structure for grouping data elements, enabling efficient, predictable, and safe handling of related items. Understanding tuples is essential for effective Python programming, algorithm design, and data manipulation.


### Key Characteristics of Tuples

- **Ordered**: Tuples retain the order in which their elements were added.
- **Immutable**: Once created, tuples cannot be altered—no adding, removing, or changing elements.
- **Indexable**: Elements can be accessed using integer indices, similar to lists.
- **Pairing and Grouping**: Tuples are ideal for storing paired or grouped values.
- **Unpackable**: Python supports tuple unpacking, enabling expressive variable assignments and control over complex data structures.


### Creating Tuples: Syntax and Subtleties

#### Parentheses and Commas

- Standard tuple creation:
    ```python
    my_tuple = (element1, element2)
    ```
- Parentheses are **not** strictly required if commas are present:
    ```python
    my_tuple = element1, element2
    ```
- For a **single-element tuple**, a trailing comma is required:
    ```python
    single_tuple = ("only_element",)
    ```


### Zipping: Creating Tuples by Pairing Sequences

**Zipping** combines multiple iterables (such as lists or ranges) into an iterator of tuples:

```python
a = [1, 2, 3]
b = ['x', 'y', 'z']
paired = list(zip(a, b))
# paired: [(1, 'x'), (2, 'y'), (3, 'z')]
```

- Useful for combining related data, parallel iteration, or data alignment.
- Can zip any number of iterables; iteration stops at the shortest sequence.


### Tuple Unpacking: Assigning Elements to Variables

Tuple unpacking is the process of assigning individual elements of a tuple to distinct variables:

```python
pair = (42, "answer")
number, label = pair
# number: 42, label: "answer"
```

- Increases code readability and clarity, especially when working with grouped data.


### Unpacking Tuples in Loops

Unpacking is especially powerful and concise in loops, enabling elegant iteration over sequences of tuples:

```python
pairs = [(1, 'a'), (2, 'b'), (3, 'c')]
for number, letter in pairs:
    print(number, letter)
# Outputs:
# 1 a
# 2 b
# 3 c
```

### Enumerating Positions: Using `enumerate()` for Indexed Tuples

The `enumerate()` function returns pairs of index and value when looping through a sequence:

```python
for idx, item in enumerate(['alpha', 'beta', 'gamma']):
    print(idx, item)
# 0 alpha
# 1 beta
# 2 gamma
```

- Combines tuple unpacking with position tracking, crucial for algorithms requiring both element and index.


### Pitfalls and Best Practices in Tuple Construction

- **Beware of single-element tuples:** Always include a trailing comma, e.g., `("element",)`; omitting it produces a string or other data type.
- **Use `zip()` and `enumerate()`** for robust tuple generation in batch data operations, especially in data science pipelines and iterative algorithms.
- **Immutability ensures safety:** Tuples are often used for keys in dictionaries or elements in sets because their immutability makes them hashable and reliable.


### Theoretical Rationale and Applications

- **Immutability:** Tuples are ideal for representing fixed collections of heterogeneous data (such as records, points in space, database rows), guaranteeing the integrity of groupings throughout program execution.
- **Functional programming:** Their immutability aligns with functional paradigms, allowing for safe and reproducible operations.
- **Efficient memory usage:** Tuples consume less memory than lists of the same length due to their fixed structure, which is advantageous in large-scale or performance-critical applications.
- **Hashability:** Only immutable data types can be used as dictionary keys or set elements, making tuples uniquely valuable for certain data structures and algorithms.


### Typical Applications 

- **Return multiple values from functions**: Tuples provide a concise way to return several items at once.
- **Paired data iteration**: Aligning features and labels in machine learning, or row-wise operations in data analysis.
- **Database-style records**: Storing small, fixed groups of heterogeneous attributes.
- **Efficient lookup tables**: Using tuples as compound keys in dictionaries.


Tuples are a foundational Python construct—ordered, immutable, and elegantly suited for grouping, pairing, and transporting related data. Their syntactic simplicity, memory efficiency, and functional safety make them indispensable for both general-purpose programming and sophisticated data science tasks. Mastery of tuple creation, unpacking, and manipulation empowers robust, readable, and error-resistant Python code.


In [16]:
us_cookies = [
    "Chocolate Chip",
    "Brownies",
    "Peanut Butter",
    "Oreos",
    "Oatmeal Raisin"]

in_cookies = ["Punjabi", 
              "Fruit Cake Rust", 
              "Marble Cookies", 
              "Kaju Pista Cookies", 
              "Almond Cookies"]

In [17]:
top_pairs = list(zip(us_cookies, in_cookies))
print(top_pairs)

[('Chocolate Chip', 'Punjabi'), ('Brownies', 'Fruit Cake Rust'), ('Peanut Butter', 'Marble Cookies'), ('Oreos', 'Kaju Pista Cookies'), ('Oatmeal Raisin', 'Almond Cookies')]


In [18]:
# Unpacking tuples
us_num_1, in_num_1 = top_pairs[0]

print(us_num_1)
print(in_num_1)

Chocolate Chip
Punjabi


In [19]:
for us_cookie, in_cookie in top_pairs:
    print(in_cookie)
    print(us_cookie)

Punjabi
Chocolate Chip
Fruit Cake Rust
Brownies
Marble Cookies
Peanut Butter
Kaju Pista Cookies
Oreos
Almond Cookies
Oatmeal Raisin


In [20]:
for idx, item in enumerate(top_pairs):
    us_cookie, in_cookie = item
    print(idx, us_cookie, in_cookie)

0 Chocolate Chip Punjabi
1 Brownies Fruit Cake Rust
2 Peanut Butter Marble Cookies
3 Oreos Kaju Pista Cookies
4 Oatmeal Raisin Almond Cookies


### Using and unpacking tuples
If you have a tuple like `('chocolate chip cookies', 15)` and you want to access each part of the data, you can use an index just like a list. However, you can also "unpack" the tuple into multiple variables such as `type, count = ('chocolate chip cookies', 15)` that will set type to `'chocolate chip cookies'` and `count` to `15`.

Often you'll want to pair up multiple array data types. The `zip()` function does just that. It will return a list of tuples containing one element from each list passed into `zip()`.

When looping over a list, you can also track your position in the list by using the `enumerate()` function. The function returns the index of the list item you are currently on in the list and the list item itself. 

In [21]:
# Create a list of unique boy names from the records, filtering for MALE gender and converting names to title case for consistency
boy_unique = list(set([name[3].title() for name in records if name[1] == "MALE"]))
print(sorted(boy_unique)[:10])
print(len(boy_unique))

['Aahil', 'Aarav', 'Aaron', 'Aayan', 'Abdiel', 'Abdoul', 'Abdoulaye', 'Abdul', 'Abdullah', 'Abel']
766


In [22]:
# Create a list of unique girl names from the records, filtering for FEMALE gender and converting names to title case for consistency
girl_unique = list(set([name[3].title() for name in records if name[1] == "FEMALE"]))
print(sorted(girl_unique)[:10])
print(len(girl_unique))

['Aaliyah', 'Aarya', 'Abby', 'Abigail', 'Abrielle', 'Abril', 'Ada', 'Addison', 'Adelaide', 'Adele']
889


In [23]:
# Use the zip() function to pair up girl_names and boy_names into a variable called pairs.
pairs = list(zip(girl_unique, boy_unique))

# Use a for loop to loop through the first 10 pairs, using enumerate() to keep track of your position. Unpack pairs into the variables rank and pair.
for rank, pair in enumerate(pairs[:10]):
    
    # Unpack pair: girl_name, boy_name
    girl_name, boy_name = pair
    
    # Print the rank, girl name, and boy name, in that order. The rank is contained in rank.
    print(f'Rank {rank+1}: {girl_name} and {boy_name}')

Rank 1: Fernanda and Eliyahu
Rank 2: Zelda and Shloime
Rank 3: Jayda and Kenneth
Rank 4: Kimora and Carmine
Rank 5: Kailyn and Yehoshua
Rank 6: Michaela and Nathanael
Rank 7: Lea and Arian
Rank 8: Amelia and Yitzchok
Rank 9: Sandra and Alter
Rank 10: Aubree and Menashe


### Making tuples by accident
Tuples are very powerful and useful, and it's super easy to make one by accident. All you have to do is create a variable and follow the assignment with a comma. This becomes an error when you try to use the variable later expecting it to be a string or a number.

You can verify the data type of a variable with the `type()` function. In this exercise, you'll see for yourself how easy it is to make a tuple by accident.

In [24]:
# Create a variable named normal and set it equal to 'simple'.
normal = "simple"

# Create a variable named error and set it equal to 'trailing comma',.
error = "trailing comma",

# Print the type of the normal and error variables.
print(type(normal))
print(type(error))

<class 'str'>
<class 'tuple'>


## Strings in Python: Construction, Manipulation, and Best Practices

**Strings** are one of the most fundamental and versatile data types in Python. They represent sequences of Unicode characters, serving as the backbone for textual data storage, manipulation, and analysis in nearly every Python application. From data cleaning and natural language processing to configuration and output formatting, mastery of string operations is indispensable for robust and expressive Python programming.

### Creating Formatted Strings: f-Strings and Beyond

#### f-Strings (Formatted String Literals)

Introduced in Python 3.6, **f-strings** provide a concise and readable way to embed expressions inside string literals, prefixed with `f`:

```python
name = "Alex"
age = 30
formatted = f"{name} is {age} years old."
# "Alex is 30 years old."
```

**Key features:**
- Embed variables and expressions directly in curly braces.
- Support arbitrary Python expressions (e.g., `{value * 100:.2f}` for formatting).
- Faster and clearer than alternatives like `.format()` or `%` formatting.


### Joining Strings: Concatenation and the `join()` Method

#### Efficiently Joining Lists of Strings

The `join()` method is the preferred way to concatenate an iterable of strings with a chosen separator:

```python
words = ["data", "science", "is", "fun"]
sentence = " ".join(words)
# "data science is fun"
```

**Why use `join()`?**
- **Efficiency:** Faster than repeated string addition in a loop.
- **Flexibility:** Can join any iterable of strings (lists, tuples, generators).
- **Custom separators:** Use commas, spaces, or any string.

#### Example: Dynamic List Joining

```python
items = ["apple", "banana", "cherry"]
print(f"Available fruits: {', '.join(items[:-1])}, and {items[-1]}.")
# "Available fruits: apple, banana, and cherry."
```


### Matching Parts of a String: Prefixes, Suffixes, and Patterns

#### Using `.startswith()` and `.endswith()`

These methods return `True` if the string starts or ends with the specified substring, respectively:

```python
filename = "data_report.csv"
filename.startswith("data")    # True
filename.endswith(".csv")      # True
```

- **Case-sensitive**: Always consider letter case; `"Data"` and `"data"` are not the same.
- **Iterable support**: You can pass a tuple of prefixes/suffixes to check against multiple possibilities.


### Searching Within Strings: The `in` Operator

The `in` operator checks for substring presence:

```python
sentence = "Python is powerful."
"Python" in sentence       # True
"power" in sentence        # True
"java" in sentence         # False
```

**Why is this powerful?**
- Highly readable.
- Works for any iterable, not just strings.


### Case Sensitivity and Case Normalisation

String operations in Python are **case-sensitive** by default:

```python
"data" in "Data Science"    # False
```

#### Case Insensitive Searching

A common strategy is to convert both strings to the same case (usually lower) before searching or comparison:

```python
target = "data"
sentence = "Data Science is fun."
target in sentence.lower()   # True
```

**Best practice:** Use `.lower()` or `.upper()` for robust, case-insensitive comparisons.


### Rationale and Applications

- **Text processing**: Strings form the basis for reading, cleaning, and analysing textual data.
- **User interaction and reporting**: String formatting creates clear, professional output and error messages.
- **Search and pattern matching**: Enables filtering, classification, and information retrieval.
- **Configuration and templating**: Dynamic string construction is foundational for scripting, templating engines, and web development.

### Advanced String Manipulation Strategies

- **Slicing**: Extract substrings via indexing (e.g., `s[1:4]`).
- **Regular expressions**: Use the `re` module for complex pattern matching and manipulation.
- **Escape characters**: Manage special characters (e.g., newline `\n`, tab `\t`).
- **Immutability**: Strings are immutable; all string operations return new strings.

Strings are an essential and richly featured data type in Python. Mastery of their creation, manipulation, and analysis underpins effective programming in data science, automation, user interaction, and beyond. By understanding the principles, syntax, and best practices covered here, you’ll wield string operations with confidence and precision, powering robust, expressive, and high-performance Python code.


In [25]:
cookie_name = "Anzac"
cookie_price = "$1.99"

print(f"Each {cookie_name} cookie costs {cookie_price}.")

Each Anzac cookie costs $1.99.


In [26]:
child_ages = ["3", "4", "7", "8"]
print(", ".join(child_ages))

3, 4, 7, 8


In [27]:
print(f"The children are ages {', '.join(child_ages[0:3])}, and {child_ages[-1]}.")

The children are ages 3, 4, 7, and 8.


In [28]:
list_boys = ["Mohamed", "Youssef", "Ahmed"]
print([name for name in list_boys if name.startswith("A")])

['Ahmed']


In [29]:
"long" in "Life is a long lesson in humility."

True

In [30]:
"life" in "Life is a long lesson in humility".lower()

True

### Formatted String Literals ("f" strings)
We've been using plain strings with `""` or `''` in this class so far, but there are several types of strings and blend variables with them. the most recent addition of a string type to Python is the "f-strings", which is short for formatted string literals. "F-strings" make it easy to mix strings with variables and formatting to help get exactly the output you want and you make them by prefacing the quotes with the letter f like `f""`. If you want to include a variable within a string you can use the `{}` around the variable in an f-string to insert the variable's value into the string itself. For example if we had a variable count with the number 12 stored it in, we could make an f-string like `f"{count} cookies"`, which would output the string `"12 cookies"` when printed. The list `top_ten_girl_names` contains tuples that correspond to the `top_ten_rank` and name for each position.

In [31]:
# 1. Verify where Jupyter is running
print("Current working directory:", os.getcwd())

# 2. File path
csv_path = Path("data_types") / "data" / "baby_names.csv"
print("CSV exists at:", csv_path.exists(), "→", csv_path)

# 3. Read and store all rows before the file closes
with open(csv_path, mode="r", newline="", encoding="utf-8") as file:
    records = list(csv.reader(file))

# 4. Inspect the result
print(f"Loaded {len(records)} total rows.")

Current working directory: c:\Users\jhonm\Downloads\Code\Python\Courses\DataCamp\Python\python_fundamentals
CSV exists at: True → data_types\data\baby_names.csv
Loaded 13963 total rows.


In [32]:
# Create lists of boy and girl names from the records, filtering by gender and converting to title case
boy_names = [name[3].title() for name in records if name[1] == "MALE"]
girl_names = [name[3].title() for name in records if name[1] == "FEMALE"]

In [33]:
from collections import Counter

# Count occurrences of each boy name in the dataset
top_boys_names = Counter(boy_names)

# Count occurrences of each girl name in the dataset
top_girls_names = Counter(girl_names)

In [34]:
# Create a tuple of the top 10 most popular girl names with proper ranking

# Step 1: Get the top 10 most common names (returns list of (name, count) tuples)
top_10_most_common = top_girls_names.most_common(10)

# Step 2: Convert to (rank, name) tuples where rank starts at 1
top_ten_girl_names = tuple(
    (rank, name) for rank, (name, count) in enumerate(top_10_most_common, start=1)
)

In [35]:
# Loop over the top_ten_girl_names list and use tuple unpacking to get the top_ten_rank and name.
for top_ten_rank, name in top_ten_girl_names:
    
    # Print out each rank and name like this Rank #: 1 - Jada where the number 1 is the rank and Jada is the name.
    print(f"Rank #{top_ten_rank}: {name}")

Rank #1: Grace
Rank #2: Hailey
Rank #3: Hannah
Rank #4: Isabella
Rank #5: Jasmine
Rank #6: Kaitlyn
Rank #7: Kayla
Rank #8: Kaylee
Rank #9: Leah
Rank #10: Madison


### Combining multiple strings
F strings work great for a few variables, but what if you want to combine a whole list of variables into a string. You can use the `"".join()` method for just that. You put what you want to join the list items with inside the "" and then pass the list into the `join()` method. For example, if you want to join all the items in a list named cookies with a comma and space it would look like `", ".join(cookies)`.

In [36]:
# Make a string that contains: The top ten boy names are: and store it as preamble.
preamble = "The top ten boy names are: "

# , and as conjunction
conjunction = ", and "

# Make a string that combines the first 9 names in boy_names list with a comma and store it as first_nine_names.
first_nine_names = ", ".join(boy_names[:10])

# Print f-string preamble, first_nine_names, conjunction, the final item in boy_names and a period
print(f"{preamble}{first_nine_names}{conjunction}{boy_names[-1]}.")

The top ten boy names are: Aarav, Aaron, Abdul, Abdullah, Adam, Aditya, Adrian, Ahmed, Aidan, Aiden, and Zev.


### Finding strings in other strings
Many times when we are working with strings, we care about which characters are in the string. For example, we may want to know how many cookies in a list of cookies have the word `Chocolate` in them, or how many start with the letter `C`. We can perform these checks by using the `in` keyword and the `.startswith()` method on a string. We can also use conditionals on a list comprehension in the form of `[action for item in list if something is true]`. Using our cookies examples, it would be something like `[cookie_name for cookie_name in cookies if 'chocolate' in cookie_name.lower()]`. Note these checks are case sensitive so we're using the `.lower()` method on the string. We can also "chain" methods together by calling them one after the other.

In [37]:
# Store and print a list of girl_unique that start with "Ste".
girl_ste = [girl for girl in girl_unique if girl.startswith("Ste")]
print(girl_ste)

['Stephanie', 'Stella', 'Stephany']


In [38]:
# Store and print a list of girl_names with angel in them.
girl_angel = [girl for girl in girl_unique if "angel" in girl.lower()]
print(girl_angel)

['Angel', 'Angelique', 'Angely', 'Angelina', 'Angeline', 'Angelica', 'Evangeline', 'Angela']


## Using Dictionaries: Theory, Syntax, and Techniques in Python


**Dictionaries** are powerful, flexible, and fundamental data structures in Python. Often known as associative arrays, hash tables, or hash maps in other programming languages, dictionaries store and retrieve data through **key-value pairs**, enabling rapid, intuitive, and efficient data access. Mastery of dictionaries underpins effective programming, algorithmic reasoning, and complex data management in data science and beyond.



### Key Characteristics of Dictionaries

- **Key-Value Structure**: Store and retrieve values based on unique keys.
- **Mutable**: Dictionaries can dynamically grow or shrink; elements can be added, modified, or removed.
- **Unordered (Historically)**: Prior to Python 3.7, dictionaries were unordered; since Python 3.7+, insertion order is preserved.
- **Iterable**: Iteration over keys, values, or key-value pairs is straightforward and efficient.
- **Nestable**: Values within dictionaries can themselves be dictionaries or other complex data types, enabling hierarchical data storage.

### Creating Dictionaries: Syntax and Approaches

#### Using `{}` literal syntax:
```python
dictionary_example = {"key1": "value1", "key2": "value2"}
```

#### Using the `dict()` constructor:
```python
dictionary_example = dict(key1="value1", key2="value2")
```

#### From an iterable of pairs:
```python
pairs = [("key1", "value1"), ("key2", "value2")]
dictionary_example = dict(pairs)
```


### Looping Through Dictionaries: Key Principles

Looping through dictionaries can be done over keys, values, or items (key-value pairs):

#### Iterating over keys:
```python
for key in dictionary:
    print(key)
```

#### Iterating over values:
```python
for value in dictionary.values():
    print(value)
```

#### Iterating over key-value pairs:
```python
for key, value in dictionary.items():
    print(key, value)
```

### Printing Dictionaries with Ordered Output

Since dictionaries preserve insertion order (Python 3.7+), sorting before iteration is useful when alphabetic or numeric order is needed:

```python
for key in sorted(dictionary):
    print(key, dictionary[key])
```

This ensures a consistent, human-readable ordering.

### Accessing Dictionary Values Safely: Rationale and Techniques

Dictionaries access values using keys. However, directly accessing a key that does not exist raises a `KeyError`, which interrupts program execution.

#### Direct key access (potentially unsafe):
```python
value = dictionary["nonexistent_key"]  # Raises KeyError
```

**Safest practice** is using `.get()`:

#### `.get()` Method:
Returns a default value if the key is not present, avoiding exceptions:

```python
value = dictionary.get("nonexistent_key", "Default Value")
# Returns "Default Value" if key does not exist
```

- If no default is provided, `.get()` returns `None` when key is absent.

#### Example of Robust Usage:
```python
key = "example_key"
value = dictionary.get(key, f"{key} not found")
print(value)
```

### Nesting Dictionaries: Handling Complex Data Structures

Dictionaries are ideal for hierarchical data storage:

```python
nested_dict = {
    "outer_key1": {"inner_key1": "inner_value1"},
    "outer_key2": {"inner_key2": "inner_value2"}
}
```

**Accessing Nested Data:**
```python
nested_value = nested_dict["outer_key1"]["inner_key1"]
```

- Nesting dictionaries provides intuitive management of complex datasets (JSON, API responses, configuration files).

### Theoretical and Computational Rationale

- **Fast access**: Dictionary lookups use hash tables, providing average-case O(1) complexity, ideal for large-scale data.
- **Data relationships**: Dictionaries naturally represent relationships between keys and associated values, critical in databases, caches, and in-memory indexing.
- **Dynamic storage**: They allow dynamic addition, deletion, and updates of key-value pairs, suited to streaming data, user input, or real-time modifications.



### Creating and looping through dictionaries
You'll often encounter the need to loop over some array type data, like in Chapter 1, and provide it some structure so you can find the data you desire quickly.

You start that by creating an empty dictionary and assigning part of your array data as the key and the rest as the value.

Previously, you used `sorted()` to organize your data in a list. Dictionaries can also be sorted. By default, using `sorted()` on a dictionary will sort by the keys of the dictionary.

In [39]:
squirrels = [
    ("Marcus Garvey Park", ("Black", "Cinnamon", "Cleaning", None)),
    ("Highbridge Park", ("Gray", "Cinnamon", "Running, Eating", "Runs From, watches us in short tree")),
    ("Madison Square Park", ("Gray", None, "Foraging", "Indifferent")),
    ("City Hall Park", ("Gray", "Cinnamon", "Eating", "Approaches")),
    ("J. Hood Wright Park", ("Gray", "White", "Running", "Indifferent")),
    ("Seward Park", ("Gray", "Cinnamon", "Eating", "Indifferent")), 
    ("Union Square Park", ("Gray", "Black", "Climbing", None)),
    ("Tompkins Square Park", ("Gray", "Gray", "Lounging", "Approaches")),
]

In [40]:
# Create an empty dictionary called squirrels_by_park.
squirrels_by_park = {}

# Loop over squirrels, unpacking it into the variables park and squirrel_details.
for park, squirrel in squirrels:

    # Inside the loop, add each squirrel_details to the squirrels_by_park dictionary using the park as the key.
    squirrels_by_park[park] = squirrel

# Sort the squirrel_details dictionary keys in ascending order, print each park and its value using an F string.
for park in sorted(squirrels_by_park):
    
    # Print each park and its value in squirrels_by_park
    print(f"{park}: {squirrels_by_park[park]}")


City Hall Park: ('Gray', 'Cinnamon', 'Eating', 'Approaches')
Highbridge Park: ('Gray', 'Cinnamon', 'Running, Eating', 'Runs From, watches us in short tree')
J. Hood Wright Park: ('Gray', 'White', 'Running', 'Indifferent')
Madison Square Park: ('Gray', None, 'Foraging', 'Indifferent')
Marcus Garvey Park: ('Black', 'Cinnamon', 'Cleaning', None)
Seward Park: ('Gray', 'Cinnamon', 'Eating', 'Indifferent')
Tompkins Square Park: ('Gray', 'Gray', 'Lounging', 'Approaches')
Union Square Park: ('Gray', 'Black', 'Climbing', None)


### Safely finding by key
As demonstrated in the video, if you attempt to access a key that isn't present in a dictionary, you'll get a `KeyError`. One option to handle this type of error is to use a `try: except:` block. You can learn more about error handling in **Python Data Science Toolbox (Part 1)**.

Python provides a faster, more versatile tool to help with this problem in the form of the `.get()` method. The `.get()` method allows you to supply the name of a key, and optionally, what you'd like to have returned if the key is not found.

In [41]:
# Safely print 'Union Square Park' from the squirrels_by_park dictionary .
print(f"Union Square Park: {squirrels_by_park.get('Union Square Park')}")

# Safely print the type of 'Fort Tryon Park' from the squirrels_by_park dictionary.
print(f"Fort Tryon Park: {type(squirrels_by_park.get('Fort Tryon Park'))}")

# # Safely print 'Central Park' from the squirrels_by_park dictionary or 'Not Found'
print(f"Central Park: {squirrels_by_park.get('Central Park', 'Not Found')}")

Union Square Park: ('Gray', 'Black', 'Climbing', None)
Fort Tryon Park: <class 'NoneType'>
Central Park: Not Found


## Altering Dictionaries: Techniques, Theory, and Best Practices in Python


Dictionaries in Python are powerful data structures that allow flexible management of key-value pairs. Beyond simple creation and retrieval, Python dictionaries can be **dynamically altered**—keys and values added, updated, or removed—making them exceptionally useful in evolving data workflows, real-time applications, and robust programming solutions. This section rigorously explores the methods, theory, syntax, and best practices for safely and efficiently altering dictionaries.

### Adding and Extending Dictionaries

#### Adding Single Key-Value Pairs

The simplest way to add a new entry to an existing dictionary is through assignment:

```python
dictionary["new_key"] = "new_value"
```

- If `"new_key"` already exists, its value is updated.
- If not, a new key-value pair is added.

#### Extending with `.update()`

To merge dictionaries or add multiple key-value pairs simultaneously, use the `.update()` method:

```python
dict1 = {"a": 1, "b": 2}
dict2 = {"b": 3, "c": 4}

dict1.update(dict2)
# dict1: {"a": 1, "b": 3, "c": 4}
```

- **Dictionary merging:** Values from the second dictionary (`dict2`) overwrite those in the first (`dict1`) if keys overlap.
- Can also update using iterables of tuples or keyword arguments:
    ```python
    dict1.update([("d", 5), ("e", 6)])
    dict1.update(f=7, g=8)
    ```


### Updating Existing Dictionaries: Deep Dive

#### Updating with Tuples or Lists of Pairs

`.update()` conveniently supports various iterable types containing key-value pairs:

```python
updates = [("key1", "value1"), ("key2", "value2")]
dictionary.update(updates)
```

#### Updating Using Keyword Arguments

You can also pass key-value pairs directly as keyword arguments if keys are valid identifiers:

```python
dictionary.update(key1="value1", key2="value2")
```

#### Rationale and Theory of Updating:
- Provides flexible and efficient means to keep dictionaries up-to-date, especially beneficial when data is received incrementally or modified frequently.
- Promotes concise and readable code when performing bulk updates.


### Removing Elements from Dictionaries: Safe and Unsafe Methods

Dictionaries often require deletion of obsolete, redundant, or incorrect data entries. Python provides multiple robust methods for safely managing these scenarios.

#### Using `del` (Deletion)

The `del` statement removes a key-value pair directly, but raises a `KeyError` if the key is not found:

```python
del dictionary["key"]
```

- **Caution:** Always check for existence of the key or handle exceptions to avoid program interruption.

#### Using `.pop()` Method

The `.pop()` method removes a key and returns its associated value, optionally providing a default value if the key is absent:

```python
value_removed = dictionary.pop("key", "default_value")
```

- If the key is present, removes it and returns its value.
- If absent and no default provided, raises a `KeyError`.
- Recommended for robust, safe deletion with predictable outcomes.

### Rationale, Theoretical Foundations, and Applications

#### Mutability and Dynamic Data Management

- Dictionaries' **mutability** permits dynamic, flexible modifications—ideal for real-time data updates, configuration adjustments, and interactive systems.
- Supports evolving data schemas without requiring restructuring or recreation of the entire data object.

#### Safe Handling and Robustness

- Utilizing safe methods (`.pop()` and `.update()`) enhances program reliability, reducing runtime errors and enabling predictable behaviour.
- Enables smoother error handling and programmatic correction of data anomalies.

### Adding and extending dictionaries
If you have a dictionary and you want to add data to it, you can simply create a new key and assign the data you desire to it. It's important to remember that if it's a nested dictionary, then all the keys in the data path must exist, and each key in the path must be assigned individually.

You can also use the `.update()` method to update a dictionary with a list of keys and values from another dictionary, tuples or keyword arguments.

In [42]:
squirrels_madison = [
    {
        "primary_fur_color": "Gray",
        "highlights_in_fur_color": None,
        "activities": "Foraging",
        "interactions_with_humans": "Indifferent",
    },
    {
        "primary_fur_color": "Gray",
        "highlights_in_fur_color": None,
        "activities": "Sitting",
        "interactions_with_humans": "Indifferent",
    },
]

In [43]:
squirrels_union = [
        {
            "primary_fur_color": "Gray",
            "highlights_in_fur_color": None,
            "activities": "Eating, Foraging",
            "interactions_with_humans": None,
        },
        {
            "primary_fur_color": "Gray",
            "highlights_in_fur_color": "Cinnamon",
            "activities": "Climbing, Eating",
            "interactions_with_humans": None,
        },
        {
            "primary_fur_color": "Cinnamon",
            "highlights_in_fur_color": None,
            "activities": "Foraging",
            "interactions_with_humans": "Indifferent",
        },
        {
            "primary_fur_color": "Gray",
            "highlights_in_fur_color": None,
            "activities": "Running, Digging",
            "interactions_with_humans": "Runs From",
        },
        {
            "primary_fur_color": "Gray",
            "highlights_in_fur_color": None,
            "activities": "Digging",
            "interactions_with_humans": "Indifferent",
        },
        {
            "primary_fur_color": "Gray",
            "highlights_in_fur_color": "Black",
            "activities": "Climbing",
            "interactions_with_humans": None,
        },
        {
            "primary_fur_color": "Gray",
            "highlights_in_fur_color": None,
            "activities": "Eating, Foraging",
            "interactions_with_humans": None,
        }
    ]

In [44]:
squirrels_by_park["Madison Square Park"] = squirrels_madison
print(type(squirrels_by_park["Madison Square Park"]))
print(squirrels_by_park["Madison Square Park"])

<class 'list'>
[{'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Foraging', 'interactions_with_humans': 'Indifferent'}, {'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Sitting', 'interactions_with_humans': 'Indifferent'}]


In [45]:
# Update squirrels_by_park with the squirrels_union tuple
squirrels_by_park["Union Square Park"] = squirrels_union

In [46]:
print(squirrels_by_park["Union Square Park"])

[{'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Eating, Foraging', 'interactions_with_humans': None}, {'primary_fur_color': 'Gray', 'highlights_in_fur_color': 'Cinnamon', 'activities': 'Climbing, Eating', 'interactions_with_humans': None}, {'primary_fur_color': 'Cinnamon', 'highlights_in_fur_color': None, 'activities': 'Foraging', 'interactions_with_humans': 'Indifferent'}, {'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Running, Digging', 'interactions_with_humans': 'Runs From'}, {'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Digging', 'interactions_with_humans': 'Indifferent'}, {'primary_fur_color': 'Gray', 'highlights_in_fur_color': 'Black', 'activities': 'Climbing', 'interactions_with_humans': None}, {'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Eating, Foraging', 'interactions_with_humans': None}]


In [47]:
# OUTDATED DATACAMP: THIS EXERCISE DOESN'T WORK. THE CODE WON'T RUN.
# # Loop over the park_name in the squirrels_by_park dictionary
# for park_name in squirrels_by_park:
#     # Safely print a list of the primary_fur_color for each squirrel in park_name
#     print(
#         park_name,
#         [squirrel.get("primary_fur_color", "N/A") for squirrel in squirrels_by_park[park_name]],
#     )

In [48]:
# Loop over the park_name in the squirrels_by_park dictionary
for park_name in squirrels_by_park:
    # Check if the park data is a list of dictionaries or a tuple
    park_data = squirrels_by_park[park_name]
    if isinstance(park_data, list):
        # For lists of dictionaries, safely get primary_fur_color
        print(
            park_name,
            [squirrel.get("primary_fur_color", "N/A") for squirrel in park_data],
        )

Madison Square Park ['Gray', 'Gray']
Union Square Park ['Gray', 'Gray', 'Cinnamon', 'Gray', 'Gray', 'Gray', 'Gray']


### Popping and deleting from dictionaries
Often, you will want to remove keys and value from a dictionary. You can do so using the del Python instruction. It's important to remember that del will throw a `KeyError` if the key you are trying to delete does not exist. You can not use it with the `.get()` method to safely delete items; however, it can be used with `try: catch:`.

If you want to save that deleted data into another variable for further processing, the `.pop()` dictionary method will do just that. You can supply a default value for `.pop()` much like you did for `.get()` to safely deal with missing keys. It's also typical to use `.pop()` instead of del since it is a safe method.

In [49]:
# Remove "Madison Square Park" from squirrels_by_park and store it as squirrels_madison.
squirrels_madison = squirrels_by_park.pop("Madison Square Park", {})

# Safely remove "City Hall Park" from squirrels_by_park with a empty dictionary as the default and store it as squirrels_city_hall. To do this, pass in an empty dictionary {} as a second argument to .pop().
squares_city_hall = squirrels_by_park.pop("City Hall Park", {})

# Delete "Union Square Park" from squirrels_by_park.
del squirrels_by_park["Union Square Park"]

# Print squirrels_by_park.
for key, value in squirrels_by_park.items():
    print(f"{key}: {value}")

Marcus Garvey Park: ('Black', 'Cinnamon', 'Cleaning', None)
Highbridge Park: ('Gray', 'Cinnamon', 'Running, Eating', 'Runs From, watches us in short tree')
J. Hood Wright Park: ('Gray', 'White', 'Running', 'Indifferent')
Seward Park: ('Gray', 'Cinnamon', 'Eating', 'Indifferent')
Tompkins Square Park: ('Gray', 'Gray', 'Lounging', 'Approaches')


## Pythonically Using Dictionaries


The Python philosophy emphasises clarity, efficiency, and elegance—what is often described as being “*Pythonic*.” When working with dictionaries, adopting idiomatic Python patterns leads to code that is not only correct but also concise, readable, robust, and performant. This section explores **Pythonic** strategies for accessing, iterating, and conditionally using dictionaries, clarifying what “Pythonic” means in the context of dictionary operations.


### What Does “Pythonic” Mean?

“Pythonic” code embraces the core principles of Python:
- **Readability:** Code should be clear and easy to understand.
- **Explicitness:** Operations are direct, not hidden behind unnecessary abstractions.
- **Efficiency:** Prefer built-in, optimised methods over manual or verbose solutions.
- **Elegance:** Code is concise and expressive, avoiding redundancy.


### Iterating Over Dictionaries the Pythonic Way

#### `.items()` Method

To loop through all key-value pairs in a dictionary, use the `.items()` method, which yields pairs as tuples. This approach is canonical in Python and avoids the need to manually index into the dictionary.

```python
for key, value in my_dict.items():
    print(f"Key: {key}, Value: {value}")
```

**Why is this Pythonic?**
- It is direct, avoids unnecessary lookups, and leverages built-in dictionary capabilities.
- It ensures clarity by making both the key and value explicit in each iteration.


### Conditional Lookup and Key Presence

#### The `in` Operator

The idiomatic way to check for the existence of a key in a dictionary is with the `in` operator:

```python
if "some_key" in my_dict:
    print("Found:", my_dict["some_key"])
else:
    print("Key not found.")
```

- **Efficiency:** The `in` operator is highly optimised for dictionaries (average O(1) time).
- **Readability:** The intent is clear; the test is concise and unambiguous.

#### Why Not Always Use `.get()`?

While `.get()` is excellent for providing fallback (default) values, it’s not always the most Pythonic for control flow when the presence or absence of the key affects program logic:

```python
value = my_dict.get("some_key", "Default Value")
# Good for fetching with fallback, but less clear for branching logic.
```

**Guideline:**
- Use `in` for presence checks, branching, and control flow.
- Use `.get()` when you need a fallback/default value and do not care about control flow.


### More Idiomatic Patterns for Dictionaries

#### Dictionary Comprehensions

Pythonic code often employs **dictionary comprehensions** for constructing dictionaries from iterables in a single, expressive statement:

```python
squared = {x: x**2 for x in range(5)}
# {0: 0, 1: 1, 2: 4, 3: 9, 4: 16}
```

#### Unpacking Dictionaries

Python’s syntax supports unpacking dictionaries for merging or passing as arguments:

```python
# Merging
merged = {**dict1, **dict2}

# Passing as keyword arguments
def func(**kwargs):
    ...
func(**my_dict)
```

### Theoretical Rationale and Principles

- **Hash Table Efficiency:** Dictionaries use hash tables, so operations like `in`, `get()`, and iteration over `.items()` are extremely fast and scale well with data size.
- **Error Prevention:** Explicit presence checks (`in`) prevent `KeyError`s, making code safer and more predictable.
- **Expressive Logic:** Pythonic patterns communicate programmer intent, making maintenance and collaboration easier.


### Working with dictionaries more pythonically
So far, you've worked a lot with the keys of a dictionary to access data, but in Python, the preferred manner for iterating over items in a dictionary is with the `.items()` method.

This returns each key and value from the dictionary as a tuple, which you can unpack in a `for` loop. You'll now get practice doing this.

In [50]:
# Add the Union Square Park squirrel data to the squirrels_by_park dictionary
squirrels_by_park["Union Square Park"] = squirrels_union

# Print the data type and contents to verify the assignment
print(type(squirrels_by_park["Union Square Park"]))
print(squirrels_by_park["Union Square Park"])

<class 'list'>
[{'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Eating, Foraging', 'interactions_with_humans': None}, {'primary_fur_color': 'Gray', 'highlights_in_fur_color': 'Cinnamon', 'activities': 'Climbing, Eating', 'interactions_with_humans': None}, {'primary_fur_color': 'Cinnamon', 'highlights_in_fur_color': None, 'activities': 'Foraging', 'interactions_with_humans': 'Indifferent'}, {'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Running, Digging', 'interactions_with_humans': 'Runs From'}, {'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Digging', 'interactions_with_humans': 'Indifferent'}, {'primary_fur_color': 'Gray', 'highlights_in_fur_color': 'Black', 'activities': 'Climbing', 'interactions_with_humans': None}, {'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Eating, Foraging', 'interactions_with_humans': None}]


In [51]:
# Add the Madison Square Park squirrel data to the squirrels_by_park dictionary
squirrels_by_park["Madison Square Park"] = squirrels_madison

# Print the data type and contents to verify the assignment
print(type(squirrels_by_park["Madison Square Park"]))
print(squirrels_by_park["Madison Square Park"])

<class 'list'>
[{'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Foraging', 'interactions_with_humans': 'Indifferent'}, {'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Sitting', 'interactions_with_humans': 'Indifferent'}]


In [52]:
# Iterate over the first record in squirrels_by_park["Madison Square Park"], unpacking its items into field and value.
for field, value in squirrels_by_park["Madison Square Park"][0].items():
    # Print each field and value.
    print(field, value)

print("-" * 13)

# Repeat the process for the second record in squirrels_by_park["Union Square Park"].
# Iterate over the second squirrel entry in the Union Square Park list
for field, value in squirrels_by_park["Union Square Park"][1].items():
    # Print field and value
    print(field, value)

primary_fur_color Gray
highlights_in_fur_color None
activities Foraging
interactions_with_humans Indifferent
-------------
primary_fur_color Gray
highlights_in_fur_color Cinnamon
activities Climbing, Eating
interactions_with_humans None


### Checking dictionaries for data
You can check to see if a key exists in a dictionary by using the `in` expression.

For example, you can check to see if `'cookies'` is a key in the recipes dictionary by using `if 'cookies' in recipes:` this allows you to safely react to data being present in the dictionary.

We've loaded a `squirrels_by_park` dictionary with park names for the keys and a list of dictionaries of the squirrels.

In [53]:
# Check to see if Tompkins Square Park is in the squirrels_by_park dictionary, and print 'Found Tompkins Square Park' if it is present.
if "Tompkins Square Park" in squirrels_by_park:

    # Print 'Found Tompkins Square Park'
    print("Found Tompkins Square Park")

# Check to see if Central Park is in squirrels_by_park. Then, print 'Found Central Park' if found and 'Central Park missing' if not found.
if "Central Park" in squirrels_by_park:

    # Print 'Found Central Park' if found
    print("Found Central Park")

else:
    # Print 'Central Park missing' if not found
    print("Central Park missing")

Found Tompkins Square Park
Central Park missing


## Mixed Data Types in Dictionaries

Dictionaries in Python are not limited to flat mappings—they are fully capable of **nesting**, storing other dictionaries, lists, or even more complex objects as their values. This powerful feature enables the construction of intricate, multi-level data structures mirroring hierarchical, relational, or tree-like data found in real-world applications (such as JSON objects, configuration files, or database records). Proper handling and access of nested (mixed-type) dictionaries is a cornerstone of idiomatic Python programming, especially in data science, web development, and systems design.


### Key Principles of Mixed-Type and Nested Dictionaries

#### 1. Dictionaries as Values

- A dictionary value can itself be any object—another dictionary, list, tuple, or even a custom class.
- This allows for flexible representation of complex, repeating, or hierarchical data structures.

#### 2. Pythonic Representation

- Nested dictionaries often mirror the structure of real-world data, such as collections of records grouped by keys (IDs, categories, etc.).
- Key insight: Python’s syntax and methods support *arbitrary depth* and seamless, readable access to such data.


### Working with Nested Dictionaries: Core Syntax

#### Creating a Nested Dictionary

```python
nested_dict = {
    "outer_key1": {
        "inner_key1": "value1",
        "inner_key2": "value2"
    },
    "outer_key2": {
        "inner_key3": "value3"
    }
}
```

#### Accessing Top-Level Keys

The `.keys()` method returns a “view” of all top-level keys—these can be iterated, sorted, or converted to lists:

```python
top_keys = nested_dict.keys()
# dict_keys(['outer_key1', 'outer_key2'])
```

- This allows efficient inspection and enumeration of hierarchical data’s first level.

#### Accessing Nested Data

Accessing values within a nested dictionary involves chaining key lookups:

```python
value = nested_dict["outer_key1"]["inner_key1"]
# "value1"
```

- This syntax can be repeated as deeply as the structure requires.

#### Using `.get()` for Safe Access

The `.get()` method remains invaluable at each level to avoid `KeyError`s:

```python
outer = nested_dict.get("outer_key1", {})
value = outer.get("inner_key1", None)
```

- This approach is robust against missing keys at any depth.

### Dealing with Repeating Structures and Hierarchical Data

Nested dictionaries are the canonical way to manage:
- Collections of grouped records (e.g., items by category, users by region).
- Tree-like or multi-level data (e.g., configuration trees, file system-like structures).
- Structured data read from JSON, YAML, or hierarchical database records.


### Pythonic Patterns for Nested Dictionaries

#### Idiomatic Iteration

Use nested loops and `.items()` to traverse and process multi-level dictionaries:

```python
for outer_key, inner_dict in nested_dict.items():
    for inner_key, value in inner_dict.items():
        print(f"({outer_key}, {inner_key}): {value}")
```

#### Safe, Readable Access

Adopt chained `.get()` calls or try/except patterns to gracefully handle missing keys, especially in user-facing or production code.

#### Comprehensions

Nested dictionary comprehensions enable transformation, filtering, or flattening of complex structures:

```python
flattened = {
    (outer, inner): value
    for outer, inner_dict in nested_dict.items()
    for inner, value in inner_dict.items()
}
```



In [54]:
for key, value in squirrels_by_park.items():
    print(key, value)

Marcus Garvey Park ('Black', 'Cinnamon', 'Cleaning', None)
Highbridge Park ('Gray', 'Cinnamon', 'Running, Eating', 'Runs From, watches us in short tree')
J. Hood Wright Park ('Gray', 'White', 'Running', 'Indifferent')
Seward Park ('Gray', 'Cinnamon', 'Eating', 'Indifferent')
Tompkins Square Park ('Gray', 'Gray', 'Lounging', 'Approaches')
Union Square Park [{'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Eating, Foraging', 'interactions_with_humans': None}, {'primary_fur_color': 'Gray', 'highlights_in_fur_color': 'Cinnamon', 'activities': 'Climbing, Eating', 'interactions_with_humans': None}, {'primary_fur_color': 'Cinnamon', 'highlights_in_fur_color': None, 'activities': 'Foraging', 'interactions_with_humans': 'Indifferent'}, {'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Running, Digging', 'interactions_with_humans': 'Runs From'}, {'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Digging', 'intera

In [55]:
keys = [
    "primary_fur_color",
    "highlights_in_fur_color",
    "activities",
    "interactions_with_humans",
]

for park, entry in squirrels_by_park.items():

    # If the entry is a tuple, zip it to a dictionary
    if isinstance(entry, tuple):
        squirrels_by_park[park] = dict(zip(keys, entry))

    # If the entry is a list of dictionaries, use the first squirrel data
    elif isinstance(entry, list) and entry:
        squirrels_by_park[park] = {k: entry[0].get(k, None) for k in keys}

    # if entry is empty or unexpected format, create empty record
    else:
        squirrels_by_park[park] = {k: None for k in keys}


print(squirrels_by_park)

{'Marcus Garvey Park': {'primary_fur_color': 'Black', 'highlights_in_fur_color': 'Cinnamon', 'activities': 'Cleaning', 'interactions_with_humans': None}, 'Highbridge Park': {'primary_fur_color': 'Gray', 'highlights_in_fur_color': 'Cinnamon', 'activities': 'Running, Eating', 'interactions_with_humans': 'Runs From, watches us in short tree'}, 'J. Hood Wright Park': {'primary_fur_color': 'Gray', 'highlights_in_fur_color': 'White', 'activities': 'Running', 'interactions_with_humans': 'Indifferent'}, 'Seward Park': {'primary_fur_color': 'Gray', 'highlights_in_fur_color': 'Cinnamon', 'activities': 'Eating', 'interactions_with_humans': 'Indifferent'}, 'Tompkins Square Park': {'primary_fur_color': 'Gray', 'highlights_in_fur_color': 'Gray', 'activities': 'Lounging', 'interactions_with_humans': 'Approaches'}, 'Union Square Park': {'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Eating, Foraging', 'interactions_with_humans': None}, 'Madison Square Park': {'primary_fur

In [56]:
# Print the keys of the squirrels_by_park dictionary, NOTE: They are park_names.
print(squirrels_by_park.keys())

# Print the keys of the squirrels_by_park dictionary for the park_name Union Square Park.
print(squirrels_by_park["Union Square Park"])

# Loop over the squirrels_by_park dictionary.
for park_name in squirrels_by_park:
    # Inside the loop, safely print the park_name and the highlights_in_fur_color. Print 'N/A' if the highlightsinfur_color is not found or None.
    print(park_name, squirrels_by_park[park_name].get("highlights_in_fur_color", "N/A"))

dict_keys(['Marcus Garvey Park', 'Highbridge Park', 'J. Hood Wright Park', 'Seward Park', 'Tompkins Square Park', 'Union Square Park', 'Madison Square Park'])
{'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Eating, Foraging', 'interactions_with_humans': None}
Marcus Garvey Park Cinnamon
Highbridge Park Cinnamon
J. Hood Wright Park White
Seward Park Cinnamon
Tompkins Square Park Gray
Union Square Park None
Madison Square Park None


### Dealing with nested mixed types
Previously, we used the `in` expression so see if data is in a dictionary such as `if 'cookies' in recipes_dict`. However, what if we want to find data in a dictionary key that is a list of dictionaries? In that scenario, we can use a for loop to loop over the items in the nested list and operate on them. Additionally, we can leverage list comprehensions to effectively filter nested lists of dictionaries. For example: `[cookie for cookie in recipes["cookies"] if "chocolate chip" in cookie["name"]]` would return a list of cookies in recipes list that have chocolate chip in the name key of the cookie.

In [57]:
squirrels_tompkins = [
    {
        "primary_fur_color": "Gray",
        "highlights_in_fur_color": "Gray",
        "activities": "Foraging",
        "interactions_with_humans": "Approaches",
    },
    {
        "primary_fur_color": "Gray",
        "highlights_in_fur_color": "Gray",
        "activities": "Climbing (down tree)",
        "interactions_with_humans": "Indifferent",
    },
    {
        "primary_fur_color": "Gray",
        "highlights_in_fur_color": "Gray",
        "interactions_with_humans": "Indifferent",
    },
    {
        "primary_fur_color": "Gray",
        "highlights_in_fur_color": "Gray",
        "activities": "Foraging",
        "interactions_with_humans": "Indifferent",
    },
]

In [58]:
squirrels_union = [
    {
        "primary_fur_color": "Gray",
        "highlights_in_fur_color": None,
        "activities": "Eating, Foraging",
        "interactions_with_humans": None,
    },
    {
        "primary_fur_color": "Cinnamon",
        "highlights_in_fur_color": None,
        "activities": "Foraging",
        "interactions_with_humans": None,
    },
    {
        "primary_fur_color": "Gray",
        "highlights_in_fur_color": None,
        "activities": "Eating, Foraging",
        "interactions_with_humans": None,
    },
    {
        "primary_fur_color": "Gray",
        "highlights_in_fur_color": None,
        "activities": "Digging",
        "interactions_with_humans": "Indifferent",
    },
]

In [59]:
squirrels_by_park["Union Square Park"] = squirrels_union

In [60]:
squirrels_by_park["Tompkins Square Park"] = squirrels_tompkins
for squirrel in squirrels_by_park["Tompkins Square Park"]:
    print(squirrel)

{'primary_fur_color': 'Gray', 'highlights_in_fur_color': 'Gray', 'activities': 'Foraging', 'interactions_with_humans': 'Approaches'}
{'primary_fur_color': 'Gray', 'highlights_in_fur_color': 'Gray', 'activities': 'Climbing (down tree)', 'interactions_with_humans': 'Indifferent'}
{'primary_fur_color': 'Gray', 'highlights_in_fur_color': 'Gray', 'interactions_with_humans': 'Indifferent'}
{'primary_fur_color': 'Gray', 'highlights_in_fur_color': 'Gray', 'activities': 'Foraging', 'interactions_with_humans': 'Indifferent'}


In [61]:
# Use a for loop to iterate over the squirrels found in the Tompkins Square Park key of squirrels_by_park:
for squirrel in squirrels_by_park["Tompkins Square Park"]:
    
    # Safely print each activities of each squirrel.
    print(squirrel.get("activities"))

Foraging
Climbing (down tree)
None
Foraging


In [62]:
# Print the list of 'Cinnamon' primary_fur_color squirrels found in Union Square Park using a list comprehension.
print(
    [
        squirrel
        for squirrel in squirrels_by_park["Union Square Park"]
        if "Cinnamon" in squirrel["primary_fur_color"]
    ]
)

[{'primary_fur_color': 'Cinnamon', 'highlights_in_fur_color': None, 'activities': 'Foraging', 'interactions_with_humans': None}]


## Numeric Data Types in Python

Numeric data types form the backbone of scientific, statistical, and financial computations in Python. A deep understanding of **integers**, **floats**, and **decimals**—including their limitations, use cases, and behaviour—enables robust and reliable programming across domains from data science to engineering. This section presents a rigorous overview of these numeric types, their operations, and best practices for precision, formatting, and idiomatic usage.

### Integer (`int`)

#### Definition and Characteristics

- Represents **whole numbers** (positive, negative, or zero) with **arbitrary precision** (limited only by available memory, not by bit width).
- No distinction between "small" and "large" integers—Python's `int` type transparently expands to handle huge values.

**Example:**
```python
a = 123456789123456789
print(type(a))  # <class 'int'>
```

#### Use Cases

- Counting, indexing, discrete values.
- Applications requiring lossless manipulation of large numbers (cryptography, combinatorics).

### Float (`float`)

#### Definition and Characteristics

- Represents **floating-point numbers** (real numbers with a fractional component).
- Implements the IEEE 754 double-precision binary floating-point standard.
- Supports scientific notation for compact representation of very small or large numbers.

**Examples:**
```python
b = 1.2345678912345678e+17  # Scientific notation
c = 0.00001
print(b)  # 1.2345678912345678e+17
print(c)  # 1e-05
```

##### Approximation and Precision

- Floats are inherently **approximate** due to binary representation of decimal fractions.
- Not suitable for applications requiring exact decimal precision (e.g., financial calculations).


### Decimal (`decimal.Decimal`)

#### Definition and Characteristics

- The `Decimal` type (from the `decimal` module) provides **arbitrary-precision, exact decimal arithmetic**.
- Avoids common float pitfalls—precisely represents values like 0.1 or 1/3.
- Ideal for financial applications, currency, and any scenario where **exactness** is paramount.

**Example:**
```python
from decimal import Decimal
d = Decimal('123456789123456789')
print(d)  # 123456789123456789
```

#### Use Cases

- Currency calculations.
- Legal, scientific, or financial applications with regulatory precision requirements.


### Printing and Formatting Floating-Point Numbers

Python provides versatile tools for controlling float display:

#### Scientific vs. Fixed-Point Notation

```python
x = 0.00001
print(x)              # Scientific: 1e-05
print(f"{x:f}")       # Fixed-point: 0.000010

y = 0.0000001
print(f"{y:f}")       # 0.000000
print(f"{y:.7f}")     # 0.0000001
```

- Adjust the number of decimal places with precision specifiers (e.g., `.7f`).
- Use scientific notation for concise representation of very small or large numbers.

### Division in Python: Classic, Floor, and True Division

Python distinguishes between **true division** (`/`) and **floor division** (`//`):

#### True Division (`/`)

- Returns a floating-point result, even for two integers:
    ```python
    4 / 2   # 2.0
    ```

#### Floor Division (`//`)

- Returns the **largest integer less than or equal to the division result** (rounds down):
    ```python
    7 // 3  # 2
    ```

- Especially useful for discrete algorithms, partitioning, and index calculations.

### Rationale, Theoretical Context, and Best Practices

#### When to Use Each Type

- **int**: Use for all discrete, countable quantities; safe for very large numbers.
- **float**: Use for continuous data, scientific calculations, and where performance trumps exact precision.
- **Decimal**: Use when you require absolute precision (financial calculations, regulatory reporting, exact arithmetic).

#### Pythonic Principles

- **Be explicit**: Always choose the type matching the required precision and behaviour.
- **Format output**: Use formatted strings (f-strings) for human-readable, context-appropriate display.
- **Test for equality cautiously**: Never use `==` for floats or decimals unless you are certain of the underlying value—prefer tolerances or the `math.isclose()` function.
- **Embrace built-in modules**: Use `decimal` for precision and `fractions` for rational numbers as needed.


Mastery of Python’s numeric data types—**int**, **float**, and **Decimal**—is critical for writing robust, precise, and maintainable code in any scientific, engineering, or financial context. Python’s flexible, extensible numeric model supports applications ranging from quick calculations to exact, regulatory-grade computations. The most Pythonic approach is to select the simplest type that guarantees the correctness of your results, format output for clarity, and leverage Python’s rich standard library for advanced requirements.


## Booleans — The Logical Data Type in Python

**Booleans** are a foundational data type in Python, representing truth values for logical operations, decision-making, and control flow. While simple in concept, their nuanced behaviour and deep integration into the language's logic model make them essential for writing expressive, robust, and idiomatic Python code.

### The Boolean Data Type: True and False

In Python, the two Boolean values are represented by the built-in constants `True` and `False` (note the **capitalisation**; lowercase `true` and `false` are not valid and will result in a `NameError`). This contrasts with languages like JavaScript or C, where the case may differ.

```python
is_active = True
is_closed = False
```

#### Booleans in Control Flow

Boolean values are most commonly used to control the execution of code:

```python
ready_to_run = True
if ready_to_run:
    print("Proceeding with execution.")
```

### Truthy and Falsey: Implicit Boolean Contexts

Python adopts a flexible system where not only explicit `True` and `False` but also many other objects can be used in Boolean contexts:

#### Truthy Values

Values that evaluate to `True` in a Boolean context:
- Any non-empty string: `"Hello"`
- Any non-empty list, tuple, or dictionary: `[1]`, `{"key": "value"}`
- Any non-zero number: `1`, `-1`, `0.01`
- Custom objects unless otherwise specified with `__bool__` or `__len__`

#### Falsey Values

Values that evaluate to `False`:
- `None`
- `False`
- Zero of any numeric type: `0`, `0.0`
- Empty sequences and collections: `''`, `[]`, `{}`, `set()`
- Objects that explicitly define a `__bool__` method returning `False`

**Example:**
```python
if "abc":
    print("Non-empty strings are truthy.")
if []:
    print("Will not execute: empty lists are falsey.")
```

#### Practical Implications

This system allows concise and expressive code for checking conditions:

```python
value = []
if value:
    print("Has content.")
else:
    print("Is empty or falsey.")
```

### Boolean Operators and Expressions

Python provides several **comparison and logical operators** that return Boolean values, including:

- `==` : equal to
- `!=` : not equal to
- `<`  : less than
- `<=` : less than or equal to
- `>`  : greater than
- `>=` : greater than or equal to

**Example:**
```python
result = (score >= 50)
```

Logical operators combine or negate Boolean expressions:

- `and`: True if both operands are true
- `or` : True if at least one operand is true
- `not`: Logical negation

**Example:**
```python
if (user_logged_in and not is_banned):
    print("Access granted.")
```

### Floating Point Caution: Equality Pitfalls

**Do not use `==` for direct comparison of floating point numbers.**

Due to how floating point arithmetic is implemented, direct equality checks can fail even when the values are mathematically equivalent:

```python
x = 0.1 + 1.1
print(x == 1.2)  # False!
print(x)         # 1.2000000000000002
```

**Pythonic solution:** Use `math.isclose()` for tolerant floating point comparisons:

```python
import math
if math.isclose(x, 1.2):
    print("Values are effectively equal.")
```

Booleans in Python are much more than just `True` or `False`. They are the bedrock of logic, flow control, and idiomatic Python. Understanding their behaviour—including truthiness, falseyness, and the subtle pitfalls of floating-point comparisons—equips you to write clear, reliable, and efficient code. Pythonic usage leverages the language’s rich semantics for concise, elegant logic, making your programs more readable and robust.


In [63]:
# Create an empty list called my_list
my_list = []

# Print the truthiness of my_list.
print(bool(my_list))

# Append the string 'cookies' to my_list
my_list.append("cookies")

# Check the truthiness of my_list
print(bool(my_list))

False
True


In [64]:
penguins = [
    {"species": "Adlie", "flipper_length": 190.0, "body_mass": 3050.0, "sex": "FEMALE"},
    {"species": "Adlie", "flipper_length": 184.0, "body_mass": 3325.0, "sex": "FEMALE"},
    {"species": "Gentoo", "flipper_length": 209.0, "body_mass": 4800.0, "sex": "FEMALE",},
    {"species": "Adlie", "flipper_length": 193.0, "body_mass": 4200.0, "sex": "MALE"},
    {"species": "Gentoo", "flipper_length": 210.0, "body_mass": 4400.0, "sex": "FEMALE",},
    {"species": "Gentoo", "flipper_length": 213.0, "body_mass": 4650.0, "sex": "FEMALE",},
    {"species": "Chinstrap", "flipper_length": 193.0, "body_mass": 3600.0, "sex": "FEMALE",},
    {"species": "Adlie", "flipper_length": 193.0, "body_mass": 3800.0, "sex": "MALE"},
    {"species": "Chinstrap", "flipper_length": 199.0, "body_mass": 3900.0, "sex": "FEMALE",},
    {"species": "Chinstrap", "flipper_length": 195.0, "body_mass": 3650.0, "sex": "FEMALE",},
    {"species": "Adlie", "flipper_length": 185.0, "body_mass": 3700.0, "sex": "FEMALE"},
    {"species": "Gentoo", "flipper_length": 208.0, "body_mass": 4575.0, "sex": "FEMALE",},
    {"species": "Adlie", "flipper_length": 196.0, "body_mass": 4350.0, "sex": "MALE"},
    {"species": "Adlie", "flipper_length": 191.0, "body_mass": 3700.0, "sex": "FEMALE"},
    {"species": "Chinstrap", "flipper_length": 195.0, "body_mass": 3300.0, "sex": "FEMALE",},
    {"species": "Adlie", "flipper_length": 195.0, "body_mass": 3450.0, "sex": "FEMALE"},
    {"species": "Gentoo","flipper_length": 217.0, "body_mass": 4875.0, "sex": ".",},
    {"species": "Gentoo", "flipper_length": 212.0, "body_mass": 4875.0, "sex": "FEMALE",},
    {"species": "Adlie", "flipper_length": 205.0, "body_mass": 4300.0, "sex": "MALE"},
    {"species": "Gentoo", "flipper_length": 220.0, "body_mass": 6000.0, "sex": "MALE"},
]

In [65]:
# Use a for loop to iterate over the penguins list.
for penguin in penguins:
    
    # Check the penguin entry for a body_mass of more than 3300 grams.
    if penguin["body_mass"] > 3300:
        
        # Print the species and sex of the penguin if true.
        print(f"{penguin['species']} - {penguin['sex']}")

Adlie - FEMALE
Gentoo - FEMALE
Adlie - MALE
Gentoo - FEMALE
Gentoo - FEMALE
Chinstrap - FEMALE
Adlie - MALE
Chinstrap - FEMALE
Chinstrap - FEMALE
Adlie - FEMALE
Gentoo - FEMALE
Adlie - MALE
Adlie - FEMALE
Adlie - FEMALE
Gentoo - .
Gentoo - FEMALE
Adlie - MALE
Gentoo - MALE


In [66]:
penguin_305_details = {
    "species": "Adlie",
    "flipper_length": 190.0,
    "body_mass": 3050.0,
    "tracked": True,
    "sex": "FEMALE",
}

In [67]:
# Check the truthiness of penguin_305_details sex key.
if penguin_305_details["sex"]:
    # If true, check if sex is True and store it as sex_is_true.
    sex_is_true = penguin_305_details["sex"] is True
    
    # Print the sex key's value and sex_is_true
    print(f"{penguin_305_details['sex']}: {sex_is_true}")

FEMALE: False


In [68]:
# Check the truthiness of penguin_305_details tracked key.
if penguin_305_details["tracked"]:

    # If true, check if tracked is True and store it as tracked_is_true.
    tracked_is_true = penguin_305_details["tracked"] is True

    # Print the tracked key and tracked_is_true
    print(f"{penguin_305_details['tracked']}: {tracked_is_true}")

True: True


## Sets in Python: Unordered Collections and Optimised Logic Operations

**Sets** are a fundamental data structure in Python, offering unique, unordered, and mutable collections optimised for rapid membership testing and powerful mathematical operations. Python's set type is a direct, practical implementation of set theory, making it a critical tool for any programmer needing to handle distinct elements, eliminate duplicates, or perform efficient logic-based data analysis.


### Core Properties of Sets

- **Unique Elements:** Sets automatically remove duplicates—every element is unique.
- **Unordered:** The order of elements is not preserved or meaningful.
- **Mutable:** Sets can be modified after creation by adding or removing elements.
- **Efficient Membership Testing:** Sets use hash tables under the hood, enabling extremely fast `in` checks.


### Creating Sets

Sets can be created from any iterable (typically lists or tuples):

```python
my_list = ['A', 'B', 'A', 'C']
my_set = set(my_list)
# my_set: {'A', 'B', 'C'}
```

- Direct creation of an empty set requires `set()`, not `{}` (the latter creates an empty dictionary).

### Modifying Sets

#### Adding Elements

- Use `.add()` to insert a single element. If the element is already present, the set remains unchanged.

```python
unique_values = set()
unique_values.add('X')
unique_values.add('Y')
# {'X', 'Y'}
unique_values.add('X')
# Still {'X', 'Y'}
```

#### Updating with Multiple Elements

- Use `.update()` to add elements from another iterable or set. All duplicates are ignored.

```python
unique_values.update(['Z', 'Y'])
# {'X', 'Y', 'Z'}
```

### Removing Elements

#### Safe Removal with `.discard()`

- `.discard()` removes an element if present; does nothing if absent (no error).

```python
unique_values.discard('Y')
# {'X', 'Z'}
unique_values.discard('NotThere')  # No error
```

#### Arbitrary Removal with `.pop()`

- `.pop()` removes and returns an arbitrary element (not the “first” or “last,” as sets are unordered).
- Raises `KeyError` if the set is empty.

```python
removed = unique_values.pop()
print(removed)
```

### Set Operations: Mathematical Logic at Scale

Sets implement all classical set-theoretic operations, making them invaluable for logic, filtering, and analytics:

#### Union

- Returns all elements present in either set (logical OR).

```python
A = {1, 2, 3}
B = {3, 4}
A.union(B)  # {1, 2, 3, 4}
A | B       # Alternate syntax
```

#### Intersection

- Returns elements present in **both** sets (logical AND).

```python
A.intersection(B)  # {3}
A & B
```

#### Difference

- Returns elements in the first set that are **not** in the second (A - B).

```python
A.difference(B)  # {1, 2}
A - B
```

#### Symmetric Difference

- Returns elements in either set, but **not** both (XOR).

```python
A.symmetric_difference(B)  # {1, 2, 4}
A ^ B
```

Python’s set type brings the rigour and power of set theory to everyday programming—delivering robust, efficient, and expressive solutions for handling unordered, unique collections. By using sets for deduplication, membership testing, and logic operations, and following idiomatic patterns, programmers gain both clarity and computational advantage in data-driven applications. Mastering sets is a prerequisite for any Pythonista working with logic, analytics, or large-scale data workflows.


In [69]:
male_penguin_species = {"Adlie", "Gentoo"}

In [70]:
# Use a list comprehension to iterate over each penguin in penguins saved as female_species_list:

# If the the sex of the penguin is 'FEMALE', return the species value.
female_species_list = [penguin["species"] for penguin in penguins if penguin["sex"] == "FEMALE"]

# Create a set using the female_species_list as female_penguin_species.
female_penguin_species = set(female_species_list)

# Find the difference between female_penguin_species and male_penguin_species. Store the result as differences.
differences = female_penguin_species.difference(male_penguin_species)

# Print the differences
print(differences)

{'Chinstrap'}


### Finding all the data and the overlapping data between sets
Sets have several methods to combine, compare, and study them all based on mathematical set theory. The `.union()` method returns a set of all the elements found in the set you used the method on plus any sets passed as arguments to the method. You can also look for overlapping data in sets by using the `.intersection()` method on a set and passing another set as an argument. It will return an empty set if nothing matches.

Your job in this exercise is to find the union and intersection in the species from male and female penguins. For this purpose, two sets have been pre-loaded into your workspace: `female_penguin_species` and `male_penguin_species`.

In [71]:
# Combine all the species in female_penguin_species and male_penguin_species by computing their union. Store the result as all_species.
all_species = female_penguin_species.union(male_penguin_species)

# Print the count of names in all_species
print(len(all_species))

3


In [72]:
# Find all the species that occur in both female_penguin_species and male_penguin_species by computing their intersection. Store the result as overlapping_species.Find all the species that occur in both female_penguin_species and male_penguin_species by computing their intersection. Store the result as overlapping_species.
overlapping_species = female_penguin_species.intersection(male_penguin_species)

# Print the count of species in overlapping_species
print(len(overlapping_species))

2


## Dictionaries of Unknown Structure: The Power of `defaultdict`

When working with real-world data, it is often impossible to know all the keys that will be present in a dictionary ahead of time, or you may want to aggregate multiple values under the same key without tedious existence checks. Python's `defaultdict` (from the `collections` module) provides an elegant, efficient, and Pythonic solution to these challenges, eliminating boilerplate and making code both cleaner and safer.

### The Problem: Managing Dictionaries with Unknown or Dynamic Keys

#### The Classic Pattern

A common pattern when aggregating data by key is to check if the key already exists in the dictionary and, if not, initialise it:

```python
by_category = {}
for key, value in iterable:
    if key not in by_category:
        by_category[key] = []
    by_category[key].append(value)
```

- This approach is verbose and error-prone.
- It clutters logic with checks and initialisations, reducing readability and introducing opportunities for subtle bugs.

### The Pythonic Solution: `collections.defaultdict`

#### What is `defaultdict`?

`defaultdict` is a subclass of `dict` that overrides one behaviour: when accessing a missing key, it **automatically creates a default value** based on a factory function (such as `list`, `int`, `set`, or even a custom function). This means you never have to check if a key exists before updating its value.

#### Basic Syntax

```python
from collections import defaultdict

# Example: Aggregating lists by key
grouped = defaultdict(list)
for key, value in iterable:
    grouped[key].append(value)
```

- If `grouped[key]` does not exist, `defaultdict` automatically creates `grouped[key] = []` (or whatever you specify as the default factory) before appending.

#### Supported Factories

- `list`: for aggregating multiple values
- `int`: for counting or summing values
- `set`: for collecting unique items
- Any callable that returns a default value

### Examples

#### 1. Aggregating Items by Key

```python
from collections import defaultdict

records = [('A', 1), ('B', 2), ('A', 3)]
by_key = defaultdict(list)
for key, value in records:
    by_key[key].append(value)
# by_key: {'A': [1, 3], 'B': [2]}
```

#### 2. Counting Occurrences

```python
from collections import defaultdict

counts = defaultdict(int)
for category in ["apple", "banana", "apple"]:
    counts[category] += 1
# counts: {'apple': 2, 'banana': 1}
```

#### 3. Flexible Data Aggregation

You can use any factory function to initialise complex structures, such as a `dict` of sets:

```python
nested = defaultdict(set)
nested["fruits"].add("apple")
nested["fruits"].add("banana")
# nested: {'fruits': {'apple', 'banana'}}
```

### Pythonic Usage

- Always use `defaultdict` when you are aggregating values under dynamic or unknown keys.
- Select the default factory (`list`, `set`, `int`, etc.) that matches your aggregation logic.
- For nested or hierarchical data, use nested `defaultdict` structures (e.g., `defaultdict(lambda: defaultdict(list))`).
- Avoid mixing `defaultdict` with manual key initialisation—let the data structure do the work.
- For interoperability, you can convert a `defaultdict` back to a regular `dict` if needed via `dict(my_defaultdict)`.


## Limitations

- `defaultdict` only invokes the default factory on **missing** keys—pre-existing keys are never overwritten.
- Not always suitable if you require explicit handling or validation before initialising new keys (e.g., for error checking or logging).
- When serialising or saving (e.g., as JSON), `defaultdict` is not directly serialisable—convert to a standard dictionary first.


### Using Counter on lists
`Counter` is a powerful tool for counting, validating, and learning more about the elements within a dataset that is found in the `collections` module. You pass an iterable (list, set, tuple) or a dictionary to the `Counter`. You can also use the `Counter` object similarly to a dictionary with key/value assignment, for example `counter[key] = value`.

A common usage for `Counter` is checking data for consistency prior to using it, so let's do just that.

In [73]:
# Import the Counter object from collections.
from collections import Counter

In [74]:
# Create a Counter of the penguins list called penguins_sex_counts; use a list comprehension to return the Sex of each penguin to the Counter.
penguins_sex_counter = Counter([penguin["sex"] for penguin in penguins])

In [75]:
# Print the penguins_sex_counts
print(penguins_sex_counter)

Counter({'FEMALE': 14, 'MALE': 5, '.': 1})


In [76]:
# Create a Counter of the penguins list called penguins_species_counts; use a list comprehension to return the Species of each penguin to the Counter.
penguins_species_counts = Counter([penguin["species"] for penguin in penguins])

# Print the three most common species counts.
print(penguins_species_counts.most_common(3))

[('Adlie', 9), ('Gentoo', 7), ('Chinstrap', 4)]


### Creating dictionaries of an unknown structure
Occasionally, you'll need a structure to hold nested data, and you may not be certain that the keys will all actually exist. This can be an issue if you're trying to append items to a list for that key. You might remember the NYC data that we explored in the video. In order to solve the problem with a regular dictionary, you'll need to test that the key exists in the dictionary, and if not, add it with an empty list.

You'll be working with a list of entries that contains species, flipper length, body mass, and sex of the female penguins in our study. You're going to solve this same type of problem with a much easier solution in the next exercise.

In [77]:
weight_log = [
    ("Chinstrap", "FEMALE", 3800.0),
    ("Adlie", "FEMALE", 3450.0),
    ("Gentoo", "FEMALE", 4300.0),
    ("Adlie", "FEMALE", 3550.0),
    ("Adlie", "FEMALE", 3175.0),
]

In [78]:
# Create an empty dictionary called female_penguin_weights.
female_penguin_weights = {}

# Iterate over weight_log, unpacking it into the variables species, sex, and body_mass.
for species, sex, body_mass in weight_log:

    # Check to see if species is already in the dictionary
    if species not in female_penguin_weights:

        # Create an empty list for any missing species
        female_penguin_weights[species] = []

    # Append the sex and body_mass as a tuple to the species keys list
    female_penguin_weights[species].append((sex, body_mass))

# Print the weights for 'Adlie'
print(female_penguin_weights["Adlie"])

[('FEMALE', 3450.0), ('FEMALE', 3550.0), ('FEMALE', 3175.0)]


### Safely appending to a key's value list
Often when working with dictionaries, you will need to initialize a data type before you can use it. A prime example of this is a list, which has to be initialized on each key before you can append to that list.

A `defaultdict` allows you to define what each uninitialized key will contain. When establishing a `defaultdict`, you pass it the type you want it to be, such as a `list`, `tuple`, `set`, `int`, `string`, `dictionary` or any other valid type object.

You'll be working with the same weight log as last exercise, but with the male penguins in our study.


Iterate over the list weight_log, unpacking it into the variables species, sex, and body_mass, as you did in the previous exercise. Use species as the key of the male_penguin_weights dictionary and append body_mass to its value.
Print the first 2 items of the male_penguin_weights dictionary. You can use the .items() method for this. Remember to make it a list.

In [79]:
weight_log = [
    ("Gentoo", "MALE", 5500.0),
    ("Chinstrap", "MALE", 4300.0),
    ("Adlie", "MALE", 3800.0),
    ("Gentoo", "MALE", 5800.0),
    ("Chinstrap", "MALE", 4100.0),
    ("Adlie", "MALE", 3975.0),
    ("Gentoo", "MALE", 5400.0),
    ("Chinstrap", "MALE", 4800.0),
    ("Chinstrap", "MALE", 3950.0),
    ("Gentoo", "MALE", 5250.0),
    ("Gentoo", "MALE", 4925.0),
    ("Adlie", "MALE", 3950.0),
    ("Chinstrap", "MALE", 3800.0),
    ("Chinstrap", "MALE", 4050.0),
    ("Adlie", "MALE", 3650.0),
]

In [80]:
# Import defaultdict from collections.
from collections import defaultdict

# Create a defaultdict with a default type of list called male_penguin_weights.
male_penguin_weights = defaultdict(list)

# Iterate over the list weight_log, unpacking it into the variables species, sex, and body_mass, as you did in the previous exercise. Use species as the key of the male_penguin_weights dictionary and append body_mass to its value.
for species, sex, body_mass in weight_log:
    # Use the species as the key, and append the body_mass to it
    male_penguin_weights[species].append(body_mass)

# Print the first 2 items of the male_penguin_weights dictionary
print(list(male_penguin_weights.items())[:2])

[('Gentoo', [5500.0, 5800.0, 5400.0, 5250.0, 4925.0]), ('Chinstrap', [4300.0, 4100.0, 4800.0, 3950.0, 3800.0, 4050.0])]


## `namedtuple`: Readable, Immutable, and Pythonic Structured Data


A **`namedtuple`** is an advanced tuple type in Python, provided by the `collections` module, that allows you to assign names to the fields of a tuple. This feature combines the efficiency and immutability of standard tuples with the readability and clarity of named fields, making it a powerful tool for writing expressive, self-documenting, and error-resistant code. `namedtuple` is often used as a lightweight alternative to custom classes or even single rows in a Pandas DataFrame, especially when you need immutable, fixed-structure records with named access.


### What is a `namedtuple`?

- A subclass of the built-in tuple.
- Each position in the tuple has a **field name** for attribute-style access.
- The structure is **fixed**: all instances share the same fields and order.
- Remains **immutable**: fields cannot be changed after instantiation.
- Fields can be accessed either by name (`record.field_name`) or by index (`record[index]`).

### Why Use `namedtuple`?

- **Readability:** Attribute access is clearer and less error-prone than positional index access.
- **Self-Documentation:** The field names convey the meaning of each value.
- **Performance:** Nearly as fast and memory-efficient as regular tuples, much lighter than dictionaries or custom classes.
- **Immutability:** Ensures data consistency and safety, crucial in concurrent or functional programming paradigms.
- **Pythonic:** Leverages Python’s object model for expressive, clear, and idiomatic code.

### Creating a `namedtuple`

Import and define a new `namedtuple` type by specifying a name and a sequence of field names:

```python
from collections import namedtuple

# Define a new namedtuple type with three fields
Record = namedtuple('Record', ['field1', 'field2', 'field3'])
```

- Field names can be any valid Python identifier.
- Fields can be passed as a list, tuple, or a whitespace/comma-separated string.

### Instantiating and Using `namedtuple` Objects

Create an instance just like a tuple, but with named arguments for clarity:

```python
record = Record(field1='A', field2='B', field3='C')

# Access by attribute name
print(record.field1)  # 'A'

# Access by index
print(record[1])      # 'B'
```

#### Iterating Over `namedtuple` Objects

You can unpack, iterate, or convert to a dictionary for flexible use:

```python
# Unpack fields
a, b, c = record

# Convert to dict for serialization or further manipulation
record_dict = record._asdict()
```

#### Example: Collecting Structured Records

```python
Record = namedtuple('Record', ['name', 'location', 'category'])
records = []
for item in iterable_of_tuples:
    details = Record(item[0], item[1], item[2])
    records.append(details)

# Attribute access is natural and readable
for rec in records:
    print(rec.name, rec.location, rec.category)
```

### Rationale 

- **Immutability** supports safer code in concurrent and functional contexts.
- **Fixed structure** makes data validation, serialisation, and documentation easier.
- **Expressiveness** of named fields minimises mistakes and improves code review and maintainability.

#### Comparison to Alternatives

- **vs. Tuples:** More readable, less error-prone (no magic index numbers).
- **vs. Dictionaries:** More memory-efficient and ensures field order/consistency; attribute access is faster.
- **vs. Classes:** Faster and lighter, but not extensible with methods or custom behaviour.


### Pythonic Patterns 

- Use `namedtuple` for simple, immutable data structures, configuration records, or as return values for functions that return multiple related values.
- For mutable records, prefer `dataclasses.dataclass` (Python 3.7+).
- Use `_replace()` to create a new instance with some values changed.
- Use `_fields` to inspect field names programmatically.
- Use `_asdict()` for quick conversion to a dictionary.



Python’s `namedtuple` is a best-of-both-worlds solution for lightweight, immutable, readable data structures. It is the most Pythonic tool for situations where you want tuple efficiency but with named access and documentation. Mastery of `namedtuple` enables you to write robust, clear, and high-performance code for a wide range of structured data scenarios.


### Creating namedtuples for storing data
Often times when working with data, you will use a dictionary just so you can use key names to make reading the code and accessing the data easier to understand. Python has another container called a `namedtuple` that is a tuple, but has names for each position of the tuple. You create one by passing a name for the tuple type and a list of field names.

For example, `Cookie = namedtuple("Cookie", ['name', 'quantity'])` will create a container, and you can create new ones of the type using `Cookie('chocolate chip', 1)` where you can access the name using the `name` attribute, and then get the quantity using the `quantity` attribute.

In this exercise, you're going to restructure the penguin weight log data you've been working with into `namedtuples` for more descriptive code.

In [81]:
weight_log = [
    ("Gentoo", "MALE", 5500.0),
    ("Chinstrap", "MALE", 4300.0),
    ("Adlie", "MALE", 3800.0),
    ("Gentoo", "MALE", 5800.0),
    ("Chinstrap", "MALE", 4100.0),
    ("Adlie", "MALE", 3975.0),
    ("Gentoo", "MALE", 5400.0),
    ("Chinstrap", "MALE", 4800.0),
    ("Chinstrap", "FEMALE", 3800.0),
    ("Adlie", "FEMALE", 3450.0),
    ("Chinstrap", "MALE", 3950.0),
    ("Gentoo", "MALE", 5250.0),
    ("Gentoo", "FEMALE", 4300.0),
    ("Gentoo", "MALE", 4925.0),
    ("Adlie", "FEMALE", 3550.0),
    ("Adlie", "MALE", 3950.0),
    ("Chinstrap", "MALE", 3800.0),
    ("Chinstrap", "MALE", 4050.0),
    ("Adlie", "MALE", 3650.0),
    ("Adlie", "FEMALE", 3175.0),
]

In [83]:
# Import namedtuple from collections.
from collections import namedtuple

# Create a namedtuple called SpeciesDetails with a type name of SpeciesDetails and fields of 'species', 'sex', and 'body_mass'.
SpeciesDetails = namedtuple("SpeciesDetails",["species", "sex", "body_mass"])

# Create a list called labeled_entries.
labeled_entries = []

# Iterate over the weight_log list, unpacking it into species, sex, and body_mass, and create a new SpeciesDetails namedtuple instance for each entry and append it to labeled_entries.
for species, sex, body_mass in weight_log:
    details = SpeciesDetails(species, sex, body_mass)

    # Append a new SpeciesDetails namedtuple instance for each entry to labeled_entries
    labeled_entries.append(details)


print(labeled_entries[5])

SpeciesDetails(species='Adlie', sex='MALE', body_mass=3975.0)


### Leveraging attributes on namedtuples
Once you have a namedtuple, you can write more expressive code that is easier to understand. Remember, you can access the elements in the tuple by their name as an attribute. For example, you can access the species of the namedtuples in the previous exercise using the `.species` attribute.

Here, you'll use the tuples you made in the previous exercise to see how this works.

In [85]:
# Iterate over the first twenty entryss in the labeled_entries list:
for entry in labeled_entries[:20]:
    
    # If it is a Chinstrap species:
    if entry.species == "Chinstrap":
        
        # Print the entry's sex and body_mass separated by a :.
        print(f"{entry.sex}:{entry.body_mass}")

MALE:4300.0
MALE:4100.0
MALE:4800.0
FEMALE:3800.0
MALE:3950.0
MALE:3800.0
MALE:4050.0
