# Native Data Structures

## Introduction

We have seen how to work with primitive data types such as numbers and primitive operations like arithmetic. We learned to build compound functions through composition and control, and to create functional abstractions by assigning names to processes. Moreover, we observed that higher-order functions extend the expressive power of a language by allowing us to manipulate and reason about general methods of computation—capturing much of the essence of programming.

We now turn our attention to data itself. The techniques introduced here will enable us to represent and process information across many domains. With the explosive growth of data, a vast volume of structured information is now available to both businesses and individuals, enabling computation to address a wide spectrum of problems. Mastering the use of built-in and user-defined data types is therefore essential to building effective data processing applications.

## Range

A range is a sequence of consecutive integers.

In [None]:
a_list = [1, 2, 3]
type(a_list)

list

In [2]:
a_range = range(10)
type(a_range)

range

In [8]:
a_range = range(0, 10, 2)

In [9]:
print(a_range)

range(0, 10, 2)


## List

In [15]:
type([1, 2, 3])

list

In [18]:
list("alice2123")

['a', 'l', 'i', 'c', 'e', '2', '1', '2', '3']

In [19]:
list(range(10))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [1]:
[1, 3, 4] + [5, 7, 9]

[1, 3, 4, 5, 7, 9]

In [4]:
a = [1, 3, 4]
b = [5, 7, 9]
a + b

[1, 3, 4, 5, 7, 9]

In [None]:
a = [1, 3, 4]
b = [5, 7, 9]


In [7]:
a = list(range(10))
a[2:6]

[2, 3, 4, 5]

In [9]:
a

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [11]:
len(a)

10

In [10]:
a_str = "hello"
len(a_str)

5

In [13]:
a = [1, 3, 4]
b = ["a", "b", "c"]

In [37]:
a = [1, 2, 3]
b = ["a", "b", "c"]

In [47]:
a = [1, 2, 3, 4, 5]

In [48]:
a.pop(2)

3

In [49]:
a

[1, 2, 4, 5]

In [50]:
a.extend(range(6, 10))

A value can be tested for membership in a sequence. Python has two operators in and not in that evaluate to True or False depending on whether an element appears in a sequence.

In [57]:
digits = [2, 3, 4]
2 in digits

True

In [None]:
5 not in digits

True

In [62]:
def find_common_element(array_left, array_right):
    result = []
    for a in array_left:
        if a in array_right:
            result.append(a)
    return result


find_common_element([1, 2, 3], [2, 3, 4, 5])

[2, 3]

List comprehensions are a concise Python syntax for creating lists by transforming and optionally filtering items from an iterable in a single expression.

Basic form : `[<expression> for <item> in <iterable>]`

With filtering: `[<expression> for <item> in <iterable> if <condition>]`

With conditional: `[expr_if_true if <condition> else <expr_if_false> for <item> in <iterable>]`

In [None]:
def some_operation(array):
    result = []
    for a in array:
        result.append(a**2)
    return result


some_operation([1, 2, 3])

[1, 4, 9]

Squares:

In [67]:
[n**2 for n in range(1, 6)]

[1, 4, 9, 16, 25]

Even numbers:

In [72]:
[n for n in range(10) if n % 2 == 0]

[0, 2, 4, 6, 8]

Normalize strings:

In [75]:
names = [" Alice ", " BoB "]
[s.strip().lower() for s in names]

['alice', 'bob']

Cartesian pairs:

In [77]:
[(i, j) for i in range(3) for j in range(2)]

[(0, 0), (0, 1), (1, 0), (1, 1), (2, 0), (2, 1)]

**Exercise**

Implement `promoted`, which takes a sequence `s` and a one-argument function `f`. It returns a list with the same elements as `s`, but with all elements `e` for which `f(e)` is a true value placed first. Among those placed first and those placed after, the order stays the same.

In [None]:
# Equal to
# def odd(x):
#     return x % 2 != 0
odd = lambda x: x % 2 != 0


def promoted(s, f):
    """
    Return a list with the same elements as s, but with all
    elements e for which f(e) is a true value placed first.

    >>> promoted(range(10), odd) # odds in front
    [1, 3, 5, 7, 9, 0, 2, 4, 6, 8]
    """
    return "____"

In [None]:
def promoted(s, f):
    result = [e for e in s if f(e)]
    falses = [e for e in s if not f(e)]
    result.extend(falses)
    return result


promoted(range(10), lambda x: x % 3 == 0)

Implement `double_eights`, which takes a list `s` and returns whether two consecutive items are both 8.

In [None]:
def double_eights(s):
    """
    Return whether two consecutive items of list s are 8.

    >>> double_eights([1, 2, 8, 8])
    True
    >>> double_eights([8, 8, 0])
    True
    >>> double_eights([5, 3, 8, 8, 3, 5])
    True
    >>> double_eights([2, 8, 4, 6, 8, 2])
    False
    """

    for i in "___":
        if "___":
            return True
    return False

In [None]:
def double_eights(s):
    """
    Return whether two consecutive items of list s are 8.

    >>> double_eights([1, 2, 8, 8])
    True
    >>> double_eights([8, 8, 0])
    True
    >>> double_eights([5, 3, 8, 8, 3, 5])
    True
    >>> double_eights([2, 8, 4, 6, 8, 2])
    False
    """

    if "___":
        return True
    elif len(s) < 2:
        return False
    else:
        return "___"

Implement the function `squares`, which takes in a list of positive integers. It returns a list that contains the square roots of the elements of the original list that are perfect squares. Use a list comprehension.

Hint: To find if `x` is a perfect square, you can check if `sqrt(x)` equals `round(sqrt(x))`.

In [None]:
from math import sqrt


def squares(s):
    """Returns a new list containing square roots of the elements of the
    original list that are perfect squares.

    >>> seq = [8, 49, 8, 9, 2, 1, 100, 102]
    >>> squares(seq)
    [7, 3, 1, 10]
    >>> seq = [500, 30]
    >>> squares(seq)
    []
    """
    return ["___" for n in s if "___"]

### String

In [110]:
import datetime

time_now = datetime.datetime.now()
time_now_str = f"Current time is: {time_now}."
print(time_now_str)

Current time is: 2025-10-05 14:24:07.787561.


In [112]:
city = "Frankfurt"

In [113]:
len(city)

9

In [114]:
city[:3]

'Fra'

In [115]:
country = "Germany"

In [116]:
city + ", " + country

'Frankfurt, Germany'

In [None]:
"Shabu " * 2  # Equal to "Shabu " + "Shabu "

'Shabu Shabu '

In [None]:
["Shabu"] * 2  # Equal to ["Shabu"] + ["Shabu"]

['Shabu', 'Shabu']

Use `in` with strings to test for substring containment, not whole-word membership.

In [121]:
"Where" in "Where's Waldo?"

True

A string can be created from any object in Python by calling the str constructor function with an object value as its argument.

In [124]:
digits = [2, 3, 4]
digits_str = str(digits)
digits_str[0]

'['

In [125]:
digits = [2, 3, 4]
str(2) + " is an element of " + str(digits)

'2 is an element of [2, 3, 4]'

In [126]:
"Alice".lower()

'alice'

In [127]:
"Alice".upper()

'ALICE'

In [None]:
"  Alice    ".strip().lower()

'alice'

In [138]:
name = "  AlICe    "
name_cleaned = name.strip().lower()

In [140]:
name

'  AlICe    '

In [139]:
name_cleaned

'alice'

In [None]:
name[3] = "a"

A string can be broken down into a list. And vice versa.

In [None]:
sentence = "Alice is happy."
sentence_list = sentence.split(" ")

['Alice', 'is', 'happy.']


In [147]:
words = ["A", "Wonderful", "Day"]
" ".join(words)

'A Wonderful Day'

In [148]:
", ".join(words)

'A, Wonderful, Day'

**Exercise**

When parking vehicles in a row:

* A **motorcycle** takes up **1 parking spot**,
* A **car** takes up **2 adjacent parking spots**.

We represent a row of `n` adjacent parking spots as a string of length `n`:

* `'%'` for a motorcycle,
* `'<>'` for a car,
* `'.'` for an empty spot.

For example:

```
'.%%.<><>'
```

represents a row of 8 spots: an empty spot, two motorcycles, an empty spot, and two cars.

Implement the function `park` that returns all possible ways vehicles can be parked in `n` adjacent spots, represented as strings. Spots may be left empty.

```python
def park(n):
    """Return the ways to park cars and motorcycles in n adjacent spots.

    >>> park(1)
    ['%', '.']

    >>> park(2)
    ['%%', '%.', '.%', '..', '<>']

    >>> len(park(4))  # some examples: '<><>', '.%%.', '%<>%', '%.<>'
    29
    """
    if n < 0:
        return "____"
    elif n == 0:
        return "____"
    else:
        return "____"
```

## Mutability

Mutability means whether an object’s contents can change after creation. Lists and dicts are mutable, while tuples and strings are immutable.

In [None]:
a = "astring"
a[3] = "s"

## Tuple

Tuple is immutable, smaller and faster. Use a list when you have a collection that may change. Use a tuple when the data is fixed and should be protected from modification.

In [153]:
a_tuple = ("Hello", "Python", "World")

In [155]:
rgb = (256, 128, 0)
rgb[1]

128

## Dictionary

Dictionaries are Python's built-in data type for storing and manipulating correspondence relationships. A dictionary contains key-value pairs, where both the keys and values are objects. The purpose of a dictionary is to provide an abstraction for storing and retrieving values that are indexed not by consecutive integers, but by descriptive keys.

Strings commonly serve as keys, because strings are our conventional representation for names of things. This dictionary literal gives the values of various Roman numerals.

In [160]:
numerals = {"I": 1, "V": 5, "X": 10}
numerals["X"]

10

In [161]:
numerals["V"]

5

In [162]:
numerals["L"] = 50

In [167]:
numerals.values()

dict_values([1, 5, 10, 50])

In [61]:
numerals.keys()

dict_keys(['I', 'V', 'X', 'L'])

In [180]:
numerals.items()

dict_items([('I', 1), ('V', 5), ('X', 10), ('L', 50)])

In [182]:
for k, v in numerals.items():
    print(k, "->", v)

I -> 1
V -> 5
X -> 10
L -> 50


In [183]:
"X" in numerals

True

In [80]:
10 in numerals

False

In [184]:
10 in numerals.values()

True

Dictionary comprehensions let you build dicts in one expression by generating key–value pairs from an iterable, with optional filtering.

Basic form: `{<key_expr>: <value_expr> for <item> in <iterable>}`

With filtering: `{<key_expr>: <value_expr> for <item> in <iterable> if <condition>}`

With conditional value: `{<key_expr>: (<value_if_true> if <condition> else <value_if_false>) for <item> in <iterable>}`

With conditional key and value: `{(<key_if_true> if <cond> else <key_if_false>): (<val_if_true> if <cond> else <val_if_false>) for <item> in <iterable>}`

Squares map:

In [196]:
a_dict = {n: n * n for n in range(25, 30)}

Filtered:

In [None]:
{n: n * n for n in range(10) if n % 2 == 0}

{0: 0, 2: 4, 4: 16, 6: 36, 8: 64}

From existing dict with transform:

In [199]:
d = {"I": 1, "V": 5, "X": 10}

{k.lower(): v * 2 for k, v in d.items()}

{'i': 2, 'v': 10, 'x': 20}

From two iterables:

In [205]:
keys = ["one", "two", "three"]
values = [1, 2, 3]
{k: v for k, v in zip(keys, values)}

{'one': 1, 'two': 2, 'three': 3}



Use when you need a new dict immediately; prefer a regular loop if the expression gets hard to read.

**Exercise**

Write a function `count_frequency(seq)` that takes a sequence of elements and returns a dictionary mapping each distinct element to the number of times it appears.

In [None]:
seq = ["hi", "I", "am", "Alexa", "I", "would", "just", "like", "to", "say", "hi"]

In [213]:
result = {"hi": 1}

In [216]:
result.get("hi", 0)

1

In [217]:
def count_frequency(seq):
    """
    Return a dictionary of element frequencies in seq.

    >>> seq = ["hi", "I", "am", "Alice", "I", "would", "just", "like", "to", "say", "hi"]
    >>> count_frequency(seq)
    {
        "hi": 2,
        "I": 2,
        "am": 1,
        "Alice": 1,
        "would": 1,
        "just": 1,
        "like": 1,
        "to": 1,
        "say": 1,
    }
    """
    result = {}
    for e in seq:
        result[e] = result.get(e, 0) + 1
    return result


seq = ["hi", "I", "am", "Alice", "I", "would", "just", "like", "to", "say", "hi"]
count_frequency(seq)

{'hi': 2,
 'I': 2,
 'am': 1,
 'Alice': 1,
 'would': 1,
 'just': 1,
 'like': 1,
 'to': 1,
 'say': 1}

Implement `divide`, which takes two lists of positive integers `quotients` and `divisors`. It returns a dictionary whose keys are the elements of `quotients`. For each key `q`, its corresponding value is a list of all the elements of `divisors` that can be evenly divided by `q`.

Hint: The value for each key needs be a list, so list comprehension might be useful here.

In [None]:
def divide(quotients, divisors):
    """Return a dictonary in which each quotient q is a key for the list of
    divisors that it divides evenly.

    >>> divide([3, 4, 5], [8, 9, 10, 11, 12])
    {3: [9, 12], 4: [8, 12], 5: [10]}
    >>> divide(range(1, 5), range(20, 25))
    {1: [20, 21, 22, 23, 24], 2: [20, 22, 24], 3: [21, 24], 4: [20, 24]}
    """
    return {q: [d for d in divisors if d % q == 0] for q in quotients}


divide(range(1, 5), range(20, 25))

{1: [20, 21, 22, 23, 24], 2: [20, 22, 24], 3: [21, 24], 4: [20, 24]}

Implement the buy function that takes three parameters:

`fruits_to_buy`: A list of strings representing the fruits you need to buy. At least one of each fruit must be bought.
`prices`: A dictionary where the keys are fruit names (strings) and the values are positive integers representing the cost of each fruit.
`total_amount`: An integer representing the total money available for purchasing the fruits. Take a look at the docstring for more details on the input structure.

The function should print all possible ways to buy the required fruits so that the combined cost equals `total_amount`. You can only select fruits mentioned in `fruits_to_buy` list.

Note: You can use the `display` function to format the output. Call display(fruit, count) for each fruit and its corresponding quantity to generate a string showing the type and amount of fruit bought.

Hint: How can you ensure that every combination includes at least one of each fruit listed in `fruits_to_buy`?

In [None]:
def buy(fruits_to_buy, prices, total_amount):
    """Print ways to buy some of each fruit so that the sum of prices is amount.

    >>> prices = {'oranges': 4, 'apples': 3, 'bananas': 2, 'kiwis': 9}
    >>> buy(['apples', 'oranges', 'bananas'], prices, 12)  # We can only buy apple, orange, and banana, but not kiwi
    [2 apples][1 orange][1 banana]
    >>> buy(['apples', 'oranges', 'bananas'], prices, 16)
    [2 apples][1 orange][3 bananas]
    [2 apples][2 oranges][1 banana]
    >>> buy(['apples', 'kiwis'], prices, 36)
    [3 apples][3 kiwis]
    [6 apples][2 kiwis]
    [9 apples][1 kiwi]
    """

    def add(fruits, amount, cart):
        if fruits == [] and amount == 0:
            print(cart)
        elif fruits and amount > 0:
            fruit = fruits[0]
            price = ____
            for k in ____:
                # Hint: The display function will help you add fruit to the cart.
                add(____, ____, ____)

    add(fruits_to_buy, total_amount, "")


def display(fruit, count):
    """Display a count of a fruit in square brackets.

    >>> display('apples', 3)
    '[3 apples]'
    >>> display('apples', 1)
    '[1 apple]'
    >>> print(display('apples', 3) + display('kiwis', 3))
    [3 apples][3 kiwis]
    """
    assert count >= 1 and fruit[-1] == "s"
    if count == 1:
        fruit = fruit[:-1]  # get rid of the plural s
    return "[" + str(count) + " " + fruit + "]"

## Identity and Equality

Identity refers to whether two variables point to the exact same object in memory, and equality refers to whether two objects have the same value.

In [225]:
a = [2, 4]
b = [4, 2]

if a != b:
    print("They are not equal.")
else:
    print("They are equal.")

They are not equal.


In [226]:
a = [1, 2, 4]
b = [1, 2, 4]
a.append(5)

In [230]:
a = [1, 2, 4]
b = a
a.append(5)

In [231]:
b

[1, 2, 4, 5]

In [232]:
id(a)

4595952512

In [233]:
id(b)

4595952512

In [234]:
a

[1, 2, 4, 5]

In [235]:
a.append(6)

In [236]:
b

[1, 2, 4, 5, 6]

In [237]:
a is b

True

In [238]:
a == b

True

In [2]:
a = [1, 2, 3]
b = [1, 2, 3]

In [3]:
id(a)

4609754112

In [4]:
id(b)

4609751104

In [5]:
a.append(4)

In [6]:
b

[1, 2, 3]

In [None]:
def a_func_with_optional_parameter(a=None):
    if a is None:
        print("a is not provided.")
    else:
        print(f"a is provided, the value is {a}.")

a is provided, the value is hello.


In [None]:
if type(a) is list:
    pass

if isinstance(a, list):
    pass

**Exercise**

Implement `shuffle`, which takes a sequence `s` (such as a list or range) with an even number of elements. It returns a new list that interleaves the elements of the first half of `s` with the elements of the second half. It does not modify `s`.

To interleave two sequences `s0` and `s1` is to create a new list containing the first element of `s0`, the first element of `s1`, the second element of `s0`, the second element of `s1`, and so on.
For example, if `s0 = [1, 2, 3]` and `s1 = [4, 5, 6]`, then interleaving `s0` and `s1` would result in `[1, 4, 2, 5, 3, 6]`.

In [83]:
def shuffle(s):
    """Return a shuffled list that interleaves the two halves of s.
    >>> shuffle(range(6))
    [0, 3, 1, 4, 2, 5]
    >>> letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
    >>> shuffle(letters)
    ['a', 'e', 'b', 'f', 'c', 'g', 'd', 'h']
    >>> shuffle(shuffle(letters))
    ['a', 'c', 'e', 'g', 'b', 'd', 'f', 'h']
    >>> letters  # Original list should not be modified
    ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
    """
    assert len(s) % 2 == 0, "len(seq) must be even"
    "*** YOUR CODE HERE ***"

**Definition:** A nested list of numbers is a list that contains numbers and lists. It may contain only numbers, only lists, or a mixture of both. The lists must also be nested lists of numbers.

For example: `[1, [2, [3]], 4]`, `[1, 2, 3]`, and `[[1, 2], [3, 4]]` are all nested lists of numbers.

Write a function `deep_map` that takes two arguments: a nested list of numbers `s` and a one-argument function `f`. It modifies `s` **in place** by applying `f` to each number within `s` and replacing the number with the result of calling `f` on that number.

`deep_map` returns `None` and should **not** create any new lists.

> Hint: `type(a) is list` will evaluate to `True` if `a` is a list.

In [82]:
def deep_map(f, s):
    """Replace all non-list elements x with f(x) in the nested list s.
    >>> six = [1, 2, [3, [4], 5], 6]
    >>> deep_map(lambda x: x * x, six)
    >>> six
    [1, 4, [9, [16], 25], 36]
    >>> # Check that you're not making new lists
    >>> s = [3, [1, [4, [1]]]]
    >>> s1 = s[1]
    >>> s2 = s1[1]
    >>> s3 = s2[1]
    >>> deep_map(lambda x: x + 1, s)
    >>> s
    [4, [2, [5, [2]]]]
    >>> s1 is s[1]
    True
    >>> s2 is s1[1]
    True
    >>> s3 is s2[1]
    True
    """
    "*** YOUR CODE HERE ***"

## Files

### Read and Write

In [8]:
# Modes: 'r' = read, 'w' = write, 'a' = append, 'b' = binary

with open("inputs.txt", "r") as f:
    data = f.read()  # read entire file as string
    print(data)

1
2
3
4
5
6
7
8


In [12]:
# Read line by line
with open("inputs.txt", "r") as f:
    for line in f:
        print(line.strip())

1
2
3
4
5
6
7
8


In [13]:
# Read all lines as list
with open("inputs.txt", "r") as f:
    lines = f.readlines()

### CSV

In [17]:
import csv

# Writing
rows = [
    ["name", "age"],
    ["Alice", 30],
    ["Bob", 25],
]

with open("people.csv", "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerows(rows)

# Reading
with open("people.csv", "r") as f:
    reader = csv.reader(f)
    for row in reader:
        print(row)

['name', 'age']
['Alice', '30']
['Bob', '25']


### YAML

YAML is a human-friendly format for configs. Needs the `PyYAML` library installed.

In [None]:
import yaml

# Writing
data = {"name": "Alice", "age": 30, "skills": ["Python", "SQL"]}
with open("config.yaml", "w") as f:
    yaml.dump(data, f)

# Reading
with open("config.yaml", "r") as f:
    loaded = yaml.safe_load(f)
    print(loaded["skills"])

['Python', 'SQL', 'R', 'Excel']


**Exercise**

You are given a text file called **`quotes.txt`** with the following contents:

```
The journey of a thousand miles begins with one step.
Life is what happens when you're busy making other plans.
That which does not kill us makes us stronger.
When the going gets tough, the tough get going.
```

Read the contents of `quotes.txt`  and write a new file called `short_quotes.txt` that contains only the lines with fewer than 50 characters.


In [29]:
def remove_long_lines(input_file, output_file):
    with open(input_file, "r") as f:
        lines = f.readlines()

    short_lines = [line for line in lines if len(line) < 51]

    with open(output_file, "w") as f:
        f.writelines(short_lines)


remove_long_lines("quotes.txt", "short_quotes.txt")

## Classes and Objects

In [30]:
class Account:
    def __init__(self, name, balance=0):
        self.name = name
        self.balance = balance

In [31]:
a_account = Account(name="Alice", balance=100)

In [32]:
type(a_account)

__main__.Account