# Data Structures

Organize and store data efficiently: lists, tuples, sets, dictionaries, and how mutability works.
        


## Learning Objectives

- Create and manipulate lists, tuples, sets, and dictionaries for everyday data tasks.
- Use indexing, slicing, unpacking, and the `del` operator to reshape and clean collections.
- Explain mutability, identity vs. equality, and when to copy objects to avoid side effects.
- Combine structures (e.g., lists of dicts, `zip`, ranges) to model real datasets.


## 1. Lists: ordered, mutable sequences
A list can hold numbers, strings, or even other lists. Use square brackets `[]`.
        


In [None]:
# A list of numbers
prices = [10.5, 20.0, 5.99, 100.0]
print(prices)

# A list of strings
departments = ["Legal", "Economics", "IT", "HR"]
print(departments)

# A mixed list
mixed = ["Vienna", 123, True]
print(mixed)
        


### Accessing elements and checking size
- **Indexing**: Access specific items using their position (starting at 0).
- **Slicing**: Get a subset of the list.
- **`len()`**: The most important function for data scientists. It tells you how many rows or items are in your dataset.
- **`in`**: Checks if an item exists in the list.

In [None]:
print(departments[0])   # First element
print(departments[-1])  # Last element

print("Is IT listed?", "IT" in departments)
print("Number of departments:", len(departments))
        


### List properties and quick stats
Useful helpers include `len()`, `.count()`, `.index()`, and for numeric lists `min()`, `max()`, and `sum()`.
        


In [None]:
transactions = [5, 3, 1, 4, 1]
print("Count of 1s:", transactions.count(1))
print("Index of first 4:", transactions.index(4))
print("Smallest:", min(transactions))
print("Largest:", max(transactions))
print("Total:", sum(transactions))
print("Average:", sum(transactions) / len(transactions))
        


### Modifying lists
`append()` adds one element, `extend()` adds many, and `insert()` places an element at a position. You can also concatenate with `+` or repeat with `*`.
        


In [None]:
team = ["Legal", "IT"]
team.append("HR")  # Add a single element
team.extend(["Finance", "Audit"])  # Add multiple elements
team.insert(1, "Comms")  # Insert at position 1
print(team)

# Concatenate and repeat
new_team = team + ["Procurement"]
print(new_team)
repeated = [1, 2] * 3
print(repeated)
        


### Rearranging and removing elements
`sort()` orders elements, `.reverse()` flips the order, `.pop()` removes and returns an element, `.remove()` deletes by value, and `.clear()` empties a list.
        


In [None]:
scores = [5, 3, 1, 4, 1]
scores.sort()
print("Sorted:", scores)
scores.reverse()
print("Reversed:", scores)
last = scores.pop()  # Remove last element
print("After pop:", scores, "removed", last)
scores.remove(3)  # Remove first occurrence of 3
print("After removing 3:", scores)
scores_copy = scores.copy()
scores_copy.clear()
print("Cleared copy:", scores_copy)
        


### Unpacking lists
Lists can be unpacked into variables. Using `*` lets you collect "the rest" of the elements.
        


In [None]:
first, *middle, last = [10, 20, 30, 40, 50]
print(first, middle, last)

a, b, *_ = [1, 2, 3, 4]
print(a, b)
        


### Slicing and slice assignment
Use `list[start:stop:step]` to read ranges. Assigning to a slice replaces that range; assigning an empty list deletes the slice. `list[::-1]` reads the list in reverse without changing the original.
        


In [None]:
numbers = [0, 1, 2, 3, 4, 5]
print(numbers[1:4])
print(numbers[::-1])  # Read reversed

numbers[2:4] = [20, 30]
print("After replacement:", numbers)

numbers[1:3] = []  # Delete a slice
print("After deleting slice:", numbers)
        


### The `del` operator
`del` removes Python objects. With lists you can delete by index; with slices you must include the colon `:` to remove a range.
        


In [None]:
labels = ["A", "B", "C", "D"]
del labels[1]
print(labels)

placeholder = 42
del placeholder  # Removes the name
        


### Nested lists
Lists can contain other lists (e.g., for matrices). Access elements with multiple indexes.
        


In [None]:
matrix = [
    [1, 2, 3],
    [4, 5, 6],
]
print(matrix[1][0])  # Row 2, column 1
        


### Lists and strings
Strings behave like sequences: you can index and slice them. They are immutable, so operations like `.split()` and `.join()` create new strings.
        


In [None]:
text = "Hello, World"
print(text[0], text[-1])
print(text[7:12])

words = text.split(", ")
print(words)
joined = " - ".join(words)
print(joined)
        


### Ranges
`range(start, stop, step)` represents a sequence of numbers (often used in loops). Convert it to a list to see the values.
        


In [None]:
r = range(2, 10, 2)
print(r)
print(list(r))
        


## 2. Tuples (immutable sequences)
Tuples look like lists but use parentheses `()` and cannot be changed. Use them for fixed records like coordinates.
        


In [None]:
coordinates = (48.2, 16.37)
print(coordinates[0])
# coordinates[0] = 0  # Would raise a TypeError because tuples are immutable
        


### Match-case with tuples
Pattern matching can unpack tuples and react to specific shapes of data.
        


In [None]:
point = (0, 1)

match point:
    case (0, y):
        print(f"On the y-axis at {y}")
    case (x, 0):
        print(f"On the x-axis at {x}")
    case (x, y):
        print(f"Point at ({x}, {y})")
        


## 3. `zip()`: combine iterables
`zip()` pairs elements from multiple iterables (lists, tuples, ranges) into tuples, position by position.
        


In [None]:
names = ["Alice", "Bob", "Charlie"]
departments_short = ["Legal", "IT", "Economics"]
paired = list(zip(names, departments_short))
print(paired)
print(dict(zip(names, departments_short)))
        


## 4. Sets (unique, unordered collections)
Sets remove duplicates automatically and support mathematical operations: union `|`, intersection `&`, difference `-`, and symmetric difference `^`. They are mutable.
        


In [None]:
set_a = {1, 2, 3, 4}
set_b = {3, 4, 5}

print("Union:", set_a | set_b)
print("Intersection:", set_a & set_b)
print("Difference:", set_a - set_b)
print("Symmetric difference:", set_a ^ set_b)

set_a.add(5)
set_a.remove(1)
print("Modified set_a:", set_a)
        


## 5. Mutability at a glance
- **Mutable**: lists, sets, dictionaries (they can change in place).
- **Immutable**: tuples, strings (a new object is created when you "change" them).
Aliasing mutable objects (assigning them to multiple variables) means changes are visible through all references.
        


In [None]:
a = b = []
a.append("shared")
print("Lists share state:", b)

immutable_tuple = (1, 2, 3)
try:
    immutable_tuple[0] = 99
except TypeError as exc:
    print("Tuples cannot change:", exc)
        


### Identity vs. equality
`==` compares values, while `is` checks object identity. Small integers and short strings may be interned (reused) by Python.
        


In [None]:
x = [1, 2, 3]
y = [1, 2, 3]
print("Same value?", x == y)
print("Same object?", x is y)

name1 = "ab"
name2 = "a" + "b"
print("Interned small strings might be identical:", name1 is name2)
        


### Shallow vs. deep copies
`list()` or `.copy()` make a shallow copy (nested objects are shared). `copy.deepcopy()` clones everything.
        


In [None]:
import copy

original = [[1, 2], [3, 4]]
shallow = list(original)
deep = copy.deepcopy(original)

original[0][0] = 99
print("Shallow copy sees change:", shallow)
print("Deep copy isolated:", deep)
        


### Mutability and functions
Mutable arguments can be changed inside a function; immutable ones cannot.
        


In [None]:
def add_item(items, value):
    items.append(value)

def try_to_change_number(n):
    n += 10
    return n

basket = ["apple"]
add_item(basket, "banana")
print("Basket after function:", basket)

number = 5
new_number = try_to_change_number(number)
print("Original number unchanged:", number)
print("Returned value:", new_number)
        


## 6. Dictionaries (key-value pairs)
Dictionaries map keys to values and use curly braces `{}`. Keys must be unique, and dictionaries are mutable.
        


In [None]:
employee = {
    "name": "Alice Smith",
    "department": "Legal",
    "salary": 60000,
    "active": True,
}
print(employee)
        


### Accessing and modifying values
Use square brackets to read/write by key. The `in` keyword checks for keys, `.keys()` and `.values()` list them, and `.get()` can supply a default.
        


In [None]:
print(employee["name"])
print(employee.get("salary"))

employee["salary"] = 65000  # Update a value
employee["email"] = "alice.smith@fma.gv.at"  # Add a new pair
print("Keys:", list(employee.keys()))
print("Has 'department' key?", "department" in employee)
print(employee)
        


### The `*` operator with dictionaries
Using `*my_dict` unpacks all keys. `**` can merge dictionaries into a new one.
        


In [None]:
print([*employee])  # Keys only
other = {"office": "Vienna"}
merged = {**employee, **other}
print(merged)
        


## 7. Lists of dictionaries (real-world data)
Datasets often combine both structures: a **list** of **dictionaries**, one per record (similar to CSV or JSON data).
        


In [None]:
employees = [
    {"name": "Alice", "dept": "Legal", "salary": 60000},
    {"name": "Bob", "dept": "IT", "salary": 75000},
    {"name": "Charlie", "dept": "Economics", "salary": 62000},
]

total_salary = sum(emp["salary"] for emp in employees)
avg_salary = total_salary / len(employees)
print(f"Average salary: {avg_salary}")
        


## Summary
- **Lists**: access, modify, unpack, slice, nest, and connect to strings and ranges.
- **Tuples**: ordered but immutable; great with pattern matching.
- **Sets**: unique unordered items with fast membership tests and set algebra.
- **Dictionaries**: key-value stores, unpackable with `*`/`**` and useful with lists of records.
- **Mutability**: understand identity, copying, and side effects when sharing objects or passing them to functions.
        
