# Chapter 4: Data Structures - Organizing Information

---

## The CRAWL ‚Üí WALK ‚Üí RUN Framework

This textbook uses a structured approach to learning Python while developing effective AI collaboration skills. Each chapter follows three distinct phases:

| Mode | Icon | AI Policy | Purpose |
|------|------|-----------|--------|
| **CRAWL** | üêõ | No AI assistance | Build foundational skills you can demonstrate independently |
| **WALK** | üö∂ | AI for understanding only | Use AI to explain concepts and errors, but write your own code |
| **RUN** | üöÄ | Full AI collaboration | Partner with AI on complex tasks while documenting your process |

**Why This Matters:** Your exams will test CRAWL and WALK material with no AI assistance. If you skip the foundational work and rely entirely on AI, you won't pass. The progression ensures you build genuine competence before leveraging AI as a professional tool.

## üìä Case Study Continues: Lehigh Student Success Dataset

We continue working with the Lehigh student dataset. Now that you can write functions, you'll learn to:

- **Store multiple student records** in lists
- **Represent individual students** as dictionaries
- **Find unique values** (colleges, majors) using sets
- **Combine structures** for complex data organization

This chapter is the bridge between basic Python and real data analysis. The data structures you learn here are exactly what pandas DataFrames are built on.

**Dataset Reminder:**

| Variable | Type | Description |
|----------|------|-------------|
| Student_ID | String | Unique identifier (LU100001) |
| College | String | Lehigh college |
| Major | String | Declared major |
| Class_Year | String | Academic standing |
| GPA | Float | Grade Point Average (0.0-4.0) |
| Credits_Attempted | Integer | Total credits registered |
| Credits_Earned | Integer | Credits successfully passed |

**Heads up:** In Week 3, you'll encounter the "messy" version of this dataset. The data structure skills from this chapter are essential for cleaning inconsistent college names, handling missing values, and detecting duplicates.

## Learning Objectives

By the end of this chapter, you will:

- üêõ Create, access, and modify lists
- üêõ Use list indexing and slicing to extract data
- üêõ Apply common list methods (append, extend, insert, remove, pop, sort)
- üêõ Understand the difference between lists and tuples
- üêõ Create and access dictionary key-value pairs
- üêõ Iterate over dictionaries using keys, values, and items
- üö∂ Use sets to find unique values and perform set operations
- üö∂ Nest data structures (lists of dictionaries, dictionaries of lists)
- üö∂ Choose the right data structure for different problems
- üöÄ Build a student records management system using combined data structures

---

# üêõ CRAWL: Lists and Tuples

**Rules for this section:**
- Close all AI tools (ChatGPT, Claude, Copilot, etc.)
- Work through examples by typing them yourself
- Use only this notebook, Python documentation, or your instructor for help
- This material will appear on exams without AI assistance

---

## üìö DataCamp Resources for Chapter 4

**[Introduction to Python](https://www.datacamp.com/courses/intro-to-python-for-data-science)** - Complete these:

| Chapter | Topics Covered | Alignment |
|---------|---------------|------------|
| Chapter 2: Python Lists | List creation, indexing, manipulation | Sections 4.1-4.4 |

**[Data Types for Data Science in Python](https://www.datacamp.com/courses/data-types-for-data-science-in-python)** - Complete these:

| Chapter | Topics Covered | Alignment |
|---------|---------------|------------|
| Chapter 1: Fundamental Data Types | Lists, tuples, strings | Sections 4.1-4.5 |
| Chapter 2: Dictionaries | Dictionary creation, methods | Sections 4.6-4.8 |
| Chapter 3: Meet the Collections Module | Sets, advanced collections | Section 4.9 |

**Estimated time:** 3-4 hours total

---

## 4.1 Introduction to Lists

A **list** is an ordered, mutable (changeable) collection of items. Lists can contain any type of data, including mixed types.

Lists are created using square brackets `[]` with items separated by commas.

In [None]:
# Creating lists
gpas = [3.41, 3.55, 3.60, 3.44, 3.90]
colleges = ["Business", "Engineering", "Arts and Sciences", "Health", "Education"]
credits = [105, 73, 45, 125, 89]

print(gpas)
print(colleges)
print(credits)

In [None]:
# Lists can contain mixed types (though usually not recommended)
student_info = ["LU100001", "Finance", 3.41, 105, True]
print(student_info)

In [None]:
# Empty list
empty_list = []
print(empty_list)
print(len(empty_list))  # Length is 0

In [None]:
# Check list length
print(f"Number of GPAs: {len(gpas)}")
print(f"Number of colleges: {len(colleges)}")

## 4.2 List Indexing

Access individual items using their **index** (position). Python uses **zero-based indexing**, meaning the first item is at index 0.

| Index | 0 | 1 | 2 | 3 | 4 |
|-------|---|---|---|---|---|
| Item | "Business" | "Engineering" | "Arts and Sciences" | "Health" | "Education" |
| Negative Index | -5 | -4 | -3 | -2 | -1 |

In [None]:
colleges = ["Business", "Engineering", "Arts and Sciences", "Health", "Education"]

# Access by positive index
print(colleges[0])   # First item
print(colleges[2])   # Third item
print(colleges[4])   # Fifth (last) item

In [None]:
# Negative indexing counts from the end
print(colleges[-1])  # Last item
print(colleges[-2])  # Second to last
print(colleges[-5])  # Fifth from end (same as first)

In [None]:
# What happens if you go out of bounds?
# Uncomment to see the error:
# print(colleges[10])  # IndexError!

In [None]:
# Modify items by index
gpas = [3.41, 3.55, 3.60, 3.44, 3.90]
print(f"Before: {gpas}")

gpas[1] = 3.75  # Update second GPA
print(f"After: {gpas}")

## 4.3 List Slicing

Extract a portion of a list using **slicing**. The syntax is `list[start:stop:step]`.

- `start`: Index to begin (inclusive, default 0)
- `stop`: Index to end (exclusive, default end of list)
- `step`: How many items to skip (default 1)

**Key insight:** The stop index is NOT included in the result.

In [None]:
gpas = [3.41, 3.55, 3.60, 3.44, 3.90, 2.85, 3.75]

# Basic slicing
print(gpas[1:4])    # Items at index 1, 2, 3 (not 4!)
print(gpas[:3])     # First three items (0, 1, 2)
print(gpas[4:])     # From index 4 to end
print(gpas[:])      # Copy of entire list

In [None]:
# Slicing with step
print(gpas[::2])    # Every other item (0, 2, 4, 6)
print(gpas[1::2])   # Every other item starting at index 1
print(gpas[::-1])   # Reverse the list!

In [None]:
# Negative indices in slicing
print(gpas[-3:])    # Last three items
print(gpas[:-2])    # Everything except last two

In [None]:
# Practical example: Get top 3 GPAs (assuming sorted)
sorted_gpas = sorted(gpas, reverse=True)
print(f"All GPAs (sorted): {sorted_gpas}")
print(f"Top 3 GPAs: {sorted_gpas[:3]}")

## 4.4 List Methods

Lists have many built-in methods. The most important ones:

| Method | Purpose | Example |
|--------|---------|--------|
| `append(x)` | Add x to end | `list.append(3.5)` |
| `extend(iterable)` | Add all items from iterable | `list.extend([1, 2, 3])` |
| `insert(i, x)` | Insert x at position i | `list.insert(0, "first")` |
| `remove(x)` | Remove first occurrence of x | `list.remove(3.5)` |
| `pop(i)` | Remove and return item at i | `list.pop(0)` |
| `index(x)` | Return index of first x | `list.index(3.5)` |
| `count(x)` | Count occurrences of x | `list.count(3.5)` |
| `sort()` | Sort list in place | `list.sort()` |
| `reverse()` | Reverse list in place | `list.reverse()` |

In [None]:
# Building a list of student GPAs
gpas = []

# Add students one by one
gpas.append(3.41)
gpas.append(3.55)
gpas.append(3.60)
print(f"After appends: {gpas}")

# Add multiple at once
gpas.extend([3.44, 3.90])
print(f"After extend: {gpas}")

In [None]:
# Insert at a specific position
colleges = ["Business", "Engineering", "Health"]
colleges.insert(2, "Arts and Sciences")  # Insert at index 2
print(colleges)

In [None]:
# Remove items
gpas = [3.41, 3.55, 3.60, 3.44, 3.90]

# Remove by value
gpas.remove(3.55)
print(f"After remove: {gpas}")

# Remove by index and get the value
removed_gpa = gpas.pop(0)
print(f"Removed: {removed_gpa}")
print(f"After pop: {gpas}")

In [None]:
# Find items
gpas = [3.41, 3.55, 3.60, 3.44, 3.55, 3.90]

print(f"Index of 3.55: {gpas.index(3.55)}")  # First occurrence
print(f"Count of 3.55: {gpas.count(3.55)}")

In [None]:
# Sorting
gpas = [3.41, 3.55, 3.60, 3.44, 3.90]

# Sort in place (modifies original list)
gpas.sort()
print(f"Sorted ascending: {gpas}")

gpas.sort(reverse=True)
print(f"Sorted descending: {gpas}")

In [None]:
# sorted() returns a NEW list (doesn't modify original)
gpas = [3.41, 3.55, 3.60, 3.44, 3.90]
sorted_gpas = sorted(gpas)

print(f"Original: {gpas}")
print(f"Sorted copy: {sorted_gpas}")

### Common Pitfall: append vs extend

In [None]:
# append adds the item as a single element
list1 = [1, 2, 3]
list1.append([4, 5])
print(f"append: {list1}")  # [1, 2, 3, [4, 5]] - nested list!

# extend adds each item from the iterable
list2 = [1, 2, 3]
list2.extend([4, 5])
print(f"extend: {list2}")  # [1, 2, 3, 4, 5] - flat list

## 4.5 Tuples

A **tuple** is an ordered, immutable (unchangeable) collection. Once created, you cannot add, remove, or modify items.

Tuples are created using parentheses `()` or just commas.

In [None]:
# Creating tuples
student = ("LU100001", "Finance", 3.41)
coordinates = (40.6084, -75.3785)  # Lehigh's location

print(student)
print(coordinates)

In [None]:
# Tuple without parentheses (packing)
point = 10, 20, 30
print(point)
print(type(point))

In [None]:
# Single-item tuple needs a trailing comma!
not_a_tuple = ("Business")  # This is just a string
is_a_tuple = ("Business",)  # This is a tuple

print(type(not_a_tuple))
print(type(is_a_tuple))

In [None]:
# Indexing and slicing work the same as lists
student = ("LU100001", "Finance", 3.41, 105)

print(student[0])      # First item
print(student[-1])     # Last item
print(student[1:3])    # Slice

In [None]:
# But you CANNOT modify a tuple
# Uncomment to see the error:
# student[2] = 3.50  # TypeError!

In [None]:
# Tuple unpacking - very useful!
student = ("LU100001", "Finance", 3.41)

student_id, major, gpa = student

print(f"ID: {student_id}")
print(f"Major: {major}")
print(f"GPA: {gpa}")

In [None]:
# Functions often return tuples
def get_min_max(numbers):
    return min(numbers), max(numbers)

gpas = [3.41, 3.55, 3.60, 3.44, 3.90]
lowest, highest = get_min_max(gpas)
print(f"GPA range: {lowest} to {highest}")

### When to Use Tuples vs Lists

| Use Lists When... | Use Tuples When... |
|-------------------|--------------------|
| Collection will change | Data should not change |
| Items are similar (all GPAs) | Items are different fields (ID, name, GPA) |
| Order might change | Order is fixed and meaningful |
| Need to add/remove items | Need to unpack values |

**Rule of thumb:** If you're storing a fixed record of related but different values, use a tuple. If you're storing a collection that might grow or change, use a list.

---

# üêõ CRAWL: Dictionaries

---

## 4.6 Introduction to Dictionaries

A **dictionary** stores key-value pairs. Instead of accessing items by position (index), you access them by name (key).

This is perfect for structured data like student records where each field has a name.

Dictionaries are created using curly braces `{}` with `key: value` pairs.

In [None]:
# A student record as a dictionary
student = {
    "id": "LU100001",
    "college": "College of Business",
    "major": "Finance",
    "class_year": "Senior",
    "gpa": 3.41,
    "credits_attempted": 105,
    "credits_earned": 99
}

print(student)

In [None]:
# Access values by key
print(student["id"])
print(student["gpa"])
print(student["major"])

In [None]:
# Calculate completion rate using dictionary values
completion_rate = student["credits_earned"] / student["credits_attempted"]
print(f"{student['id']} completion rate: {completion_rate:.1%}")

In [None]:
# What happens if you access a key that doesn't exist?
# Uncomment to see:
# print(student["email"])  # KeyError!

In [None]:
# Use .get() to safely access keys (returns None if missing)
print(student.get("email"))         # None (no error)
print(student.get("email", "N/A"))  # Default value if missing

## 4.7 Modifying Dictionaries

In [None]:
student = {
    "id": "LU100001",
    "major": "Finance",
    "gpa": 3.41
}

# Add a new key-value pair
student["email"] = "abc123@lehigh.edu"
print(student)

In [None]:
# Update an existing value
student["gpa"] = 3.55
print(f"Updated GPA: {student['gpa']}")

In [None]:
# Remove a key-value pair
del student["email"]
print(student)

In [None]:
# Remove and get the value
gpa = student.pop("gpa")
print(f"Removed GPA: {gpa}")
print(f"Remaining: {student}")

In [None]:
# Update multiple values at once
student = {"id": "LU100001", "major": "Finance", "gpa": 3.41}

student.update({
    "gpa": 3.55,
    "credits": 105,
    "class_year": "Senior"
})

print(student)

## 4.8 Iterating Over Dictionaries

In [None]:
student = {
    "id": "LU100001",
    "major": "Finance",
    "gpa": 3.41,
    "credits": 105
}

# Iterate over keys (default behavior)
print("Keys:")
for key in student:
    print(f"  {key}")

In [None]:
# Iterate over values
print("Values:")
for value in student.values():
    print(f"  {value}")

In [None]:
# Iterate over key-value pairs (most useful)
print("Key-Value Pairs:")
for key, value in student.items():
    print(f"  {key}: {value}")

In [None]:
# Check if a key exists
print("gpa" in student)      # True
print("email" in student)    # False

if "gpa" in student:
    print(f"GPA is: {student['gpa']}")

In [None]:
# Practical example: Count students by college
student_colleges = ["Business", "Engineering", "Business", "Health", 
                    "Engineering", "Business", "Arts", "Health"]

college_counts = {}
for college in student_colleges:
    if college in college_counts:
        college_counts[college] = college_counts[college] + 1
    else:
        college_counts[college] = 1

print(college_counts)

In [None]:
# Cleaner version using .get()
college_counts = {}
for college in student_colleges:
    college_counts[college] = college_counts.get(college, 0) + 1

print(college_counts)

---

## üêõ CRAWL Practice Problems

Complete these problems without any AI assistance.

---

### Problem 4.1: List Manipulation
Given the list `gpas = [3.41, 3.55, 2.10, 3.90, 1.85, 3.67, 4.0, 2.95]`:
1. Find the highest and lowest GPA
2. Calculate the average GPA
3. Count how many GPAs are above 3.0
4. Create a new list containing only GPAs on the Dean's List (>= 3.5)

In [None]:
# Your code here
gpas = [3.41, 3.55, 2.10, 3.90, 1.85, 3.67, 4.0, 2.95]


### Problem 4.2: List Slicing
Given `credits = [15, 30, 45, 60, 75, 90, 105, 120]`:
1. Get the first three values
2. Get the last two values
3. Get every other value starting from index 1
4. Reverse the list using slicing

In [None]:
# Your code here
credits = [15, 30, 45, 60, 75, 90, 105, 120]


### Problem 4.3: Tuple Unpacking
Given a list of student tuples, unpack and print each student's information:
```python
students = [
    ("LU100001", "Finance", 3.41),
    ("LU100002", "Computer Science", 3.85),
    ("LU100003", "Biology", 2.95)
]
```
Print each as: "Student LU100001 studies Finance with a 3.41 GPA"

In [None]:
# Your code here
students = [
    ("LU100001", "Finance", 3.41),
    ("LU100002", "Computer Science", 3.85),
    ("LU100003", "Biology", 2.95)
]


### Problem 4.4: Dictionary Creation
Create a dictionary for a student with:
- ID: "LU100042"
- College: "P.C. Rossin College of Engineering"
- Major: "Computer Science"
- GPA: 3.75
- Credits: 89

Then print a formatted summary of the student.

In [None]:
# Your code here


### Problem 4.5: Dictionary Counting
Given a list of majors, count how many students are in each major:
```python
majors = ["Finance", "CS", "Finance", "Biology", "CS", "CS", "Finance", "Biology"]
```
Expected output: `{"Finance": 3, "CS": 3, "Biology": 2}`

In [None]:
# Your code here
majors = ["Finance", "CS", "Finance", "Biology", "CS", "CS", "Finance", "Biology"]


### Problem 4.6: Predict the Output
What does each of these print? Predict first, then run to check.

```python
a) [1, 2, 3] + [4, 5]
b) [1, 2, 3] * 2
c) list("hello")
d) {"a": 1, "b": 2}.get("c", 0)
e) len({"x": 1, "y": 2, "z": 3})
```

In [None]:
# Check your predictions


---

# üö∂ WALK: Sets and Nested Structures

**Rules for this section:**
- You may use AI tools to **explain** concepts and errors
- You must **write all code yourself**
- Good prompts: "Explain set intersection" or "When should I use a set vs a list?"
- Bad prompts: "Write code that does X"

---

## 4.9 Sets

A **set** is an unordered collection of unique items. Sets automatically remove duplicates and support mathematical set operations.

Sets are created using curly braces `{}` or the `set()` function.

In [None]:
# Creating sets
colleges = {"Business", "Engineering", "Health", "Business", "Engineering"}
print(colleges)  # Duplicates removed!

In [None]:
# Convert a list to a set to find unique values
majors_list = ["Finance", "CS", "Finance", "Biology", "CS", "Finance"]
unique_majors = set(majors_list)
print(f"All majors: {majors_list}")
print(f"Unique majors: {unique_majors}")
print(f"Number of unique majors: {len(unique_majors)}")

In [None]:
# Empty set must use set(), not {}
empty_set = set()     # This is an empty set
empty_dict = {}       # This is an empty dictionary!

print(type(empty_set))
print(type(empty_dict))

In [None]:
# Add and remove items
majors = {"Finance", "Biology"}

majors.add("Computer Science")
print(f"After add: {majors}")

majors.add("Finance")  # Already exists, no change
print(f"After duplicate add: {majors}")

majors.remove("Biology")
print(f"After remove: {majors}")

### Set Operations

Sets support mathematical operations that are very useful for data analysis:

| Operation | Method | Operator | Result |
|-----------|--------|----------|--------|
| Union | `a.union(b)` | `a \| b` | All items from both sets |
| Intersection | `a.intersection(b)` | `a & b` | Items in both sets |
| Difference | `a.difference(b)` | `a - b` | Items in a but not in b |
| Symmetric Difference | `a.symmetric_difference(b)` | `a ^ b` | Items in one but not both |

In [None]:
# Students who took Python course
python_students = {"Alice", "Bob", "Charlie", "Diana"}

# Students who took Data Analysis course
data_students = {"Bob", "Diana", "Eve", "Frank"}

# Union: students who took at least one course
print(f"Either course: {python_students | data_students}")

# Intersection: students who took both courses
print(f"Both courses: {python_students & data_students}")

# Difference: Python only (not Data Analysis)
print(f"Python only: {python_students - data_students}")

# Symmetric difference: one but not both
print(f"Exactly one course: {python_students ^ data_students}")

In [None]:
# Practical example: Find students in the messy dataset with inconsistent college names
clean_colleges = {"College of Business", "P.C. Rossin College of Engineering", 
                  "College of Arts and Sciences", "College of Health", "College of Education"}

# Colleges found in messy data
messy_colleges = {"College of Business", "COB", "Business", "Engineering", 
                  "P.C. Rossin College of Engineering", "RCOE"}

# Find non-standard names that need cleaning
needs_cleaning = messy_colleges - clean_colleges
print(f"Non-standard names to fix: {needs_cleaning}")

## 4.10 Nested Data Structures

Real data often requires combining data structures. The most common patterns:

1. **List of dictionaries** - Multiple records (like a database table)
2. **Dictionary of lists** - Grouping items by category
3. **Dictionary of dictionaries** - Hierarchical data

In [None]:
# List of dictionaries - most common for tabular data
students = [
    {"id": "LU100001", "college": "Business", "gpa": 3.41, "credits": 105},
    {"id": "LU100002", "college": "Engineering", "gpa": 3.55, "credits": 73},
    {"id": "LU100003", "college": "Engineering", "gpa": 1.85, "credits": 45},
    {"id": "LU100004", "college": "Health", "gpa": 3.90, "credits": 125},
]

# Access a specific student
print(students[0])  # First student (dictionary)

# Access a specific field of a specific student
print(students[0]["gpa"])  # First student's GPA

In [None]:
# Loop through list of dictionaries
print("Student Report:")
print("-" * 40)
for student in students:
    print(f"{student['id']}: {student['college']}, GPA={student['gpa']:.2f}")

In [None]:
# Calculate average GPA from list of dictionaries
total_gpa = 0
for student in students:
    total_gpa = total_gpa + student["gpa"]
    
average_gpa = total_gpa / len(students)
print(f"Average GPA: {average_gpa:.2f}")

In [None]:
# Filter students with GPA >= 3.5
deans_list = []
for student in students:
    if student["gpa"] >= 3.5:
        deans_list.append(student)

print("Dean's List:")
for student in deans_list:
    print(f"  {student['id']}: {student['gpa']}")

In [None]:
# Dictionary of lists - grouping by category
students_by_college = {
    "Business": ["LU100001", "LU100005"],
    "Engineering": ["LU100002", "LU100003"],
    "Health": ["LU100004"]
}

# Get all Engineering students
print(f"Engineering students: {students_by_college['Engineering']}")

# Count students per college
for college, student_list in students_by_college.items():
    print(f"{college}: {len(student_list)} students")

In [None]:
# Build dictionary of lists from flat data
students = [
    {"id": "LU100001", "college": "Business", "gpa": 3.41},
    {"id": "LU100002", "college": "Engineering", "gpa": 3.55},
    {"id": "LU100003", "college": "Engineering", "gpa": 1.85},
    {"id": "LU100004", "college": "Health", "gpa": 3.90},
    {"id": "LU100005", "college": "Business", "gpa": 2.75},
]

# Group students by college
by_college = {}
for student in students:
    college = student["college"]
    if college not in by_college:
        by_college[college] = []
    by_college[college].append(student["id"])

print(by_college)

## 4.11 Choosing the Right Data Structure

| Need | Use | Example |
|------|-----|--------|
| Ordered collection that changes | List | GPAs, credit hours |
| Fixed record of related values | Tuple | (id, name, gpa) |
| Named fields for a single record | Dictionary | {"id": "LU001", "gpa": 3.5} |
| Multiple records with same structure | List of dictionaries | Student database |
| Unique values only | Set | Unique majors |
| Quick lookup by name | Dictionary | config settings |
| Grouping items by category | Dictionary of lists | Students by college |

---

## üö∂ WALK Practice Problems

Use AI to help you understand concepts and errors, but write all code yourself.

---

### Problem 4.7: Set Operations for Data Cleaning
You're preparing to clean the messy dataset. Given:
- `valid_colleges`: The 5 official college names
- `found_colleges`: The college names found in the messy data

Find:
1. Which college names need to be standardized (in found but not valid)
2. Which valid colleges appear in the data
3. All unique college names (valid or not)

In [None]:
# Your code here
valid_colleges = {"College of Business", "P.C. Rossin College of Engineering", 
                  "College of Arts and Sciences", "College of Health", "College of Education"}

found_colleges = {"College of Business", "COB", "Business", "college of business",
                  "P.C. Rossin College of Engineering", "Engineering", "RCOE",
                  "College of Arts and Sciences", "CAS", "A&S"}


### Problem 4.8: Processing List of Dictionaries
Given the student data below:
1. Find all students on academic probation (GPA < 2.0)
2. Calculate average GPA by college
3. Find the student with the highest GPA

In [None]:
# Your code here
students = [
    {"id": "LU100001", "college": "Business", "gpa": 3.41},
    {"id": "LU100002", "college": "Engineering", "gpa": 3.55},
    {"id": "LU100003", "college": "Engineering", "gpa": 1.85},
    {"id": "LU100004", "college": "Health", "gpa": 3.90},
    {"id": "LU100005", "college": "Business", "gpa": 2.10},
    {"id": "LU100006", "college": "Business", "gpa": 1.65},
    {"id": "LU100007", "college": "Health", "gpa": 3.75},
]


### Problem 4.9: Building Nested Structures
Transform the flat student list into a nested structure organized by college.
The result should look like:
```python
{
    "Business": [
        {"id": "LU100001", "gpa": 3.41},
        {"id": "LU100005", "gpa": 2.10},
        ...
    ],
    "Engineering": [...],
    ...
}
```

In [None]:
# Your code here
students = [
    {"id": "LU100001", "college": "Business", "gpa": 3.41},
    {"id": "LU100002", "college": "Engineering", "gpa": 3.55},
    {"id": "LU100003", "college": "Engineering", "gpa": 1.85},
    {"id": "LU100004", "college": "Health", "gpa": 3.90},
    {"id": "LU100005", "college": "Business", "gpa": 2.10},
]


### Problem 4.10: Debug These Errors
Fix these code cells. Use AI to understand the errors.

In [None]:
# Error 1: Fix this
student = {"id": "LU001", "gpa": 3.5}
print(student["name"])

In [None]:
# Error 2: Fix this
gpas = [3.5, 3.2, 3.8]
gpas[5] = 4.0

In [None]:
# Error 3: Why doesn't this work?
data = ("LU001", "Finance", 3.5)
data[2] = 3.75

In [None]:
# Error 4: Fix this (subtle!)
students = [
    {"id": "LU001", "gpa": 3.5}
    {"id": "LU002", "gpa": 3.2}
]

---

# üöÄ RUN: Real-World Application

**Rules for this section:**
- Full AI collaboration is encouraged
- Document your process
- You must understand and be able to explain every line

---

## Chapter Project: Student Records Management System

Build a complete system to manage student records using the data structures from this chapter. This simulates what you'd build before having access to pandas.

### Data
```python
students = [
    {"id": "LU100001", "college": "Business", "major": "Finance", "gpa": 3.41, "credits": 105},
    {"id": "LU100002", "college": "Engineering", "major": "Computer Science", "gpa": 3.55, "credits": 73},
    {"id": "LU100003", "college": "Engineering", "major": "Mechanical Engineering", "gpa": 1.85, "credits": 45},
    {"id": "LU100004", "college": "Health", "major": "Nursing", "gpa": 3.90, "credits": 125},
    {"id": "LU100005", "college": "Business", "major": "Marketing", "gpa": 2.10, "credits": 89},
    {"id": "LU100006", "college": "Education", "major": "Elementary Education", "gpa": 4.0, "credits": 102},
    {"id": "LU100007", "college": "Health", "major": "Public Health", "gpa": 1.65, "credits": 30},
    {"id": "LU100008", "college": "Arts and Sciences", "major": "Psychology", "gpa": 3.67, "credits": 68},
    {"id": "LU100009", "college": "Engineering", "major": "Computer Science", "gpa": 3.85, "credits": 91},
    {"id": "LU100010", "college": "Business", "major": "Finance", "gpa": 2.95, "credits": 115},
]
```

### Requirements

Build functions to:

1. **`get_unique_values(students, field)`** - Return a set of unique values for any field

2. **`filter_students(students, field, value)`** - Return list of students matching a field value

3. **`group_by(students, field)`** - Return dictionary grouping students by a field

4. **`calculate_stats_by_group(students, group_field, stat_field)`** - Calculate average of stat_field for each group

5. **`find_at_risk(students)`** - Return students on probation (GPA < 2.0) or low completion (credits < 30 and not First Year)

6. **`generate_report(students)`** - Print a comprehensive report including:
   - Total students and unique colleges/majors
   - Average GPA by college
   - Dean's List students (GPA >= 3.5)
   - At-risk students

### AI Collaboration Tips
Good prompts:
- "How do I calculate an average from values in a list of dictionaries?"
- "What's the cleanest way to group items by a key in Python?"

Avoid:
- "Write a student management system for me"

In [None]:
# STUDENT RECORDS MANAGEMENT SYSTEM
#
# AI Collaboration Log:
# - Prompts used:
# - Key insights:
# - My modifications:

students = [
    {"id": "LU100001", "college": "Business", "major": "Finance", "gpa": 3.41, "credits": 105},
    {"id": "LU100002", "college": "Engineering", "major": "Computer Science", "gpa": 3.55, "credits": 73},
    {"id": "LU100003", "college": "Engineering", "major": "Mechanical Engineering", "gpa": 1.85, "credits": 45},
    {"id": "LU100004", "college": "Health", "major": "Nursing", "gpa": 3.90, "credits": 125},
    {"id": "LU100005", "college": "Business", "major": "Marketing", "gpa": 2.10, "credits": 89},
    {"id": "LU100006", "college": "Education", "major": "Elementary Education", "gpa": 4.0, "credits": 102},
    {"id": "LU100007", "college": "Health", "major": "Public Health", "gpa": 1.65, "credits": 30},
    {"id": "LU100008", "college": "Arts and Sciences", "major": "Psychology", "gpa": 3.67, "credits": 68},
    {"id": "LU100009", "college": "Engineering", "major": "Computer Science", "gpa": 3.85, "credits": 91},
    {"id": "LU100010", "college": "Business", "major": "Finance", "gpa": 2.95, "credits": 115},
]

# Your functions here...

def get_unique_values(students, field):
    """Return a set of unique values for the specified field."""
    pass  # Replace with your implementation

def filter_students(students, field, value):
    """Return list of students where field equals value."""
    pass  # Replace with your implementation

def group_by(students, field):
    """Return dictionary grouping students by field value."""
    pass  # Replace with your implementation

def calculate_stats_by_group(students, group_field, stat_field):
    """Calculate average of stat_field for each group."""
    pass  # Replace with your implementation

def find_at_risk(students):
    """Return students on probation or with low completion."""
    pass  # Replace with your implementation

def generate_report(students):
    """Print comprehensive student report."""
    pass  # Replace with your implementation

# Test your functions
print("=" * 50)
print("STUDENT RECORDS MANAGEMENT SYSTEM")
print("=" * 50)

# Add test code here...


### Project Reflection

1. Which data structure was most useful for which task?
2. How would you handle the messy dataset issues (inconsistent college names) using sets?
3. What would be different if you used pandas instead of these basic structures?
4. Which function was hardest to implement? Why?

*Your reflection here:*



---

# Preview: Week 3 Data Cleaning Challenge

In Week 3, you'll use these data structure skills to clean the **messy version** of the Lehigh student dataset. Here's what you'll face:

| Issue | Data Structure Solution |
|-------|------------------------|
| Inconsistent college names ("COB", "Business", "college of business") | Dictionary mapping + Set of valid names |
| Missing GPA values | List filtering with `if gpa is not None` |
| Duplicate student records | Set of seen IDs to detect duplicates |
| Finding unique majors with typos | Set operations to compare clean vs messy |
| Grouping data for analysis | Dictionary of lists |

The skills from this chapter directly translate to data cleaning tasks. When you use pandas later, you'll recognize that DataFrames are essentially fancy versions of lists of dictionaries.

---

# Accountability Check

## üêõ CRAWL (Must do without AI)
- [ ] Create lists and access items by index
- [ ] Use slicing to extract portions of a list
- [ ] Apply list methods: append, extend, remove, pop, sort
- [ ] Understand the difference between lists and tuples
- [ ] Create dictionaries and access values by key
- [ ] Iterate over dictionaries using keys(), values(), items()
- [ ] Check if a key exists in a dictionary

## üö∂ WALK (AI to learn, write code yourself)
- [ ] Use sets to find unique values
- [ ] Perform set operations (union, intersection, difference)
- [ ] Work with nested structures (list of dictionaries)
- [ ] Group data using dictionaries
- [ ] Choose the right data structure for a problem

## üöÄ RUN (AI-assisted, must understand)
- [ ] Build functions that process complex data structures
- [ ] Transform data between different structure formats
- [ ] Filter, group, and aggregate data
- [ ] Generate reports from structured data

**Review CRAWL material if you can't do it from memory.**

---

## What's Next?

In **Week 3**, you'll tackle:

- **File I/O:** Reading and writing CSV files
- **Exception Handling:** Dealing with errors gracefully
- **Data Cleaning Project:** The messy Lehigh dataset with 28+ data quality issues
- **Midterm Review:** Preparing for the no-AI exam

The data structures from this chapter are the foundation for everything that follows. Make sure you can work with lists, dictionaries, and sets confidently before moving on.

---