# Python Collections Module Exercises


In [116]:
import pandas as pd
from collections import Counter, defaultdict, deque, namedtuple, ChainMap

**Dataset:** *Injection Mold 365-Day Inspection Log*

**Objective:** Use `collections` to solve typical quality control and maintenance tracking problems in manufacturing workflows.


In [117]:
file_path = 'injection_mold_365day_dataset.csv'

In [118]:
df = pd.read_csv(file_path)

In [119]:
df.head()

Unnamed: 0,Date,Shift,Mold ID,Part Name,Wear Level (%),Surface Roughness (Ra μm),Clearance (mm),Alignment Accuracy (mm),Leak Check,Corrosion,Crack Detected,Function Test,Lubrication Status,Pass/Fail
0,2025-01-01,Shift 1,MOLD741,Guide Pin,60.8,1.03,0.323,0.076,Pass,Severe,No,Fail,Needs Reapply,Fail
1,2025-01-01,Shift 1,MOLD742,Core Plate,43.5,0.28,0.339,0.188,Pass,Moderate,Yes,Fail,Good,Fail
2,2025-01-01,Shift 1,MOLD560,Runner Channel,15.0,0.57,0.418,0.128,Fail,Moderate,Yes,Pass,Needs Reapply,Pass
3,2025-01-01,Shift 2,MOLD126,Support Pillar,68.7,0.47,0.109,0.045,Pass,Moderate,No,Pass,Good,Pass
4,2025-01-01,Shift 2,MOLD727,Core Plate,53.7,0.41,0.351,0.181,Fail,Moderate,Yes,Fail,Needs Reapply,Fail


In [120]:
df.columns

Index(['Date', 'Shift', 'Mold ID', 'Part Name', 'Wear Level (%)',
       'Surface Roughness (Ra μm)', 'Clearance (mm)',
       'Alignment Accuracy (mm)', 'Leak Check', 'Corrosion', 'Crack Detected',
       'Function Test', 'Lubrication Status', 'Pass/Fail'],
      dtype='object')

### **Problem 1: Count Defective Parts per Day**

**Purpose**: Practice using `Counter` to aggregate counts.

> Count how many parts failed the final `Pass/Fail` test for each date.


In [121]:
failures = df[df['Pass/Fail'] == 'Fail']

In [122]:
fail_per_date = Counter(failures['Date'])

In [123]:
fail_per_date

Counter({'2025-01-10': 6,
         '2025-05-06': 6,
         '2025-08-07': 6,
         '2025-11-30': 6,
         '2025-01-08': 5,
         '2025-01-16': 5,
         '2025-03-18': 5,
         '2025-03-23': 5,
         '2025-03-28': 5,
         '2025-04-03': 5,
         '2025-04-08': 5,
         '2025-04-11': 5,
         '2025-06-01': 5,
         '2025-06-03': 5,
         '2025-06-20': 5,
         '2025-07-08': 5,
         '2025-07-19': 5,
         '2025-07-20': 5,
         '2025-08-09': 5,
         '2025-08-10': 5,
         '2025-08-13': 5,
         '2025-09-02': 5,
         '2025-09-07': 5,
         '2025-09-17': 5,
         '2025-09-20': 5,
         '2025-11-13': 5,
         '2025-11-18': 5,
         '2025-12-03': 5,
         '2025-12-16': 5,
         '2025-12-23': 5,
         '2025-01-12': 4,
         '2025-01-21': 4,
         '2025-01-25': 4,
         '2025-01-30': 4,
         '2025-02-08': 4,
         '2025-02-10': 4,
         '2025-02-15': 4,
         '2025-02-16': 4,
         '20

### **Problem 2: Track Mold Issues with `defaultdict`**

**Purpose**: Practice using `defaultdict(set)` for grouped data.

> Create a dictionary where each mold ID is the key, and its value is a list of all issues recorded (e.g., if “Leak Check” is “Fail”, “Corrosion” is “Severe”, etc.).


In [124]:
mold_issues = defaultdict(set)

In [125]:
for _, row in df.iterrows():
    mold_id = row['Mold ID']
    
    if row['Leak Check'] == 'Fail':
        mold_issues[mold_id].add(row['Leak Check'])
    
    if row['Corrosion'] in ['Moderate', 'Severe']:
        mold_issues[mold_id].add(f'Corrosion: {row["Corrosion"]}')
        
    if row['Crack Detected'] == 'Yes':
        mold_issues[mold_id].add('Crack Detected')
        
    if row['Function Test'] == 'Fail':
        mold_issues[mold_id].add('Function Test Failed')
        
    if row['Lubrication Status'] == 'Needs Reapply':
        mold_issues[mold_id].add('Needs Lubrication')

In [126]:
dict(list(mold_issues.items())[:5])

{'MOLD741': {'Corrosion: Severe', 'Function Test Failed', 'Needs Lubrication'},
 'MOLD742': {'Corrosion: Moderate',
  'Corrosion: Severe',
  'Crack Detected',
  'Function Test Failed',
  'Needs Lubrication'},
 'MOLD560': {'Corrosion: Moderate',
  'Crack Detected',
  'Fail',
  'Needs Lubrication'},
 'MOLD126': {'Corrosion: Moderate'},
 'MOLD727': {'Corrosion: Moderate',
  'Corrosion: Severe',
  'Crack Detected',
  'Fail',
  'Function Test Failed',
  'Needs Lubrication'}}

# What is `defaultdict`?

`defaultdict` is a special kind of dictionary from Python’s `collections` module.
It **automatically creates a default value** for any key that doesn't exist **when accessed**.



### Example Comparison

#### Using regular `dict`:

```python
issues = {}
issues['MOLD741'].append('Leak Check Failed')  # ERROR! Key doesn't exist
```

You would get:

```
KeyError: 'MOLD741'
```

#### Using `defaultdict(list)` or `defaultdict(set)`:

```python
from collections import defaultdict

issues = defaultdict(list)
issues['MOLD741'].append('Leak Check Failed')  # Works automatically
```

It **creates a new empty list** for `'MOLD741'` the first time it's accessed.



## What has `defaultdict` done better than regular `dict`?

| Feature                                            | `dict` | `defaultdict` |
| -------------------------------------------------- | ------ | ------------- |
| Needs manual key checking                          | ✅      | ❌             |
| Auto-creates empty value                           | ❌      | ✅             |
| Cleaner syntax for appending                       | ❌      | ✅             |
| Better for grouping/aggregation                    | ❌      | ✅             |
| Works with any default type (list, set, int, etc.) | ❌      | ✅             |



## Why is `defaultdict(set)` useful for **this mold inspection problem**?

In this specific case:

* Each **Mold ID** (like `'MOLD741'`) maps to a **set of issues**.
* With `defaultdict(set)`, you don't need to check if the mold already has an entry:

```python
mold_issues[mold_id].add('Function Test Failed')  # Automatically works
```

Using a plain `dict` would require this extra check every time:

```python
if mold_id not in mold_issues:
    mold_issues[mold_id] = set()
mold_issues[mold_id].add('Function Test Failed')
```

That's more code and easier to mess up.



## Summary: Should You Always Use `defaultdict`?

Not always. But if you:

* Need to **group** things by a key
* Want to **append**, **extend**, or **add** to lists/sets/counters
* Are **tired of writing key-checking code**

Then `defaultdict` is your best friend.

### **Problem 2.1: Group Parts by Mold ID**

**Purpose**: Practice using `defaultdict(set)` for grouping.

> Create a dictionary where each **Mold ID** is the key, and the value is a list of **part names** it has produced over time.

Use `defaultdict(set)`. Avoid using any `if key not in dict` logic.

In [137]:
part_names = defaultdict(set)

In [138]:
for _, row in df.iterrows():
    mold_id = row['Mold ID']
    part_names[mold_id].add((row['Part Name']))
    

In [139]:
dict(list(part_names.items())[:3])

{'MOLD741': {'Guide Pin'},
 'MOLD742': {'Cavity Plate', 'Core Plate', 'Sprue Bushing'},
 'MOLD560': {'Runner Channel'}}

In [140]:
# convert set to list
part_names_cleaned = {mold: list(parts) for mold, parts in part_names.items()}

In [141]:
dict(list(part_names_cleaned.items())[:3])

{'MOLD741': ['Guide Pin'],
 'MOLD742': ['Core Plate', 'Cavity Plate', 'Sprue Bushing'],
 'MOLD560': ['Runner Channel']}

### **Problem 2.2: Track All Days a Mold Had a Lubrication Problem**

**Purpose**: Practice using `defaultdict(set)` to collect **unique dates**.

> Create a dictionary where each **Mold ID** maps to a **set of dates** when the **"Lubrication Status" was "Needs Reapply"**.

Use `defaultdict(set)` to store only the date, ensuring no duplicate dates for any mold.

In [142]:
lubrication_reapply_dates = defaultdict(set)

In [143]:
for _, row in df.iterrows():
    mold_id = row['Mold ID']
    
    if row['Lubrication Status'] == 'Needs Reapply':
        lubrication_reapply_dates[mold_id].add(row['Date'])

In [144]:
dict(list(lubrication_reapply_dates.items())[:3])

{'MOLD741': {'2025-01-01',
  '2025-06-06',
  '2025-09-16',
  '2025-09-29',
  '2025-11-13',
  '2025-12-11'},
 'MOLD560': {'2025-01-01',
  '2025-02-12',
  '2025-04-07',
  '2025-05-02',
  '2025-05-17',
  '2025-12-13'},
 'MOLD727': {'2025-01-01',
  '2025-01-06',
  '2025-04-17',
  '2025-06-03',
  '2025-11-04',
  '2025-12-16',
  '2025-12-20'}}

### **Problem 2.3: Count All Issue Types per Mold ID**

**Purpose**: Practice using `defaultdict(Counter)` for multi-category aggregation.

> Build a dictionary where each **Mold ID** is a key, and the value is a `Counter` object tallying different issue types (like `"Leak Check Failed"`, `"Crack Detected"`).

Use nested `defaultdict(Counter)` to maintain a count of issue types per mold.

## You’re about to learn one of the most powerful patterns in Python: defaultdict(Counter).

### What does defaultdict(Counter) mean?
It means:

For every new Mold ID, Python automatically gives you an empty Counter() object.

Then you can do += 1 to tally issue types — no need to check if keys exist.

In [145]:
mold_issues_counter = defaultdict(Counter)

In [146]:
for _, row in df.iterrows():
    mold_id = row['Mold ID']
    
    if row['Leak Check'] == 'Fail':
        mold_issues_counter[mold_id]['Leak Check Failed'] += 1
        
    if row['Corrosion'] in ['Moderate', 'Severe']:
        mold_issues_counter[mold_id][f'Corrosion: {row["Corrosion"]}'] += 1
        
    if row['Crack Detected'] == 'Yes':
        mold_issues_counter[mold_id]['Crack Detected'] += 1
        
    if row['Function Test'] == 'Fail':
        mold_issues_counter[mold_id][f'Function Test Failed'] += 1
        
    if row['Lubrication Status'] == 'Needs Reapply':
        mold_issues_counter[mold_id][f'Needs Lubrication'] += 1

In [147]:
dict(list(mold_issues_counter.items())[:3])

{'MOLD741': Counter({'Corrosion: Severe': 6,
          'Function Test Failed': 6,
          'Needs Lubrication': 6}),
 'MOLD742': Counter({'Function Test Failed': 19,
          'Corrosion: Moderate': 15,
          'Crack Detected': 12,
          'Needs Lubrication': 7,
          'Corrosion: Severe': 4}),
 'MOLD560': Counter({'Leak Check Failed': 6,
          'Corrosion: Moderate': 6,
          'Crack Detected': 6,
          'Needs Lubrication': 6})}

### **Problem 2.4: Count Inspections per Part Name**

**Purpose**: Practice using `defaultdict(int)` for simple counting.

> Count how many times each **Part Name** appears in the dataset using `defaultdict(int)`.

In [148]:
part_names_count = defaultdict(int)

In [149]:
for _, row in df.iterrows():
    part_names_count[row['Part Name']] += 1

In [150]:
dict(list(part_names_count.items())[:3])

{'Guide Pin': 104, 'Core Plate': 138, 'Runner Channel': 116}

### **Problem 2.5: Sum Total Wear Level per Mold**

**Purpose**: Practice using `defaultdict(float)` for accumulating numerical values.

> For each **Mold ID**, compute the **total Wear Level (%)** across all its inspections.

In [151]:
total_wear_level = defaultdict(float)

In [154]:
for _, row in df.iterrows():
    total_wear_level[row['Mold ID']] += row['Wear Level (%)']
    

In [155]:
dict(list(total_wear_level.items())[:3])

{'MOLD741': 364.8, 'MOLD742': 846.2, 'MOLD560': 90.0}

### **Problem 2.6: Store Latest Function Test by Date**

**Purpose**: Practice using `defaultdict(dict)` to hold structured values.

> Create a dictionary where each **Mold ID** maps to another dictionary of `{Date: Function Test}`.

In [156]:
function_test_results = defaultdict(dict)

In [157]:
for _, row in df.iterrows():
    mold = row['Mold ID']
    date = row['Date']
    function_test_results[mold][date] = row['Function Test']

In [158]:
dict(list(function_test_results.items())[:3])

{'MOLD741': {'2025-01-01': 'Fail',
  '2025-06-06': 'Fail',
  '2025-09-16': 'Fail',
  '2025-09-29': 'Fail',
  '2025-11-13': 'Fail',
  '2025-12-11': 'Fail'},
 'MOLD742': {'2025-01-01': 'Fail',
  '2025-01-31': 'Fail',
  '2025-02-07': 'Fail',
  '2025-02-19': 'Fail',
  '2025-04-16': 'Fail',
  '2025-06-05': 'Fail',
  '2025-06-15': 'Fail',
  '2025-07-03': 'Fail',
  '2025-07-12': 'Fail',
  '2025-08-27': 'Fail',
  '2025-09-12': 'Fail',
  '2025-10-11': 'Fail',
  '2025-10-26': 'Fail',
  '2025-11-01': 'Fail',
  '2025-11-03': 'Fail',
  '2025-11-06': 'Fail',
  '2025-11-18': 'Fail',
  '2025-11-23': 'Fail',
  '2025-12-20': 'Fail'},
 'MOLD560': {'2025-01-01': 'Pass',
  '2025-02-12': 'Pass',
  '2025-04-07': 'Pass',
  '2025-05-02': 'Pass',
  '2025-05-17': 'Pass',
  '2025-12-13': 'Pass'}}

### **Problem 2.7: Keep Last 5 Roughness Values Per Mold**

**Purpose**: Practice using `defaultdict(deque)` with a `maxlen`.

> For each **Mold ID**, track the **last 5 surface roughness readings** using a sliding window.

In [162]:
surface_roughness = defaultdict(lambda: deque(maxlen=5))

In [163]:
for _, row in df.iterrows():
    surface_roughness[row['Mold ID']].append(row['Surface Roughness (Ra μm)'])

In [164]:
dict(list(surface_roughness.items())[:3])

{'MOLD741': deque([1.03, 1.03, 1.03, 1.03, 1.03], maxlen=5),
 'MOLD742': deque([0.28, 0.28, 0.28, 0.28, 0.15], maxlen=5),
 'MOLD560': deque([0.57, 0.57, 0.57, 0.57, 0.57], maxlen=5)}

### Explanation:
deque(maxlen=5) maintains a sliding window of the last 5 items.

When the 6th value is added, the oldest one is automatically removed.

### **Problem 2.8: Build Nested Inspection Count by Shift and Mold**

**Purpose**: Practice using nested `defaultdict(lambda: defaultdict(int))`.

> Build a structure like:

```python
{
  'Shift 1': {'MOLD741': 12, 'MOLD742': 15},
  'Shift 2': {'MOLD741': 10}
}
```

In [174]:
shift_mold_total = defaultdict(lambda: defaultdict(int))

In [175]:
for _, row in df.iterrows():
    shift_mold_total[row['Shift']][row['Mold ID']] += 1

In [176]:
dict(list(shift_mold_total.items())[:3])

{'Shift 1': defaultdict(int,
             {'MOLD741': 3,
              'MOLD742': 9,
              'MOLD560': 2,
              'MOLD255': 7,
              'MOLD522': 4,
              'MOLD903': 1,
              'MOLD565': 9,
              'MOLD360': 3,
              'MOLD772': 11,
              'MOLD541': 3,
              'MOLD226': 4,
              'MOLD973': 4,
              'MOLD474': 4,
              'MOLD203': 3,
              'MOLD666': 5,
              'MOLD431': 3,
              'MOLD752': 9,
              'MOLD727': 3,
              'MOLD156': 7,
              'MOLD503': 11,
              'MOLD365': 2,
              'MOLD744': 1,
              'MOLD462': 1,
              'MOLD271': 10,
              'MOLD553': 4,
              'MOLD880': 3,
              'MOLD458': 2,
              'MOLD312': 7,
              'MOLD771': 3,
              'MOLD142': 5,
              'MOLD279': 2,
              'MOLD231': 8,
              'MOLD317': 5,
              'MOLD397': 3,
              'M

### **Problem 3: FIFO Buffer with `deque`**

**Purpose**: Simulate a real-time production buffer.

> Load inspection records for "Shift 1" into a deque buffer and remove the first 5 inspections (simulate processing). Show the remaining items.

### Problem 3: FIFO Buffer with deque gives you a realistic hands-on feel for how deque works — especially in scenarios like:

Real-time quality control buffers

Production line queueing

First-In-First-Out (FIFO) simulation

#### FIFO Concept Recap:
First items added are first to be removed.

Ideal for queues, logs, and real-time buffers.

collections.deque is optimized for fast appends and pops from both ends.

In [181]:
df_clean = df.copy()

In [182]:
df_clean.columns = df_clean.columns.str.replace(' ', '_').str.replace('(', '').str.replace(')', '')

In [183]:
# filter Shift 1
shift1_records = df_clean[df_clean['Shift'] == 'Shift 1']

In [187]:
# create deque
buffer = deque(shift1_records.to_dict('records'))

In [188]:
for _ in range(5):
    if buffer:
        buffer.popleft()

In [189]:
list(buffer)[:3]

[{'Date': '2025-01-02',
  'Shift': 'Shift 1',
  'Mold_ID': 'MOLD903',
  'Part_Name': 'Guide Pin',
  'Wear_Level_%': 78.9,
  'Surface_Roughness_Ra_μm': 0.21,
  'Clearance_mm': 0.315,
  'Alignment_Accuracy_mm': 0.097,
  'Leak_Check': 'Pass',
  'Corrosion': 'Moderate',
  'Crack_Detected': 'Yes',
  'Function_Test': 'Fail',
  'Lubrication_Status': 'Dry',
  'Pass/Fail': 'Pass'},
 {'Date': '2025-01-03',
  'Shift': 'Shift 1',
  'Mold_ID': 'MOLD565',
  'Part_Name': 'Runner Channel',
  'Wear_Level_%': 16.2,
  'Surface_Roughness_Ra_μm': 0.65,
  'Clearance_mm': 0.213,
  'Alignment_Accuracy_mm': 0.097,
  'Leak_Check': 'Pass',
  'Corrosion': 'Moderate',
  'Crack_Detected': 'Yes',
  'Function_Test': 'Pass',
  'Lubrication_Status': 'Needs Reapply',
  'Pass/Fail': 'Pass'},
 {'Date': '2025-01-03',
  'Shift': 'Shift 1',
  'Mold_ID': 'MOLD360',
  'Part_Name': 'Support Pillar',
  'Wear_Level_%': 44.3,
  'Surface_Roughness_Ra_μm': 0.45,
  'Clearance_mm': 0.083,
  'Alignment_Accuracy_mm': 0.119,
  'Leak_Chec

Excellent! Below are **five new hands-on problems** focused on mastering `deque` in real-world production scenarios. These are numbered starting from **3.1** and cover sliding windows, multi-shift processing, rollback simulation, and more.

---

### 🔁 **Deque Practice Problems (3.1 to 3.5)**

**Goal**: Strengthen understanding of real-time buffering, order-sensitive operations, and state management using `collections.deque`.

---

### 📘 **Problem 3.1: Sliding Average of Wear Level**

**Purpose**: Use `deque(maxlen=5)` to maintain a window of the last 5 wear levels per mold.

> For each **Mold ID**, track the **last 5 wear levels**, and calculate the **average** after each new reading is added.

---

### 📘 **Problem 3.2: Per-Shift Inspection Queue**

**Purpose**: Group `deque` buffers by shift to simulate parallel buffers.

> Create a dictionary where each **Shift** is a key and its value is a `deque` of inspections. Limit each shift's buffer to the **last 10 inspections** only.

---

### 📘 **Problem 3.3: Rollback Last N Processed Items**

**Purpose**: Use `deque.appendleft()` to simulate rollbacks.

> Simulate a production error: after removing 3 items from the left (processed), **push them back to the front** of the buffer using `appendleft()`.

---

### 📘 **Problem 3.4: Identify First Mold with Consecutive Failures**

**Purpose**: Scan a `deque` buffer of rows and find the first **Mold ID** with **3 consecutive 'Fail' values** in `"Pass/Fail"`.

> Use a buffer of recent rows, and scan for any mold whose last 3 `"Pass/Fail"` values were `"Fail"`.

---

### 📘 **Problem 3.5: FIFO Alert Dispatcher**

**Purpose**: Use a `deque` to manage a queue of alert messages.

> Create a buffer where each element is a string like `"Alert: Mold MOLD741 - Wear Level 92%"`. Add new alerts as needed, and always remove the **oldest alert** once the buffer size exceeds 5.

---

Would you like to solve these yourself first, or get worked examples (step-by-step with code)?




---


---



---

### **Problem 4: Identify Most Common Faults**

**Purpose**: Use `Counter` to summarize quality issues.

> Find and display the most common combination of "Crack Detected" and "Function Test" outcomes.

---

### **Problem 5: Use `namedtuple` for Part Summaries**

**Purpose**: Store row data using `namedtuple` for readability.

> Define a `namedtuple` called `InspectionSummary` and use it to represent the first 10 records in the dataset.

---

### **Problem 6: Build a Lubrication Log**

**Purpose**: Use `defaultdict` for grouped status.

> Build a log that groups all `Mold ID`s by their `Lubrication Status`. For example, show all molds that "Need Reapply".

---

### **Problem 7: Compare Configurations with `ChainMap`**

**Purpose**: Use `ChainMap` to layer configuration settings.

> Create two dictionaries: one with default quality thresholds (`e.g. Wear Level < 50`) and another with temporary stricter thresholds for specific parts. Merge and display the effective configuration using `ChainMap`.

---

### **Problem 8: Determine Part Counts Per Mold**

**Purpose**: Use `Counter` to find mold utilization.

> Count how many times each `Mold ID` appears in the dataset (i.e., how many parts it has been used to produce).

---

### **Problem 9: Real-Time Sliding Window with `deque`**

**Purpose**: Use deque for windowed average calculations.

> Use a sliding window of the last 7 “Surface Roughness” values and compute the average after each insertion.

---

### **Problem 10: Build a Daily Issue Dashboard**

**Purpose**: Use `defaultdict(set)` to avoid duplicates.

> For each day, create a set of all issues (e.g., "Crack", "Leak", "Corrosion") reported, without duplicates.

---

Let me know when you're done with these or if you'd like the solutions, visualizations, or advanced versions using `pandas + collections` together!
