# Python Collections Module Exercises


In [116]:
import pandas as pd
from collections import Counter, defaultdict, deque, namedtuple, ChainMap

**Dataset:** *Injection Mold 365-Day Inspection Log*

**Objective:** Use `collections` to solve typical quality control and maintenance tracking problems in manufacturing workflows.


In [117]:
file_path = 'injection_mold_365day_dataset.csv'

In [118]:
df = pd.read_csv(file_path)

In [119]:
df.head()

Unnamed: 0,Date,Shift,Mold ID,Part Name,Wear Level (%),Surface Roughness (Ra μm),Clearance (mm),Alignment Accuracy (mm),Leak Check,Corrosion,Crack Detected,Function Test,Lubrication Status,Pass/Fail
0,2025-01-01,Shift 1,MOLD741,Guide Pin,60.8,1.03,0.323,0.076,Pass,Severe,No,Fail,Needs Reapply,Fail
1,2025-01-01,Shift 1,MOLD742,Core Plate,43.5,0.28,0.339,0.188,Pass,Moderate,Yes,Fail,Good,Fail
2,2025-01-01,Shift 1,MOLD560,Runner Channel,15.0,0.57,0.418,0.128,Fail,Moderate,Yes,Pass,Needs Reapply,Pass
3,2025-01-01,Shift 2,MOLD126,Support Pillar,68.7,0.47,0.109,0.045,Pass,Moderate,No,Pass,Good,Pass
4,2025-01-01,Shift 2,MOLD727,Core Plate,53.7,0.41,0.351,0.181,Fail,Moderate,Yes,Fail,Needs Reapply,Fail


In [120]:
df.columns

Index(['Date', 'Shift', 'Mold ID', 'Part Name', 'Wear Level (%)',
       'Surface Roughness (Ra μm)', 'Clearance (mm)',
       'Alignment Accuracy (mm)', 'Leak Check', 'Corrosion', 'Crack Detected',
       'Function Test', 'Lubrication Status', 'Pass/Fail'],
      dtype='object')

### **Problem 1: Count Defective Parts per Day**

**Purpose**: Practice using `Counter` to aggregate counts.

> Count how many parts failed the final `Pass/Fail` test for each date.


In [121]:
failures = df[df['Pass/Fail'] == 'Fail']

In [122]:
fail_per_date = Counter(failures['Date'])

In [123]:
fail_per_date

Counter({'2025-01-10': 6,
         '2025-05-06': 6,
         '2025-08-07': 6,
         '2025-11-30': 6,
         '2025-01-08': 5,
         '2025-01-16': 5,
         '2025-03-18': 5,
         '2025-03-23': 5,
         '2025-03-28': 5,
         '2025-04-03': 5,
         '2025-04-08': 5,
         '2025-04-11': 5,
         '2025-06-01': 5,
         '2025-06-03': 5,
         '2025-06-20': 5,
         '2025-07-08': 5,
         '2025-07-19': 5,
         '2025-07-20': 5,
         '2025-08-09': 5,
         '2025-08-10': 5,
         '2025-08-13': 5,
         '2025-09-02': 5,
         '2025-09-07': 5,
         '2025-09-17': 5,
         '2025-09-20': 5,
         '2025-11-13': 5,
         '2025-11-18': 5,
         '2025-12-03': 5,
         '2025-12-16': 5,
         '2025-12-23': 5,
         '2025-01-12': 4,
         '2025-01-21': 4,
         '2025-01-25': 4,
         '2025-01-30': 4,
         '2025-02-08': 4,
         '2025-02-10': 4,
         '2025-02-15': 4,
         '2025-02-16': 4,
         '20

### **Problem 2: Track Mold Issues with `defaultdict`**

**Purpose**: Practice using `defaultdict(set)` for grouped data.

> Create a dictionary where each mold ID is the key, and its value is a list of all issues recorded (e.g., if “Leak Check” is “Fail”, “Corrosion” is “Severe”, etc.).


In [124]:
mold_issues = defaultdict(set)

In [125]:
for _, row in df.iterrows():
    mold_id = row['Mold ID']
    
    if row['Leak Check'] == 'Fail':
        mold_issues[mold_id].add(row['Leak Check'])
    
    if row['Corrosion'] in ['Moderate', 'Severe']:
        mold_issues[mold_id].add(f'Corrosion: {row["Corrosion"]}')
        
    if row['Crack Detected'] == 'Yes':
        mold_issues[mold_id].add('Crack Detected')
        
    if row['Function Test'] == 'Fail':
        mold_issues[mold_id].add('Function Test Failed')
        
    if row['Lubrication Status'] == 'Needs Reapply':
        mold_issues[mold_id].add('Needs Lubrication')

In [126]:
dict(list(mold_issues.items())[:5])

{'MOLD741': {'Corrosion: Severe', 'Function Test Failed', 'Needs Lubrication'},
 'MOLD742': {'Corrosion: Moderate',
  'Corrosion: Severe',
  'Crack Detected',
  'Function Test Failed',
  'Needs Lubrication'},
 'MOLD560': {'Corrosion: Moderate',
  'Crack Detected',
  'Fail',
  'Needs Lubrication'},
 'MOLD126': {'Corrosion: Moderate'},
 'MOLD727': {'Corrosion: Moderate',
  'Corrosion: Severe',
  'Crack Detected',
  'Fail',
  'Function Test Failed',
  'Needs Lubrication'}}

# What is `defaultdict`?

`defaultdict` is a special kind of dictionary from Python’s `collections` module.
It **automatically creates a default value** for any key that doesn't exist **when accessed**.



### Example Comparison

#### Using regular `dict`:

```python
issues = {}
issues['MOLD741'].append('Leak Check Failed')  # ERROR! Key doesn't exist
```

You would get:

```
KeyError: 'MOLD741'
```

#### Using `defaultdict(list)` or `defaultdict(set)`:

```python
from collections import defaultdict

issues = defaultdict(list)
issues['MOLD741'].append('Leak Check Failed')  # Works automatically
```

It **creates a new empty list** for `'MOLD741'` the first time it's accessed.



## What has `defaultdict` done better than regular `dict`?

| Feature                                            | `dict` | `defaultdict` |
| -------------------------------------------------- | ------ | ------------- |
| Needs manual key checking                          | ✅      | ❌             |
| Auto-creates empty value                           | ❌      | ✅             |
| Cleaner syntax for appending                       | ❌      | ✅             |
| Better for grouping/aggregation                    | ❌      | ✅             |
| Works with any default type (list, set, int, etc.) | ❌      | ✅             |



## Why is `defaultdict(set)` useful for **this mold inspection problem**?

In this specific case:

* Each **Mold ID** (like `'MOLD741'`) maps to a **set of issues**.
* With `defaultdict(set)`, you don't need to check if the mold already has an entry:

```python
mold_issues[mold_id].add('Function Test Failed')  # Automatically works
```

Using a plain `dict` would require this extra check every time:

```python
if mold_id not in mold_issues:
    mold_issues[mold_id] = set()
mold_issues[mold_id].add('Function Test Failed')
```

That's more code and easier to mess up.



## Summary: Should You Always Use `defaultdict`?

Not always. But if you:

* Need to **group** things by a key
* Want to **append**, **extend**, or **add** to lists/sets/counters
* Are **tired of writing key-checking code**

Then `defaultdict` is your best friend.

### **Problem 2.1: Group Parts by Mold ID**

**Purpose**: Practice using `defaultdict(set)` for grouping.

> Create a dictionary where each **Mold ID** is the key, and the value is a list of **part names** it has produced over time.

Use `defaultdict(set)`. Avoid using any `if key not in dict` logic.

In [137]:
part_names = defaultdict(set)

In [138]:
for _, row in df.iterrows():
    mold_id = row['Mold ID']
    part_names[mold_id].add((row['Part Name']))
    

In [139]:
dict(list(part_names.items())[:3])

{'MOLD741': {'Guide Pin'},
 'MOLD742': {'Cavity Plate', 'Core Plate', 'Sprue Bushing'},
 'MOLD560': {'Runner Channel'}}

In [140]:
# convert set to list
part_names_cleaned = {mold: list(parts) for mold, parts in part_names.items()}

In [141]:
dict(list(part_names_cleaned.items())[:3])

{'MOLD741': ['Guide Pin'],
 'MOLD742': ['Core Plate', 'Cavity Plate', 'Sprue Bushing'],
 'MOLD560': ['Runner Channel']}

### **Problem 2.2: Track All Days a Mold Had a Lubrication Problem**

**Purpose**: Practice using `defaultdict(set)` to collect **unique dates**.

> Create a dictionary where each **Mold ID** maps to a **set of dates** when the **"Lubrication Status" was "Needs Reapply"**.

Use `defaultdict(set)` to store only the date, ensuring no duplicate dates for any mold.

In [142]:
lubrication_reapply_dates = defaultdict(set)

In [143]:
for _, row in df.iterrows():
    mold_id = row['Mold ID']
    
    if row['Lubrication Status'] == 'Needs Reapply':
        lubrication_reapply_dates[mold_id].add(row['Date'])

In [144]:
dict(list(lubrication_reapply_dates.items())[:3])

{'MOLD741': {'2025-01-01',
  '2025-06-06',
  '2025-09-16',
  '2025-09-29',
  '2025-11-13',
  '2025-12-11'},
 'MOLD560': {'2025-01-01',
  '2025-02-12',
  '2025-04-07',
  '2025-05-02',
  '2025-05-17',
  '2025-12-13'},
 'MOLD727': {'2025-01-01',
  '2025-01-06',
  '2025-04-17',
  '2025-06-03',
  '2025-11-04',
  '2025-12-16',
  '2025-12-20'}}

### **Problem 2.3: Count All Issue Types per Mold ID**

**Purpose**: Practice using `defaultdict(Counter)` for multi-category aggregation.

> Build a dictionary where each **Mold ID** is a key, and the value is a `Counter` object tallying different issue types (like `"Leak Check Failed"`, `"Crack Detected"`).

Use nested `defaultdict(Counter)` to maintain a count of issue types per mold.

Absolutely! Here are **three progressively structured exercises** to help you master `defaultdict` using your injection mold inspection dataset. These will be **labeled as Problem 2.1, 2.2, and 2.3**, and designed to reinforce grouped data handling and eliminate repetitive key-checking code.

---



---



---



---

Would you like to try solving these on your own first, or do you want working solutions to study right away?




---


---

### **Problem 3: FIFO Buffer with `deque`**

**Purpose**: Simulate a real-time production buffer.

> Load inspection records for "Shift 1" into a deque buffer and remove the first 5 inspections (simulate processing). Show the remaining items.

---

### **Problem 4: Identify Most Common Faults**

**Purpose**: Use `Counter` to summarize quality issues.

> Find and display the most common combination of "Crack Detected" and "Function Test" outcomes.

---

### **Problem 5: Use `namedtuple` for Part Summaries**

**Purpose**: Store row data using `namedtuple` for readability.

> Define a `namedtuple` called `InspectionSummary` and use it to represent the first 10 records in the dataset.

---

### **Problem 6: Build a Lubrication Log**

**Purpose**: Use `defaultdict` for grouped status.

> Build a log that groups all `Mold ID`s by their `Lubrication Status`. For example, show all molds that "Need Reapply".

---

### **Problem 7: Compare Configurations with `ChainMap`**

**Purpose**: Use `ChainMap` to layer configuration settings.

> Create two dictionaries: one with default quality thresholds (`e.g. Wear Level < 50`) and another with temporary stricter thresholds for specific parts. Merge and display the effective configuration using `ChainMap`.

---

### **Problem 8: Determine Part Counts Per Mold**

**Purpose**: Use `Counter` to find mold utilization.

> Count how many times each `Mold ID` appears in the dataset (i.e., how many parts it has been used to produce).

---

### **Problem 9: Real-Time Sliding Window with `deque`**

**Purpose**: Use deque for windowed average calculations.

> Use a sliding window of the last 7 “Surface Roughness” values and compute the average after each insertion.

---

### **Problem 10: Build a Daily Issue Dashboard**

**Purpose**: Use `defaultdict(set)` to avoid duplicates.

> For each day, create a set of all issues (e.g., "Crack", "Leak", "Corrosion") reported, without duplicates.

---

Let me know when you're done with these or if you'd like the solutions, visualizations, or advanced versions using `pandas + collections` together!
