# What is namedtuple()?
namedtuple() is a factory function in the collections module that creates a subclass of a tuple with named fields. It lets you access tuple elements using dot notation, improving code readability.

## **Difference between `tuple` and `namedtuple`** in Python:



### **1. Readability**

#### `tuple`:

You access items by **index**, which can be unclear:

```python
t = ('2025-01-01', 'OK', 120)
print(t[0])  # What's this? Hard to tell
```

#### `namedtuple`:

You access items by **field name**, improving clarity:

```python
from collections import namedtuple

FeedLog = namedtuple('FeedLog', ['date', 'status', 'duration'])
log = FeedLog('2025-01-01', 'OK', 120)

print(log.date)  # Much clearer
```



### **2. Self-documenting Code**

Using `namedtuple` gives your data **semantic meaning**:

```python
# tuple
log = ('2025-01-01', 'OK', 120)

# namedtuple
FeedLog = namedtuple('FeedLog', ['date', 'status', 'duration'])
log = FeedLog('2025-01-01', 'OK', 120)
```

The second version communicates more clearly what each value represents.



### **3. Still Memory-Efficient**

Unlike a class (with `__init__`, `__str__`, etc.), a `namedtuple`:

* Is as **lightweight** as a regular tuple
* Has **fixed size** and **immutable**
* Supports **indexing**, **iteration**, and **unpacking** just like `tuple`



### **4. Adds Useful Methods**

`namedtuple` adds:

* `.._fields` → returns a tuple of field names
* `.._asdict()` → returns an `OrderedDict` of field names and values
* `._replace(**kwargs)` → create a copy with one or more fields replaced

```python
log = FeedLog('2025-01-01', 'OK', 120)

print(log._asdict())   # {'date': '2025-01-01', 'status': 'OK', 'duration': 120}
print(log._replace(status='Error'))  # Replace status immutably
```



### Limitations of `namedtuple`

* You **cannot modify** fields once created (just like `tuple`)
* All fields must be passed during creation
* Not ideal if you need methods or mutable behavior → use `dataclass` instead



### Summary Table

| Feature             | `tuple`   | `namedtuple`          |
| ------------------- | --------- | --------------------- |
| Access by name      | ❌ No      | ✅ Yes                 |
| Memory efficiency   | ✅ Yes     | ✅ Yes                 |
| Field documentation | ❌ No      | ✅ Yes (`._fields`)    |
| Field immutability  | ✅ Yes     | ✅ Yes                 |
| Use in real apps    | ❌ Limited | ✅ Clearer, safer data |



In [1]:
from collections import namedtuple
import csv
import pandas as pd

In [2]:
Point = namedtuple('Point', ['x', 'y']) # type: ignore

In [3]:
p = Point(1, 2)

In [4]:
p.x

1

In [5]:
p[0]

1

In [6]:
p[1]

2

In [7]:
p.y

2


You get both:

* **Index access like a tuple**
* **Attribute access like an object**



## Learning Plan: Skill-Building Activities

We'll divide your learning into **four tiers**:

| Tier          | Focus                                             | Dataset                            | Skill Level     |
| ------------- | ------------------------------------------------- | ---------------------------------- | --------------- |
| 1. Basics     | Create and use namedtuples                        | Manual/small list                  | 🟢 Beginner     |
| 2. Iteration  | Parse and store records                           | Simulated machine log              | 🟡 Intermediate |
| 3. Processing | Filtering, sorting, aggregating                   | CSV-like structured data           | 🟠 Advanced     |
| 4. Complex    | Full analysis using `namedtuple` and real dataset | `epson_feeding_365day_dataset.csv` | 🔴 Expert       |



## TIER 1 – Basics of `namedtuple()`

### Skills Covered:

* Creating a namedtuple
* Accessing elements by attribute and index
* Unpacking a namedtuple

### Activity 1.1: Define and Use

```python
from collections import namedtuple

SensorReading = namedtuple('SensorReading', ['timestamp', 'value'])
reading = SensorReading('2025-05-01 08:00', 12.5)

print(f"Timestamp: {reading.timestamp}, Value: {reading.value}")
```

**Challenge**:

* Unpack it with `a, b = reading`
* Try accessing a non-existent field: `reading.temperature` → What error do you get?




## Goal:
Understand what a namedtuple is, how to create one, and how to use its fields.

### TIER 1 – Activities: namedtuple() Basics
#### Skills You’ll Learn:
- Defining a namedtuple type
- Creating an instance
- Accessing elements by name and index
- Unpacking values

### Activity 1.1: Define and Use a Simple namedtuple
#### Task:
Create a SensorReading namedtuple with fields:

- timestamp
- value

In [8]:
SensorReading = namedtuple('SensorReading', ['timestamp', 'value']) # type: ignore 

In [9]:
reading = SensorReading('2025-05-18 14:00', 25.6)

In [10]:
print(f'Timestamp: {reading.timestamp}')

Timestamp: 2025-05-18 14:00


In [11]:
print(f'Value: {reading.value}')

Value: 25.6


### Activity 1.2: Access by index vs Name

In [12]:
# index access
reading[0]

'2025-05-18 14:00'

In [13]:
# name access
reading.timestamp

'2025-05-18 14:00'

### Activity 1.3: Unpack the NamedTuple

In [14]:
timestamp, value = reading

In [15]:
timestamp

'2025-05-18 14:00'

In [16]:
value

25.6

### Activity 1.4: Use asdict()
Turn a namedtuple into a dictionary for easy inspection or logging.

In [17]:
from dataclasses import dataclass, asdict

In [18]:
@dataclass
class SensorReading:
    timestamp: str
    value: float

In [19]:
reading = SensorReading('2025-05-18 14:00', 25.6)

In [20]:
asdict(reading)

{'timestamp': '2025-05-18 14:00', 'value': 25.6}

### Activity 1.5: Replace a Value with replace()
Since namedtuples are immutable, you can't update them directly. Use replace():

In [21]:
from dataclasses import replace

In [22]:
new_reading = replace(reading, value=25.6)

In [23]:
new_reading

SensorReading(timestamp='2025-05-18 14:00', value=25.6)

### Challenge 1.6: Create and Unpack a TemperatureReading

In [24]:
TemperatureReading = namedtuple('TemperatureReading', ['location', 'celsius']) # type: ignore

In [25]:
reading = TemperatureReading('Calgary', 18.7)

In [26]:
location_, temp_c = reading

In [27]:
location_

'Calgary'

In [28]:
temp_f = temp_c * 1.8 + 32

In [29]:
temp_f

65.66

### Challenge 1.7: Use asdict() to log a MaterialStatus

In [30]:
from typing import Optional
@dataclass
class MaterialStatus:
    material: str
    status: str
    code: Optional[str] # this means it can be str or None

In [31]:
status = MaterialStatus('Plastic Film', 'OK', None)

In [32]:
asdict(status)

{'material': 'Plastic Film', 'status': 'OK', 'code': None}

### Quick Note on Optional[str]
```
Optional[str] == Union[str, None]
```

It tells Python (and your linter) that this field can either be:

- A str (like 'E105')
- Or None (missing code)

## TIER 2 – Working with Lists of NamedTuples

### Skills Covered:

* Reading from a list or simulated data
* Storing multiple records in namedtuples
* Filtering and displaying results

### Activity 2.1: Feed Log Simulation

```python
from collections import namedtuple

FeedLog = namedtuple('FeedLog', ['day', 'status', 'duration'])

logs = [
    FeedLog('2025-01-01', 'OK', 120),
    FeedLog('2025-01-02', 'Error', 5),
    FeedLog('2025-01-03', 'OK', 110),
]

# Filter short feeds
short_feeds = [log for log in logs if log.duration < 30]
print(short_feeds)
```

**Challenge**:

* Count the number of 'OK' feeds.
* Sort the logs by duration (hint: `sorted()` and lambda).

In [33]:
FeedLog = namedtuple('FeedLog', ['day', 'status', 'duration']) # type: ignore

In [34]:
logs = [
    FeedLog('2025-01-01', 'OK', 120),
    FeedLog('2025-01-02', 'Error', 5),
    FeedLog('2025-01-03', 'OK', 110),
]

In [35]:
short_feeds = [log for log in logs if log.duration < 30]

In [36]:
short_feeds

[FeedLog(day='2025-01-02', status='Error', duration=5)]

In [37]:
ok_feeds = sum(1 for log in logs if log.status == 'OK')

In [38]:
ok_feeds

2

In [39]:
sorted_duration = sorted(logs, key=lambda log: log.duration)

In [40]:
sorted_duration

[FeedLog(day='2025-01-02', status='Error', duration=5),
 FeedLog(day='2025-01-03', status='OK', duration=110),
 FeedLog(day='2025-01-01', status='OK', duration=120)]

## Bonus Exercises

### Challenge 2.3: Total Feed Time for 'OK' Entries


In [41]:
total_ok_time = sum(log.duration for log in logs if log.status == 'OK')

In [42]:
total_ok_time

230

### Challenge 2.4: Find the Day with the Shortest Feed


In [43]:
shortest_day_with_shortest_feed = min(logs, key=lambda log: log.duration)

In [44]:
shortest_day_with_shortest_feed

FeedLog(day='2025-01-02', status='Error', duration=5)

## TIER 3 – Reading and Processing Realistic CSV Records

### **Skills Covered**:

| Skill                            | Description                                |
| -------------------------------- | ------------------------------------------ |
| Reading CSV rows                 | Load real-like data using the `csv` module |
| Creating `namedtuple` from rows  | Dynamically map rows into namedtuples      |
| Handling optional/missing fields | Convert blank values to `None`             |
| Processing                       | Filtering, summarizing, aggregating        |

In [45]:
Feed = namedtuple('Feed', ['date', 'status', 'duration', 'error']) # type: ignore

In [46]:
with open('feed_sample.csv', newline='') as f:
    reader = csv.reader(f)
    next(reader) # skip header
    
    logs = [
        Feed(date, status, int(duration), error or None)
        for date, status, duration, error in reader
    ]

In [47]:
error_logs = [log for log in logs if log.status == 'Error']

In [48]:
for e in error_logs:
    print(e)

Feed(date='2025-01-02', status='Error', duration=10, error='E105')


## Challenges for Tier 3
### Challenge 3.1: Count how many feeds are successful (OK)

In [49]:
successful_feeds = sum(1 for log in logs if log.status == 'OK')

In [50]:
successful_feeds

2

### Challenge 3.2: Find the average duration of all feeds

In [51]:
average_duration_all_feeds = round(sum(log.duration for log in logs) / len(logs), 2)

In [52]:
average_duration_all_feeds

86.67

### Challenge 3.3: Find the day with the shortest feed

In [53]:
day_with_shortest_feed = min(logs, key=lambda log: log.duration)

In [54]:
day_with_shortest_feed

Feed(date='2025-01-02', status='Error', duration=10, error='E105')

### Challenge 3.4: Total downtime due to errors

In [55]:
error_time = sum(log.duration for log in logs if log.status == 'Error')

In [56]:
error_time

10

## **Tier 1–3 Consolidated Exercises**

### **Exercise 1: Create a namedtuple class**

Define a `namedtuple` named `FeedRecord` with the appropriate fields based on the dataset headers.



In [57]:
data = pd.read_csv('epson_feeding_system_dataset.csv')

In [58]:
df = pd.DataFrame(data)

In [59]:
dataset_headers = list(data.columns)

In [60]:
dataset_headers

['Date',
 'Shift',
 'Batch ID',
 'Printer Model',
 'Paper Size',
 'Machine ID',
 'Feed Motor RPM',
 'Roller Speed (mm/s)',
 'Skew Angle (°)',
 'Encoder Count per Rotation',
 'Pickup Roller Wear (%)',
 'Paper Tray Capacity (sheets)',
 'Paper Thickness (mm)',
 'Jam Detection',
 'Misfeed Detection',
 'Double Feed Detection',
 'Skew Detection',
 'Paper Feed Time (ms)',
 'Feed Accuracy (%)',
 'Sensor Calibration Status',
 'Operator ID',
 'Pass/Fail']

In [61]:
# clean headers --> regex substitution
import re

In [62]:
clean_headers = [re.sub(r'\W|^(?=\d)', '_', col.strip()) for col in dataset_headers]

In [63]:
clean_headers

['Date',
 'Shift',
 'Batch_ID',
 'Printer_Model',
 'Paper_Size',
 'Machine_ID',
 'Feed_Motor_RPM',
 'Roller_Speed__mm_s_',
 'Skew_Angle____',
 'Encoder_Count_per_Rotation',
 'Pickup_Roller_Wear____',
 'Paper_Tray_Capacity__sheets_',
 'Paper_Thickness__mm_',
 'Jam_Detection',
 'Misfeed_Detection',
 'Double_Feed_Detection',
 'Skew_Detection',
 'Paper_Feed_Time__ms_',
 'Feed_Accuracy____',
 'Sensor_Calibration_Status',
 'Operator_ID',
 'Pass_Fail']

In [64]:
FeedRecord = namedtuple('FeedRecord', clean_headers) # type: ignore

### Let’s break down this line of code:

```python
re.sub(r'\W|^(?=\d)', '_', col.strip())
```

You're using it to **sanitize column names** for use in `namedtuple`, and it involves **regex substitution**.



### What This Does

```python
re.sub(r'\W|^(?=\d)', '_', col.strip())
```

| Component     | Explanation                                           |                                              |
| ------------- | ----------------------------------------------------- | -------------------------------------------- |
| `re.sub(...)` | Replaces all matches of the pattern in a string       |                                              |
| \`r'\W        | ^(?=\d)'\`                                            | Regular expression pattern (explained below) |
| `'_'`         | Replacement string (what we use instead of the match) |                                              |
| `col.strip()` | Removes leading/trailing whitespace from column name  |                                              |



### Regex Pattern Breakdown: `r'\W|^(?=\d)'`

Let’s split this into two parts:

#### 1. `\W`

* Matches any **non-word character**
* Word characters are: `A–Z`, `a–z`, `0–9`, and `_`
* So it replaces characters like spaces, dashes, parentheses, `%`, `°`, etc.

Examples:

* `'Batch ID'` → `'Batch_ID'`
* `'Skew Angle (°)'` → `'Skew_Angle___'`


#### 2. `^(?=\d)`

* `^` = beginning of the string
* `(?=\d)` = **lookahead** that checks if the next character is a digit
* So it matches the beginning of a string **only if** the string starts with a number

Example:

* `'123Rate'` → `' _23Rate'` (we replace the start with `_`)

This ensures field names don’t start with a digit, which is **illegal in Python variable names**.



### Example Workflow

```python
original = '  Roller Speed (mm/s)'
cleaned = re.sub(r'\W|^(?=\d)', '_', original.strip())
print(cleaned)  # 'Roller_Speed__mm_s'
```



### Full Usage in Context

```python
clean_headers = [re.sub(r'\W|^(?=\d)', '_', col.strip()) for col in dataset_headers]
```

* Applies the cleanup to each column name
* Makes the names safe for `namedtuple`



### **Exercise 2: Load the CSV into a list of `FeedRecord`**

Skip the header row and convert all `Duration` fields to `float`, and empty `ErrorCode` fields to `None`.

In [65]:
with open('epson_feeding_system_dataset.csv', newline='') as f:
    reader = csv.reader(f)
    next(reader)
    
    records = []
    for row in reader:
        cleaned_row = []
        for h, v in zip(clean_headers, row):
            if h == 'Paper_Feed_Time__ms_':
                cleaned_row.append(float(v) if v else 0.0)
            elif v == '':
                cleaned_row.append(None)
            else:
                cleaned_row.append(v)
        records.append(FeedRecord(*cleaned_row))

In [66]:
records[:2]

[FeedRecord(Date='2025-04-01', Shift='Shift 1', Batch_ID='FEED7674', Printer_Model='EcoTank ET-3850', Paper_Size='Letter', Machine_ID='Feed Line B', Feed_Motor_RPM='201', Roller_Speed__mm_s_='63.3', Skew_Angle____='-0.23', Encoder_Count_per_Rotation='1238', Pickup_Roller_Wear____='76.7', Paper_Tray_Capacity__sheets_='196', Paper_Thickness__mm_='0.15', Jam_Detection='Pass', Misfeed_Detection='Fail', Double_Feed_Detection='Fail', Skew_Detection='Pass', Paper_Feed_Time__ms_=1172.7, Feed_Accuracy____='97.96', Sensor_Calibration_Status='Not Calibrated', Operator_ID='TECH938', Pass_Fail='Pass'),
 FeedRecord(Date='2025-04-01', Shift='Shift 1', Batch_ID='FEED8629', Printer_Model='EcoTank ET-2800', Paper_Size='5x7', Machine_ID='Feed Line B', Feed_Motor_RPM='251', Roller_Speed__mm_s_='53.9', Skew_Angle____='0.87', Encoder_Count_per_Rotation='551', Pickup_Roller_Wear____='35.4', Paper_Tray_Capacity__sheets_='120', Paper_Thickness__mm_='0.122', Jam_Detection='Fail', Misfeed_Detection='Fail', Doubl

### **Exercise 3: Print the total number of records**

Display the total number of records loaded into the list.

In [67]:
len(records)

180

### **Exercise 4: Filter and print all 'Fail' records**

Show all records where `Pass/Fail == 'Fail'`

In [68]:
fail_records = [record for record in records if record.Pass_Fail == 'Fail']

In [69]:
fail_records[:2]

[FeedRecord(Date='2025-04-01', Shift='Shift 2', Batch_ID='FEED4425', Printer_Model='L3210', Paper_Size='5x7', Machine_ID='Feed Line B', Feed_Motor_RPM='296', Roller_Speed__mm_s_='86.6', Skew_Angle____='0.83', Encoder_Count_per_Rotation='669', Pickup_Roller_Wear____='82.3', Paper_Tray_Capacity__sheets_='208', Paper_Thickness__mm_='0.099', Jam_Detection='Fail', Misfeed_Detection='Fail', Double_Feed_Detection='Fail', Skew_Detection='Fail', Paper_Feed_Time__ms_=652.4, Feed_Accuracy____='95.37', Sensor_Calibration_Status='Calibrated', Operator_ID='TECH621', Pass_Fail='Fail'),
 FeedRecord(Date='2025-04-02', Shift='Shift 1', Batch_ID='FEED5458', Printer_Model='L3210', Paper_Size='A4', Machine_ID='Feed Line A', Feed_Motor_RPM='276', Roller_Speed__mm_s_='75.9', Skew_Angle____='0.31', Encoder_Count_per_Rotation='524', Pickup_Roller_Wear____='20.0', Paper_Tray_Capacity__sheets_='241', Paper_Thickness__mm_='0.116', Jam_Detection='Pass', Misfeed_Detection='Pass', Double_Feed_Detection='Pass', Skew_

### **Exercise 5: Count how many feeds failed per shift**

> Count how many `'Pass/Fail' == 'Fail'` records occurred for each `'Shift'`. Print the totals per shift (e.g., Day, Night).


In [70]:
from collections import Counter

feeds_failed_per_shift = Counter(record.Shift for record in fail_records)

In [71]:
feeds_failed_per_shift

Counter({'Shift 2': 41, 'Shift 1': 29})

### **Exercise 6: Calculate the average Paper Feed Time for passed feeds**

> For rows where `'Pass/Fail' == 'Pass'`, calculate the average of `'Paper Feed Time (ms)'`.

In [76]:
passed_feeds = [record.Paper_Feed_Time__ms_ for record in records if record.Pass_Fail == 'Fail']

In [77]:
average_feed_time_passed_feeds = round(sum(passed_feeds) / len(passed_feeds), 2)

In [78]:
average_feed_time_passed_feeds

773.59

### **Exercise 7: Sort the records by Feed Accuracy (%) and show the bottom 10**

> Convert `'Feed Accuracy (%)'` to `float`, sort in ascending order, and print the 10 least accurate feeds.


---



---



---

### ✅ **Exercise 8: Count how many jams occurred**

> Count the number of rows where `'Jam Detection'` is not blank or is `"Yes"`. You may need to check how it's recorded (`'Yes'`, `'1'`, or non-empty string).

---

### ✅ **Exercise 9: Find the operator with the most failed feeds**

> Group records by `'Operator ID'`, and count how many times each one failed (`'Pass/Fail' == 'Fail'`). Print the operator with the highest count.

---

### ✅ **Exercise 10: Create a quality summary report**

> For each `'Printer Model'`, calculate:

* Total number of feeds
* Number of failures
* Average `'Feed Accuracy (%)'`

Structure your summary like:

```python
{
  'Epson X': {'feeds': 25, 'fails': 3, 'avg_accuracy': 98.7},
  'Epson Y': {...}
}
```

---

### 🧠 Tip for All Exercises

Make sure you’re using the **cleaned headers**, like:

```python
'Pass_Fail', 'Paper_Feed_Time__ms_', 'Feed_Accuracy__', 'Operator_ID', 'Printer_Model'
```








## TIER 3 – Loading from CSV and Processing

### 🔹 Skills Covered:

* Reading CSV into namedtuple
* Converting strings and numeric types
* Aggregating and filtering

### Activity 3.1: Feed Summary from CSV

We’ll use a **simulated smaller dataset**, similar to your real Epson file:

**Sample CSV: `feed_sample.csv`**

```csv
Date,Status,Duration,ErrorCode
2025-01-01,OK,130,
2025-01-02,Error,10,E105
2025-01-03,OK,120,
```

```python
import csv
from collections import namedtuple

Feed = namedtuple('Feed', ['date', 'status', 'duration', 'error'])

with open('feed_sample.csv') as f:
    reader = csv.reader(f)
    next(reader)
    data = [Feed(date, status, int(duration), error or None) for date, status, duration, error in reader]

errors = [entry for entry in data if entry.status == 'Error']
print("Error Entries:", errors)
```

**Challenge**:

* Find average duration of successful feeds.
* Get the day with the shortest feed duration.



## TIER 4 – Real Dataset: `epson_feeding_365day_dataset.csv`

### Skills Covered:

* Working with 365 records
* Performance considerations
* Advanced filtering and reporting

### Activity 4.1: Load and Analyze

```python
import csv
from collections import namedtuple

Feeding = namedtuple('Feeding', ['date', 'shift', 'status', 'duration', 'material', 'error_code'])

with open('/mnt/data/epson_feeding_365day_dataset.csv') as f:
    reader = csv.reader(f)
    next(reader)
    records = [
        Feeding(date, shift, status, int(duration), material, error_code or None)
        for date, shift, status, duration, material, error_code in reader
    ]
```

**Advanced Challenges**:

1. 🔍 How many errors occurred per shift?
2. 🧮 What is the average duration per material type?
3. 🗓️ What day had the highest number of errors?
4. 📊 Generate a summary per month: total feeds, average duration, % errors.



## Common Pitfalls and How to Solve Them

| Problem                                       | Cause                                      | Solution                                |
| --------------------------------------------- | ------------------------------------------ | --------------------------------------- |
| `TypeError: Feed() takes exactly X arguments` | Wrong number of fields passed              | Double-check field count in CSV         |
| `'tuple' object has no attribute`             | Used plain `tuple` instead of `namedtuple` | Ensure you're using the named version   |
| Empty field in CSV                            | Missing conversion                         | Use `value or None` for optional fields |
| Can't modify field                            | `namedtuple` is immutable                  | Use `_replace()` method                 |

