# Part 1: Introduction to defaultdict()
### What is defaultdict()?
defaultdict is a subclass of Python’s built-in dict. It overrides one behavior: when accessing a missing key, instead of raising a KeyError, it creates a default value using a factory function (e.g., int, list, set).

In [155]:
import csv
from collections import defaultdict
from pprint import pprint

In [156]:
dd = defaultdict(list)

In [157]:
dd['a'].append(1)

In [158]:
dd

defaultdict(list, {'a': [1]})

In [159]:
dict(dd)

{'a': [1]}

# Skill-Building Activities
## Level 1: Basic Mechanics
### 1.1 Using int as a default factory

Skill: Automatically initializing numeric counters

Challenge: Forgetting that defaultdict doesn't retroactively convert existing keys.

In [160]:
word_counts = defaultdict(int)
sentence = 'solar panel assembly is efficient solar energy source'

In [161]:
for word in sentence.split():
    word_counts[word] += 1

In [162]:
dict(word_counts)

{'solar': 2,
 'panel': 1,
 'assembly': 1,
 'is': 1,
 'efficient': 1,
 'energy': 1,
 'source': 1}

###  1.2 Using list to group values
Skill: Grouping items into categories

Challenge: Accidentally using regular dict → KeyError.

In [163]:
groups = defaultdict(list)

In [164]:
data = [('Line A', 'Operator1'), ('Line B', 'Operator2'), ('Line A', 'Operator3')]

In [165]:
for line, operator in data:
    groups[line].append(operator)

In [166]:
groups

defaultdict(list,
            {'Line A': ['Operator1', 'Operator3'], 'Line B': ['Operator2']})

In [167]:
dict(groups)

{'Line A': ['Operator1', 'Operator3'], 'Line B': ['Operator2']}

###  1.3 Using set to avoid duplicates

In [168]:
unique_parts = defaultdict(set)

In [169]:
records = [('Module A', 'Frame'), ('Module A', 'Cell'), ('Module A', 'Cell')]

In [170]:
for module, parts in records:
    unique_parts[module].add(parts)

In [171]:
unique_parts

defaultdict(set, {'Module A': {'Cell', 'Frame'}})

In [172]:
dict(unique_parts)

{'Module A': {'Cell', 'Frame'}}

## Level 2: Intermediate Usage
### 2.1 Nested defaultdict (factory returning another defaultdict)
Skill: Multi-level grouping (e.g., Assembly Line ➜ Component ➜ Count)

In [173]:
def nested_dict():
    return defaultdict(int)

In [174]:
inventory = defaultdict(nested_dict)

In [175]:
inventory['Line A']['Frame'] += 10
inventory['Line A']['Cell'] += 5
inventory['Line B']['Glass'] += 3

In [176]:
inventory

defaultdict(<function __main__.nested_dict()>,
            {'Line A': defaultdict(int, {'Frame': 10, 'Cell': 5}),
             'Line B': defaultdict(int, {'Glass': 3})})

In [177]:
dict(inventory)

{'Line A': defaultdict(int, {'Frame': 10, 'Cell': 5}),
 'Line B': defaultdict(int, {'Glass': 3})}

In [178]:
for line, components in inventory.items():
    for component, count in components.items():
        print(f'{line} --> {component} : {count}')
        

Line A --> Frame : 10
Line A --> Cell : 5
Line B --> Glass : 3


### 2.2 Aggregating numerical values by category

In [179]:
assembly_times = [
    ('Line A', 45), ('Line A', 55), ('Line B', 30), ('Line A', 40)
]

In [180]:
totals = defaultdict(int)

In [181]:
counts = defaultdict(int)

In [182]:
for line, time in assembly_times:
    totals[line] += time
    counts[line] += 1

In [183]:
averages = {line: round(totals[line] / counts[line], 2) for line in totals}

In [184]:
averages

{'Line A': 46.67, 'Line B': 30.0}

# Level 3: Advanced Use Cases for defaultdict()

In [185]:
import pandas as pd
data = pd.read_csv('solar_panel_assembly_dataset.csv')

In [186]:
df = pd.DataFrame(data)

In [187]:
list(df.columns)

['Date',
 'Shift',
 'Panel Serial No.',
 'Panel Type',
 'Assembly Line',
 'Cycle Time (s)',
 'Number of Cells',
 'Cell Alignment Deviation (mm)',
 'Glass Thickness (mm)',
 'Junction Box Attached',
 'Power Output (W)',
 'Efficiency (%)',
 'Insulation Resistance (MΩ)',
 'Flash Test Result',
 'Visual Inspection',
 'Operator ID',
 'Final Inspection']

### 3.1: Total Panels Assembled Per Operator
Goal: Count how many solar panels each operator assembled.

In [188]:
import csv
panel_counts = defaultdict(int)

In [189]:
with open('solar_panel_assembly_dataset.csv', newline='') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        operator = row['Operator ID']
        panel_counts[operator] += 1
        

In [190]:
for op, count in panel_counts.items():
    print(f'Operator: {op} --> {count} panels')

Operator: EMP593 --> 1 panels
Operator: EMP448 --> 1 panels
Operator: EMP993 --> 1 panels
Operator: EMP533 --> 1 panels
Operator: EMP415 --> 1 panels
Operator: EMP358 --> 1 panels
Operator: EMP795 --> 1 panels
Operator: EMP538 --> 2 panels
Operator: EMP138 --> 1 panels
Operator: EMP554 --> 1 panels
Operator: EMP973 --> 1 panels
Operator: EMP727 --> 2 panels
Operator: EMP508 --> 1 panels
Operator: EMP708 --> 2 panels
Operator: EMP476 --> 1 panels
Operator: EMP519 --> 1 panels
Operator: EMP713 --> 1 panels
Operator: EMP532 --> 1 panels
Operator: EMP164 --> 1 panels
Operator: EMP909 --> 1 panels
Operator: EMP305 --> 1 panels
Operator: EMP132 --> 1 panels
Operator: EMP497 --> 1 panels
Operator: EMP402 --> 1 panels
Operator: EMP794 --> 1 panels
Operator: EMP762 --> 1 panels
Operator: EMP813 --> 1 panels
Operator: EMP695 --> 1 panels
Operator: EMP349 --> 1 panels
Operator: EMP705 --> 1 panels
Operator: EMP211 --> 1 panels
Operator: EMP656 --> 1 panels
Operator: EMP528 --> 1 panels
Operator: 

###  3.2: Average Efficiency Per Panel Type
Goal: Group by 'Panel Type' and calculate average 'Efficiency (%)'.

In [191]:
efficiencies = defaultdict(list)

In [192]:
with open('solar_panel_assembly_dataset.csv', newline='') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        panel_type = row['Panel Type']
        efficiency = float(row['Efficiency (%)'])
        efficiencies[panel_type].append(efficiency)

In [193]:
average_efficiency = {
    ptype: round(sum(vals)/len(vals), 2)
    for ptype, vals in efficiencies.items()
}

In [195]:
for ptype, avg in average_efficiency.items():
    print(f'{ptype} --> Average Efficiency: {avg}%')

Thin-Film --> Average Efficiency: 18.37%
Polycrystalline --> Average Efficiency: 19.06%
Monocrystalline --> Average Efficiency: 18.6%
