# Understanding defaultdict in Python: An Interactive Guide

## Introduction
The `defaultdict` is a subclass of Python's built-in `dict` class that provides a special feature: if you try to access or modify a key that doesn't exist, it automatically creates the key with a default value. This makes it extremely useful for certain programming patterns.

First, let's import defaultdict:
```python
from collections import defaultdict
```

## 1. Basic Usage and Comparison with Regular Dict

### Regular dict behavior:
```python
# Regular dict
regular_dict = {}
try:
    regular_dict['missing_key'] += 1
except KeyError:
    print("KeyError: key doesn't exist!")
```
Output: `KeyError: key doesn't exist!`

### defaultdict behavior:
```python
# defaultdict with int as default_factory
d = defaultdict(int)
d['missing_key'] += 1
print(d['missing_key'])  # Output: 1
```

ðŸ¤” **Why does this work?**
When you access a non-existent key in a defaultdict, it:
1. Calls the default_factory function (int() in this case)
2. Uses that value (0 for int()) as the default
3. Inserts it into the dictionary
4. Returns the default value

## 2. Common Use Cases

### 2.1 Counting Elements
```python
# Count occurrences of words in a text
text = "the quick brown fox jumps over the lazy dog"
word_count = defaultdict(int)

for word in text.split():
    word_count[word] += 1

print(dict(word_count))
```
Output: `{'the': 2, 'quick': 1, 'brown': 1, 'fox': 1, 'jumps': 1, 'over': 1, 'lazy': 1, 'dog': 1}`

### 2.2 Grouping Items
```python
# Group animals by their first letter
animals = ['cat', 'dog', 'crow', 'deer', 'bear']
animals_by_letter = defaultdict(list)

for animal in animals:
    animals_by_letter[animal[0]].append(animal)

print(dict(animals_by_letter))
```
Output: `{'c': ['cat', 'crow'], 'd': ['dog', 'deer'], 'b': ['bear']}`

### 2.3 Nested defaultdicts
```python
# Create a nested structure for storing city populations by country and state
locations = defaultdict(lambda: defaultdict(list))
locations['USA']['California'].append('Los Angeles')
locations['USA']['California'].append('San Francisco')
locations['Canada']['Ontario'].append('Toronto')

# The raw output shows the defaultdict structure:
print(dict(locations))
# Output: {'USA': defaultdict(<class 'list'>, {'California': ['Los Angeles', 'San Francisco']}), 
#          'Canada': defaultdict(<class 'list'>, {'Ontario': ['Toronto']})}

# To get a pure nested dictionary, you need to convert each level:
pure_dict = {
    country: dict(states)
    for country, states in locations.items()
}
print(pure_dict)
# Output: {'USA': {'California': ['Los Angeles', 'San Francisco']}, 
#          'Canada': {'Ontario': ['Toronto']}}
```

## Best Practices and Tips

1. **Choose the Right Default Factory**:
   - `int`: for counting
   - `list`: for grouping items
   - `set`: for unique collections
   - `lambda`: for custom defaults or nested structures

2. **Converting to Regular Dict**:
   - Use `dict(defaultdict_obj)` when you need to serialize or print
   - Useful when the default values are no longer needed

3. **Memory Considerations**:
   - defaultdict keeps all created keys in memory
   - Consider using regular dict if you need to check for key existence

4. **Type Hints**:
```python
from typing import DefaultDict
counts: DefaultDict[str, int] = defaultdict(int)
```



## 3. Practice Problems

### Problem 1: Character Counter
Write a function that counts the occurrence of each character in a string using defaultdict.



In [None]:
from collections import defaultdict


In [4]:
def count_characters(s):
    counting_dict = defaultdict(int)
    for letter in s:
        counting_dict[letter] +=1
    return counting_dict        


# Test your function
print(count_characters("hello world"))
# Expected output: {'h': 1, 'e': 1, 'l': 3, 'o': 2, ' ': 1, 'w': 1, 'r': 1, 'd': 1}

defaultdict(<class 'int'>, {'h': 1, 'e': 1, 'l': 3, 'o': 2, ' ': 1, 'w': 1, 'r': 1, 'd': 1})



### Problem 2: Group by Length
Write a function that groups words by their length using defaultdict.

In [6]:
def group_by_length(words):
    grouping_dict = defaultdict(list)
    for word in words:
        grouping_dict[len(word)].append(word)
    return grouping_dict

# Test your function
words = ['cat', 'dog', 'elephant', 'rat', 'giraffe', 'pig']
print(group_by_length(words))
# Expected output: {3: ['cat', 'dog', 'rat', 'pig'], 7: ['giraffe'], 8: ['elephant']}

defaultdict(<class 'list'>, {3: ['cat', 'dog', 'rat', 'pig'], 8: ['elephant'], 7: ['giraffe']})



### Problem 3: Building a Simple Graph
Create a function that builds an adjacency list representation of a graph using defaultdict.


In [8]:
def build_graph(edges):
    graph = defaultdict(set)
    for node, edge in edges:
        graph[node].add(edge)
        graph[edge].add(node)
    return graph

# Test your function
edges = [(1, 2), (1, 3), (2, 4), (3, 4)]
print(build_graph(edges))
# Expected output: {1: [2, 3], 2: [4], 3: [4], 4: []}

defaultdict(<class 'set'>, {1: {2, 3}, 2: {1, 4}, 3: {1, 4}, 4: {2, 3}})



## 5. Advanced Challenge

Create a function that processes a list of transactions and returns a nested defaultdict structure showing:
- Total amount spent per customer
- List of items bought per customer
- Average transaction value per customer

Sample data:

Try to solve this yourself before looking at the solution!



In [19]:
transactions = [
    ('customer1', 'apple', 0.5),
    ('customer1', 'banana', 0.3),
    ('customer2', 'apple', 0.5),
    ('customer1', 'orange', 0.6)
]

transaction_dict = defaultdict(lambda: {"items": [],"transaction_value": [],"average_transaction_value": 0,"total_spend": 0} )

for customer, item, cost in transactions:
    transaction_dict[customer]["items"].append(item)
    transaction_dict[customer]["transaction_value"].append(cost)


for customer, details in transaction_dict.items():
    transaction_dict[customer]["total_spend"] = sum(details["transaction_value"])
    transaction_dict[customer]["average_transaction_value"] = sum(details["transaction_value"])/len(details["transaction_value"])

    

print(transaction_dict)

defaultdict(<function <lambda> at 0x106a86c00>, {'customer1': {'items': ['apple', 'banana', 'orange'], 'transaction_value': [0.5, 0.3, 0.6], 'average_transaction_value': 0.4666666666666666, 'total_spend': 1.4}, 'customer2': {'items': ['apple'], 'transaction_value': [0.5], 'average_transaction_value': 0.5, 'total_spend': 0.5}})


## 4. Solutions

<details>
<summary>Click to see solutions</summary>

### Solution 1: Character Counter
```python
def count_characters(s):
    char_count = defaultdict(int)
    for char in s:
        char_count[char] += 1
    return dict(char_count)
```

### Solution 2: Group by Length
```python
def group_by_length(words):
    groups = defaultdict(list)
    for word in words:
        groups[len(word)].append(word)
    return dict(groups)
```

### Solution 3: Building a Simple Graph
```python
def build_graph(edges):
    graph = defaultdict(list)
    for src, dest in edges:
        graph[src].append(dest)
    return dict(graph)
```

```python
def analyze_transactions(transactions):
    # Create nested defaultdict structure
    customer_data = defaultdict(lambda: {
        'total_spent': 0,
        'items': defaultdict(int),
        'transaction_count': 0
    })
    
    # Process transactions
    for customer, item, amount in transactions:
        customer_data[customer]['total_spent'] += amount
        customer_data[customer]['items'][item] += 1
        customer_data[customer]['transaction_count'] += 1
    
    # Calculate averages and convert to regular dict
    result = {}
    for customer, data in customer_data.items():
        result[customer] = {
            'total_spent': data['total_spent'],
            'items': dict(data['items']),
            'average_transaction': data['total_spent'] / data['transaction_count']
        }
    
    return result
```

</details>


In [3]:
from collections import defaultdict


# Create a nested structure for storing city populations by country and state
locations = defaultdict(lambda: defaultdict(list))
locations['USA']['California'].append('Los Angeles')
locations['USA']['California'].append('San Francisco')
locations['Canada']['Ontario'].append('Toronto')

print(dict(locations))


{'USA': defaultdict(<class 'list'>, {'California': ['Los Angeles', 'San Francisco']}), 'Canada': defaultdict(<class 'list'>, {'Ontario': ['Toronto']})}
