# Notebook 1: Introduction to Serializable Dictionaries in Python

Welcome! This notebook introduces the fundamental concept of dictionary serialization in Python.

Dictionaries are incredibly versatile, but they exist only in memory while your script runs. Serialization lets us convert these dictionaries into formats that can be saved to files or sent across networks, allowing for data persistence, configuration management, and much more.

**Learning Objectives:**
*   Understand what serialization and deserialization mean.
*   Learn why serializing dictionaries is useful.
*   Explore common serialization formats: JSON, Pickle, YAML, and MessagePack.
*   See basic examples of serializing and deserializing dictionaries using `json` and `pickle`.

## Part 1: Understanding Dictionary Serialization in Python

Serialization is the process of converting a data object (like a Python dictionary) into a format that can be stored (e.g., in a file) or transmitted (e.g., over a network). Deserialization is the reverse process: reconstructing the original object from the serialized format.

This is essential when you need to:

*   Save application state to disk
*   Transfer data between different systems
*   Cache computation results
*   Store configuration settings
*   Exchange data between different programming languages

Python dictionaries map naturally to common formats like JSON, but handling nested structures or complex data types requires specific approaches.

### Why Serialize Dictionaries?

In-memory dictionaries vanish when your program ends. Serialization gives them permanence.

1.  **Data Persistence**: Save dictionaries to files for later retrieval.
2.  **Data Exchange**: Transfer dictionary data between systems or applications.
3.  **Caching**: Store results of heavy computations to speed up future requests.
4.  **Configuration Management**: Easily save and load application settings.
5.  **API Communication**: Send and receive structured data (often as JSON) via web APIs.

### Common Serialization Formats

Python offers several libraries for serialization. Let's look at the most common ones.

**1. JSON (JavaScript Object Notation)**

JSON is lightweight, human-readable, and excellent for basic Python types (strings, numbers, lists, bools, nulls, nested dicts/lists). It's the standard for web APIs.

*Limitations:* Cannot directly handle complex types like sets, custom objects, dates, or binary data.

In [None]:
import json
from pprint import pprint # For pretty printing output

# Our sample dictionary
my_dict = {'name': 'Alice', 'age': 30, 'city': 'New York', 'is_student': False, 'courses': None, 'skills': ['Python', 'Data Science']}

print("Original Dictionary:")
pprint(my_dict)

# --- Serialization (Dictionary to JSON string) ---
# Use dumps() to serialize to a string
# `indent=2` makes the output human-readable
json_string = json.dumps(my_dict, indent=2)

print("\nSerialized JSON String:")
print(json_string)
print(f"Type of serialized data: {type(json_string)}")

# --- Deserialization (JSON string to Dictionary) ---
# Use loads() to deserialize from a string
reconstructed_dict = json.loads(json_string)

print("\nReconstructed Dictionary:")
pprint(reconstructed_dict)
print(f"Type of reconstructed data: {type(reconstructed_dict)}")

# Verify it's the same
assert my_dict == reconstructed_dict

**2. Pickle**

Pickle is Python's native serialization format. It can handle almost *any* Python object, including custom classes, functions (with caveats), sets, dates, etc.

*Limitations:* The output is binary (not human-readable) and Python-specific. **Crucially, unpickling data from untrusted sources is a security risk** because it can execute arbitrary code.

In [None]:
import pickle
from datetime import datetime

# A more complex dictionary
complex_dict = {
    'data': [1, 2, 3],
    'metadata': {'source': 'sensor', 'timestamp': datetime.now()},
    'unique_ids': {101, 102, 103} # A set - JSON cannot handle this directly
}

print("Original Complex Dictionary:")
pprint(complex_dict)

# --- Serialization (Dictionary to bytes) ---
# Use dumps() to serialize to a bytes object
serialized_pickle = pickle.dumps(complex_dict)

print("\nSerialized Pickle Data (bytes):")
# print(serialized_pickle) # This is binary, not very readable
print(f"Type of serialized data: {type(serialized_pickle)}")
print(f"Size of serialized data: {len(serialized_pickle)} bytes")

# --- Deserialization (bytes to Dictionary) ---
# Use loads() to deserialize from bytes
deserialized_dict = pickle.loads(serialized_pickle)

print("\nReconstructed Dictionary:")
pprint(deserialized_dict)
print(f"Type of reconstructed data: {type(deserialized_dict)}")

# Verify it's the same
assert complex_dict == deserialized_dict

**3. YAML (YAML Ain't Markup Language)**

YAML is often considered more human-readable than JSON. It supports comments, anchors/aliases (for repeated structures), and more complex data types.

*Requires Installation:* `pip install pyyaml`

In [None]:
pip install pyyaml

In [None]:
# Note: You need to install PyYAML first: pip install pyyaml
try:
    import yaml
    
    config_dict = {
        'server': {
            'host': 'localhost', 
            'port': 8080
        }, 
        'debug': True, 
        'features': ['auth', 'logging']
    }
    
    print("Original Config Dictionary:")
    pprint(config_dict)
    
    # --- Serialization (Dictionary to YAML string) ---
    # Use dump() to serialize to a string (default_flow_style=False makes it block style)
    yaml_string = yaml.dump(config_dict, default_flow_style=False)
    
    print("\nSerialized YAML String:")
    print(yaml_string)
    print(f"Type of serialized data: {type(yaml_string)}")
    
    # --- Deserialization (YAML string to Dictionary) ---
    # Use safe_load() for untrusted input (important!)
    loaded_config = yaml.safe_load(yaml_string)
    
    print("\nReconstructed Dictionary:")
    pprint(loaded_config)
    print(f"Type of reconstructed data: {type(loaded_config)}")
    
    assert config_dict == loaded_config
    
except ImportError:
    print("PyYAML not installed. Run 'pip install pyyaml' to run this cell.")

**4. MessagePack**

MessagePack is a binary serialization format like Pickle, but designed to be faster and more compact than JSON. It's often used for performance-critical applications or inter-process communication.

*Requires Installation:* `pip install msgpack-python`

In [None]:
pip install msgpack-python

In [None]:
# Note: You need to install MessagePack first: pip install msgpack-python
try:
    import msgpack
    
    data_dict = {'values': [1, 2, 3, 4, 5], 'metadata': {'source': 'sensor', 'active': True}}
    
    print("Original Data Dictionary:")
    pprint(data_dict)
    
    # --- Serialization (Dictionary to bytes) ---
    # Use packb() to serialize to bytes
    packed_data = msgpack.packb(data_dict)
    
    print("\nSerialized MessagePack Data (bytes):")
    # print(packed_data) # Binary, not very readable
    print(f"Type of serialized data: {type(packed_data)}")
    print(f"Size of serialized data: {len(packed_data)} bytes")
    
    # --- Deserialization (bytes to Dictionary) ---
    # Use unpackb() to deserialize from bytes
    # raw=False ensures strings are decoded to Python strings (default)
    unpacked_data = msgpack.unpackb(packed_data, raw=False)
    
    print("\nReconstructed Dictionary:")
    pprint(unpacked_data)
    print(f"Type of reconstructed data: {type(unpacked_data)}")

    assert data_dict == unpacked_data

except ImportError:
    print("msgpack not installed. Run 'pip install msgpack-python' to run this cell.")

### Summary

We've seen the basics of serializing Python dictionaries using common formats:

*   **JSON**: Human-readable, great for web, limited types.
*   **Pickle**: Python-specific, handles almost anything, binary, potential security risks.
*   **YAML**: Very readable, good for config, requires install.
*   **MessagePack**: Fast, compact, binary, requires install.

Choosing the right format depends on your needs: readability, compatibility, performance, and the types of data in your dictionaries.

**Next Steps:** In the next notebook, we'll dive into handling more complex scenarios like custom objects and optimizing performance.