# Python Pickle Tutorial: Serializing Python Objects

This notebook covers:
1. **What is Pickle?** - Understanding serialization
2. **Basic Usage** - Saving and loading objects
3. **Real-World Use Cases** - Practical examples
4. **Best Practices & Security** - Safe usage guidelines

## üì¶ What is Pickle?

**Pickle** is Python's built-in module for **serialization** - converting Python objects into a byte stream that can be:
- Saved to a file
- Sent over a network
- Stored in a database

Later, you can **deserialize** (unpickle) the byte stream back into the original Python object.

In [1]:
import pickle
import os

print("‚úÖ Pickle module imported!")
print(f"üìä Pickle protocol version: {pickle.HIGHEST_PROTOCOL}")

‚úÖ Pickle module imported!
üìä Pickle protocol version: 5


---
## üî∞ Use Case 1: Saving Simple Python Objects

Save and load basic Python types: lists, dictionaries, tuples, etc.

In [15]:
# Create a directory for our pickle files
PICKLE_DIR = "pickle_examples"
os.makedirs(PICKLE_DIR, exist_ok=True)

# Example: Save a dictionary
user_data = {
    "name": "Sujith",
    "age": 30,
    "skills": ["Python", "ML", "Data Science"],
    "scores": {"math": 95, "science": 88}
}

# Save to pickle file
pickle_path = os.path.join(PICKLE_DIR, "user_data.pkl")
with open(pickle_path, "wb") as f:  # 'wb' = write binary
    pickle.dump(user_data, f)

print(f"‚úÖ Saved dictionary to: {os.path.abspath(pickle_path)}")
print(f"üìä File size: {os.path.getsize(pickle_path)} bytes")

‚úÖ Saved dictionary to: /home/sujith/github/rag/airflow_mlflow_kubeflow/00_mlflow/mlflow_usecases/phase2/notebook/pickle_examples/user_data.pkl
üìä File size: 116 bytes


In [3]:
# Load from pickle file
with open(pickle_path, "rb") as f:  # 'rb' = read binary
    loaded_data = pickle.load(f)

print("‚úÖ Loaded data:")
print(f"   Name: {loaded_data['name']}")
print(f"   Skills: {loaded_data['skills']}")
print(f"   Original == Loaded: {user_data == loaded_data}")

‚úÖ Loaded data:
   Name: Sujith
   Skills: ['Python', 'ML', 'Data Science']
   Original == Loaded: True


---
## ü§ñ Use Case 2: Saving Machine Learning Models

This is one of the most common uses of pickle - saving trained ML models!

In [4]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Train a simple model
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(
    iris.data, iris.target, test_size=0.2, random_state=42
)

model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

accuracy = model.score(X_test, y_test)
print(f"‚úÖ Model trained! Accuracy: {accuracy:.4f}")

‚úÖ Model trained! Accuracy: 1.0000


In [5]:
# Save the trained model
model_path = os.path.join(PICKLE_DIR, "trained_model.pkl")
with open(model_path, "wb") as f:
    pickle.dump(model, f)

print(f"‚úÖ Model saved to: {os.path.abspath(model_path)}")
print(f"üìä File size: {os.path.getsize(model_path) / 1024:.2f} KB")

‚úÖ Model saved to: /home/sujith/github/rag/airflow_mlflow_kubeflow/00_mlflow/mlflow_usecases/phase2/notebook/pickle_examples/trained_model.pkl
üìä File size: 173.08 KB


In [6]:
# Load the model and use it
with open(model_path, "rb") as f:
    loaded_model = pickle.load(f)

# Make predictions with loaded model
loaded_accuracy = loaded_model.score(X_test, y_test)
print(f"‚úÖ Loaded model accuracy: {loaded_accuracy:.4f}")
print(f"üìä Same as original: {accuracy == loaded_accuracy}")

‚úÖ Loaded model accuracy: 1.0000
üìä Same as original: True


---
## üîß Use Case 3: Saving Custom Classes

Pickle can serialize custom Python classes with their attributes and methods.

In [7]:
class MLExperiment:
    """A custom class to store ML experiment results"""
    
    def __init__(self, name, accuracy, params):
        self.name = name
        self.accuracy = accuracy
        self.params = params
        self.history = []
    
    def add_result(self, epoch, loss):
        self.history.append({"epoch": epoch, "loss": loss})
    
    def summary(self):
        return f"Experiment: {self.name}, Accuracy: {self.accuracy:.4f}"

# Create an experiment
exp = MLExperiment(
    name="RandomForest_v1",
    accuracy=0.95,
    params={"n_estimators": 100, "max_depth": 10}
)
exp.add_result(1, 0.5)
exp.add_result(2, 0.3)
exp.add_result(3, 0.1)

print(exp.summary())
print(f"History: {exp.history}")

Experiment: RandomForest_v1, Accuracy: 0.9500
History: [{'epoch': 1, 'loss': 0.5}, {'epoch': 2, 'loss': 0.3}, {'epoch': 3, 'loss': 0.1}]


In [8]:
# Save the custom object
exp_path = os.path.join(PICKLE_DIR, "experiment.pkl")
with open(exp_path, "wb") as f:
    pickle.dump(exp, f)

# Load it back
with open(exp_path, "rb") as f:
    loaded_exp = pickle.load(f)

print(f"‚úÖ Loaded: {loaded_exp.summary()}")
print(f"üìä History preserved: {loaded_exp.history}")
print(f"üìä Params preserved: {loaded_exp.params}")

‚úÖ Loaded: Experiment: RandomForest_v1, Accuracy: 0.9500
üìä History preserved: [{'epoch': 1, 'loss': 0.5}, {'epoch': 2, 'loss': 0.3}, {'epoch': 3, 'loss': 0.1}]
üìä Params preserved: {'n_estimators': 100, 'max_depth': 10}


---
## üì¶ Use Case 4: Saving Multiple Objects

You can save multiple objects to a single pickle file.

In [9]:
# Create multiple objects
config = {"learning_rate": 0.001, "epochs": 100}
results = [0.8, 0.85, 0.9, 0.92, 0.95]
metadata = ("experiment_1", "2026-01-10")

# Save all objects together
multi_path = os.path.join(PICKLE_DIR, "multiple_objects.pkl")
with open(multi_path, "wb") as f:
    pickle.dump(config, f)
    pickle.dump(results, f)
    pickle.dump(metadata, f)

print("‚úÖ Saved 3 objects to single file")

‚úÖ Saved 3 objects to single file


In [10]:
# Load multiple objects (in same order)
with open(multi_path, "rb") as f:
    loaded_config = pickle.load(f)
    loaded_results = pickle.load(f)
    loaded_metadata = pickle.load(f)

print(f"Config: {loaded_config}")
print(f"Results: {loaded_results}")
print(f"Metadata: {loaded_metadata}")

Config: {'learning_rate': 0.001, 'epochs': 100}
Results: [0.8, 0.85, 0.9, 0.92, 0.95]
Metadata: ('experiment_1', '2026-01-10')


---
## üîÑ Use Case 5: Using pickle.dumps() and pickle.loads()

These functions work with **bytes** instead of files - useful for sending over networks.

In [11]:
# Convert object to bytes (no file needed)
data = {"message": "Hello, Pickle!", "values": [1, 2, 3]}

# Serialize to bytes
pickled_bytes = pickle.dumps(data)
print(f"üìä Type: {type(pickled_bytes)}")
print(f"üìä Size: {len(pickled_bytes)} bytes")
print(f"üìä First 50 bytes: {pickled_bytes[:50]}")

üìä Type: <class 'bytes'>
üìä Size: 62 bytes
üìä First 50 bytes: b'\x80\x04\x953\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x07message\x94\x8c\x0eHello, Pickle!\x94\x8c\x06values\x94'


In [12]:
# Deserialize from bytes
unpickled_data = pickle.loads(pickled_bytes)
print(f"‚úÖ Recovered: {unpickled_data}")

‚úÖ Recovered: {'message': 'Hello, Pickle!', 'values': [1, 2, 3]}


---
## ‚ö†Ô∏è Security Warning

**NEVER unpickle data from untrusted sources!**

Pickle can execute arbitrary code during deserialization.

In [13]:
# Best practices:
print("""‚ö†Ô∏è PICKLE SECURITY BEST PRACTICES:

1. ‚ùå Never load pickle files from untrusted sources
2. ‚úÖ Only unpickle data you created yourself
3. ‚úÖ For sharing data, use JSON or other safe formats
4. ‚úÖ Use pickle only for local caching/storage
5. ‚úÖ Consider using joblib for large NumPy arrays
""")

‚ö†Ô∏è PICKLE SECURITY BEST PRACTICES:

1. ‚ùå Never load pickle files from untrusted sources
2. ‚úÖ Only unpickle data you created yourself
3. ‚úÖ For sharing data, use JSON or other safe formats
4. ‚úÖ Use pickle only for local caching/storage
5. ‚úÖ Consider using joblib for large NumPy arrays



---
## üßπ Cleanup

In [16]:
import shutil

# Remove the pickle examples directory
shutil.rmtree(PICKLE_DIR, ignore_errors=True)
print(f"‚úÖ Cleaned up {PICKLE_DIR}/")

‚úÖ Cleaned up pickle_examples/


---
## üìù Summary

| Function | Description |
|----------|-------------|
| `pickle.dump(obj, file)` | Save object to file |
| `pickle.load(file)` | Load object from file |
| `pickle.dumps(obj)` | Convert object to bytes |
| `pickle.loads(bytes)` | Convert bytes to object |

### Common Use Cases:
- ü§ñ Saving trained ML models
- üíæ Caching expensive computations
- üìä Storing experiment configurations
- üîÑ Session state persistence