# Working with Pickled Data

Pickled (`.pkl`) data in Python refers to a serialized format used to store Python objects in a binary file. 

The pickle module allows complex data structures—like lists, dictionaries, tuples, classes, and even functions—to be saved and later restored exactly as they were. 

Unlike formats like JSON or CSV, Pickle preserves Python-specific types and object hierarchies. It's ideal for saving machine learning models, session states, or any Python-native data so if you work in the data science field for long enough, you're bound to come across this data type. 

**Please note, Pickle files are not human-readable and should never be loaded from untrusted sources, as they can execute arbitrary code during deserialization.**

I've included some `.pkl` datasets in the `./data` folder and will be using them for this tutorial, but I'm just a guy on the internet, so I'd wholeheartedly encourage you to generate some synthetic `.pkl` data of your own to work with for this workbook. It's just a good habit to be in.

In [4]:
# Our ~/data repo has 4 .pkl files, representing some of the common types of data stored in this format:
    # data_dict_of_lists.pkl
    # data_list_of_dicts.pkl
    # data_tuple_of_tuples.pkl
    # data_structures.pkl

# We'll cover data_structures.pkl at the end of the tutorial as the only difference there is that you'll be importing multiple types of data.
# This is going to be a rarer occurance in your world most likely, but still something you may come across and ought to be prepared to unpack.

import pickle
import pandas as pd


# Uncomment the type of file that you'd like to work with
file_path = "../data/data_list_of_dicts.pkl"
# file_path = "../data/data_tuple_of_tuples.pkl"
# file_path = "../data/data_dict_of_lists.pkl"

with open(file_path, "rb") as f:
    file = pickle.load(f)

df = pd.DataFrame(file)

# Print the first few rows of each DataFrame
print("Resultant DataFrame created from Pickle file:")
print(df.head())


Resultant DataFrame created from Pickle file:
               search_query        time platform
0     vacation spots recipe  1722760240   mobile
1  what is invest in crypto  1734252942   mobile
2           AI tools recipe  1728010297  desktop
3      best camera symptoms  1730697978   mobile
4   best meditation near me  1735175397   mobile


## A word about each of these data types

- `data_dict_of_lists.pkl`

A dictionary where each key is a column name, and each value is a list of column values.

```python
{
  "name": ["Alice", "Bob", "Charlie"],
  "age": [30, 25, 35],
  "city": ["London", "Paris", "Berlin"]
}
```

Think of it like a spreadsheet stored by column.

-----

- `data_list_of_dicts.pkl`
  
A list where each item is a dictionary representing a row.

```python
[
  {"name": "Alice", "age": 30, "city": "London"},
  {"name": "Bob", "age": 25, "city": "Paris"},
  {"name": "Charlie", "age": 35, "city": "Berlin"}
]
```

This is a row-by-row format — easy to use with JSON, APIs, or MongoDB.

-----

- `data_tuple_of_tuples.pkl`

A tuple of immutable records — each one is a flat tuple of values.

```python
(
  ("Alice", 30, "London"),
  ("Bob", 25, "Paris"),
  ("Charlie", 35, "Berlin")
)
```

This is a lightweight, fixed-shape format that’s efficient for iteration or indexing.



## Conclusion

Once we've unpickled our data, we're ready to use it just like any other `pandas` DataFrame!

You're ready to move on to [Basic Data Interrogation](../basic-data-interrogation/notebook.ipynb).