# **10. Input/Output Operations**

# 🥒 7. Pickle Files in Pandas

In [13]:
import pandas as pd 

## 1️⃣ What It Does and When to Use It

### ✅ What it does:

Pickle files are used to **serialize** (save) and **deserialize** (load) Python objects — including pandas DataFrames — into a **binary format** using Python’s built-in `pickle` module.

Pandas provides convenient functions:

* `df.to_pickle()` – to **save** a DataFrame as a pickle file.
* `pd.read_pickle()` – to **load** a DataFrame from a pickle file.

---

### 📌 When to use:

* ✅ When you want **fast and efficient saving/loading** of pandas objects.
* ✅ When sharing data between **Python programs**.
* ✅ When preserving **DataFrame structure, data types, indexes, and metadata** exactly.
* ❌ Not recommended for **cross-language** or **long-term** storage (because of compatibility risks).

## 2️⃣ Syntax and Key Parameters

### 🔹 `df.to_pickle()`

```python
df.to_pickle(path, compression='infer', protocol=pickle.HIGHEST_PROTOCOL, storage_options=None)
```

| Parameter         | Description                                                  |
| ----------------- | ------------------------------------------------------------ |
| `path`            | File path or buffer where the pickle will be saved           |
| `compression`     | Compression method: `'gzip'`, `'bz2'`, `'zip'`, `'xz'`, etc. |
| `protocol`        | Pickle protocol version (default is highest available)       |
| `storage_options` | Extra options for remote storage (e.g., S3)                  |

---

### 🔹 `pd.read_pickle()`

```python
pd.read_pickle(filepath_or_buffer, compression='infer', storage_options=None)
```

| Parameter            | Description                                  |
| -------------------- | -------------------------------------------- |
| `filepath_or_buffer` | Path or buffer from which to read the pickle |
| `compression`        | Decompression type                           |
| `storage_options`    | Optional storage args for cloud/remote reads |


## 3️⃣ Examples of Reading/Writing

### 📤 Writing to a Pickle File

In [14]:
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35]
})

df

Unnamed: 0,Name,Age
0,Alice,25
1,Bob,30
2,Charlie,35


In [15]:
# Save to Pickle file

df.to_pickle('data files\\pickle\\people.pkl')

### 📥 Reading from a Pickle File

In [16]:
df_loaded = pd.read_pickle('data files\\pickle\\people.pkl')
df_loaded

Unnamed: 0,Name,Age
0,Alice,25
1,Bob,30
2,Charlie,35


### 🗜️ Compressed Pickle Example

In [17]:
# Save using gzip compression
df.to_pickle('data files\\pickle\\people.pkl.gz', compression='gzip')

In [18]:
# Load from compressed pickle
df_loaded = pd.read_pickle('data files\\pickle\\people.pkl.gz')
df_loaded

Unnamed: 0,Name,Age
0,Alice,25
1,Bob,30
2,Charlie,35


## 4️⃣ Common Pitfalls

| Pitfall                     | Explanation & Fix                                                                                    |
| --------------------------- | ---------------------------------------------------------------------------------------------------- |
| ❌ **Not human-readable**    | Binary format can't be inspected manually — use only for code-based data exchange.                   |
| ❌ **Version compatibility** | Pickles created in newer Python or pandas versions may **not work** in older versions.               |
| ❌ **Security risks**        | Unpickling untrusted files is dangerous — can **execute malicious code**. Always trust the source.   |
| ❌ **Not interoperable**     | Not suitable for sharing with R, Java, or web apps — use formats like CSV, JSON, or Parquet instead. |
| ❌ **Harder to debug**       | If something breaks, it’s harder to debug binary data than text-based formats.                       |


## 5️⃣ Real-World Usage

### 🚀 Fast Data Caching

* Save **intermediate processing results** in memory for quick reuse, instead of recomputing expensive operations every time.

### 🧪 Machine Learning Pipelines

* Store **preprocessed features**, **model input**, or **final datasets** for experimentation and tuning.

### 🧑‍💻 Application Development

* Save state of DataFrame-like settings or session data in a **desktop or backend application**.

### 🔄 Repeated Analysis

* Efficiently store datasets for **daily/weekly reruns** of scripts or notebooks.

## ✅ Summary Table

| Task                       | Method                                   |
| -------------------------- | ---------------------------------------- |
| Save DataFrame as Pickle   | `df.to_pickle()`                         |
| Load DataFrame from Pickle | `pd.read_pickle()`                       |
| Best for                   | Speed & exact structure retention        |
| Not suited for             | Sharing outside Python / untrusted files |
| Optional Compression       | Yes (`gzip`, `bz2`, etc.)                |


<center><b>Thanks</b></center>