# Saving and Loading Arrays

**Module 06 | Notebook 01**

---

## Objective
By the end of this notebook, you will master:
- Saving arrays to .npy and .npz files
- Loading arrays from files
- Multiple arrays in single file
- Memory-mapped files for large data
- Best practices for data persistence

In [None]:
import numpy as np
import os
np.set_printoptions(precision=2)

---
## 1. np.save() and np.load() - Single Array

In [None]:
# Save a single array to .npy file
arr = np.arange(10)
print(f"Array to save: {arr}")

np.save('my_array.npy', arr)
print("Saved to my_array.npy")

In [None]:
# Load the array
loaded = np.load('my_array.npy')
print(f"Loaded array: {loaded}")
print(f"Same as original: {np.array_equal(arr, loaded)}")

In [None]:
# .npy extension added automatically if missing
np.save('test_array', arr)  # Creates test_array.npy
print(f"File exists: {os.path.exists('test_array.npy')}")

In [None]:
# What's in a .npy file?
# It's a binary format with:
# - Magic number
# - Version info
# - Header with dtype, shape, order
# - Raw data bytes

print(f"File size: {os.path.getsize('my_array.npy')} bytes")

---
## 2. np.savez() - Multiple Arrays

In [None]:
# Save multiple arrays to a single .npz file
arr1 = np.arange(10)
arr2 = np.random.rand(3, 4)
arr3 = np.array(['a', 'b', 'c'])

# Named arrays
np.savez('multiple_arrays.npz', 
         integers=arr1, 
         floats=arr2, 
         strings=arr3)

print("Saved multiple arrays to .npz")

In [None]:
# Load .npz file
loaded = np.load('multiple_arrays.npz')

# It's like a dictionary
print(f"Keys: {list(loaded.keys())}")
print(f"integers: {loaded['integers']}")
print(f"floats shape: {loaded['floats'].shape}")

In [None]:
# Close the file (good practice)
loaded.close()

# Or use context manager
with np.load('multiple_arrays.npz') as data:
    print(f"Keys: {list(data.keys())}")

In [None]:
# Unnamed arrays (automatic names: arr_0, arr_1, ...)
np.savez('unnamed.npz', arr1, arr2, arr3)

with np.load('unnamed.npz') as data:
    print(f"Auto-generated keys: {list(data.keys())}")

---
## 3. np.savez_compressed() - Compressed Archives

In [None]:
# Create a larger array for comparison
large_arr = np.random.rand(1000, 1000)

# Uncompressed
np.savez('uncompressed.npz', data=large_arr)

# Compressed
np.savez_compressed('compressed.npz', data=large_arr)

print(f"Uncompressed size: {os.path.getsize('uncompressed.npz') / 1e6:.2f} MB")
print(f"Compressed size: {os.path.getsize('compressed.npz') / 1e6:.2f} MB")

In [None]:
# Compression is more effective for structured data
structured_arr = np.zeros((1000, 1000))  # Lots of repeated values

np.savez('struct_uncompressed.npz', data=structured_arr)
np.savez_compressed('struct_compressed.npz', data=structured_arr)

print(f"Zeros uncompressed: {os.path.getsize('struct_uncompressed.npz') / 1e6:.2f} MB")
print(f"Zeros compressed: {os.path.getsize('struct_compressed.npz') / 1e3:.2f} KB")

---
## 4. Memory-Mapped Files

In [None]:
# For very large files, use memory mapping
# File is accessed from disk on demand, not loaded entirely

# Create a large array and save it
large = np.arange(1000000).reshape(1000, 1000)
np.save('large_array.npy', large)

In [None]:
# Load with memory mapping
mmap = np.load('large_array.npy', mmap_mode='r')  # read-only

print(f"Shape: {mmap.shape}")
print(f"First row: {mmap[0, :5]}")

# The entire array is NOT loaded into RAM

In [None]:
# Memory map modes:
# 'r'  - read-only
# 'r+' - read-write (changes saved to disk)
# 'w+' - create new file for read-write
# 'c'  - copy-on-write (changes not saved)

mmap_rw = np.load('large_array.npy', mmap_mode='r+')
# mmap_rw[0, 0] = 999  # Would modify the file

In [None]:
# Create memory-mapped array directly
fp = np.memmap('temp_memmap.dat', dtype='float32', mode='w+', shape=(1000, 1000))

# Write data
fp[:] = np.random.rand(1000, 1000)

# Flush to disk
fp.flush()

print(f"Memmap shape: {fp.shape}")
del fp  # Close

---
## 5. allow_pickle Parameter

In [None]:
# Object arrays require pickle
obj_arr = np.array([{'a': 1}, {'b': 2}], dtype=object)

np.save('object_array.npy', obj_arr, allow_pickle=True)
print("Saved object array")

In [None]:
# Loading object arrays requires allow_pickle=True
try:
    loaded = np.load('object_array.npy', allow_pickle=False)
except ValueError as e:
    print(f"Error without pickle: {e}")

# With pickle enabled
loaded = np.load('object_array.npy', allow_pickle=True)
print(f"Loaded: {loaded}")

In [None]:
# Security note: 
# pickle can execute arbitrary code
# Only load .npy files from trusted sources if allow_pickle=True

---
## 6. Practical Patterns

In [None]:
# Pattern: Save model weights
weights = {
    'layer1': np.random.rand(784, 256),
    'layer2': np.random.rand(256, 128),
    'layer3': np.random.rand(128, 10)
}

np.savez_compressed('model_weights.npz', **weights)
print("Model weights saved")

# Load
with np.load('model_weights.npz') as data:
    loaded_weights = {k: data[k] for k in data.keys()}
    print(f"Loaded layers: {list(loaded_weights.keys())}")

In [None]:
# Pattern: Checkpoint with metadata
checkpoint = {
    'epoch': np.array(10),
    'loss': np.array([0.5, 0.3, 0.2, 0.1]),
    'weights': np.random.rand(100, 50),
    'optimizer_state': np.random.rand(100)
}

np.savez('checkpoint.npz', **checkpoint)

In [None]:
# Pattern: Incremental save to memory-mapped file
n_samples = 10000
n_features = 100

# Create empty memory-mapped file
fp = np.memmap('dataset.dat', dtype='float32', mode='w+', 
               shape=(n_samples, n_features))

# Simulate writing data in batches
batch_size = 1000
for i in range(0, n_samples, batch_size):
    fp[i:i+batch_size] = np.random.rand(batch_size, n_features)
    fp.flush()  # Ensure data is written

print(f"Dataset shape: {fp.shape}")
del fp

---
## Key Points Summary

**Save/Load Functions:**
- `np.save()`: Single array to .npy
- `np.savez()`: Multiple arrays to .npz (uncompressed)
- `np.savez_compressed()`: Compressed .npz
- `np.load()`: Load .npy or .npz

**Memory Mapping:**
- Use `mmap_mode` in `np.load()` for large files
- `np.memmap()` for direct memory-mapped arrays
- Modes: 'r' (read), 'r+' (read-write), 'c' (copy-on-write)

**Best Practices:**
- Use `.npz` for related arrays
- Use `_compressed` for large sparse/structured data
- Use memory mapping for arrays larger than RAM

---
## Interview Tips

**Q1: Difference between .npy and .npz?**
> - `.npy`: Single array, binary format
> - `.npz`: Multiple arrays in a zip archive, accessed like dictionary

**Q2: When to use savez_compressed?**
> When file size matters and data is compressible (sparse arrays, repeated values). Trade-off is slower save/load due to compression.

**Q3: What is memory mapping and when to use it?**
> Memory mapping accesses file directly from disk without loading entirely into RAM. Use for arrays larger than available memory or when only accessing portions.

**Q4: Why is allow_pickle a security concern?**
> Pickle can execute arbitrary code during deserialization. Malicious .npy files could run harmful code when loaded.

---
## Cleanup

In [None]:
# Clean up test files
import glob

for f in glob.glob('*.npy') + glob.glob('*.npz') + glob.glob('*.dat'):
    os.remove(f)
    print(f"Removed: {f}")

---
## Next Notebook
**02_working_with_text_files.ipynb** - Reading and writing text/CSV files with NumPy.