# <font color="#418FDE" size="6.5" uppercase>**Saving and Loading**</font>

>Last update: 20251225.
    
By the end of this Lecture, you will be able to:
- Save and load arrays using NumPy-specific formats such as .npy and .npz with appropriate options. 
- Import and export array data from text-based formats like CSV while controlling dtypes and delimiters. 
- Integrate NumPy arrays with external tools by converting to and from common formats such as Python buffers and memory-mapped files. 


## **1. NumPy File Formats**

### **1.1. Saving and Loading Arrays**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/NumPy (2.2.6) A-Z/Module_03/Lecture_B/image_01_01.jpg?v=1766693100" width="250">



>* NumPy binary files store arrays exactly and safely
>* Avoid recomputing expensive data by reloading later

>* Binary files store array values plus metadata
>* Arrays reload exactly, preserving shape, type, precision

>* Stable, versioned format supports long-term, shared data
>* Enables reliable pipelines without formatting or rounding issues



In [None]:
#@title Python Code - Saving and Loading Arrays

# Demonstrate saving NumPy arrays using .npy format.
# Demonstrate loading NumPy arrays back from disk.
# Show that shape and values remain exactly preserved.

import numpy as np  # Import NumPy numerical library.

# Create a simple temperature array in Fahrenheit degrees.
temperatures_f = np.array([68.0, 70.5, 73.2, 75.0])

# Save the array to disk using NumPy binary format.
np.save("daily_temps.npy", temperatures_f)

# Load the array back from the saved .npy file.
loaded_temps_f = np.load("daily_temps.npy")

# Print original and loaded arrays to compare values and shapes.
print("Original temperatures:", temperatures_f, "shape:", temperatures_f.shape)

# Print loaded array information showing identical content and structure.
print("Loaded temperatures:", loaded_temps_f, "shape:", loaded_temps_f.shape)




### **1.2. Comparing NPY And NPZ**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/NumPy (2.2.6) A-Z/Module_03/Lecture_B/image_01_02.jpg?v=1766693123" width="250">



>* NPY stores one array with full metadata
>* NPZ bundles multiple named arrays into one archive

>* Use NPY when working with one main array
>* Use NPZ to keep multiple related arrays together

>* NPY is faster, simpler for single arrays
>* NPZ groups related arrays into shareable packages



In [None]:
#@title Python Code - Comparing NPY And NPZ

# Demonstrate saving single array with NPY format clearly and simply.
# Demonstrate saving multiple related arrays together using NPZ format.
# Compare how loading from NPY and NPZ works using simple printed summaries.

import numpy as np
from pathlib import Path

# Create simple arrays representing daily temperatures and humidity values in Fahrenheit units.
temperatures_f = np.array([68.0, 70.0, 75.0, 80.0], dtype=np.float32)
humidity_percent = np.array([40, 45, 50, 55], dtype=np.int32)

# Save a single array using NPY format, ideal for one primary dataset.
np.save("daily_temperatures.npy", temperatures_f)

# Save multiple related arrays together using NPZ archive with clear names.
np.savez("weather_data.npz", temps_f=temperatures_f, humidity=humidity_percent)

# Load the single array back from the NPY file and inspect its shape.
loaded_temps = np.load("daily_temperatures.npy")
print("Loaded NPY temperatures shape and dtype:", loaded_temps.shape, loaded_temps.dtype)

# Load the NPZ archive, which behaves like a dictionary of named arrays.
weather_archive = np.load("weather_data.npz")
print("NPZ archive keys available:", list(weather_archive.keys()))

# Access individual arrays from NPZ by their names and show small summaries.
loaded_temps_npz = weather_archive["temps_f"]
loaded_humidity_npz = weather_archive["humidity"]
print("NPZ temps first value and length:", loaded_temps_npz[0], len(loaded_temps_npz))

# Show that humidity array is stored alongside temperatures inside the same NPZ file.
print("NPZ humidity first value and length:", loaded_humidity_npz[0], len(loaded_humidity_npz))

# Confirm that both files exist on disk, highlighting one file versus combined archive.
print("NPY file exists:", Path("daily_temperatures.npy").exists(), "NPZ file exists:", Path("weather_data.npz").exists())



### **1.3. Compressed archives with savez**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/NumPy (2.2.6) A-Z/Module_03/Lecture_B/image_01_03.jpg?v=1766693138" width="250">



>* Bundle many arrays into one compressed file
>* Name each array and reload them by name

>* Compression saves disk space and speeds data transfer
>* But adds CPU cost and slows frequent access

>* Archives capture full experiment or simulation state
>* Named arrays improve sharing, organization, reproducibility



In [None]:
#@title Python Code - Compressed archives with savez

# Demonstrate saving multiple arrays using numpy savez_compressed archives.
# Show tradeoff between compressed size and uncompressed size clearly.
# Load archive and access arrays by their chosen names easily.

import numpy as np
import os

# Create two example arrays representing daily temperatures in Fahrenheit.
city_a_temps = np.array([70.0, 72.5, 68.0, 75.0, 71.5])
city_b_temps = np.array([60.0, 62.0, 59.5, 63.0, 61.0])

# Save arrays separately using uncompressed numpy format for comparison.
np.save("city_a_temps.npy", city_a_temps)
np.save("city_b_temps.npy", city_b_temps)

# Save both arrays together using a single compressed archive file.
np.savez_compressed("temps_archive.npz", a_city=city_a_temps, b_city=city_b_temps)

# Show file sizes in bytes to illustrate compression space savings.
size_a = os.path.getsize("city_a_temps.npy")
size_b = os.path.getsize("city_b_temps.npy")
size_archive = os.path.getsize("temps_archive.npz")

print("Uncompressed separate total bytes:", size_a + size_b)
print("Compressed archive total bytes:", size_archive)

# Load the compressed archive and access arrays by their stored names.
loaded = np.load("temps_archive.npz")
print("Loaded keys from archive:", list(loaded.keys()))

print("City A temperatures from archive:", loaded["a_city"])
print("City B temperatures from archive:", loaded["b_city"])



## **2. Text Data Loading**

### **2.1. Loading Text Files**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/NumPy (2.2.6) A-Z/Module_03/Lecture_B/image_02_01.jpg?v=1766693156" width="250">



>* Convert raw text data into numeric arrays
>* Handle headers, metadata, and malformed or missing entries

>* Match file lines and fields to array
>* Choose correct delimiter so columns map correctly

>* Expect messy text files with irregular, noisy entries
>* Configure loading to skip noise and preserve numbers



In [None]:
#@title Python Code - Loading Text Files

# Demonstrate loading simple numeric text data using NumPy loadtxt function.
# Show how lines become rows and separated values become array columns.
# Create a small CSV file, load it, and print the resulting array.

import numpy as np
from pathlib import Path

# Create example CSV text with header and three numeric data rows.
csv_text = "date,temperature_F,humidity_percent\n2024-01-01,68.0,40\n2024-01-02,70.5,42\n2024-01-03,69.0,41\n"

# Write the CSV text into a small file inside the Colab working directory.
csv_path = Path("weather_data.csv")
csv_path.write_text(csv_text, encoding="utf-8")

# Load numeric columns from the text file, skipping the header description row.
data = np.loadtxt(csv_path, delimiter=",", skiprows=1, usecols=(1, 2))

# Print the loaded array to see rows and columns created from file lines.
print("Loaded numeric data array (temperature_F, humidity_percent):")
print(data)

# Show the array shape to confirm three rows and two columns were parsed.
print("Array shape (rows, columns):", data.shape)



### **2.2. Handling Missing Text Data**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/NumPy (2.2.6) A-Z/Module_03/Lecture_B/image_02_02.jpg?v=1766693172" width="250">



>* Missing values appear in many text data formats
>* Identify and encode them carefully to avoid distortions

>* Define all text patterns that mean missing data
>* Map them to one consistent internal representation

>* Missing values affect dtypes and array structure
>* Use flexible dtypes or masks to preserve analysis



In [None]:
#@title Python Code - Handling Missing Text Data

# Demonstrate loading text data with missing values using NumPy genfromtxt.
# Show how different missing markers become consistent NaN values in arrays.
# Compare behavior with and without specifying missing value options.

import numpy as np
from io import StringIO

# Create a small CSV style text with different missing markers.
csv_text = "day,temp_f,humidity\n1,72,40\n2,NA,42\n3,,41\n4,-9999,43\n"

# Wrap the text inside a file like StringIO object for loading.
text_file = StringIO(csv_text)

# Load without missing value handling, observe dtype and strange numeric values.
raw_data = np.genfromtxt(text_file, delimiter=",", names=True, dtype=None, encoding=None)

print("Raw loaded data array:", raw_data)

# Reset the StringIO position before loading again with better options.
text_file.seek(0)

# Load again, treating NA, empty, and -9999 as missing values represented by NaN.
clean_data = np.genfromtxt(
    text_file,
    delimiter=",",
    names=True,
    missing_values=["NA", "", "-9999"],
    filling_values=np.nan,
)

print("Clean loaded data array:", clean_data)




### **2.3. Controlling Dtype And Delimiter**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/NumPy (2.2.6) A-Z/Module_03/Lecture_B/image_02_03.jpg?v=1766693188" width="250">



>* Choose correct delimiter to separate file columns
>* Set dtypes explicitly to avoid distorted data

>* Match each columnâ€™s dtype to its role
>* Prevents data corruption and saves memory space

>* Choose delimiters carefully to avoid mis-parsing
>* Match delimiters and dtypes for reliable arrays



In [None]:
#@title Python Code - Controlling Dtype And Delimiter

# Demonstrate loading text with custom delimiter control.
# Show how dtype choices change parsed column meanings.
# Use small CSV examples that run easily in Colab.

import numpy as np
from io import StringIO

# Create example text data with semicolon delimiter.
# Columns represent id, temperature, distance, and status.
text_data = """001;72.5;10.0;OK
002;68.0;15.5;FAIL
003;75.2;20.0;OK"""

# Wrap the string as a file like object for numpy loading.
file_like = StringIO(text_data)

# Load using wrong delimiter, everything becomes one merged string column.
wrong = np.loadtxt(file_like, delimiter=",", dtype=str)

# Reset file like object position before second loading attempt.
file_like.seek(0)

# Load using correct delimiter and mixed dtypes for each column.
correct = np.loadtxt(file_like, delimiter=";", dtype=[("id", "U3"), ("temp_f", float), ("dist_miles", float), ("status", "U4")])

# Print both results to compare delimiter and dtype effects.
print("Wrong delimiter result array:", wrong)

print("\nCorrect delimiter structured array:", correct)



## **3. Advanced NumPy IO**

### **3.1. Memory Mapped Arrays**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/NumPy (2.2.6) A-Z/Module_03/Lecture_B/image_03_01.jpg?v=1766693212" width="250">



>* Treat huge on-disk files like NumPy arrays
>* Load and process only needed chunks efficiently

>* Match dtype, shape, and mode to file
>* Mapped arrays mirror file contents and disk behavior

>* Share huge datasets across tools and languages
>* Map files once, reuse arrays efficiently everywhere



In [None]:
#@title Python Code - Memory Mapped Arrays

# Demonstrate creating and using NumPy memory mapped arrays safely.
# Show reading and writing without loading entire file into memory.
# Useful for large datasets like long time series or images.

import numpy as np
import os

# Define file path inside current working directory for safety.
filename = os.path.join(os.getcwd(), "sensor_data_mmap.dat")

# Create a large array representing hourly temperatures for many years.
hours_per_year = 24 * 365
years = 10

# Use memmap to create file backed array without full memory allocation.
mm_write = np.memmap(filename, dtype="float32", mode="w+", shape=(years, hours_per_year))

# Fill only first year with simple pattern representing rising daily temperatures.
for day in range(7):
    start = day * 24
    end = start + 24

    # Simulate temperatures from 60 to 80 Fahrenheit during each example day.
    mm_write[0, start:end] = np.linspace(60.0, 80.0, 24)

# Flush changes to disk and delete reference to release resources.
mm_write.flush()
del mm_write

# Reopen same file as read only memory mapped array for analysis.
mm_read = np.memmap(filename, dtype="float32", mode="r", shape=(years, hours_per_year))

# Compute average temperature for first example week using vectorized operations.
first_week = mm_read[0, 0:7 * 24]
week_average = float(first_week.mean())

# Show that data behaves like normal NumPy array despite being file backed.
print("Memory mapped week shape:", first_week.shape)
print("Average Fahrenheit temperature:", round(week_average, 2))

# Clean up created file to avoid cluttering the Colab environment.
os.remove(filename)



### **3.2. Buffer Based Interactions**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/NumPy (2.2.6) A-Z/Module_03/Lecture_B/image_03_02.jpg?v=1766693233" width="250">



>* Buffers expose NumPy array data as raw bytes
>* Other tools use these bytes directly without copying

>* Buffers connect NumPy arrays to binary-based systems
>* Enable zero-copy streaming, processing, and visualization workflows

>* Use buffers to map custom binary layouts
>* Coordinate dtypes and alignment for efficient pipelines



In [None]:
#@title Python Code - Buffer Based Interactions

# Demonstrate NumPy array sharing data using Python buffer protocol.
# Show converting array to bytes and back without extra copying.
# Illustrate using memoryview to modify shared underlying buffer.

import numpy as np

# Create a simple NumPy array with float32 values representing temperatures Fahrenheit.
arr = np.array([32.0, 68.0, 77.0, 104.0], dtype=np.float32)

# Get a memoryview that exposes the raw bytes of the NumPy array.
view = memoryview(arr)

# Convert the memoryview to an immutable bytes object for sending or saving.
raw_bytes = bytes(view)

# Reconstruct a new NumPy array from the raw bytes buffer without manual copying.
arr_from_bytes = np.frombuffer(raw_bytes, dtype=np.float32)

# Create a writable memoryview on the original array to modify shared bytes.
write_view = memoryview(arr)

# Change the first float32 value by writing raw bytes representing 212.0 Fahrenheit.
write_view[:4] = np.array([212.0], dtype=np.float32).tobytes()

# Print original and reconstructed arrays to show shared buffer behavior clearly.
print("Original array after modification:", arr)

# Print array created from bytes to show independent copy not affected by later changes.
print("Array reconstructed from bytes buffer:", arr_from_bytes)



### **3.3. Efficient Large Array Storage**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/NumPy (2.2.6) A-Z/Module_03/Lecture_B/image_03_03.jpg?v=1766693250" width="250">



>* Balance storage size with speed and access
>* Choose formats based on workflows and tool compatibility

>* Store arrays contiguously to match hardware behavior
>* Layout enables partial reads, reducing disk and memory

>* Use formats that support growth and subsetting
>* Organize data for cheap, long-term, tool-friendly access



In [None]:
#@title Python Code - Efficient Large Array Storage

# Demonstrate efficient large array storage using NumPy binary formats.
# Compare saving one big file versus several smaller chunked files.
# Show how chunked storage loads only needed data efficiently.

import numpy as np
import os

# Create a large synthetic array representing hourly temperatures in Fahrenheit.
num_days = 365 * 5
hours_per_day = 24
large_array = np.random.uniform(low=50.0, high=100.0, size=(num_days, hours_per_day))

# Show approximate size in megabytes for the full array in memory.
bytes_total = large_array.nbytes
mb_total = bytes_total / (1024 * 1024)
print("Full array size megabytes approximately:", round(mb_total, 2))

# Save the entire array into a single NumPy binary file on disk.
np.save("all_temps.npy", large_array)
size_single = os.path.getsize("all_temps.npy") / (1024 * 1024)
print("Single file size megabytes approximately:", round(size_single, 2))

# Save the same data into yearly chunked files for efficient partial loading.
years = 5
rows_per_year = num_days // years
for year in range(years):
    start_row = year * rows_per_year
    end_row = start_row + rows_per_year
    chunk = large_array[start_row:end_row]
    filename = f"temps_year_{year + 1}.npy"
    np.save(filename, chunk)

# Load only one year from chunked files instead of the entire dataset.
loaded_year_three = np.load("temps_year_3.npy", mmap_mode=None)
mean_year_three = loaded_year_three.mean()
print("Mean temperature year three Fahrenheit:", round(mean_year_three, 2))

# Load the full dataset again and compute the global mean temperature.
loaded_all = np.load("all_temps.npy", mmap_mode=None)
mean_all = loaded_all.mean()
print("Mean temperature all years Fahrenheit:", round(mean_all, 2))

# Show that chunked storage allows smaller targeted reads for common workflows.
print("Rows loaded from single year file:", loaded_year_three.shape[0])



# <font color="#418FDE" size="6.5" uppercase>**Saving and Loading**</font>


In this lecture, you learned to:
- Save and load arrays using NumPy-specific formats such as .npy and .npz with appropriate options. 
- Import and export array data from text-based formats like CSV while controlling dtypes and delimiters. 
- Integrate NumPy arrays with external tools by converting to and from common formats such as Python buffers and memory-mapped files. 

<font color='yellow'>Congratulations on completing this course!</font>