📝 **Author:** Amirhossein Heydari - 📧 **Email:** <amirhosseinheydari78@gmail.com> - 📍 **Origin:** [mr-pylin/numpy-workshop](https://github.com/mr-pylin/numpy-workshop)

---


**Table of contents**<a id='toc0_'></a>    
- [Dependencies](#toc1_)    
- [NumPy - Input and output](#toc2_)    
  - [NumPy binary files (npy, npz)](#toc2_1_)    
  - [Text files](#toc2_2_)    
  - [String formatting](#toc2_3_)    
  - [Memory mapping files](#toc2_4_)    
  - [Text formatting options](#toc2_5_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

# <a id='toc1_'></a>[Dependencies](#toc0_)


In [1]:
import sys

import numpy as np

In [2]:
rng = np.random.default_rng(seed=42)

# <a id='toc2_'></a>[NumPy - Input and output](#toc0_)

📝 Doc:

- Input and output: [numpy.org/doc/stable/reference/routines.io.html](https://numpy.org/doc/stable/reference/routines.io.html)


## <a id='toc2_1_'></a>[NumPy binary files (npy, npz)](#toc0_)

- npy
  - Stores a single NumPy array in binary format
  - Efficient for loading and saving large arrays
  - Use `np.save` to save, `np.load` to load
- npz
  - Stores multiple NumPy arrays in a compressed archive
  - Use `np.savez` or `np.savez_compressed` to save, `np.load` to load.
  - Offers compression for smaller file size

<table style="margin: 0 auto;">
  <thead>
    <tr>
      <th style="text-align: center;">Function</th>
      <th style="text-align: center;">Description</th>
      <th style="text-align: center;">Details</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code>np.load</code></td>
      <td>Load arrays or pickled objects from <code>.npy</code>, <code>.npz</code> or pickled files</td>
      <td style="text-align: center;"><a href="https://numpy.org/doc/stable/reference/generated/numpy.load.html">link</a></td>
    </tr>
    <tr>
      <td><code>np.save</code></td>
      <td>Save an array to a binary file in NumPy <code>.npy</code> format</td>
      <td style="text-align: center;"><a href="https://numpy.org/doc/stable/reference/generated/numpy.save.html">link</a></td>
    </tr>
    <tr>
      <td><code>np.savez</code></td>
      <td>Save several arrays into a single file in uncompressed <code>.npz</code> format</td>
      <td style="text-align: center;"><a href="https://numpy.org/doc/stable/reference/generated/numpy.savez.html">link</a></td>
    </tr>
    <tr>
      <td><code>np.savez_compressed</code></td>
      <td>Save several arrays into a single file in compressed <code>.npz</code> format</td>
      <td style="text-align: center;"><a href="https://numpy.org/doc/stable/reference/generated/numpy.savez_compressed.html">link</a></td>
    </tr>
  </tbody>
</table>


In [3]:
arr_1d_1 = np.array([1, 2, 3, 4, 5])

# save
np.save("../assets/binaries/binary_1.npy", arr_1d_1)

In [4]:
arr_1d_2 = np.array([1, 2, 3, 4, 5])
arr_1d_3 = np.array([5, 4, 3, 2, 1])

# savez
np.savez("../assets/binaries/binary_2.npz", arr_1d_2, arr_1d_3)

In [5]:
arr_1d_4 = np.array([4, 5, 6, 7, 8])
arr_1d_5 = np.array([8, 7, 6, 5, 4])

# savez_compressed
np.savez_compressed("../assets/binaries/binary_3.npz", arr_1d_4, arr_1d_5)

In [6]:
# load .npy
arr_1d_6 = np.load("../assets/binaries/binary_1.npy")

# load .npz
load_1 = np.load("../assets/binaries/binary_2.npz")
arr_1d_7, arr_1d_8 = load_1["arr_0"], load_1["arr_1"]

# load .npz [compressed file]
load_2 = np.load("../assets/binaries/binary_3.npz")
arr_1d_9, arr_1d_10 = load_2["arr_0"], load_2["arr_1"]

# log
print(f"arr_1d_6 : {arr_1d_6}")
print(f"arr_1d_7 : {arr_1d_7}")
print(f"arr_1d_8 : {arr_1d_8}")
print(f"arr_1d_9 : {arr_1d_9}")
print(f"arr_1d_10: {arr_1d_10}")

arr_1d_6 : [1 2 3 4 5]
arr_1d_7 : [1 2 3 4 5]
arr_1d_8 : [5 4 3 2 1]
arr_1d_9 : [4 5 6 7 8]
arr_1d_10: [8 7 6 5 4]


## <a id='toc2_2_'></a>[Text files](#toc0_)

<table style="margin: 0 auto;">
  <thead>
    <tr>
      <th style="text-align: center;">Function</th>
      <th style="text-align: center;">Description</th>
      <th style="text-align: center;">Details</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code>np.loadtxt</code></td>
      <td>Load data from a text file</td>
      <td style="text-align: center;"><a href="https://numpy.org/doc/stable/reference/generated/numpy.loadtxt.html">link</a></td>
    </tr>
    <tr>
      <td><code>np.savetxt</code></td>
      <td>Save an array to a text file</td>
      <td style="text-align: center;"><a href="https://numpy.org/doc/stable/reference/generated/numpy.savetxt.html">link</a></td>
    </tr>
    <tr>
      <td><code>np.genfromtxt</code></td>
      <td>Load data from a text file, with missing values handled as specified</td>
      <td style="text-align: center;"><a href="https://numpy.org/doc/stable/reference/generated/numpy.genfromtxt.html">link</a></td>
    </tr>
    <tr>
      <td><code>np.fromstring</code></td>
      <td>A new 1-D array initialized from text data in a string</td>
      <td style="text-align: center;"><a href="https://numpy.org/doc/stable/reference/generated/numpy.fromstring.html">link</a></td>
    </tr>
    <tr>
      <td><code>np.ndarray.tofile</code></td>
      <td>Write array to a file as text or binary (default)</td>
      <td style="text-align: center;"><a href="https://numpy.org/doc/stable/reference/generated/numpy.ndarray.tofile.html">link</a></td>
    </tr>
    <tr>
      <td><code>np.ndarray.tolist</code></td>
      <td>Return the array as an <code>a.ndim</code>-levels deep nested list of Python scalars</td>
      <td style="text-align: center;"><a href="https://numpy.org/doc/stable/reference/generated/numpy.ndarray.tolist.html">link</a></td>
    </tr>
  </tbody>
</table>


In [7]:
arr_2d_1 = np.array([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0], [10.0, 11.0, 12.0]])

# savetxt
np.savetxt("../assets/txtfiles/file_1.csv", X=arr_2d_1, fmt="%i", delimiter=",", header="A, B, C", comments="")

# load txt file
arr_2d_2 = np.loadtxt(
    fname="../assets/txtfiles/file_1.csv",
    dtype=np.int64,
    delimiter=",",
    skiprows=1,
)

# log
print(f"arr_2d_2:\n{arr_2d_2}")

arr_2d_2:
[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]


In [8]:
arr_2d_2 = np.array([[1.0, 2.0, 3.0], [4.0, None, 6.0], [7.0, 8.0, 9.0], [10.0, 11.0, 12.0]])
arr_2d_2 = np.where(arr_2d_2 == None, np.nan, arr_2d_2)

# savetxt [contains null values]
np.savetxt("../assets/txtfiles/file_2.csv", X=arr_2d_2, fmt="%.1f", delimiter=",", header="A, B, C", comments="")

# load txt file [advancced]
arr_2d_3 = np.genfromtxt(
    fname="../assets/txtfiles/file_2.csv",
    dtype=np.float64,
    delimiter=",",
    names=True,
)

# log
print(f"arr_2d_3       : {arr_2d_3}")
print(f"arr_2d_3.dtype : {arr_2d_3.dtype}")

arr_2d_3       : [( 1.,  2.,  3.) ( 4., nan,  6.) ( 7.,  8.,  9.) (10., 11., 12.)]
arr_2d_3.dtype : [('A', '<f8'), ('B', '<f8'), ('C', '<f8')]


In [9]:
str_1 = "1.0,2.0,3.0,4.0,5.0"

# fromstring
fromstring_1 = np.fromstring(str_1, sep=",")

# log
print(f"fromstring_1: {fromstring_1}")

fromstring_1: [1. 2. 3. 4. 5.]


## <a id='toc2_3_'></a>[String formatting](#toc0_)

<table style="margin: 0 auto;">
  <thead>
    <tr>
      <th style="text-align: center;">Function</th>
      <th style="text-align: center;">Description</th>
      <th style="text-align: center;">Details</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code>np.array2string</code></td>
      <td>Return a string representation of an array</td>
      <td style="text-align: center;"><a href="https://numpy.org/doc/stable/reference/generated/numpy.array2string.html">link</a></td>
    </tr>
  </tbody>
</table>


In [10]:
arr_2d_4 = np.array([[1, 2], [3, 4], [5, 6]])

# array2string
array2string_1 = np.array2string(arr_2d_4)

# log
print(f"type(array2string_1) : {type(array2string_1)}")
print(f"array2string_1:\n{array2string_1}")

type(array2string_1) : <class 'str'>
array2string_1:
[[1 2]
 [3 4]
 [5 6]]


## <a id='toc2_4_'></a>[Memory mapping files](#toc0_)

- It's used when datasets are too large to fit into RAM
- `np.memmap` allows you to work with them as if they were in memory, but without loading the entire dataset into RAM
- You can efficiently read and process only the parts of the data you need

<table style="margin: 0 auto;">
  <thead>
    <tr>
      <th style="text-align: center;">Function</th>
      <th style="text-align: center;">Description</th>
      <th style="text-align: center;">Details</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code>np.memmap</code></td>
      <td>Create a memory-map to an array stored in a binary file on disk</td>
      <td style="text-align: center;"><a href="https://numpy.org/doc/stable/reference/generated/numpy.memmap.html">link</a></td>
    </tr>
  </tbody>
</table>


In [11]:
# create an empty memmap array
memmap_arr_1 = np.memmap("../assets/memmaps/memmap_1.dat", dtype=np.float32, mode="w+", shape=(10, 10))

# fill the array with data
random_data_1 = rng.random((10, 10))
memmap_arr_1[:] = random_data_1

# flush changes to disk
memmap_arr_1.flush()

# open the memory-mapped file in read mode
memmap_arr_2 = np.memmap("../assets/memmaps/memmap_1.dat", dtype=np.float32, mode="r", shape=(10, 10))

# log
print(f"memmap_arr_1  size : {sys.getsizeof(memmap_arr_1)} bytes")
print(f"memmap_arr_2  size : {sys.getsizeof(memmap_arr_2)} bytes")
print(f"random_data_1 size : {sys.getsizeof(random_data_1)} bytes")

# when done, delete the reference to the memmap array
del memmap_arr_1
del memmap_arr_2

memmap_arr_1  size : 160 bytes
memmap_arr_2  size : 160 bytes
random_data_1 size : 928 bytes


## <a id='toc2_5_'></a>[Text formatting options](#toc0_)

<table style="margin: 0 auto;">
  <thead>
    <tr>
      <th style="text-align: center;">Function</th>
      <th style="text-align: center;">Description</th>
      <th style="text-align: center;">Details</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code>np.set_printoptions</code></td>
      <td>Set printing options</td>
      <td style="text-align: center;"><a href="https://numpy.org/doc/stable/reference/generated/numpy.set_printoptions.html">link</a></td>
    </tr>
    <tr>
      <td><code>np.get_printoptions</code></td>
      <td>Return the current print options</td>
      <td style="text-align: center;"><a href="https://numpy.org/doc/stable/reference/generated/numpy.set_printoptions.html">link</a></td>
    </tr>
    <tr>
      <td><code>np.printoptions</code></td>
      <td>Context manager for setting print options</td>
      <td style="text-align: center;"><a href="https://numpy.org/doc/stable/reference/generated/numpy.printoptions.html">link</a></td>
    </tr>
  </tbody>
</table>


In [12]:
arr_2d_5 = np.array([[1.123456789, 2.987654321, 3.2353278765], [3.141592653, 4.567890123, 1.2461287468175]])

# log
print(f"arr_2d_5:\n{arr_2d_5}")

arr_2d_5:
[[1.12345679 2.98765432 3.23532788]
 [3.14159265 4.56789012 1.24612875]]


In [13]:
# retrieve the current print options
print_options_1 = np.get_printoptions()

# log
print(f"print_options_1:\n{print_options_1}")

print_options_1:
{'edgeitems': 3, 'threshold': 1000, 'floatmode': 'maxprec', 'precision': 8, 'suppress': False, 'linewidth': 75, 'nanstr': 'nan', 'infstr': 'inf', 'sign': '-', 'formatter': None, 'legacy': False, 'override_repr': None}


In [14]:
# use as a context manager to temporarily set print options within a specific block of code
with np.printoptions(precision=2, suppress=True, linewidth=20):
    print(f"arr_2d_5:\n{arr_2d_5}")

# log
print(f"arr_2d_5:\n{arr_2d_5}")

arr_2d_5:
[[1.12 2.99 3.24]
 [3.14 4.57 1.25]]
arr_2d_5:
[[1.12345679 2.98765432 3.23532788]
 [3.14159265 4.56789012 1.24612875]]


In [15]:
# set global print options for NumPy arrays
np.set_printoptions(precision=2, suppress=True, linewidth=20)

# log
print(f"arr_2d_5:\n{arr_2d_5}")

# reset to default settings according to the docs [but it's not!!] [tested on v2.0]
np.set_printoptions()

arr_2d_5:
[[1.12 2.99 3.24]
 [3.14 4.57 1.25]]
