## Saving data with Python native libraies

We can save current data to a file for later retrieval. We can also share the data with other people.

In [1]:
import numpy as np

### Save data to a normal ASCII file

In [2]:
data = list(range(10))  # define data as a list of 10 values
print(data)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


Save to CSV file in current directory with file name data.csv

In [3]:
np.savetxt("data.csv", data, delimiter=",")

Read data from file we just wrote. Notice the data type was not preserved. Data were upconverted to Numpy float.

In [4]:
read_data = np.loadtxt("data.csv", delimiter=",")
print(read_data)
print(type(read_data))

[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
<class 'numpy.ndarray'>


### Save multiple data to a normal ASCII file

In [5]:
data_1 = np.arange(10)
data_2 = np.arange(10) + 100

Save to csv file using lists of the data for multiple data. The data need to be the same length because saving each list as a new row.

In [6]:
np.savetxt("data.csv", [data_1, data_2], delimiter=",")

In [7]:
read_data = np.loadtxt("data.csv", delimiter=",")
print(read_data)

[[  0.   1.   2.   3.   4.   5.   6.   7.   8.   9.]
 [100. 101. 102. 103. 104. 105. 106. 107. 108. 109.]]


### Save to Numpy binary file for reduced space and faster read/write speeds

Notice the data type was preserved. This is only for one variable at a time.

In [8]:
data = np.arange(10)
print('data.dtype: ', data.dtype)
np.save("data.npy", data)
read_data = np.load("data.npy")
print(read_data)
print('read_data.dtype:', read_data.dtype)

data.dtype:  int64
[0 1 2 3 4 5 6 7 8 9]
read_data.dtype: int64


### Saving multiple variables to single file
We can save multiple Numpy variables to a single file.

In [9]:
data_1 = np.arange(10)
data_2 = np.arange(20, dtype=float)
data_3 = np.zeros(5, dtype=bool)
filename = "data.npz"

# Use the name of the keyword as the name of the data to preserve the variable name.
np.savez(filename, data_1, data_2=data_2, data_3=data_3)  # save to Numpy binary file

Now read the data and print the variable names and values. Notice the use of the .files method. Strange name but this is what Numpy chose for the data objects in the written data file. Each variable is written to a different file, with the name of the file as the name of the variable.

In [10]:
read_data = np.load(filename)

list_names = read_data.files
print("list_names:", list_names)
print("read_data['data_2']:", read_data['data_2'])

list_names: ['data_2', 'data_3', 'arr_0']
read_data['data_2']: [ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12. 13. 14. 15. 16. 17.
 18. 19.]


### We can compress the data to save space on the disk

In [11]:
data_1 = np.arange(10000)
data_2 = np.arange(20000, dtype=float)
data_3 = np.zeros(5000, dtype=bool)
filename = "data_compressed.npz"

np.savez_compressed(filename, data_1=data_1, data_2=data_2, data_3=data_3)

Read data from file. Notice how we didn't say it was compressed data, it just figured it out.

In [12]:
read_data = np.load(filename)
print('read_data.files:', read_data.files)
for var_name in read_data.files:
    print(var_name, ":", read_data[var_name])

read_data.files: ['data_1', 'data_2', 'data_3']
data_1 : [   0    1    2 ... 9997 9998 9999]
data_2 : [0.0000e+00 1.0000e+00 2.0000e+00 ... 1.9997e+04 1.9998e+04 1.9999e+04]
data_3 : [False False False ... False False False]


## Saving data not in Numpy format
What if we want to save objects that are not Numpy? Then we can use a different method called Pickel. This will write the object as an object to the binary file and return it as the Python object.

NOTE:
Pickle files can be hacked. If you receive an unverified raw pickle file, don't trust it! It could have malicious code in it, that would run arbitrary python when you try to de-pickle it.

In [13]:
import pickle

Create a dictionary with information

In [14]:
favorite_color = {"lion": "yellow", "kitty": "red"}

Dump the dictionary to a binary file as a dictionary.

In [15]:
filename = "save.pkl"
pickle.dump(favorite_color, open(filename, "wb"))

del favorite_color  # Remove the dictionary just to prove it is read in.

Read the dictionary back into memory

In [16]:
favorite_color = pickle.load(open(filename, "rb"))
print("favorite_color:", favorite_color)

favorite_color: {'lion': 'yellow', 'kitty': 'red'}


### We can also pickel multiple variables and return them to multiple variables.
Create a dictionary with information

In [17]:
favorite_color = {"lion": "yellow", "kitty": "red"}
favorite_season = 'winter'
favorite_numbers = [4, 42, np.pi]
favorite_series = np.array([2, 3, 5, 7, 11, 13, 17, 19, 23, 29], dtype=np.int8)

Dump the different objects to a single file. For the pickle.dump() method to work we need to pass in one object. That object can contain as many different objects as we want.

In [18]:
filename = "save.pkl"
with open(filename, "wb") as f:
    single_tuple = (favorite_color, favorite_season, favorite_numbers, favorite_series)
    pickle.dump(single_tuple, f)

# Remove the dictionary just to prove it is read in.
del favorite_color, favorite_season, favorite_numbers, favorite_series

Read the dictionary back into memory

In [19]:
favorite_color, favorite_season, favorite_numbers, favorite_series = pickle.load(open(filename, "rb"))
print("favorite_color:", favorite_color)
print("favorite_season:", favorite_season)
print("favorite_numbers:", favorite_numbers)
print("favorite_series:", favorite_series, favorite_series.dtype)

favorite_color: {'lion': 'yellow', 'kitty': 'red'}
favorite_season: winter
favorite_numbers: [4, 42, 3.141592653589793]
favorite_series: [ 2  3  5  7 11 13 17 19 23 29] int8


We just created and saved a bunch of files to disk. Just run this code to cleanup and remove the files we created.

In [20]:
from pathlib import Path
for ext in ['npy', 'csv', 'pkl', 'npz']:
    for fl in Path('.').glob(f'*.{ext}'):
        fl.unlink()