# How to save and load data in Python
In this notebook I will cover the followings:
- Save and load files using HDF5 (using h5py package)
- Save and load files using pickle
- Save and load files from MATLAB
- Save and load files using HDF5 (using Pandas package) # to be added

To continue we need to create some data, e.g., data1 and data2.

In [1]:
import numpy as np

In [2]:
data1 = np.random.random((10,10))
data2 = np.random.random((100,20))
print(data2[20:22,3:6])

[[0.10586635 0.53926726 0.88506764]
 [0.92514506 0.43486768 0.111493  ]]


# HDF5
There are three main types of items in HDF5 files:
- File
- Group
- Dataset
Their name is used as access key.

### HDF Files: How to Create HDF file and how to Read from it

In [3]:
import h5py

# How to create and save hdf files
with h5py.File('/home/meysam/github/ML_with_Keras/SaveLoad/HDF_filename.h5','w') as h: # replace HDF_filename.h5 with your desired file name 
    h.create_dataset('d1',data=data1) # inside prantesis first a name and then data is passed
    h.create_dataset('d2',data=data2)
    
# how to read from the hdf files
with h5py.File('/home/meysam/github/ML_with_Keras/SaveLoad/HDF_filename.h5','r') as h: # healthy practice:use contet manager
    ls = list(h.keys()) # this gives the list of names in the dataset
    print('List of datasets in this file:', ls)
    da1 = h.get('d1')
    loaded_data1 = np.array(da1)
    da2 = h.get('d2')
    loaded_data2 = np.asarray(da2)
    
# sanity check
print(loaded_data2[20:22,3:6])

List of datasets in this file: ['d1', 'd2']
[[0.10586635 0.53926726 0.88506764]
 [0.92514506 0.43486768 0.111493  ]]


### HDF Groups: How to Create HDF Groups and how to Read from them

In [4]:
matrix1 = np.random.random((20,20))
matrix2 = np.random.random((20,20))
matrix3 = np.random.random((20,20))
matrix4 = np.random.random((20,20))
matrix3[4:5,6:8]

array([[0.56184845, 0.75124976]])

In [5]:
# How to create groups and subgroups
with h5py.File('/home/meysam/github/ML_with_Keras/SaveLoad/HDF_groups.h5','w') as h:
    G1 = h.create_group('Group1')
    G1.create_dataset('dataset1',data=matrix1)
    G1.create_dataset('dataset4',data=matrix4)
    
    G2 = h.create_group('Group2/subGroup1')
    G2.create_dataset('dataset3',data=matrix3)

    G2_2 = h.create_group('Group2/subGroup2')
    G2_2.create_dataset('dataset2',data=matrix2)

In [6]:
# How to Read groups and subgroups, Also how to check their members
with h5py.File('/home/meysam/github/ML_with_Keras/SaveLoad/HDF_groups.h5','r') as h:
    ls_items = list(h.items()) #note the difference h.items() not h.keys()
    print('List of datasets in the base dictionary: \n', ls_items)
    G1 = h.get('Group1')
    G1_items = list(G1.items())
    print('Items in group 1:\n',G1_items)
    
    G2 = h.get('Group2')
    G2_items = list(G2.items())
    print('Items in group 2:\n',G2_items)
    
    # read the dataset in subgroup 1 of group 2
    G2_1 = G2.get('/Group2/subGroup1') # the address is obtained from list(G2.items())
    G2_1_items = list(G2_1.items())
    print('Items in subgroup 1 of group 2:\n',G2_1_items)
    mat3 = np.array(G2_1.get('dataset3'))

List of datasets in the base dictionary: 
 [('Group1', <HDF5 group "/Group1" (2 members)>), ('Group2', <HDF5 group "/Group2" (2 members)>)]
Items in group 1:
 [('dataset1', <HDF5 dataset "dataset1": shape (20, 20), type "<f8">), ('dataset4', <HDF5 dataset "dataset4": shape (20, 20), type "<f8">)]
Items in group 2:
 [('subGroup1', <HDF5 group "/Group2/subGroup1" (1 members)>), ('subGroup2', <HDF5 group "/Group2/subGroup2" (1 members)>)]
Items in subgroup 1 of group 2:
 [('dataset3', <HDF5 dataset "dataset3": shape (20, 20), type "<f8">)]


In [7]:
mat3[4:5,6:8] # sanity check

array([[0.56184845, 0.75124976]])

Note that you can compress the datasets, or give them attributes, etc. More details __[here](https://www.youtube.com/watch?v=beat9RO0-x4&index=8&list=PLea0WJq13cnB_ORdGzEkPlZEN20TSt6Lx)__.

# pickle
### How to save and also load files in pickle?

In [8]:
import pickle
# Save files with pickle
# with open('name_by_which_file_will_be_saved','wb') as pw:
#     pw.dump([var1,var2, ...(vars to be saved)],f)
with open('/home/meysam/github/ML_with_Keras/SaveLoad/Pickle_filename','wb') as pw:
    pickle.dump([data1, data2],pw)

    
# Load files with pickle
# with open('name_of_file_to_be_loaded','rb') as pr:
#     [var1,var2, ...(vars to be load)] = pr.load(pr)
with open('/home/meysam/github/ML_with_Keras/SaveLoad/Pickle_filename','rb') as pr:
    data1_pickle, data2_pickle = pickle.load(pr)


# Sanity check
print(data2_pickle[20:22,3:6])

[[0.10586635 0.53926726 0.88506764]
 [0.92514506 0.43486768 0.111493  ]]


# MATLAB
### How to read data from Matlab files, i.e., .mat

In [None]:
# Method 1
import hdf5storage      # This package works for all .mat files, especially the matlab file version is v7.3 and works for everything
desired_name_for_loaded_file = hdf5storage.loadmat(FileName)  

# Method 2
import scipy.io as sio  # This package is less general and do not work for .mat files v7.3 
desired_name_for_loaded_file = sio.loadmat(FileName)

# HDF5 with pandas
See __[here](https://www.youtube.com/watch?v=beat9RO0-x4&index=8&list=PLea0WJq13cnB_ORdGzEkPlZEN20TSt6Lx)__.