In [12]:
import pickle
import os
import numpy as np

os.getcwd()

'c:\\Users\\tpr16\\OneDrive - The Pennsylvania State University\\TDA\\Neuro_stat_papers\\Mapper Algorithm\\fmri-tensor-based-analysis\\basic python'

## Serialization and Deserialization of Python objects.

References: 

1. (Main) https://www.datacamp.com/tutorial/pickle-python-tutorial

2. https://machinelearningmastery.com/a-gentle-introduction-to-serialization-for-python/

In Python, we work with high-level data structures such as lists, tuples, and sets. However, when we want to store these objects in memory, they need to be converted into a sequence of bytes that the computer can understand. This process is called **serialization (pickling)**.

The next time we want to access the same data structure, this sequence of bytes must be converted back into the initial high-level object in a process known as **deserialization (unpickling)**.

- `pickle` and `h5py` are two common serialization libraries in Python to serialize data objects like dictionaries and Tensorflow models in Python for storage and transmission.
- We focus on `pickle` library.

The pickle module is part of the Python standard library and implements methods to serialize and deserialize Python objects.

In [2]:
# understanding importance of serialization
# define a nested dictionary
students = {
  'Student 1': {
        'Name': "Alice", 'Age' :10, 'Grade':4,
    },
   
    'Student 2': {
        'Name':'Bob', 'Age':11, 'Grade':5
    },
   
    'Student 3': {
        'Name':'Elena', 'Age':14, 'Grade':8
    }
}

# dictionary type object
type(students)

dict

In [3]:
with open('student_info.txt','w') as data:
      data.write(str(students))

Notice that since we can only write string objects to text files, we have converted the dictionary to a string using the `str()`function. This means that the original state of our dictionary is lost.

In [None]:
with open("student_info.txt", 'r') as f:
    print(f.read())

In [4]:
with open("student_info.txt", 'r') as f:
    for students in f:
        print(students)


type(students)

{'Student 1': {'Name': 'Alice', 'Age': 10, 'Grade': 4}, 'Student 2': {'Name': 'Bob', 'Age': 11, 'Grade': 5}, 'Student 3': {'Name': 'Elena', 'Age': 14, 'Grade': 8}}


str

The nested dictionary is now being printed as a string, and will return an error when we try to access its keys or values. This is where serialization comes in. When dealing with more complex data types like dictionaries, data frames, and nested lists, serialization allows the user to preserve the object’s original state without losing any relevant information.

Python’s Pickle module is a popular format used to serialize and deserialize data types. This format is native to Python, meaning Pickle objects cannot be loaded using any other programming language. So one may struggle to deserialize pickled objects when using a different language.

## Serialization: Python object to byte object
The Pickle `pickle.dump()` and `pickle.dumps()` functions are used to serialize an object. The only difference between them is that `dump()` writes the data to a file, while `dumps()` represents it as a byte object. 
- `pickle.dump(obj, file, protocol=None, *, fix_imports=True, buffer_callback=None)`
- `pickle.dumps(obj, protocol=None, *, fix_imports=True, buffer_callback=None)`

In [8]:
# Example 1: create simple list
student_names = ["Alice", "Bob", "Elena", "Jane", "Kyle"]

with open("student_file.pkl", mode="wb") as f: # open file to write in binary format
    pickle.dump(obj=student_names, file=f)     # "dump" or serialize the list into the file student_file.pkl

# We chose "wb" mode because we want to return a byte object 

## Deserialization: byte object to Python object
Similarly, `pickle.load()` reads pickled objects from a file, whereas `pickle.loads()` deserializes them from a bytes-like object.
- `pickle.load(file, *, fix_imports=True, encoding='ASCII', errors='strict', buffers=None)`
- `pickle.loads(data, /, *, fix_imports=True, encoding="ASCII", errors="strict", buffers=None)`

In [11]:
# notice that the list type of the object is retained after deserialization

with open("student_file.pkl", "rb") as f:
    student_names_loaded = pickle.load(f)  # deserialize the byte type object
    print(student_names_loaded); print(type(student_names_loaded))

['Alice', 'Bob', 'Elena', 'Jane', 'Kyle']
<class 'list'>


### Example 2: using pandas DataFrame

## NumPy: Convert a given array into bytes, and load it as array
* This problem involves writing a NumPy program to convert a given array into a bytes object and then load it back as an array
* The task requires using NumPy's `tobytes()` method to serialize the array into a bytes object and the `frombuffer()` method to deserialize the bytes back into an array.

Ref: https://www.askpython.com/python-modules/gzip-module-in-python

In [18]:
a = np.array([1, 2, 3, 4, 5, 6])
print("Original array:")
print(a); print(type(a))

Original array:
[1 2 3 4 5 6]
<class 'numpy.ndarray'>


In [27]:
# Converting the array 'a' to bytes using a.tobytes() method and storing it in 'a_bytes'
a_bytes = a.tobytes()
a.dtype

dtype('int32')

In [15]:
# Creating a new NumPy array 'a2' by reading from 'a_bytes' using np.frombuffer()
# Specifying the datatype of 'a2' as the same as 'a' using a.dtype
a2 = np.frombuffer(buffer=a_bytes, dtype=a.dtype)

In [17]:
# Printing a message indicating the content of the array after loading from bytes
print("After loading, content of the text file:")
print(a2); print(type(a2))

After loading, content of the text file:
[1 2 3 4 5 6]
<class 'numpy.ndarray'>


In [19]:
# Checking if both arrays 'a' and 'a2' are equal using np.array_equal()
print(np.array_equal(a, a2)) 

True


In [29]:
a2_new = np.frombuffer(buffer=a_bytes, dtype=a.dtype, offset=4)
print(a2_new)

[2 3 4 5 6]
