### 0 Serialization
We can serialize in various method:
- Serialize into JSON string: One way is to use Pydantic library, to create Pydantic objectsm and then simply use ```.json()``` method
- Serialize into bit-string: We can use Pickle library, and then ```pickle.dump()``` method.

#### 1 Serialization into byte stream: Nativ Python data structure
- Serialization in Python refers to the process of converting an object into a byte stream, which can then be stored in a file or transmitted over a network. This can be done using the pickle module in Python, which provides a way to serialize and deserialize Python objects.

- Here we consider the data in Dictionary as the Python native data structure
    - This is the most straight forward data- structure that can be serialized

- Let's assume we have a simple object, here just a dictionary
    - Serialization is the process of converting an in-memory object (like a dictionary, list, or custom object) into a format that can be easily saved to a file, sent over a network, or stored in a database. The pickle module in Python provides a way to serialize and deserialize Python objects.
    - When you use pickle.dumps(obj) in Python, the pickle module serializes the given object (obj) into a byte stream. This byte stream is a representation of the object in a format that can be stored or transmitted and later deserialized back into the original object.



##### 1-0 What is Byte-Serialization?
- Converting an object into a byte stream, in layman terms, means transforming the object's data into a sequence of bytes that can be stored in a file, sent over a network, or saved in a database. 
    - Think of it like taking a snapshot of an object's information and saving that snapshot in a form that can be easily transmitted or stored. When you need the object again, you can use this snapshot to recreate it exactly as it was.
- Imagine you have a detailed Lego structure. Converting the object to a byte stream is like taking a step-by-step manual of how the Lego pieces are put together and saving that manual. The manual itself isn't the Lego structure but contains all the information needed to recreate it.
    - When you want the structure back, you follow the manual to put the Lego pieces together in the exact same way. Similarly, deserializing the byte stream reconstructs the original object from the saved sequence of bytes.


- Use Cases
    - Saving State: Save the state of an object to a file or database.
    - Data Transmission: Send an object over a network in a serialized form.
    - Caching: Store a serialized object in a cache to improve performance.

##### 1-1 serialize 

In [22]:
import pickle

# Original object in Python-native-data-structure: a simple dictionary
original_obj = {'name': 'Alice', 'age': 30, 'city': 'Wonderland'}

# Serialize the dictionary to a byte stream
serialized_obj = pickle.dumps(original_obj)

print("Serialized object:", serialized_obj)

Serialized object: b'\x80\x04\x950\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x04name\x94\x8c\x05Alice\x94\x8c\x03age\x94K\x1e\x8c\x04city\x94\x8c\nWonderland\x94u.'


##### 1-2 Deserialization

In [23]:
# Deserialize the byte stream back to the original object
deserialized_obj = pickle.loads(serialized_obj)

print("Deserialized object:", deserialized_obj)

Deserialized object: {'name': 'Alice', 'age': 30, 'city': 'Wonderland'}


#### 2 More complicated Object: Non-Python-native-data-structure

##### 2-1 Object with attributes still in Oython-native-data-structure

- This is relatively simple, and pickle can bit-serialize it
- Look at the "complex_operation()" method that will not be part of the **state** of the object to be bit-serialize

In [24]:
import pickle
import math

# Define a class with a method that performs complicated operations
class SimpleCalculator:
    def __init__(self, data):
        self.data = data

    def complex_operation(self):
        # Example of a complicated operation
        return sum(math.sqrt(x) for x in self.data)

    def __repr__(self):
        return f"SimpleCalculator(data={self.data})"

    def __getstate__(self):
        # customize what should be pickled in python-native-data-structure
        state = self.__dict__.copy()  # The object state only will be  {'data': [1, 4, 9, 16]}, a python-native-data-structure
        return state

    def __setstate__(self, state):
        # Restore the state (the state that being sent previously to be serialize (here i.e.       state= {'data': [1, 4, 9, 16]}       ))
        self.__dict__.update(state)

# Create an instance of the class
calculator = SimpleCalculator([1, 4, 9, 16])
print(f"Original object: {calculator}")
print(f"Result of complex_operation: {calculator.complex_operation()}")

# explore the 
python_native_state = calculator.__getstate__()  # is we explore the self.data, inside the __getstate__ method:   ====>   {'data': [1, 4, 9, 16]}
print(f"\n The Python native state of created object: {python_native_state}")

# Serialize the instance to a byte stream
byte_stream = pickle.dumps(calculator)
print(f"\n Serialized byte stream: {byte_stream}")

# Deserialize the byte stream back to an object
deserialized_calculator = pickle.loads(byte_stream)
print(f"\n Deserialized object: {deserialized_calculator}")
print(f" Result of complex_operation: {deserialized_calculator.complex_operation()}")


Original object: SimpleCalculator(data=[1, 4, 9, 16])
Result of complex_operation: 10.0

 The Python native state of created object: {'data': [1, 4, 9, 16]}

 Serialized byte stream: b'\x80\x04\x95;\x00\x00\x00\x00\x00\x00\x00\x8c\x08__main__\x94\x8c\x10SimpleCalculator\x94\x93\x94)\x81\x94}\x94\x8c\x04data\x94]\x94(K\x01K\x04K\tK\x10esb.'

 Deserialized object: SimpleCalculator(data=[1, 4, 9, 16])
 Result of complex_operation: 10.0


##### 2-2 Class with non-native operations/methods into the state of an object
- Let's assume we have Image method from Pillow library
- We define a method to do some operations, based on this non-pyhon-native library
- We also define an attribute from above SimpleCalculator class
- **More importantly, all above are part of state of an object of this class**

How a native-python-bject, such as dictionary, is being created in pickling process:
- Customize what to serialize:
    - In Python's pickle module, the __getstate__ method is called automatically when an object is being serialized ```pickle.dumps(obj)```.
    - The self.__dict__ in this method originall includes all 3 attributes defined in __init__ method.
    - This ( __getstate__ ) method allows to define what attributes of the object should be included in the serialized state, and what to be excluded. 
    - If __getstate__ is not defined, pickle will by default serialize the object's __dict__.
- What to restore in deeserialization:
    - using __setstate__ method, when the ```pickle.loads()``` is called.

In [25]:
import pickle
from PIL import Image

# Define a class with non-Python-native operations
class ImageProcessor:
    def __init__(self, image_path):
        self.image_path = image_path
        self.image = Image.open(image_path)
        self.calculator = SimpleCalculator([1, 4, 9, 16])

    def process_image(self):
        # Example of a non-Python-native operation (image processing)
        return self.image.rotate(90)

    def __repr__(self):
        return f"ImageProcessor(image_path={self.image_path})"

    def __getstate__(self):
        # customize what should be pickled in python-native-data-structure
        state = self.__dict__.copy()
        # Remove the image from the state
        state.pop('image', None)
        return state

    def __setstate__(self, state):
        # Restore the state
        self.__dict__.update(state)
        # Reinitialize the image
        self.image = Image.open(self.image_path)

# Create an instance of the class
processor = ImageProcessor('example.jpg')
print(f"Original object: {processor}")
print(f"Processing image: {processor.process_image()}")

# Serialize the instance to a byte stream
byte_stream = pickle.dumps(processor)
print(f"\n Serialized byte stream: {byte_stream}")

# Deserialize the byte stream back to an object
deserialized_processor = pickle.loads(byte_stream)
print(f"\n Deserialized object: {deserialized_processor}")
print(f" Processing image: {deserialized_processor.process_image()}")


Original object: ImageProcessor(image_path=example.jpg)
Processing image: <PIL.Image.Image image mode=RGB size=591x500 at 0x7FD440208100>

 Serialized byte stream: b'\x80\x04\x95\x80\x00\x00\x00\x00\x00\x00\x00\x8c\x08__main__\x94\x8c\x0eImageProcessor\x94\x93\x94)\x81\x94}\x94(\x8c\nimage_path\x94\x8c\x0bexample.jpg\x94\x8c\ncalculator\x94h\x00\x8c\x10SimpleCalculator\x94\x93\x94)\x81\x94}\x94\x8c\x04data\x94]\x94(K\x01K\x04K\tK\x10esbub.'

 Deserialized object: ImageProcessor(image_path=example.jpg)
 Processing image: <PIL.Image.Image image mode=RGB size=591x500 at 0x7FD4402606D0>


### 3 Serialization into JSON-string, then into bit-string
- Here, we first create the JSON string of a Pydantic object
- then, we convert this to bit-string using ```.encode("utf-8)``` method. Above we used ```pickle.dumps``` to do the same


In [26]:
from pydantic import BaseModel, Field
from typing import Dict, Any

# Define your custom class
class Address(BaseModel):
    street: str
    city:str
    zip_code:str
    def __init__(self, **data):
        # Initialize the BaseModel attributes using the parent __init__
        super().__init__(**data)

    def __repr__(self):
        return f"Address(street={self.street}, city={self.city}, zip_code={self.zip_code})"

# Define a Pydantic model that includes the custom class
class UserProfile(BaseModel):
    name: str
    age: int = Field(..., ge=0, le=120)
    email: str
    address: Address  # This is a custom class field

    def __init__(self, **data):
        super().__init__(**data)
        # You can add custom initialization logic if needed

    def __repr__(self):
        return f"UserProfile(name={self.name}, age={self.age}, email={self.email}, address={self.address})"


In [33]:
# Custom function to parse the Address data
def parse_address(address_data: Dict[str, Any]) -> Address:
    address = Address(**address_data)
    return address

# Sample data
address_data = {
    "street": "123 Main St",
    "city": "Wonderland",
    "zip_code": "12345"
}

user_data = {
    "name": "Alice",
    "age": 30,
    "email": "alice@example.com",
    "address": parse_address(address_data)  # Convert custom address data
}

# Create an instance of the Pydantic model
user = UserProfile(**user_data)
print("User profile created successfully:")
print(user)

# Serialize the Pydantic model to a JSON string
user_json = user.model_dump_json()
print("\nSerialized JSON string:")
print(user_json)

# Serialize the instance to a byte stream
byte_stream = pickle.dumps(user_json)
print(f"\nbyte-string of above JSON-string serialization:")
print(byte_stream)

# Deserialize the JSON string back to a Pydantic model
user_from_json = UserProfile.model_validate_json(user_json)
print("\nDeserialized Pydantic model:")
print(user_from_json)

User profile created successfully:
name='Alice' age=30 email='alice@example.com' address=Address(street=123 Main St, city=Wonderland, zip_code=12345)

Serialized JSON string:
{"name":"Alice","age":30,"email":"alice@example.com","address":{"street":"123 Main St","city":"Wonderland","zip_code":"12345"}}

byte-string of above JSON-string serialization:
b'\x80\x04\x95\x83\x00\x00\x00\x00\x00\x00\x00\x8c\x7f{"name":"Alice","age":30,"email":"alice@example.com","address":{"street":"123 Main St","city":"Wonderland","zip_code":"12345"}}\x94.'

Deserialized Pydantic model:
name='Alice' age=30 email='alice@example.com' address=Address(street=123 Main St, city=Wonderland, zip_code=12345)


In [32]:
import json
json.dumps(user_json)

'"{\\"name\\":\\"Alice\\",\\"age\\":30,\\"email\\":\\"alice@example.com\\",\\"address\\":{\\"street\\":\\"123 Main St\\",\\"city\\":\\"Wonderland\\",\\"zip_code\\":\\"12345\\"}}"'

In [34]:
json.dumps(user_json).encode("utf-8")

b'"{\\"name\\":\\"Alice\\",\\"age\\":30,\\"email\\":\\"alice@example.com\\",\\"address\\":{\\"street\\":\\"123 Main St\\",\\"city\\":\\"Wonderland\\",\\"zip_code\\":\\"12345\\"}}"'

In [None]:
# just out curosity that "UTF-8" does not always print out the usual-text-like sentences, when the original text in not from ASCII

text = "Hello, 世界!"                   # This is a text string with non-ASCII characters
encoded_text = text.encode("utf-8")     # Encoding the string into bytes
print(encoded_text)                     # Output: b'Hello, \xe4\xb8\x96\xe7\x95\x8c!'