### Day 02: Date : 31 Jul 2022

**So, what is serialization?**

> Serialization is the process of converting an object into a stream of bytes to store the object or transmit it to memory, a database, or a file. Its main purpose is to save the state of an object in order to be able to recreate it when needed. - https://bit.ly/ms-defn-serialization

There are multiple ways to achieve this. 2 of the most common ways are to either convert it to a pickle file or a json file. The lesser known ways are to use something called an in-built module  called `marshal` or to install a library called `msgpack`. The reason most people don't use marshal or are never marketed it is because by design, it's not meant for general Python objects. From the official Python documentation - "The marshal module exists mainly to support reading and writing the “pseudo-compiled” code for Python modules of .pyc files.".
But for the purposes of this experiment, we will look at how it compares to pickling. For the most part, the timings are comparable and again to quote from the documentation in favor of `pickle` over `marshal`, "the performance is comparable, version independence is guaranteed, and pickle supports a substantially wider range of objects than marshal.".

Spoiler alert : JSON is the slowest of the lot.

In [5]:
import os

import json
import pickle
import marshal
import msgpack

from utils import timing, get_file_size


@timing
def test_marshal_timing(filename, data):
    with open(f"outputs/{filename}", 'wb') as f:
        marshal.dump(data, f)
        print(f'Filename : {filename}. Filesize: {get_file_size(f"outputs/{filename}")} MB')


@timing
def test_json_timing(filename, data):
    with open(f"outputs/{filename}", "w") as f:
        json.dump(data, f)
        print(f'Filename : {filename}. Filesize: {get_file_size(f"outputs/{filename}")} MB')
    
@timing
def test_pickle_timing(filename, data):
    with open(f"outputs/{filename}", 'wb') as f:
        pickle.dump(data, f, protocol=pickle.HIGHEST_PROTOCOL)
        print(f'Filename : {filename}. Filesize: {get_file_size(f"outputs/{filename}")} MB')
        
@timing
def test_msgpack_timing(filename, data):
    with open(f"outputs/{filename}", "wb") as f:
        packed = msgpack.packb(data)
        f.write(packed)
        print(f'Filename : {filename}. Filesize: {get_file_size(f"outputs/{filename}")} MB')

end_limit = 1000000
full_string = "a"*end_limit
full_list = list(range(end_limit))
test_marshal_timing('datafile.dat', full_string)        
test_pickle_timing('filename.pickle', full_string)
test_msgpack_timing("filename.msgpack", full_string)
test_json_timing("filename.json", full_string)


Filename : datafile.dat. Filesize: 0.95 MB
func:test_marshal_timing took: 0.0030 sec

Filename : filename.pickle. Filesize: 0.95 MB
func:test_pickle_timing took: 0.0020 sec

Filename : filename.msgpack. Filesize: 0.95 MB
func:test_msgpack_timing took: 0.0020 sec

Filename : filename.json. Filesize: 0.95 MB
func:test_json_timing took: 0.0200 sec



Github link : https://github.com/everythingpython/ndaysofpython/blob/main/Day02.ipynb