# Serialization 

### Topics

- serializing objects
- deserializing objects
- pickle
- JSON

## Serializing objects

- objects created in memory are not persistent
    - they hold a lot of important information/states of objects as the program is being executed
- to make an object persistent, we need to create a series of bytes that represent the state of the object, and write those bytes to a file
    - need to encode objects (**serializing**)
    - also need to decode objects from series of bytes (**deserializing**)
- web services often use a service described as **RESTful**
    - REST - REpresentational State Transfer
- the server and client will exchange representation of objects using RESTful services
- several ways to serialize and objects
- Pickle, JSON, XML, etc.

## Serializing objects using pickle

- Python `pickle` module is an object-oriented way to store object state directly in a special storage format
- essentially converts an object's state (and all the state of all the objects it holds as attributes) into a series of bytes
    - these bytes then can be transported or stored however we see fit
- pickle file are for temporary persistence
    - Pickled file using Python 3.7 may not be unpickled by newer version of Python or vice versa
- NOTE - pickle module is not secure
    - only unpickle data you trust!

In [15]:
import pickle

In [16]:
help(pickle)

Help on module pickle:

NAME
    pickle - Create portable serialized representations of Python objects.

MODULE REFERENCE
    https://docs.python.org/3.10/library/pickle.html
    
    The following documentation is automatically generated from the Python
    source files.  It may be incomplete, incorrect or include features that
    are considered implementation detail and may vary between Python
    implementations.  When in doubt, consult the module reference at the
    location listed above.

DESCRIPTION
    See module copyreg for a mechanism for registering custom picklers.
    See module pickletools source for extensive comments.
    
    Classes:
    
        Pickler
        Unpickler
    
    Functions:
    
        dump(object, file)
        dumps(object) -> string
        load(file) -> object
        loads(bytes) -> object
    
    Misc variables:
    
        __version__
        format_version
        compatible_formats

CLASSES
    builtins.Exception(builtins.BaseException)


In [3]:
some_data = ['a list', 'containing', 5, 'items', {"including": ("str", "int", "dict", "list", "tuple")}]

In [4]:
some_data

['a list',
 'containing',
 5,
 'items',
 {'including': ('str', 'int', 'dict', 'list', 'tuple')}]

In [7]:
# let's pickle/serialize the object list object
with open("pickled_list.pkl", 'wb') as file:
    pickle.dump(some_data, file)

In [8]:
# lets deserialize/load the object from file
with open("pickled_list.pkl", 'rb') as file:
    loaded_list = pickle.load(file)

In [9]:
type(loaded_list)

list

In [10]:
loaded_list

['a list',
 'containing',
 5,
 'items',
 {'including': ('str', 'int', 'dict', 'list', 'tuple')}]

In [12]:
some_data == loaded_list

True

In [13]:
loaded_list.append([1, 2, 3])

In [14]:
loaded_list

['a list',
 'containing',
 5,
 'items',
 {'including': ('str', 'int', 'dict', 'list', 'tuple')},
 [1, 2, 3]]

## Serializing objects using JSON

- there are many text-based format to exchange data
- XML - Extensive Markup Language (XML)
- YAML - Yet Another Markup Language
- CSV - Comma-Separated Value
- most of these techniques have obscure features that can be exploited from security point of view
    - allow arbitrary commands to be executed on the host machine

### JSON - JavaScript Object Notation

- human-readable text-based format for exchanging data
- one of the most popular tecnique used by RESTful API services
- JSON as the name says is more popular in JavaScript language to transfer data to a browswer from the web server
- JSON though popular is not as robust as the **pickle** module
- it can serialize only basic data: 
    - integers, floats, strings and simple containers such as list and dictionaries
- generally, `json` module's functions try to serialize the object's state using the value of the object's `__dict__` attribute
- json provides **dump** and **dumps**, **load** and **loads** functions

In [17]:
import json

In [18]:
help(json)

Help on package json:

NAME
    json

MODULE REFERENCE
    https://docs.python.org/3.10/library/json.html
    
    The following documentation is automatically generated from the Python
    source files.  It may be incomplete, incorrect or include features that
    are considered implementation detail and may vary between Python
    implementations.  When in doubt, consult the module reference at the
    location listed above.

DESCRIPTION
    JSON (JavaScript Object Notation) <https://json.org> is a subset of
    JavaScript syntax (ECMA-262 3rd edition) used as a lightweight data
    interchange format.
    
    :mod:`json` exposes an API familiar to users of the standard library
    :mod:`marshal` and :mod:`pickle` modules.  It is derived from a
    version of the externally maintained simplejson library.
    
    Encoding basic Python object hierarchies::
    
        >>> import json
        >>> json.dumps(['foo', {'bar': ('baz', None, 1.0, 2)}])
        '["foo", {"bar": ["baz", nul

In [19]:
class Contact:
    def __init__(self, first, last):
        self.first = first
        self.last = last
        
    @property
    def full_name(self):
        return f'{self.first} {self.last}'

In [20]:
c = Contact("John", "Smith")

In [22]:
c.full_name

'John Smith'

In [23]:
json.dumps(c.__dict__)

'{"first": "John", "last": "Smith"}'

In [25]:
# better approach
from typing import Any
import json

class ContactEncoder(json.JSONEncoder):
    def default(self, obj: Any) -> Any:
        if isinstance(obj, Contact):
            return {
                "__class__": "Contact",
                "first": obj.first,
                "last": obj.last,
                "full_name": obj.full_name,
            }
        return super().default(obj)

In [26]:
c = Contact("John", "Smith")
text = json.dumps(c, cls=ContactEncoder)

In [27]:
text

'{"__class__": "Contact", "first": "John", "last": "Smith", "full_name": "John Smith"}'

In [28]:
def decode_contact(json_object: Any) -> Any:
    if json_object.get("__class__") == "Contact":
        return Contact(json_object["first"], json_object["last"])
    else:
        return json_object

In [29]:
c2 = json.loads(text, object_hook=decode_contact)

In [30]:
c2.full_name

'John Smith'

In [31]:
some_text = ('{"__class__": "Contact", "first": "Milli", "last": "Dale", '
            '"full_name": "Milli Dale"}')

In [32]:
c3 = json.loads(some_text, object_hook=decode_contact)

In [33]:
c3.full_name

'Milli Dale'

In [34]:
help(c3)

Help on Contact in module __main__ object:

class Contact(builtins.object)
 |  Contact(first, last)
 |  
 |  Methods defined here:
 |  
 |  __init__(self, first, last)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  ----------------------------------------------------------------------
 |  Readonly properties defined here:
 |  
 |  full_name
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)

