#### Introduction

Serialising and deserialising objects is useful for persistence of data (even after a program has terminated) and/or transmission.

# 01 - Pickling

#### Lecture

This is a python-specific mechanism to serialise/deserialise objects using **binary** (by default) representation.

While pickling applies to *more* than just python dictionaries, we will focus on dictionaries here because of JSON - it's easy to serialise/deserialise them into JSON.

But not all data types are serialisable; `datetime`s, for example, don't serialise without loss of data, but there are 3rd party libraries that solve these problems (marshmallow).

**Object/Data Marshalling** is the process of serialising **and** deserialising objects/data:

`obj -- serialise --> 0101001110011... -- deserialise --> obj`

Unpickling data can be **dangerous** because they can **execute code**.

##### Usage

```import pickle```

`dump` -> pickle to file

`load` -> unpickle from file

`dumps` -> returns a string pickled representation that can be stored in a variable

`loads` -> unpickles from a string

##### Equality and Identity

A pickled object does not contain information of its ID. Therefore, if a dictionary `dict_1` is pickled and then unpickled, the final dictionary `dict_2` will have a different ID to the original.

`dict_1 == dict_2` but `dict_1 is not dict_2`

Serialising/Deserialising data behaves very similar to making deepcopies. If we deepcopy an object which contains two identical references to the same object, then, the copy will ensure that the relationship is maintained. To elaborate with an example:

```python
my_list = [1, 2]
l1 = ['a', 'b', my_list, my_list]

l1[2] == l1[3] --> True
l1[2] is l1[3] --> True

l2 = deepcopy(l1)
l2 -> ['a', 'b', [1, 2], [1, 2]

l2[2] == l2[3] --> True
l2[2] is l2[3] --> True
```

So Python sees the shared reference of `l1[2]` and `l2[3]` pointing to `my_list` and it replicates that relationship in the copy

#### Coding

##### `.dumps()` and `.loads()`

We can pickle **strings**:

In [1]:
import pickle

In [2]:
ser = pickle.dumps('Python Pickle Peppers')
ser

b'\x80\x04\x95\x19\x00\x00\x00\x00\x00\x00\x00\x8c\x15Python Pickle Peppers\x94.'

In [3]:
deser = pickle.loads(ser)
deser

'Python Pickle Peppers'

And **floats/integers**:

In [4]:
ser = pickle.dumps(3.14)
ser

b'\x80\x04\x95\n\x00\x00\x00\x00\x00\x00\x00G@\t\x1e\xb8Q\xeb\x85\x1f.'

In [5]:
deser = pickle.loads(ser)
deser

3.14

And **sets**:

In [13]:
ser = pickle.dumps({'a', 'b', 10})
ser

b'\x80\x04\x95\x0f\x00\x00\x00\x00\x00\x00\x00\x8f\x94(\x8c\x01a\x94K\n\x8c\x01b\x94\x90.'

In [14]:
deser = pickle.loads(ser)
deser

{10, 'a', 'b'}

And **lists/tuples**:

In [10]:
l1 = [10, 20, ('a', 'b', 30)]
ser = pickle.dumps(l1)
ser

b'\x80\x04\x95\x15\x00\x00\x00\x00\x00\x00\x00]\x94(K\nK\x14\x8c\x01a\x94\x8c\x01b\x94K\x1e\x87\x94e.'

In [11]:
l2 = pickle.loads(ser)
l2

[10, 20, ('a', 'b', 30)]

But remember that the IDs will **change**. They are **equal** but not **identical**.

In [12]:
print(f"{l1 == l2 = }")
print(f"{l1 is l2 = }")

l1 == l2 = True
l1 is l2 = False


And **dictionaries**:

In [15]:
from datetime import datetime

d = {
    'a': 100,
    'b': [1, 2, 3],
    'c': (1, 2, 3),
    'd': {'x': 1 + 1j, 'y': datetime.utcnow()}
}

ser = pickle.dumps(d)
ser

b'\x80\x04\x95\x8b\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x01a\x94Kd\x8c\x01b\x94]\x94(K\x01K\x02K\x03e\x8c\x01c\x94K\x01K\x02K\x03\x87\x94\x8c\x01d\x94}\x94(\x8c\x01x\x94\x8c\x08builtins\x94\x8c\x07complex\x94\x93\x94G?\xf0\x00\x00\x00\x00\x00\x00G?\xf0\x00\x00\x00\x00\x00\x00\x86\x94R\x94\x8c\x01y\x94\x8c\x08datetime\x94\x8c\x08datetime\x94\x93\x94C\n\x07\xe8\x01\x1c\x13: \x07\x94\xda\x94\x85\x94R\x94uu.'

In [16]:
deser = pickle.loads(ser)
deser

{'a': 100,
 'b': [1, 2, 3],
 'c': (1, 2, 3),
 'd': {'x': (1+1j), 'y': datetime.datetime(2024, 1, 28, 19, 58, 32, 496858)}}

As mentioned in the lecture, shared reference relationships are maintained with serialising/deserialising just like with deepcopies:

In [18]:
my_dict = {'a': 10, 'b': 20}
d = {'x': 100, 'y': my_dict, 'z': my_dict}

print(d['y'] == d['y'])
print(d['y'] is d['y'])

True
True


In [19]:
ser = pickle.dumps(d)
d2 = pickle.loads(ser)

print(d2['y'] == d2['y'])
print(d2['y'] is d2['y'])

True
True


# 02 - JSON Serialization

# 03 - Custom JSON Encoding

# 04 - Custom Encoding using JSONEncoder

# 05 - Custom JSON Decoding

# 06 - Using JSONDecoder

# 07 - JSONSchema

# 08 - Marshmallow

# 09 - YAML

# 10 - Serpy