# Pickling<br>
The pickle Module

A way to represent an object in a persistent way -> use in  disk, transmission<br>
Create an object’s representation ( Binary ) ->  serializing<br>
Reload object from representation -> deserializing.<br>
Both are called `marshalling`.<br>

Obj -> ( serialize) 0110101.. ( deserialize)-> Obj<br>

Pickle is a binary serialization (by default)<br>
Focus on dictionaries -> Can be used for other object types<br>

## Careful<br>
Unpickling (deserializing) can execute code -> not safe! <br>
Only unpickle data you trust<br>



## Usage<br>

import pickle

dump -> pickle to file<br>
load -> unpickle from file <br>
dumps -> returns a ( string ) pickled representation -> Store in a variable.<br>
loads -> Unpickle from supplied argument

## Equality and Identity<br>

equality -> `==`<br>
identity -> `is`<br>

dict1 -> (pickle ) 0101 ( unpickle ) -> dict2<br>

Both dict1 ad dict2 are different address.<br>

dict1 == dict2 ->  TrueCustom objects will need to implement `__eq__`<br>

dict1 is dict2 ->  False<br>

In [15]:
import os
import pickle
  

In [16]:
ser = pickle.dumps('Python Pickled Peppers')

In [17]:
ser

b'\x80\x03X\x16\x00\x00\x00Python Pickled Peppersq\x00.'

In [18]:
deser = pickle.loads(ser)

In [19]:
deser

'Python Pickled Peppers'

In [20]:
# same thing for numerics
ser = pickle.dumps(3.14)

In [21]:
ser

b'\x80\x03G@\t\x1e\xb8Q\xeb\x85\x1f.'

In [22]:
deser = pickle.loads(ser)

In [23]:
deser

3.14

In [24]:
d = [10, 20, ('a', 'b', 30)]

In [25]:
ser = pickle.dumps(d)

In [26]:
ser

b'\x80\x03]q\x00(K\nK\x14X\x01\x00\x00\x00aq\x01X\x01\x00\x00\x00bq\x02K\x1e\x87q\x03e.'

In [27]:
deser = pickle.loads(ser)

In [28]:
deser

[10, 20, ('a', 'b', 30)]

Note that the original and the deserialized objects are equal, but not identical:

In [29]:
d is deser, d == deser

(False, True)

This works the same way with sets too:

In [31]:
s = {'a', 'b', 'x', 10}

In [32]:
s

{10, 'a', 'b', 'x'}

In [33]:
ser = pickle.dumps(s)
print(ser)

b'\x80\x03cbuiltins\nset\nq\x00]q\x01(X\x01\x00\x00\x00xq\x02K\nX\x01\x00\x00\x00bq\x03X\x01\x00\x00\x00aq\x04e\x85q\x05Rq\x06.'


In [34]:
deser = pickle.loads(ser)
print(deser)

{'b', 10, 'x', 'a'}


### And finally, we can pickle dictionaries as well:

In [35]:
d = {'b': 1, 'a': 2, 'c': {'x': 10, 'y': 20}}

In [36]:
print(d)

{'b': 1, 'a': 2, 'c': {'x': 10, 'y': 20}}


In [37]:
ser = pickle.dumps(d)

In [38]:
ser

b'\x80\x03}q\x00(X\x01\x00\x00\x00bq\x01K\x01X\x01\x00\x00\x00aq\x02K\x02X\x01\x00\x00\x00cq\x03}q\x04(X\x01\x00\x00\x00xq\x05K\nX\x01\x00\x00\x00yq\x06K\x14uu.'

In [39]:
deser = pickle.loads(ser)

In [40]:
print(deser)

{'b': 1, 'a': 2, 'c': {'x': 10, 'y': 20}}


In [41]:
d == deser

True

What happens if we pickle a dictionary that has two of it's values set to another dictionary?

In [42]:
d1 = {'a': 10, 'b': 20}
d2 = {'x': 100, 'y': d1, 'z': d1}

In [43]:
print(d2)

{'x': 100, 'y': {'a': 10, 'b': 20}, 'z': {'a': 10, 'b': 20}}


Let's say we pickle `d2`

In [44]:
ser = pickle.dumps(d2)

Now let's unpickle that object:

In [45]:
d3 = pickle.loads(ser)

In [46]:
d3

{'x': 100, 'y': {'a': 10, 'b': 20}, 'z': {'a': 10, 'b': 20}}

Is that sub-dictionary still the same as the original one?

In [47]:
d3['y'] == d2['y']

True

In [48]:
d3['y'] is d2['y']

False

But consider the original dictionary `d2`: both the x and y keys referenced the same dictionary `d1`:

In [50]:
d2['y'] is d2['z']

True

In [51]:
d3['y'] == d3['z']

True

In [52]:
d1 = {'a': 1, 'b': 2}
d2 = {'x': 10, 'y': d1}
print(d1)
print(d2)
d1['c'] = 3
print(d1)
print(d2)

{'a': 1, 'b': 2}
{'x': 10, 'y': {'a': 1, 'b': 2}}
{'a': 1, 'b': 2, 'c': 3}
{'x': 10, 'y': {'a': 1, 'b': 2, 'c': 3}}


In [53]:
d1 = {'a': 1, 'b': 2}
d2 = {'x': 10, 'y': d1}
d1_ser = pickle.dumps(d1)
d2_ser = pickle.dumps(d2)

In [54]:
# simulate exiting the program, or maybe just restarting the notebook
del d1
del d2

In [55]:
d1

NameError: name 'd1' is not defined

In [56]:
# load the data back up
d1 = pickle.loads(d1_ser)
d2 = pickle.loads(d2_ser)

In [57]:
# and continue processing as before
print(d1)
print(d2)
d1['c'] = 3
print(d1)
print(d2)

{'a': 1, 'b': 2}
{'x': 10, 'y': {'a': 1, 'b': 2}}
{'a': 1, 'b': 2, 'c': 3}
{'x': 10, 'y': {'a': 1, 'b': 2}}


So just remember that as soon as you pickle a dictionary, whatever object references it had to another object is essentially lost.<br>
However, the pickle module is relatively intelligent and will not re-pickle an object it has already pickled - which means that **relative** references are preserved.

In [58]:
# Let's explain more
class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age
    
    def __eq__(self, other):
        return self.name == other.name and self.age == other.age
    
    def __repr__(self):
        return f'Person(name={self.name}, age={self.age})'

In [59]:
harsha = Person('harsha', 79)
mulanji = Person('mulanji', 75)
vardhana = Person('vardhana', 75)

In [60]:
parrot_sketch = {
    "title": "Parrot Sketch",
    "actors": [harsha, mulanji]
}

ministry_sketch = {
    "title": "Ministry of Silly Walks",
    "actors": [harsha, mulanji]
}

joke_sketch = {
    "title": "Funniest Joke in the World",
    "actors": [vardhana, mulanji]
}

In [61]:
fan_favorites = {
    "user_1": [parrot_sketch, joke_sketch],
    "user_2": [parrot_sketch, ministry_sketch]
}

In [62]:
from pprint import pprint
pprint(fan_favorites)

{'user_1': [{'actors': [Person(name=harsha, age=79),
                        Person(name=mulanji, age=75)],
             'title': 'Parrot Sketch'},
            {'actors': [Person(name=vardhana, age=75),
                        Person(name=mulanji, age=75)],
             'title': 'Funniest Joke in the World'}],
 'user_2': [{'actors': [Person(name=harsha, age=79),
                        Person(name=mulanji, age=75)],
             'title': 'Parrot Sketch'},
            {'actors': [Person(name=harsha, age=79),
                        Person(name=mulanji, age=75)],
             'title': 'Ministry of Silly Walks'}]}


we have some shared references

In [63]:
fan_favorites['user_1'][0] is fan_favorites['user_2'][0]

True

In [64]:
parrot_id_original = id(parrot_sketch)

In [65]:
ser = pickle.dumps(fan_favorites)

In [66]:
new_fan_favorites = pickle.loads(ser)

In [67]:
fan_favorites == new_fan_favorites

True

In [68]:
id(fan_favorites['user_1'][0]), id(new_fan_favorites['user_1'][0])

(140452512588712, 140452508462824)

same id

In [69]:
fan_favorites['user_1'][0] == new_fan_favorites['user_1'][0]

True

In [70]:
fan_favorites['user_1'][0] is fan_favorites['user_2'][0]

True

In [71]:
new_fan_favorites['user_1'][0] is new_fan_favorites['user_2'][0]

True

can see the `relative` relationship between objects that were `pickled` is preserved.