# Data serialization

Python can only write strings to a file. If you want to write a dictionary to a file, which will be later read by another script in order to recreate that dictionary, well... you're screwed. That's where data serialization comes in. It provides a way of recording this sort of stuff into a file automatically. You won't need to worry about making up a format and implementing all the rules about it.

In [1]:
with open('woops.txt', 'w') as opened_file:
    opened_file.write({'k1': 1, 'k2': 2})

TypeError: write() argument must be str, not dict

# Pickle
This is the default serialization method for Python. It's got its drawbacks. First, pickled data can only be read by Python. Second, it's not considered secure. You could unpickle data that contains malicious commands that would run during the unpickling operation that could screw you bad.

Pickle saves data in binary format, therefore you must open files in binary mode.

In [3]:
import pickle

dog1 = {'name': 'Circó', 'age': 10, 'size': 'small'}
dog2 = {'name': 'Júpiter', 'age': 6, 'size': 'medium'}
dogs = (dog1, dog2)

with open('pickledData', 'wb') as o_f:
    pickle.dump(dogs, o_f)
    
with open('pickledData', 'rb') as o_f:
    obj = pickle.load(o_f)
    print(type(obj))
    print(obj)

<class 'tuple'>
({'name': 'Circó', 'age': 10, 'size': 'small'}, {'name': 'Júpiter', 'age': 6, 'size': 'medium'})


# JSON
JSON stands for JavaScript Object Notation, and it has been adopted as default inter plataform data interchange. It can be easily understood by machines and humans and it's also lightweight.

When compared to Pickle, besides what has just been said, it's got the advantage that it's secure because no malicious code can be executed throught it, but it's got a disadvantage: it can not represent all types of Python objects, nor objects of user defined classes.

In short, there are some situations in which you'll use Pickle and others where JSON will be your choice.

First let's talk about serializing data into JSON format. Below you can see two method that do it. One of them saves the JSONed data into a file and the other one returns a string with the JSONed data. Notice that JSON files are <b>not</b> binary.

I'll have to set the <code>ensure_ascii</code> argument to false in the examples bellow because I don't what my dearest pets' names to get screwed.

In [11]:
import json

dog1 = {'name': 'Circó', 'age': 10, 'size': 'small'}
dog2 = {'name': 'Júpiter', 'age': 6, 'size': 'medium'}
dogs = (dog1, dog2)

#Saving to a file
with open('dogs.json', 'w') as o_f:
    json.dump(dogs, o_f, ensure_ascii = False)
    
#Retrieving string
json_string = json.dumps(dogs, ensure_ascii = False)
print(json_string)

[{"name": "Circó", "age": 10, "size": "small"}, {"name": "Júpiter", "age": 6, "size": "medium"}]


Wanna make it more friendlier to human eyes? 

In [32]:
import json

dog1 = {'name': 'Circó', 'age': 10, 'size': 'small'}
dog2 = {'name': 'Júpiter', 'age': 6, 'size': 'medium'}
dogs = (dog1, dog2)

#Saving to a file
with open('dogs_indent.json', 'w') as o_f:
    json.dump(dogs, o_f, ensure_ascii = False, indent = 4)
    
#Retrieving string
json_string = json.dumps(dogs, ensure_ascii = False, indent = 4)
print(json_string)

[
          {
                    "name": "Circó",
                    "age": 10,
                    "size": "small"
          },
          {
                    "name": "Júpiter",
                    "age": 6,
                    "size": "medium"
          }
]


Now lets deserialize stuff. First from a file.

In [13]:
import json

with open('dogs_indent.json') as o_f:
    obj = json.load(o_f)
    print(type(obj))
    print(obj)

<class 'list'>
[{'name': 'Circó', 'age': 10, 'size': 'small'}, {'name': 'Júpiter', 'age': 6, 'size': 'medium'}]


Now let's deserialize data from a string that has JSONed stuff.

In [20]:
import json

jsoned_data = json.dumps(({"name": "Circó", "age": 10, "size": "small"}, {"name": "Júpiter", "age": 6, "size": "medium"}), indent = 4, ensure_ascii = False)
print(jsoned_data)
print('------------------')

python_obj = json.loads(jsoned_data)
print(type(python_obj))
print(python_obj)

[
    {
        "name": "Circó",
        "age": 10,
        "size": "small"
    },
    {
        "name": "Júpiter",
        "age": 6,
        "size": "medium"
    }
]
------------------
<class 'list'>
[{'name': 'Circó', 'age': 10, 'size': 'small'}, {'name': 'Júpiter', 'age': 6, 'size': 'medium'}]


Do you remember it was said that JSON does not support all python types of objects? Well, if you paid attention you'll probably notice that in our examples bellow tuples were converted to lists. In this case there was no exception raised, only an undesirable type casting. This is not always the case, though. Check what happens when you try to serialize a set.

In [22]:
import json

my_set = {1, 2, 3}
json_string = json.dumps(my_set)

TypeError: Object of type set is not JSON serializable

What if we try to serialize a user defined class?

In [25]:
class MyClass():
    pass

my_obj = MyClass()
json_string = json.dumps(my_obj)

TypeError: Object of type MyClass is not JSON serializable

Same shit! Now let's check if Pickle could handle it!

In [31]:
import pickle

class MyClass():
    def __init__(self, a, b):
        self.a = a
        self.b = b
        
    def __str__(self):
        return f'a is {self.a} and b is {self.b}'
        
my_obj = MyClass(1, 2)

with open('pickledData2', 'wb') as o_f:
    pickle.dump(my_obj, o_f)
    
with open('pickledData2', 'rb') as o_f:
    unpickled_obj = pickle.load(o_f)
    print(unpickled_obj)

a is 1 and b is 2
