# Serialization: Save it for later
*Serialization* refers to the process of outputting data (and occasionally functions) to a database or a regular file, for the purpose of using it later on. In the very early days of programming languages, this was normally done in regular text files. Python is excellent at text processing, and you probably already know enough to get started with this.

When accessing large amounts of data became important, people developed database software based around the Structured Query Language (SQL) standard. I'm not going to cover SQL here, but, if you're interested, I recommend using the [sqlite3](http://docs.python.org/2/library/sqlite3.html) module in the Python standard library.

As data interchange became important, the eXtensible Markup Language (XML) has emerged. XML makes data formats that are easy to write parsers for, greatly simplifying the ambiguity that sometimes arises in the process. Again, I'm not going to cover XML here, but if you're interested in learning more, look into [Element Trees](http://docs.python.org/2/library/xml.etree.elementtree.html), now part of the Python standard library.

## pickle — Python object serialization

Python has a very general serialization format called **pickle** that can turn any Python object, even a function or a class, into a representation that can be written to a file and read in later. The pickle module implements binary protocols for serializing and de-serializing a Python object structure. 

“Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, whereby a byte stream (from a binary file or bytes-like object) is converted back into an object hierarchy. Pickling (and unpickling) is alternatively known as “serialization”, “marshalling,” 1 or “flattening”; however, to avoid confusion, the terms used here are “pickling” and “unpickling”.

Let's save a dictionary into a pickle file.

In [None]:
# Save a dictionary into a pickle file.
import pickle
favorite_color = { "lion": "yellow", "kitty": "red" }
pickle.dump( favorite_color, open( "favorite_color.p", "wb" ) )

In [None]:
# Load the dictionary back from the pickle file.
import pickle
del favorite_color # This command deletes the variable brom memory.
favorite_color = pickle.load( open( "favorite_color.p", "rb" ) )
print(favorite_color)

Be aware that not all the objects can be always pickled, but most. To learn more about `pickle` check its [documentation](https://docs.python.org/3/library/pickle.html).


### Isues with pickle

<div class="alert alert-danger"> The pickle module is not secure. Only unpickle data you trust.

It is possible to construct malicious pickle data which will execute arbitrary code during unpickling. Never unpickle data that could have come from an untrusted source, or that could have been tampered with.

Consider signing data with [hmac](https://docs.python.org/3/library/hmac.html#module-hmac) if you need to ensure that it has not been tampered with.

Safer serialization formats such as [json](https://docs.python.org/3/library/json.html#module-json) may be more appropriate if you are processing untrusted data. [See Comparison with json](https://docs.python.org/3/library/pickle.html#comparison-with-json).

## Other serialization methods:

### JSON

The format [JavaScript Object Notation](http://json.org/) (JSON) that has become very popular over the past few years. [There's a module in the standard library](http://docs.python.org/3/library/json.html) for encoding and decoding JSON formats. The reason I like JSON so much is that it looks almost like Python, so that, unlike the other options, you can look at your data and edit it, use it in another program, etc.

Here's a little example:

In [None]:
# Data in a json format:
json_document = """\
{
    "a": [1,2,3],
    "b": [4,5,6],
    "greeting" : "Hello"
}"""
import json
json.loads(json_document)

Your data sits in something that looks like a Python dictionary, and in a single line of code, you can load it into a Python dictionary for use later.

In the same way, you can, with a single line of code, put a bunch of variables into a dictionary, and then output to a file using json:

In [None]:
json.dumps({"a":[1,2,3],"b":[9,10,11],"greeting":"Hola"})

### YAML

YAML is a is a human-readable data-serialization language. JSON syntax is a basis of YAML version. YAML is in fact a superset of JSON that aims to maintain compatibility with JSON. Most JSON documents can be parsed with a YAML parser. This is because JSON's semantic structure is equivalent to the optional "inline-style" of writing YAML. While extended hierarchies can be written in inline-style like JSON, this is not a recommended YAML style except when it aids clarity.

YAML has many additional features lacking in JSON, including comments, extensible data types, relational anchors, strings without quotation marks, and mapping types preserving key order.

<div class="alert alert-danger"> If I have to choose between YAML and JSON, YAML is the one to go

YAML is not included as a standard python library. Therefore, we will need to install a package to use it.
```
conda install pyyaml
```

In [None]:
yaml_document = """
  a: 1
  b:
    c: 3
    d: 4
"""
import yaml
yaml.load(yaml_document, Loader=yaml.FullLoader)

In [None]:
yaml_document = """
  a: 1
  b:
    c: 3
    d: 4
"""
import yaml
yaml.dump(yaml.load(yaml_document, Loader=yaml.FullLoader))

For more information about [yaml check.](https://pyyaml.org/wiki/PyYAMLDocumentation#loading-yaml)

### XML

This section is a summary of the following [tutorial](https://www.tutorialspoint.com/python/python_xml_processing.htm)

XML is a portable, open source language that allows programmers to develop applications that can be read by other applications, regardless of operating system and/or developmental language.

The Extensible Markup Language (XML) is a markup language much like HTML or SGML. This is recommended by the World Wide Web Consortium and available as an open standard.

mXML is extremely useful for keeping track of small to medium amounts of data without requiring a SQL-based backbone.

For the XML code examples, we will use a simple XML file movies.xml as an input:

#### Parsing XML with SAX APIs

SAX is a standard interface for event-driven XML parsing. Parsing XML with SAX generally requires you to create your own ContentHandler by subclassing xml.sax.ContentHandler.

Your ContentHandler handles the particular tags and attributes of your flavor(s) of XML. A ContentHandler object provides methods to handle various parsing events. Its owning parser calls ContentHandler methods as it parses the XML file.

In [None]:
import xml.sax

class MovieHandler( xml.sax.ContentHandler ):
    def __init__(self):
        self.CurrentData = ""
        self.type = ""
        self.format = ""
        self.year = ""
        self.rating = ""
        self.stars = ""
        self.description = ""

    # Call when an element starts
    def startElement(self, tag, attributes):
        self.CurrentData = tag
        if tag == "movie":
            print("*****Movie*****")
            title = attributes["title"]
            print("Title:", title)

    # Call when an elements ends
    def endElement(self, tag):
        if self.CurrentData == "type":
            print("Type:", self.type)
        elif self.CurrentData == "format":
            print("Format:", self.format)
        elif self.CurrentData == "year":
            print("Year:", self.year)
        elif self.CurrentData == "rating":
            print("Rating:", self.rating)
        elif self.CurrentData == "stars":
            print("Stars:", self.stars)
        elif self.CurrentData == "description":
             print("Description:", self.description)
        self.CurrentData = ""

    # Call when a character is read
    def characters(self, content):
        if self.CurrentData == "type":
            self.type = content
        elif self.CurrentData == "format":
             self.format = content
        elif self.CurrentData == "year":
            self.year = content
        elif self.CurrentData == "rating":
            self.rating = content
        elif self.CurrentData == "stars":
            self.stars = content
        elif self.CurrentData == "description":
            self.description = content

In [None]:
# create an XMLReader
parser = xml.sax.make_parser()
# turn off namepsaces
parser.setFeature(xml.sax.handler.feature_namespaces, 0)

# override the default ContextHandler
Handler = MovieHandler()
parser.setContentHandler( Handler )
   
parser.parse("data/movies.xml")

For a complete detail on SAX API documentation, please refer to standard [Python SAX APIs](http://docs.python.org/library/xml.sax.html)

#### Parsing XML with DOM APIs

The Document Object Model ("DOM") is a cross-language API from the World Wide Web Consortium (W3C) for accessing and modifying XML documents.

The DOM is extremely useful for random-access applications. SAX only allows you a view of one bit of the document at a time. If you are looking at one SAX element, you have no access to another.

Here is the easiest way to quickly load an XML document and to create a minidom object using the xml.dom module. The minidom object provides a simple parser method that quickly creates a DOM tree from the XML file.

The sample phrase calls the parse( `file [,parser]` ) function of the minidom object to parse the XML file designated by file into a DOM tree object.


In [None]:
from xml.dom.minidom import parse
import xml.dom.minidom

# Open XML document using minidom parser
DOMTree = xml.dom.minidom.parse("data/movies.xml")
collection = DOMTree.documentElement
if collection.hasAttribute("shelf"):
    print("Root element : %s" % collection.getAttribute("shelf"))

# Get all the movies in the collection
movies = collection.getElementsByTagName("movie")

# Print detail of each movie.
for movie in movies:
    print("*****Movie*****")
    if movie.hasAttribute("title"):
          print(f"Title: {movie.getAttribute('title')}")

    type = movie.getElementsByTagName('type')[0]
    print(f"Type: {type.childNodes[0].data}")
    format = movie.getElementsByTagName('format')[0]
    print(f"Format: {format.childNodes[0].data}")
    rating = movie.getElementsByTagName('rating')[0]
    print(f"Rating: {rating.childNodes[0].data}")
    description = movie.getElementsByTagName('description')[0]
    print(f"Description: {description.childNodes[0].data}")

For a complete detail on DOM API documentation, please refer to standard [Python DOM APIs](http://docs.python.org/library/xml.dom.html)