# Data Serialisation

Data serialisation is the process of flattening complex data structures, into a format that can be easily stored, transferred, or shared with another program. Python provides several different ways to do this.

Note: When using serialisation methods like Pickle, be aware that deserialising data from untrusted sources can be dangerous. Pickle can execute arbitrary code during deserialisation, so only deserialise data from trusted sources.

## Serialisation with JSON
The built-in `json` module provides an interface for converting Python objects into JavaScript Object Notation (JSON) format. 

In [None]:
import json

Firstly, a simple example of serialising a dictionary to disk with JSON. 

In [None]:
capitals = {"Ireland":"Dublin", "France":"Paris", "Spain":"Madrid", "Italy": "Rome"} 

This code saves the Python dictionary `capitals` to a file named *example1.json* in JSON format.  The `json.dump()` function is used to write the object directly to a file, with several arguments specified :

- `indent=2`: Formats the JSON output with an indentation of 2 spaces, making it human-readable.
- `ensure_ascii=False`: Ensures that non-ASCII characters (e.g. accented letters) are preserved in their original form, rather than being escaped.

In [None]:
try:
    with open("example1.json", "w", encoding="utf-8") as fout:
        json.dump(capitals, fout, indent=2, ensure_ascii=False)
    print("Successfully saved data to example1.json")
except IOError as e:
    print(f"File Error: Could not write file - {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

To deserialise, we open the file for text reading and then use the `json.load()` function to reconstruct a copy of the original dictionary:

In [None]:
try:
    # open the file, then load and parse the json data
    with open("example1.json", "r", encoding="utf-8") as fin:
        values = json.load(fin)
    print("Successfully loaded data from example1.json")
except IOError as e:
    print(f"Error: Could not read file - {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

The dictionary is exactly the same as we originally created (note that a dictionary ordering is arbitrary, so the pairs may not necessarily appear in the same order, but the values are the same).

In [None]:
print("Reconstructed data:", values)
print("Data type:", type(values))

## Serialisation with Pickle

The built-in Python `pickle` module is the main mechanism provided by Python for serialising data. Unlike JSON, Pickle can serialise almost any Python object, including custom classes, functions, and nested structures.

In [None]:
import pickle

A simple example of serialising a dictionary to disk with Pickle. 

In [None]:
capitals = {"Ireland":"Dublin", "France":"Paris", "Spain":"Madrid", "Italy": "Rome"} 
capitals

We open a file for binary writing (`"wb"`). The simplest usage of Pickle involves calling the `pickle.dump()` function.  

In [None]:
try:
    # open the file and serlaise the data with Pickle
    with open("example1.pkl", "wb") as fout:
        pickle.dump(capitals, fout)
    print("Successfully saved data to example1.pkl")
except IOError as e:
    print(f"File Error: Could not write to example1.pkl - {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

To deserialise, we first open the file for binary reading (`"rb"`) and reconstruct the original dictionary using `pickle.load()`:

In [None]:
try:
    # open the file and load the serialised data
    with open("example1.pkl", "rb") as fin:
        pickle_values = pickle.load(fin)
    print("Successfully loaded data from example1.pkl")
except pickle.UnpicklingError as e:
    print(f"Pickle Error: Could not unpickle data - {e}")
    pickle_values = None
except IOError as e:
    print(f"File Error: Could not read example1.pkl - {e}")
    pickle_values = None
except Exception as e:
    print(f"Unexpected error: {e}")
    pickle_values = None

We can see that Pickle has reconstructed the original dictionary:

In [None]:
print("Pickle reconstructed data:", pickle_values)
print("Data type:", type(pickle_values))

We can apply the same process for more complex, nested data structures.

In [None]:
cities = ["London", "Paris", "Madrid"]
teams = ["Chelsea", "PSG", "Real Madrid"]
data = [cities, teams]

try:
    # open the file and seralise the nested data
    with open("example2.pkl", "wb") as fout:
        pickle.dump(data, fout)
    print("Successfully saved nested data to example2.pkl")
except IOError as e:
    print(f"Error saving data: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

In [None]:
# Load and verify the nested data
try:
    with open("example2.pkl", "rb") as fin:
        backup = pickle.load(fin)
    print("Successfully loaded nested data from example2.pkl")
except IOError as e:
    print(f"Error loading data: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

In [None]:
print("Complete backup data:", backup)
print("Cities data:", backup[0])
print("Teams data:", backup[1])

A further example, with more a complex data structure - 2 dictionaries inside a single list.

In [None]:
book1 = {
    "id" : "978-1933988177",
    "category" : ["book","paperback"],
    "name" : "Lucene in Action",
    "author" : "Michael McCandless",
    "genre" : "technology",
    "price" : 30.50,
    "pages" : 475 }

book2 = {
    "id" : "978-1857995879",
    "category" : ["book","paperback"],
    "name" : "Sophie's World",
    "author" : "Jostein Gaarder",
    "sequence_i" : 1,
    "genre" : "fiction",
    "price" : 3.07,
    "pages" : 64 }

In [None]:
books = [book1, book2]
books

In [None]:
# serialise the dictionary data
try:
    with open("example3.pkl", "wb") as fout:
        pickle.dump(books, fout)
    print(f"Successfully saved {len(books)} book records to example3.pkl")
except IOError as e:
    print(f"Error saving data: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

Again Pickle can reconstruct the original data structure from the seralised version on disk:

In [None]:
# load and verify the book data
try:
    with open("example3.pkl", "rb") as fin:
        book_backup = pickle.load(fin)
    print(f"Successfully loaded {len(book_backup)} book records")
except IOError as e:
    print(f"Error loading data: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

In [None]:
print("All book data:", book_backup)

In [None]:
# show information for the first book
first_book = book_backup[0]
for key, value in first_book.items():
    print(f"{key}: {value}")