# Redis 

[Redis](1https://redis.io) is an open source, in-memory data structure store. It stores data values of several different types associated to a given key. In your stack, you will use Redis for two purposes. First, it will serve as a cache for persisting objects beyond the lifespan
of a Python process or in-between Jupyter Notebooks. Second, you will use Redis as a message broker in order to perform delayed job processing from your notebooks using the Python library named `rq`. 


## Serialization

A central task in the workflow of any data scientist is the storage, transmission, and reconstruction of data structures and object states. This process is known as serialization. It is a well-solved problem and you will have several tools at your disposal to manage this task. In this chapter, you will look at serialization in terms of converting objects in memory to their binary representation as well as the use of the popular JSON format for serialization as a text file. Later, you will see a second format of text file serialization, the YAML format.

You will be serializing and deserializing primarily for the purposes of sharing objects and data across processes. Here, we will be caching objects in Redis for the purposes of using them in a separate notebook or process. 

### Serialization Formats and Methods

This book places an emphasis on working in Python. As such, it will focus on two Python-specific methods for serializing data: pickling and serializing via bytestring. In addition, you will look at appropriate uses for two text-based approaches to serializing data: JSON and YAML. JSON (JavaScript Object Notation) is a machine-readable subset of the JavaScript programming language that has been adopted by the programming community as a human readable, language agnostic approach to serialization. YAML is an alternative solution to the exact same problem. Both JSON and YAML are able to use the standard primitive data types: integers, floating-point numbers, Booleans, and null values, in addition to strings. For providing larger structures, both make use of the associative array, often called the dictionary, and the ordered list, also known as the array, the vector, the list, or the sequence. A dictionary holds data using key-value pairs; a list holds data using a numerical index. The two mainly differ in syntax. 

JSON makes use of nested braces and brackets to define data structures.

    {'this_json' : 'is a JSON object',
     'a nested object' : {
      'obj_id' : 123,
      'object value' : 'temperamental',
      'is_nested' : true
      },
     'a list': [1,2,3,4],
     'a list of strings': ['green eggs', 'ham'],
     'last_used' : null
    }

YAML achieves the same purpose using white space. 

    this_yaml: is a YAML object
    a_nested_object:
      obj_id: 123
      object_value: 'temperamental'
      is_nested: true
    a_list: 
      - 1 
      - 2
      - 3
      - 4 
    a_list_of_strings:
      - green eggs
      - ham
    last_used: null

Note that in each of these examples, none of the keys used have any syntactical meaning.

### Binary Encoding in Python

The Python pickle module is the preferred method for serialization of Python objects and data to binary byte streams. There are a few fundamental differences between pickling data and serializing using JSON or YAML. As noted, both JSON and YAML are human readable. An object converted to a byte stream is not human readable. JSON and YAML serialized objects will be readable by a process run in any language, while a pickled object will only be readable in Python. Because a pickled object does not have to be concerned with interoperability, a wide variety of Python objects can be pickled, whereas only dictionaries can be serialized using JSON or YAML. For the data scientist, this includes but is not limited to the `numpy` array, the `pandas` DataFrame, or the `sklearn` Model. Over the next chapter, you will explore a variety of methods for encoding data to a binary byte stream using Python.