# Working with JSON Files through Python

## Import Module

In [37]:
import json

## JSON File Example

JSON supports primitive types like stirings and numbers, as well as nested lists and objects

```json
{
    "firstName": "Jane",
    "lastName": "Doe",
    "hobbies": ["running", "sky diving", "singing"],
    "age": 35,
    "children": [
        {
            "firstName": "Alice",
            "age": 6
        },
        {
            "firstName": "Bob",
            "age": 8
        }
    ]
}
```

### JSON Vocabulary

The process of encoding JSON is usually called serialization. This term refers to the transformation of data into a series of bytes (hence serial) to be stored or transmitted across a network. You may also hear the term marshaling, but that’s a whole other discussion. Naturally, deserialization is the reciprocal process of decoding data that has been stored or delivered in the JSON standard.


### Serializing JSON

What happens after a computer processes lots of information? It needs to take a data dump. Accordingly, the json library exposes the dump() method for writing data to files. There is also a dumps() method (pronounced as “dump-s”) for writing to a Python string.

### Serialization Translations

| Python           | JSON   |
|------------------|--------|
| dict             | object |
| list, tuple      | array  |
| str              | string |
| int, long, float | number |
| True             | true   |
| False            | false  |
| None             | null   |

## Serialization

Suppose you have a python object like the following:

In [38]:
data = {
    "pokemon" : {
        "name" : "charizard",
        "type1" : "fire",
        "type2" : "flying"
    }
}

You need to write the data to a file. Using python's context manager, you can create a file called `save_file.json` and open it in write mode.

>`json.dump = dump(obj, fp, *, skipkeys=false, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, default=None, sort_keys=False, **kw)`
>
>Serialize ``obj`` as a JSON formatted stream to ``fp`` (a ``.write()``-supporting file-like object).

In [39]:
with open("save_file.json", "w") as write_file:
    json.dump(data, write_file, indent=2)

Or, if you want to continue using the serialized JSON data in the program, you can write it to a native Python `str` object.

>`json.dumps = dumps(obj, *, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, default=None, sort_keys=False, **kw)`
>
>Serialize ``obj`` to a JSON formatted ``str``.

Note that you aren't writing to disk here, so no data-like-object will be created.

In [40]:
json_string = json.dumps(data)

## Useful Keywords

`json.dumps(data, indent=4)` | The `indent` keyword argument is used to specify the indentation size for nested structures.

`json.dumps(data, separators=(", " , ": "))` | The `separators` keyword argument is used to determine what strings are used to separate and denote key-value pairs. It accepts a 2-tuple which is equal to `(", ", ": ")` by default (comma for separating key-value pairs, colon for denoting them).

More in the [JSON docs](https://docs.python.org/3/library/json.html#basic-usage)

## Deserialization

Technically, this conversion isn’t a perfect inverse to the serialization table. That basically means that if you encode an object now and then decode it again later, you may not get exactly the same object back.

The simplest example would be encoding a tuple and getting back a list after decoding like so:

```python
>>> blackjack_hand = (8, "Q")
>>> encoded_hand = json.dumps(blackjack_hand)
>>> decoded_hand = json.loads(encoded_hand)

>>> blackjack_hand == decoded_hand
False
>>> type(blackjack_hand)
<class 'tuple'>
>>> type(decoded_hand)
<class 'list'>
>>> blackjack_hand == tuple(decoded_hand)
True
```

### Deerialization Translations

| JSON          | Python |
|---------------|--------|
| object        | dict   |
| array         | list   |
| string        | str    |
| number (int)  | int    |
| number (real) | float  |
| true          | True   |
| false         | False  |
| null          | None   |

This time, imagine you’ve got some data stored on disk that you’d like to manipulate in memory. You’ll still use the context manager, but this time you’ll open up the existing data_file.json in read mode.

>`load(fp, *, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw)`
>
>Deserialize ``fp`` (a ``.read()``-supporting file-like object containing a JSON document) to a Python object.

In [41]:
with open("save_file.json", "r") as read_file:
    data = json.load(read_file)

Things are pretty straightforward here, but keep in mind that the result of this method could return any of the allowed data types from the conversion table. This is only important if you’re loading in data you haven’t seen before. In most cases, the root object will be a dict or a list.

If you’ve pulled JSON data in from another program or have otherwise obtained a string of JSON formatted data in Python, you can easily deserialize that with loads(), which naturally loads from a string:

>`json.loads = loads(s, *, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw)`
>
>Deserialize ``s`` (a ``str``, ``bytes`` or ``bytearray`` instance containing a JSON document) to a Python object.

In [42]:
json_string = """
{
    "researcher": {
        "name": "Ford Prefect",
        "species": "Betelgeusian",
        "relatives": [
            {
                "name": "Zaphod Beeblebrox",
                "species": "Betelgeusian"
            }
        ]
    }
}
"""
data = json.loads(json_string)

# Real World Example

For your introductory example, you’ll use [JSONPlaceholder](https://jsonplaceholder.typicode.com/), a great source of fake JSON data for practice purposes.

First create a script file called scratch.py, or whatever you want. I can’t really stop you.

You’ll need to make an API request to the JSONPlaceholder service, so just use the [requests](https://docs.python-requests.org/en/master/) package to do the heavy lifting. Add these imports at the top of your file:

In [43]:
import json
import requests

Go ahead and make a request to the JSONPlaceholder API for the /todos endpoint. If you’re unfamiliar with requests, there’s actually a handy json() method that will do all of the work for you, but you can practice using the json library to deserialize the text attribute of the response object. It should look something like this:

In [44]:
response = requests.get("https://jsonplaceholder.typicode.com/todos")
todos = json.loads(response.text)
# print(todos)

There are multiple users, each with a unique userId, and each task has a Boolean completed property. Can you determine which users have completed the most tasks?

In [45]:
# Map of userId to number of complete TODOs for that user
todos_by_user = {}

# Increment complete TODOs count for each user.
for todo in todos:
    if todo["completed"]:
        try:
            # Increment the existing user's count.
            todos_by_user[todo["userId"]] += 1
        except KeyError:
            # This user has not been seen. Set their count to 1.
            todos_by_user[todo["userId"]] = 1

# Create a sorted list of (userId, num_complete) pairs.
top_users = sorted(todos_by_user.items(), 
                   key=lambda x: x[1], reverse=True)

# Get the maximum number of complete TODOs.
max_complete = top_users[0][1]

# Create a list of all users who have completed the maximum number of TODOs.
users = []
for user, num_complete in top_users:
    if num_complete < max_complete:
        break
    users.append(str(user))

max_users = " and ".join(users)

Write a list with **completed** todos and write the resultant list to a file.

In [46]:
# Define a function to filter out completed TODOs of users with max completed TODOS.
def keep(todo):
    is_complete = todo["completed"]
    has_max_count = str(todo["userId"]) in users
    return is_complete and has_max_count

# Write filtered TODOs to file.
with open("filtered_data_file.json", "w") as data_file:
    filtered_todos = list(filter(keep, todos))
    json.dump(filtered_todos, data_file, indent=2)

# Encoding and Decoding Custom Python Objects

What happens when we try to serialize the Elf class from that Dungeons & Dragons app you’re working on?

In [47]:
class Elf:
    def __init__(self, level, ability_scores=None):
        self.level = level
        self.ability_scores = {
            "str": 11, "dex": 12, "con": 10,
            "int": 16, "wis": 14, "cha": 13
        } if ability_scores is None else ability_scores
        self.hp = 10 + self.ability_scores["con"]

Not so surprisingly, Python complains that Elf isn’t serializable.

Although the json module can handle most built-in Python types, it doesn’t understand how to encode customized data types by default. It’s like trying to fit a square peg in a round hole.

## Simplifying Data Structures

Now, the question is how to deal with more complex data structures. Well, you could try to encode and decode the JSON by hand, but there’s a slightly more clever solution that’ll save you some work. Instead of going straight from the custom data type to JSON, you can throw in an intermediary step.

All you need to do is represent your data in terms of the built-in types json already understands. Essentially, you translate the more complex object into a simpler representation, which the json module then translates into JSON. It’s like the transitive property in mathematics: if A = B and B = C, then A = C.

## Complex Number Example

To get the hang of this, you’ll need a complex object to play with. You could use any custom class you like, but Python has a built-in type called complex for representing complex numbers, and it isn’t serializable by default. So, for the sake of these examples, your complex object is going to be a complex object. Confused yet?

```python
>>> z = 3 + 8j
>>> type(z)
# OUT: <class 'complex'>
>>> json.dumps(z)
# OUT: TypeError: Object of type 'complex' is not JSON serializable
```

A good question to ask yourself when working with custom types is What is the minimum amount of information necessary to recreate this object? In the case of complex numbers, you only need to know the real and imaginary parts, both of which you can access as attributes on the complex object:

```python
>>> z.real
# OUT: 3.0
>>> z.imag
# OUT: 8.0
```

Passing the same numbers into a complex constructor is enough to satisfy the `__eq__` comparison operator:

```python
>>> complex(3, 8) == z
# OUT: True
```

Breaking custom data types down into their essential components is critical to both the serialization and deserialization processes.

## Encoding Custom Types

To translate a custom object into JSON, all you need to do is provide an encoding function to the dump() method’s default parameter. The json module will call this function on any objects that aren’t natively serializable. Here’s a simple decoding function you can use for practice:

In [48]:
def encode_complex(z):
    if isinstance(z, complex):
        return (z.real, z.imag)
    else:
        type_name = z.__class__.__name__
        raise TypeError(f"Object of type '{type_name}' is not JSON serializable")

Notice that you’re expected to raise a `TypeError` if you don’t get the kind of object you were expecting. This way, you avoid accidentally serializing any Elves. Now you can try encoding complex objects for yourself

```python
>>> json.dumps(9 + 5j, default=encode_complex)
# OUT: '[9.0, 5.0]'
>>> json.dumps(elf, default=encode_complex)
# OUT: TypeError: Object of type 'Elf' is not JSON serializable
```

The other common approach is to subclass the standard JSONEncoder and override its default() method:

In [49]:
class ComplexEncoder(json.JSONEncoder):
    def default(self, z):
        if isinstance(z, complex):
            return (z.real, z.imag)
        else:
            return super().default(z)

Instead of raising the TypeError yourself, you can simply let the base class handle it. You can use this either directly in the dump() method via the cls parameter or by creating an instance of the encoder and calling its encode() method:

```python
>>> json.dumps(2 + 5j, cls=ComplexEncoder)
# OUT: '[2.0, 5.0]'

>>> encoder = ComplexEncoder()
>>> encoder.encode(3 + 6j)
# OUT: '[3.0, 6.0]'
```

## Decoding Custom Types

While the real and imaginary parts of a complex number are absolutely necessary, they are actually not quite sufficient to recreate the object. This is what happens when you try encoding a complex number with the ComplexEncoder and then decoding the result:

```python
>>> complex_json = json.dumps(4 + 17j, cls=ComplexEncoder)
>>> json.loads(complex_json)
# OUT: [4.0, 17.0]
```

All you get back is a list, and you’d have to pass the values into a complex constructor if you wanted that complex object again. Recall our discussion about teleportation. What’s missing is metadata, or information about the type of data you’re encoding.

I suppose the question you really ought ask yourself is What is the minimum amount of information that is both necessary and sufficient to recreate this object?

The json module expects all custom types to be expressed as objects in the JSON standard. For variety, you can create a JSON file this time called complex_data.json and add the following object representing a complex number:

```json
{
    "__complex__": true,
    "real": 42,
    "imag": 36
}
```

See the clever bit? That "__complex__" key is the metadata we just talked about. It doesn’t really matter what the associated value is. To get this little hack to work, all you need to do is verify that the key exists:

In [50]:
def decode_complex(dct):
    if "__complex__" in dct:
        return complex(dct["real"], dct["imag"])
    return dct

If "`__complex__`" isn’t in the dictionary, you can just return the object and let the default decoder deal with it.

Every time the load() method attempts to parse an object, you are given the opportunity to intercede before the default decoder has its way with the data. You can do this by passing your decoding function to the object_hook parameter.

Now play the same kind of game as before:

```python
>>> with open("complex_data.json") as complex_data:
...     data = complex_data.read()
...     z = json.loads(data, object_hook=decode_complex)
... 
>>> type(z)
# OUT: <class 'complex'>

```

While object_hook might feel like the counterpart to the dump() method’s default parameter, the analogy really begins and ends there.

This doesn’t just work with one object either. Try putting this list of complex numbers into complex_data.json and running the script again:

```json
[
  {
    "__complex__":true,
    "real":42,
    "imag":36
  },
  {
    "__complex__":true,
    "real":64,
    "imag":11
  }
]
```

If all goes well, you’ll get a list of complex objects:

```python
>>> with open("complex_data.json") as complex_data:
...     data = complex_data.read()
...     numbers = json.loads(data, object_hook=decode_complex)
... 
>>> numbers
# OUT: [(42+36j), (64+11j)]
```

You could also try subclassing JSONDecoder and overriding object_hook, but it’s better to stick with the lightweight solution whenever possible.