# Working With JSON Data in Python
JSON allows us to represent the objects in our Python programs as human-readable text that can be sent over the internet.

Learn how to work with Python's built-in `json` module to serialize the data in programs into JSON format. Then deserialize some JSON from an online API and convert it into Python objects.

> Remember to install ipykernel so this notebook will recognize environment packages
>
> `source env/bin/activate`
>
> `python3 -m ipykernel install --name env`

## What is JSON?
**JavaScript Object Notation** (JSON) is a standardized format commonly used to transfer data as text that can be sent over a network. It's used by lots of APIs and Databases, and it's easy for both humans and machines to read.

JSON represents objects as name/value pairs, just like a Python dictionary.

Remember to import the JSON module.

In [1]:
import json

**Serialization** is the process of *encoding* data into JSON format (like converting a Python list to JSON).

**Deserialization** is the process of *decoding* JSON data back into native objects to work with (like reading JSON data into a Python list).

### Who Uses JSON?
Youtube uses JSON to deliver information on accounts, videos, and video searches.

It's also used by Twitter to interact with tweets. Google apps for getting map data. And NASA for getting imagery.

## Serializing JSON Data
The `json` module exposes two methods for serializing Python objects into JSON format.

`dump()` writes Python data to a file-like object. We use this when we want to serialize our Python data to an external JSON file.

`dumps()` writes Python data to a string in JSON format. This is useful if we want to use the JSON elsewhere in our program, or if we just want to print it to the console to check that it's correct.

Python and JSON do not share all the same types. Serialization will convert your Python objects into JSON format according to this table:

| **Python**       | **JSON** |
| ---------------- | -------- |
| dict             | object   |
| list, tuple      | array    |
| str              | string   |
| int, long, float | number   |
| True             | true     |
| False            | false    |
| None             | null     |

Both the dump() and dumps() methods allow us to specify an optional `indent` argument. This will change how many spaces is used for indentation, which can make our JSON easier to read.

```python
json_str = json.dumps(data, indent=4)
```

In [2]:
data = {
    "user": {
        "name": "William Williams",
        "age": 93
    }
}

with open("data_file.json", "w") as write_file:
    # We use dump() here because we are writing to a file-like obj
    # json.dump(data, write_file)
    
    # Make the json more readable
    json.dump(data, write_file, indent=4)
    
# Print a string representation of the json data
# json_str = json.dumps(data)

# Make json more readable
json_str = json.dumps(data, indent=4)
print(json_str)

{
    "user": {
        "name": "William Williams",
        "age": 93
    }
}


## Deserializing JSON Data
The `json` module also exposes two methods for deserializing JSON.

`load()` loads JSON data from a file-like object. We use this method when we're reading in data from a file-like object.

`loads()` loads JSON data from a string containing JSON-encoded data.

Unless your encoded data is osmething very simple, these methods will most likely return a Python dict or list containing your deserialized data.

Serialization and Deserialization are not perfectly inverse operations! This means that deserialization may not return to you the exact object you serialized.

> Tuples will be serialized as JSON arrays. JSON arrays get deserialized as Python lists containing the data in a tuple. We can easily cast this back into a tuple using the `tuple()` initializer.

In [3]:
blackjack_hand = (8, "Q")

encoded_hand = json.dumps(blackjack_hand)

decoded_hand = json.loads(encoded_hand)

type(decoded_hand)  # list

decoded_hand

[8, 'Q']

In [4]:
blackjack_hand == tuple(decoded_hand)

True

## Working with JSON Data
Get practice deserializing JSON data from a web API. Then manipulate the extracted data to derive meaning from it.

Here is the json data location: https://jsonplaceholder.typicode.com/todos

In order to get the json data from the API above, we have to import the `requests` module as well

In [5]:
import requests

In [6]:
# Make the request to the API
response = requests.get(
    "https://jsonplaceholder.typicode.com/todos"
)

# Obtain a Python list from this json data
# Use response.text to get the content of the web request
todos = json.loads(response.text)

# NOTE: We use loads() because reponse.text returns a string
#       containing all the JSON data

# print the first 2 items in our list
print(todos[:2])

[{'userId': 1, 'id': 1, 'title': 'delectus aut autem', 'completed': False}, {'userId': 1, 'id': 2, 'title': 'quis ut nam facilis et officia qui', 'completed': False}]


### Which users have completed the most todo items?

In [7]:
todos_by_user = {}

for todo in todos:
    if todo["completed"]:
        # Our dictionary represents the users by their user id
        # The try assumes the user is already in our dictionary
        try:
            todos_by_user[todo["userId"]] += 1
        # If the user is not in our dictionary
        except KeyError:
            todos_by_user[todo["userId"]] = 1
            
# Determine what the highest number of completed items is
# Also determine who has completed that many items

# This is a sorted list of tuples
# - Each tuple contains:
#   - the person
#   - how many items they've completed
#
# The tuples are sorted in descending order, by the number of
# items completed
top_users = sorted(todos_by_user.items(),
                   key=lambda x: x[1], reverse=True)

max_complete = top_users[0][1]

# In order to determine which users have this completed count, we
# define a new list called users, which holds the users that we
# discover
users = []

# Iterate over our list of tuples
for user, num_complete in top_users:
    if num_complete < max_complete:
        break
    
    users.append(str(user))
    
max_users = " and ".join(users)

print(f"User(s) {max_users} completed {max_complete} TODOs")

User(s) 5 and 10 completed 12 TODOs


### Create a JSON file that contains all completed TODOs for each of these users.

In [10]:
def keep(todo):
    '''
    Filters out todo items that were not completed by top users.
    
    Parameters
    ----------
    todo: dict
        A todo item
        
    Returns
    -------
    bool
        Returns True if todo is completed and assigned to top user;
        False otherwise
    '''
    
    # Figure out if item is completed
    is_complete = todo["completed"]
    
    # Check if item is assigned to top user
    # Convert the user id to str because that is how it's stored
    # in our users list
    has_max_count = str(todo["userId"]) in users
    
    return is_complete and has_max_count

In [11]:
# Write json data to file
with open("filtered_data_file.json", "w") as data_file:
    # Obtain a list of only the items completed by the top users
    filtered_todos = list(filter(keep, todos))
    
    json.dump(filtered_todos, data_file, indent=2)

## JSON for Custom Python Objects
Learn how to work with types that are non-serializable.

The `json` module is not capable of serializing all Python types. Non-serializable types include custom types created from classes, as well as the built-in complex type used to represent imaginary numbers.

```python
json_str = json.dumps(6 + 2j)  # Cannot serialize complex object
```

In order to serialize these types, we must extract the necessary data to recreate the object.

In [14]:
class Person:
    """
    A generic Person class.
    
    Attributes
    ----------
    name : str
        The person's name
    age : int
        The person's age
    """
    
    def __init__(self, name, age):
        '''
        Parameters
        ----------
        name : str
            The person's name
        age : int
            The person's age
        '''
        self.name = name
        self.age = age

### Serialize the Person object into a string.

In [13]:
json_str = json.dumps(Person("Will", 29))  # This fails
print(json_str)

TypeError: Object of type Person is not JSON serializable

### Simplifying Data Structures
- Break the object down into simpler parts that can be serialized
- Ask: What is the minimum amount of information necessary to recreate this object?
- Once we extract the data from the complex or custom type, we can recreate the object

### Example: Simplify the Complex Type
The complex type has 2 parts: real and imaginary. That's what we'll need to recreate the complex object.

The `complex` type stores both the real and imaginary parts of the complex number as `float`. Floats can be serialized by `dump()` and `dumps()`.

## Encoding Custom Types to JSON
Learn how to encode non-serializable types into JSON.

The `dump()` and `dumps()` methods allow us to include an optional argument: `default`. Here, we can specify a custom function that will break our non-serializable type down into a serializable object containing the data that's needed to reconstruct it later.

```python
json_str = json.dumps(4+6j, deafult=complex_encoder)
```

Now we would need to define `complex_encoder()` which converts our complex object into a tuple (which is serializable)

In [16]:
def complex_encoder(z):
    '''
    Converts complex object into a serializable object.

    Parameters
    ----------
    z : complex
        The complex object to encode

    Returns
    -------
    tuple
        The real and imaginary parts

    Raises
    ------
    TypeError
        If object passed in is not serializable.
    '''

    if isinstance(z, complex):
        return (z.real, z.imag)
    else:
        type_name = z.__class__.__name__
        raise TypeError(
            f"Object of type {type_name} is not serializable."
        )

In [17]:
json_str = json.dumps(4+6j, default=complex_encoder)
print(json_str)

[4.0, 6.0]


The above could be achieved by subclassing `json.JSONEncoder`:

In [18]:
class ComplexEncoder(json.JSONEncoder):
    """
    Encodes the complex type object.

    Attributes
    ----------
    z : complex
        The complex object to encode

    Methods
    -------
    default(z)
        Encodes complex object into a serializable object
    """
    def default(self, z):
        '''
        Encodes complex object into a serializable object

        Parameters
        ----------
        z : complex
            The complex object to encode

        Returns
        -------
        tuple
            The real and imaginary parts
        '''
        if isinstance(z, complex):
            return (z.real, z.imag)
        else:
            return super().default(z)

If this method is chosen, `default` won't work. We need to use `cls` instead:

In [19]:
json_str = json.dumps(4+6j, cls=ComplexEncoder)
print(json_str)

[4.0, 6.0]


## Decoding Custom Types from JSON
Learn how to deserialize a non-serializable type given in a JSON file.

We can represent a complex object in JSON like this:

> ```python
> {
>     "__complex__": true,
>     "real": 42,
>     "imaginary": 36
> }
> ```

If we let `load()` deserialize this, we'll get a Python dict instead of our desired complex object. That's because JSON objects deserialize to Python dict. We can write a custom decoder function that will read this dictionary and return our desired complex object.

In [21]:
def decode_complex(dct):
    '''
    Decodes dictionary into a complex object.
    
    Parameters
    ----------
    dct : dict
        The dictionary to decode
        
    Returns
    -------
    complex
        Returns a complex object if input has complex properties,
        otherwise the input is returned
    '''
    if "__complex__" in dct:
        return complex(dct["real"], dct["imaginary"])
    else:
        return dct

Now we can read in our JSON file and deserialize it. We can use the optional `object_hook` argument to specify our decoding function.

In [22]:
with open("complex_data.json") as complex_data:
    z = json.load(complex_data, object_hook=decode_complex)

We can now see we deserialized a complex object from a JSON file.

In [25]:
print(type(z))
print(z)

<class 'complex'>
(42+36j)
