# Read JSON and newline delimited JSON
By the end of this lecture you will be able to:
- read JSON
- read newline delimited JSON
- write newline delimited JSON
- do lazy scans of newline delimited JSON

A newline delimited JSON is a file with a valid JSON object per line. You can learn more about newline delimited JSON here: https://medium.com/@kandros/newline-delimited-json-is-awesome-8f6259ed4b4b


In [None]:
from pathlib import Path

import polars as pl

We read the following valid JSON string. 

We convert the string to bytes with the `b` suffix to be read by `pl.read_json`

In [None]:
jsonString = b"""
    [
        {"id":1,"values":"a"},
        {"id":2,"values":"b"},
        {"id":3,"values":null}
    ]
"""

In [None]:
pl.read_json(jsonString)

Note that if you receive a JSON string (say from a `requests` library response) you can cast the string to bytes so Polars can read it with Python's built-in `bytes` function and the appropriate encoding (utf-8 in this example)

In [None]:
pl.read_json(bytes("""
    [
        {"id":1,"values":"a"},
        {"id":2,"values":"b"},
        {"id":3,"values":null}
    ]
""","utf-8"))

### Writing JSON
We can write a `DataFrame` to JSON with `write_json`

In [None]:
df = pl.read_json(bytes("""
    [
        {"id":1,"values":"a"},
        {"id":2,"values":"b"},
        {"id":3,"values":null}
    ]
""","utf-8"))
df.write_json()

By default this JSON has a column orientation. We can make this easier to read with `pretty=True`

In [None]:
print(df.write_json(pretty=True))

We can instead write the output in a row-oriented with `row_oriented=True`

In [None]:
df.write_json(row_oriented=True)

Writing in a row-oriented way is slower for large datasets as Polars must convert from its column-oriented data to row-oriented data.

## Nested data

JSON may contain arbitrarily nested structures. Polars tries to cast these nested structures to its own nested dtypes. 

### Nested key-value pairs
Polars converts the key-value pair in `values` to a `pl.Struct` dtype *if the types in the nested structure are consistent*. Otherwise an `Exception` is raised

In [None]:
nestedJsonString = b"""
    [
        {"id":1,"values":{"a":0,"b":1}},
        {"id":2,"values":{"a":0,"b":1}},
        {"id":3,"values":null}
    ]
"""

In [None]:
pl.read_json(nestedJsonString)

### Nested arrays
Polars attempts to convert arrays to a `pl.List` dtype

In [None]:
nestedArrayJsonString = b"""
    [
        {"id":1,"values":[0,1]},
        {"id":2,"values":[0,1.0]}
    ]
"""

In [None]:
pl.read_json(nestedArrayJsonString)

## Newline delimited JSON
In a similar way we read newline delimited JSON with `pl.read_ndjson`

In [None]:
newlineDelimitedJsonString = b"""
        {"id":1,"values":"a"}\n
        {"id":2,"values":"b"}\n
        {"id":3,"values":null}\n
"""

In [None]:
pl.read_ndjson(newlineDelimitedJsonString)

We can also do lazy scans of newline-delimited JSON. To show this we must first create a directory to hold an example newline-delimited JSON

In [None]:
# Specify a directory to hold the ndJSON file
ndjson_dir = Path('data_files/ndjson')
ndjson_file = "example.json"
# Create the ndjson sub-directory if it doesn't exist already
ndjson_dir.mkdir(parents=True,exist_ok=True)
# Set the path to the ndJSON file
ndjson_path = ndjson_dir / ndjson_file

Now we will create a `DataFrame` from the example above and write it to the example file

In [None]:
df = pl.read_ndjson(newlineDelimitedJsonString)
df.write_ndjson(ndjson_path)

We can now start a lazy query by scanning the ndJSON file

In [None]:
print(
    pl.scan_ndjson(ndjson_path)
    .select("id")
    .explain()
)

At present `pl.scan_ndjson` does not work in streaming mode (as there is no `STREAMING` part of this query plan)

In [None]:
print(
    pl.scan_ndjson(ndjson_path)
    .select("id")
    .explain(streaming=True)
)

There are no exercises to this lecture