# Introduction to JSON Data Format

-----

When we first discussed reading and writing data from a Python program, we introduced the concept of a file format. In addition, the introduction to text format notebook demonstrated basic text format files. In this notebook, we extend that concept to another text-based file format, the JavaScript Object Notation, or JSON, format. Specifically, we will introduce the basic concepts behind JSON format files, and demonstrate how to read and write data in this format by using standard, built-in Python tools.

-----

## Table of Contents


[Data Acquisition](#Data-Acquisition)

[JSON](#JSON)


-----
[[Back to TOC]](#Table-of-Contents)

## Data Acquisition

Before we begin, we need test data that we can read and write in a JSON format. The next three Code cells are identical to those in the [Introduction to Text Format][tdf] notebook and perform the following operations:
- define variables to indicate where data will be stored locally, 
- use `wget` to download the airport data if necessary, 
- reads the 2009 airport data into a Python list by using the `csv` module, and
- displays the first three rows.

-----

[tdf]: text-dataformat.ipynb

In [1]:
# Airport 2009 data from stat-computing.org

# First we find our HOME directory
home_dir = !echo $HOME

# Define data directory
data_dir = home_dir[0] +'/data/'

# Second we construct the full path, below our 
# HOME directory to file location
data_file=data_dir + 'airports.csv'

In [2]:
%%bash -s "$data_file"

# Note, we passed in a Python variable above to the Bash script 
# which is then accessed via positional parameter, or $1 in this case.

# First test if file of interest does not exist
if [ ! -f "$1" ] ; then

# If it does not exist, we grab the file from the Internet and
# store it locally in the data directory

wget -O "$1" http://stat-computing.org/dataexpo/2009/airports.csv

else
    
    echo "File already exists locally."
fi

File already exists locally.


In [3]:
import csv

airports = []

# Open file and extract flights as list of strings
with open(data_file, 'r') as csvfile:
    
    for row in csv.reader(csvfile, delimiter=','):
        airports.append(row)

# Display first three rows
print(airports[0:3])

[['iata', 'airport', 'city', 'state', 'country', 'lat', 'long'], ['00M', 'Thigpen ', 'Bay Springs', 'MS', 'USA', '31.95376472', '-89.23450472'], ['00R', 'Livingston Municipal', 'Livingston', 'TX', 'USA', '30.68586111', '-95.01792778']]


-----
[[Back to TOC]](#Table-of-Contents)

### JSON

[JavaScript Object Notation][json], or JSON, is a text-based data interchange format that is easy to read and write both for humans and programs. JSON is a [standard][st], published by the [ECMA International][ecma] standard organization, which was originally known as the European Computer Manufacturers Association, but is now a more global organization for the development of global computer and electronic standards. JSON is language independent but uses a syntax that is familiar to anyone who knows a C-based language, like Python. JSON is built on two types of constructs: a dictionary and a list, and the standard dictates how data are mapped into these constructs.

The JSON standard is fairly simple, as it defines an object, an array, a value, a string, and a number data formats, upon which most of the rest of the standard is based. This makes writing and reading JSON data formats fairly straightforward. In Python, this functionality is provided by the built-in [`json`][jspy] module, which simplifies the process of [reading or writing][jsd] Python data structures _serialize_ a data hierarchy into a string representation via the `dump` method, and can _deserialize_ via the `load` module. These processes are demonstrated in the next few code cells.

-----

[json]: http://www.json.org
[st]: http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf
[ecma]: http://www.ecma-international.org
[jspy]: https://docs.python.org/3/library/json.html
[jsd]: https://docs.python.org/3/tutorial/inputoutput.html#saving-structured-data-with-json

In [4]:
import json

# Dump data to JSON file
with open(data_dir + 'data.json', 'w') as fout:
    json.dump(airports, fout)

----- 

We display the contents of our new JSON file in the following code cell; however, since this data format doesn't automatically split data over different lines, the entire file is contained in a single line. This can complicate viewing the data, but does not affect the utility of this data format in programmatic instances. To only display the first three rows, we use the `-c` flag to the `head` command, which limits the number of characters displayed to 235.

-----

In [5]:
# Show first three rows
!head -c 235 $data_dir/data.json

[["iata", "airport", "city", "state", "country", "lat", "long"], ["00M", "Thigpen ", "Bay Springs", "MS", "USA", "31.95376472", "-89.23450472"], ["00R", "Livingston Municipal", "Livingston", "TX", "USA", "30.68586111", "-95.01792778"],

-----

The beauty of a self-describing data format like JSON is that reading and reconstructing data from this format is straightforward. As demonstrated in the next code cell, we simply open the file and load the JSON formatted data.

-----

In [6]:
# First we can display the first five rows of the original data for comparison.
print(airports[:5], '\n', '-'*80)

# We use the pretty-print method to 
from pprint import pprint

# Open file and read the JSON formatted data
with open(data_dir + 'data.json', 'r') as fin:
    data = json.load(fin)

# Pretty-print the first few rows
pprint(data[:5])

[['iata', 'airport', 'city', 'state', 'country', 'lat', 'long'], ['00M', 'Thigpen ', 'Bay Springs', 'MS', 'USA', '31.95376472', '-89.23450472'], ['00R', 'Livingston Municipal', 'Livingston', 'TX', 'USA', '30.68586111', '-95.01792778'], ['00V', 'Meadow Lake', 'Colorado Springs', 'CO', 'USA', '38.94574889', '-104.5698933'], ['01G', 'Perry-Warsaw', 'Perry', 'NY', 'USA', '42.74134667', '-78.05208056']] 
 --------------------------------------------------------------------------------
[['iata', 'airport', 'city', 'state', 'country', 'lat', 'long'],
 ['00M', 'Thigpen ', 'Bay Springs', 'MS', 'USA', '31.95376472', '-89.23450472'],
 ['00R',
  'Livingston Municipal',
  'Livingston',
  'TX',
  'USA',
  '30.68586111',
  '-95.01792778'],
 ['00V',
  'Meadow Lake',
  'Colorado Springs',
  'CO',
  'USA',
  '38.94574889',
  '-104.5698933'],
 ['01G', 'Perry-Warsaw', 'Perry', 'NY', 'USA', '42.74134667', '-78.05208056']]


-----

In this notebook, we demonstrated how to simply read and write basic text data in JSON format. The JSON format enables a much richer set of data to be processed, however, and is used in a number of situations, such as social media, to communicate complex data hierarchies between agents.

-----

<font color='red' size = '5'> Student Exercise </font>

Earlier in this notebook, we used the `json` module to read and write JSON format files. Now that you have run the cells in this notebook, go back to the relevant cells and make these changes. Be sure to understand how your changes impact the file input and output process.

3. Try writing only airports that are in the state of Illinois to the JSON file.
56. The examples reading JSON files treated the data as strings. Change the code to strip leading and trailing white space and to convert all numerical data to floating-point values in the generated list.

-----

## Ancillary Information

The following links are to additional documentation that you might find helpful in learning this material. Reading these web-accessible documents is completely optional.

4. [JSON Tutorial][4] by W3Schools.
54. The [JSON](http://json.org/) format
43. Article from MongoDB on storing big data in a [JSON](http://smallworldbigdata.com/tag/json/) like format

-----

[4]: http://www.w3schools.com/json/default.asp

**&copy; 2017: Robert J. Brunner at the University of Illinois.**

This notebook is released under the [Creative Commons license CC BY-NC-SA 4.0][ll]. Any reproduction, adaptation, distribution, dissemination or making available of this notebook for commercial use is not allowed unless authorized in writing by the copyright holder.

[ll]: https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode