# Reading JSON data
Not all data is tabular. For example, consider this index of files available at one of the SEC web pages: https://www.sec.gov/Archives/edgar/full-index/2019/QTR1/index.json

This notebook is based on the **JSON Data** section Chapter 3 of "Data Wrangling with Python" (DWwP).

JSON is a widely used, lightweight, human and machine readable, data interchange format. You can find the JSON spec at http://www.json.org/. Most programming languages have tools for reading, parsing and writing JSON files. We'll see that JSON data is closely related to Python dictionaries. In fact, here's a little bit of the *data-text.json* file:

[
  {
    "Indicator":"Life expectancy at birth (years)",
    "PUBLISH STATES":"Published",
    "Year":1990,
    "WHO region":"Europe",
    "World Bank income group":"High-income",
    "Country":"Andorra",
    "Sex":"Both sexes",
    "Display Value":77,
    "Numeric":77.00000,
    "Low":"",
    "High":"",
    "Comments":""
  },
  {
    "Indicator":"Life expectancy at birth (years)",
    "PUBLISH STATES":"Published",
    "Year":2000,
    "WHO region":"Europe",
    "World Bank income group":"High-income",
    "Country":"Andorra",
    "Sex":"Both sexes",
    "Display Value":80,
    "Numeric":80.00000,
    "Low":"",
    "High":"",
    "Comments":""
  },
  
... a bunch more 

  {
    "Indicator":"Healthy life expectancy (HALE) at birth (years)",
    "PUBLISH STATES":"Published",
    "Year":2012,
    "WHO region":"Africa",
    "World Bank income group":"Low-income",
    "Country":"Zimbabwe",
    "Sex":"Female",
    "Display Value":51,
    "Numeric":51.00000,
    "Low":"",
    "High":"",
    "Comments":""
  }
]

In terms of Python data structures, how would you describe the data above?

### How to Import JSON Data
Here's the code for importing the JSON data. It's similar to the CSV example, but with some important differences. Identifying these differences will help you as you learn to program in Python

In [None]:
import json

json_data = open('data/data-text.json').read()

data = json.loads(json_data)
print(type(data))

for item in data:
    print(item)

print(type(item))

Clearly the first line:

    import json
    
imports the `json` library (instead of the `csv` library).

However, it's the next two lines that are really quite different than the `csv` example. Let's
run the Jupyter notebook `whos` command so that we can see some detailed data type information
for the variables in this code. Then we should be able to figure out what's going on.

In [None]:
whos

From the last example, we know that the Python `open()` function returns a file object. So, seems likely that `read()` is a method of a file object that, obviously, reads the file. What do you think this method returns? Just look at the output of the `whos` command above. Let's put "python open file read" into Google and see if we can confirm this. A few relevant links will be:

https://docs.python.org/3/tutorial/inputoutput.html

http://www.tutorialspoint.com/python/python_files_io.htm

http://learnpythonthehardway.org/book/ex15.html

So, to summarize:

* CSV example  --> used `open` to return a file object for the file we want to read
* JSON example --> used `open().read()` to return a string containing the contents of the file

Why the difference? Well, it simply boils down to doing what is needed to use other methods of the `csv` or `json` libraries to get the job done.

* The `csv.reader()` function requires a file object as its input
* the `json.loads()` function requires a string as its input

Finally, we have the loop.

In [None]:
for item in data:
    print(item)

We are just doing this to look at the data and to get some loop practice.

What kind of collection is `data`?

As you traverse the loop, what does the loop variable `item` store and what is its data type?

## The bottom line and a look ahead
We've learned how to use Python's built in `json` library to read JSON files into a list of dictionaries. This is a common precursor to doing data cleanup and other data preparation tasks before moving on to data analysis. As you start working with more complex JSON files, you'll come to appreciate the flexibility provided by Python dictionaries. For example, the *value* in a *key-value* pair in a dictionary can itself be another dictionary! This makes it easy to store and manipulate hierarchical data. An example of this occurs right near the top of the JSON file representing congressional votes (https://www.govtrack.us/data/congress/113/votes/2013/h101/data.json).

    {
      "bill": {
        "congress": 113, 
        "number": 1120, 
        "type": "hr"
      }, 
      "category": "passage", 
      "chamber": "h", 
      "congress": 113,
      
CSV files simply aren't up to the task of storing such data.