# Working with JSON

[JSON](http://www.json.org/) (_JavaScript Object Notation_) is a popular data-interchange format. It is well-suited for structured data that is a mix between numbers and strings. The JSON format is supported by the `json` standard library in Python (that `json` is a _standard_ library means that it is always available).

To read a JSON-file with Python, we
+ import the `json`-library,
+ open the JSON-file with the built-in `open`-function,
+ read the file(handle) with the `json.load`-function.

In [None]:
import json

with open('data/pop_growth.json', mode='r') as fid:
    population = json.load(fid)

population

The data file above is download from [Statistics Norway](http://www.ssb.no/), and contains information about the population and the population growth in Norway. It is the dataset [Population change, Whole country, latest quarter](http://data.ssb.no/api/v0/dataset/1104.json?lang=en) found among the [Ready-made datasets](http://data.ssb.no/api/v0/dataset/?lang=en).

## Lists and dictionaries

Two of the basic, but very powerful, data structures in Python are _lists_ (`list`) and _dictionaries_ (`dict`). These are also the main building blocks of the JSON format (although in JSON they are called _arrays_ and _objects_).

A `list` is an ordered sequence of elements. These elements can be of different types, and can even be new lists. A `dict` is an unordered collection of key-value-pairs, a value can be anything (even a new `dict`) while there are some restrictions on what can be a key (most often strings are used).

In [None]:
# list of cities
cities = ['Wien', 'Oslo', 'London', 'Barcelona']
for city in cities:
    print(city)

In [None]:
# dict of countries with their capitals
capitals = {'Austria': 'Wien', 'Norway': 'Oslo', 'Portugal': 'Lisboa', 'Finland': 'Helsinki'}
for country, capital in capitals.items():
    print(country, capital)

## Indexing

To pick out one element from a list or dictionary we use _indexing_. This is denoted by square brackets. For lists we need to use a numerical index, the element number counting from 0:

In [None]:
cities[0]

In [None]:
cities[2]

For dictionaries the indexing is done by the keys. In our example above we only used string-keys, namely the names of countries.

In [None]:
capitals['Norway']

For numerical indices we can also use _slicing_ to pick out several elements at once (getting a sub-list from a list). In slicing we give both a start- and an end-index separated by colon. The start-index is inclusive, while the end-index is non-inclusive.

In [None]:
cities[0:2]      # Includes the elements 0 and 1, but not 2

Any of these numbers can be omitted. If the start-index is omitted it defaults to 0, while if the end-index is omitted all elements at the end of the sequence are included.

In [None]:
cities[:2]

In [None]:
cities[1:]

It is also possible to specify a third number, which will be the stride. For instance `[::2]` will pick out every second element of a list.

In [None]:
cities[::2]

## Back to the population example

Now that we know a little more about how to deal with lists and dictionaries, let us try to make sense of the population data we loaded earlier. Recall,

In [None]:
population

First we might notice that the data are quite heavily nested. (The data are in a special form of JSON called JSON-stat, version 1.2.) Everything is in the sub-dictionary `dataset`. Let us look inside `dataset`. `.keys()` shows the keys of dictionary.

In [None]:
population['dataset'].keys()

The `dataset` dictionary has 5 keys. Of these, `label`, `source` and `updated` are simple text strings, and not so interesting to us. The item `value` contains the actual data in the dataset. Let us store those in a new variable:

In [None]:
values = population['dataset']['value']
values

The `dimension` item contains information describing the data. For instance, deeply nested inside `dimension` we find `index` which tells us which index inside the list of `values` that contains which data. We also see `label` that gives a somewhat more explanatory description of each value. 

In [None]:
index = population['dataset']['dimension']['ContentsCode']['category']['index']
label = population['dataset']['dimension']['ContentsCode']['category']['label']
index, label

We can use the `index` to connect the `label` and the `values` together. An effective way to do this is to use a _dict comprehension_. In Python _list comprehensions_, _dict comprehensions_, _set comprehensions_ and _generator expressions_ are used to turn an _iterable_ (e.g. list, dictionary, set, tuple) into another. We'll look at these in more detail later.

In [None]:
{label[k]: values[v] for k, v in index.items()}