# Class 6 Notebook 2: JSON

Class 6 (5 Dec 2016) of [BS1804-1617 Fundamentals of Database Technologies](https://imperialbusiness.school/category/bs1804-1617/) by [Piotr Migdał](http://p.migdal.pl/)

[JSON](https://en.wikipedia.org/wiki/JSON) stand for **JavaScript Object Notation**.
JSON files traditionally end with `.json`.

References:

* [json — JSON encoder and decoder - The Python Standard Library](https://docs.python.org/3/library/json.html)
* [How to Work With JSON Data Using Python](https://code.tutsplus.com/tutorials/how-to-work-with-json-data-using-python--cms-25758)
* [Render JSON into collapsible HTML](http://caldwell.github.io/renderjson/) for a demo

* In Python, it works with:
    * dictionaries,
    * lists,
    * integers,
    * floats,
    * strings,
    * True/False/None.

In [1]:
# we need to import a library
import json

In [2]:
# creating a dictionary
x = [{"a": 1, "b": 2.5, 1337: "car"},
     {"a": 99, "c": "train"},
     {"a": -4, "b": -0.1, "c": "plane"}]

In [3]:
x

[{'b': 2.5, 'a': 1, 1337: 'car'},
 {'a': 99, 'c': 'train'},
 {'a': -4, 'b': -0.1, 'c': 'plane'}]

In [4]:
# create a string from a Python
json.dumps(x)

'[{"b": 2.5, "a": 1, "1337": "car"}, {"c": "train", "a": 99}, {"b": -0.1, "c": "plane", "a": -4}]'

In [5]:
# with indentions
json.dumps(x, indent=2)

'[\n  {\n    "b": 2.5,\n    "a": 1,\n    "1337": "car"\n  },\n  {\n    "c": "train",\n    "a": 99\n  },\n  {\n    "b": -0.1,\n    "c": "plane",\n    "a": -4\n  }\n]'

In [6]:
# its clarity is not obvious until we print it:
print(json.dumps(x, indent=2))

[
  {
    "b": 2.5,
    "a": 1,
    "1337": "car"
  },
  {
    "c": "train",
    "a": 99
  },
  {
    "b": -0.1,
    "c": "plane",
    "a": -4
  }
]


In [7]:
# save a file
# (and open in in a text editor, or with Jupyter Notebook)
json.dump(x, open("file.json", "w"))

In [8]:
# load
new_x = json.load(open("file.json"))

In [9]:
new_x

[{'1337': 'car', 'a': 1, 'b': 2.5},
 {'a': 99, 'c': 'train'},
 {'a': -4, 'b': -0.1, 'c': 'plane'}]

## json library

* To/from file:
    * json.dump
    * json.load
*  To/from string:
    * json.dumps
    * json.loads

## Pandas

We can import and export *some* JSON objects with Pandas.

In [10]:
# importing Pandas
import pandas as pd

In [11]:
x = [{"a": 1, "b": 2.5, "c": "car"},
     {"a": 99, "c": "train"},
     {"a": -4, "b": -0.1, "c": "plane"}]

In [12]:
df = pd.DataFrame(x)

In [13]:
df

Unnamed: 0,a,b,c
0,1,2.5,car
1,99,,train
2,-4,-0.1,plane


In [14]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
a    3 non-null int64
b    2 non-null float64
c    3 non-null object
dtypes: float64(1), int64(1), object(1)
memory usage: 152.0+ bytes


In [15]:
# warning: 
df.to_json()

'{"a":{"0":1,"1":99,"2":-4},"b":{"0":2.5,"1":null,"2":-0.1},"c":{"0":"car","1":"train","2":"plane"}}'

In [16]:
df.to_json(orient="records")

'[{"a":1,"b":2.5,"c":"car"},{"a":99,"b":null,"c":"train"},{"a":-4,"b":-0.1,"c":"plane"}]'

In [17]:
df.to_dict("records")

[{'a': 1, 'b': 2.5, 'c': 'car'},
 {'a': 99, 'b': nan, 'c': 'train'},
 {'a': -4, 'b': -0.1, 'c': 'plane'}]

In [18]:
# to show that it works we can export and import it again
json.loads(json.dumps(df.to_dict("records")))

[{'a': 1, 'b': 2.5, 'c': 'car'},
 {'a': 99, 'b': nan, 'c': 'train'},
 {'a': -4, 'b': -0.1, 'c': 'plane'}]

In [19]:
df.to_dict("series")

{'a': 0     1
 1    99
 2    -4
 Name: a, dtype: int64, 'b': 0    2.5
 1    NaN
 2   -0.1
 Name: b, dtype: float64, 'c': 0      car
 1    train
 2    plane
 Name: c, dtype: object}

## Exercises

* Try other `df.to_dict` exports. What are the othert forms.
* Try loading into Pandas a list of dictionaries, where:
    * types are different (e.g. containing both `12` and `"bridge"`),
    * there is a missing value among integers.

In [20]:
z = [{"a": 12, "b": 99, "c": 19},
     {"a": "bridge", "c": 19.5}]

In [21]:
df = pd.DataFrame(z)
df

Unnamed: 0,a,b,c
0,12,99.0,19.0
1,bridge,,19.5


In [22]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 3 columns):
a    2 non-null object
b    1 non-null float64
c    2 non-null float64
dtypes: float64(2), object(1)
memory usage: 128.0+ bytes


## Beware of

* some objects can be exported to JSON in many different ways
* implicit format changes (e.g. 2 -> 2.0)
* the representation of missing values

## Bonus

We live in the Matrix: a Jupyter Notebook is a JSON file!

Just open `*.ipynb` file in a text editor (e.g. Atom, Notepad++, Sublime Text, Vim).

In [23]:
this_file = json.load(open("4_json.ipynb"))

In [24]:
this_file.keys()

dict_keys(['cells', 'nbformat', 'metadata', 'nbformat_minor'])

## Exercise

* What is the number of cells in this file?
* What is Python version we use in this file?
* Look at a the content a a single cell.

In [25]:
len(this_file["cells"])

35

In [26]:
this_file["metadata"]

{'anaconda-cloud': {},
 'kernelspec': {'display_name': 'Python [default]',
  'language': 'python',
  'name': 'python3'},
 'language_info': {'codemirror_mode': {'name': 'ipython', 'version': 3},
  'file_extension': '.py',
  'mimetype': 'text/x-python',
  'name': 'python',
  'nbconvert_exporter': 'python',
  'pygments_lexer': 'ipython3',
  'version': '3.5.2'}}

In [27]:
print(json.dumps(this_file["metadata"], indent=2))

{
  "anaconda-cloud": {},
  "kernelspec": {
    "display_name": "Python [default]",
    "language": "python",
    "name": "python3"
  },
  "language_info": {
    "version": "3.5.2",
    "name": "python",
    "codemirror_mode": {
      "name": "ipython",
      "version": 3
    },
    "file_extension": ".py",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
    "mimetype": "text/x-python"
  }
}
