# Jonathan Halverson
# Saturday, March 4, 2017
# Notes on JSON

We begin by creating a simple dictionary:

In [1]:
d = dict([('dog', 1), ('fish', 0), ('cat', 9)])
d

{'cat': 9, 'dog': 1, 'fish': 0}

In [2]:
d.keys()

['fish', 'dog', 'cat']

Let's write the dictionary to file:

In [3]:
import json
help(json.dump)

Help on function dump in module json:

dump(obj, fp, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, encoding='utf-8', default=None, sort_keys=False, **kw)
    Serialize ``obj`` as a JSON formatted stream to ``fp`` (a
    ``.write()``-supporting file-like object).
    
    If ``skipkeys`` is true then ``dict`` keys that are not basic types
    (``str``, ``unicode``, ``int``, ``long``, ``float``, ``bool``, ``None``)
    will be skipped instead of raising a ``TypeError``.
    
    If ``ensure_ascii`` is true (the default), all non-ASCII characters in the
    output are escaped with ``\uXXXX`` sequences, and the result is a ``str``
    instance consisting of ASCII characters only.  If ``ensure_ascii`` is
    ``False``, some chunks written to ``fp`` may be ``unicode`` instances.
    This usually happens because the input contains unicode strings or the
    ``encoding`` parameter is used. Unless ``fp.write()`` explicitly
    un

In [4]:
with open('test1.json', 'w') as fp:
     json.dump(d, fp)

Below look at the contents from the shell. Note that single quotes have been replaced:

In [5]:
!cat test1.json

{"fish": 0, "dog": 1, "cat": 9}

Let's read it back in:

In [6]:
help(json.load)

Help on function load in module json:

load(fp, encoding=None, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw)
    Deserialize ``fp`` (a ``.read()``-supporting file-like object containing
    a JSON document) to a Python object.
    
    If the contents of ``fp`` is encoded with an ASCII based encoding other
    than utf-8 (e.g. latin-1), then an appropriate ``encoding`` name must
    be specified. Encodings that are not ASCII based (such as UCS-2) are
    not allowed, and should be wrapped with
    ``codecs.getreader(fp)(encoding)``, or simply decoded to a ``unicode``
    object and passed to ``loads()``
    
    ``object_hook`` is an optional function that will be called with the
    result of any object literal decode (a ``dict``). The return value of
    ``object_hook`` will be used instead of the ``dict``. This feature
    can be used to implement custom decoders (e.g. JSON-RPC class hinting).
    
    ``object_pairs

In [7]:
with open('test1.json') as fp:
     data = json.load(fp)

In [8]:
data

{u'cat': 9, u'dog': 1, u'fish': 0}

In [9]:
data.keys()

[u'fish', u'dog', u'cat']

We see the text is now being handled as unicode and the single quotes have returned.

### Multiline JSON

In [10]:
!cat two.json

{"test":{"animal":"cat", "location":"San Diego"},
"meta":{"retreat":1, "castle":2}}


In [11]:
with open('two.json') as fp:
     data = json.load(fp)

In [12]:
data

{u'meta': {u'castle': 2, u'retreat': 1},
 u'test': {u'animal': u'cat', u'location': u'San Diego'}}

In [13]:
data.keys()

[u'test', u'meta']

In [14]:
data.values()

[{u'animal': u'cat', u'location': u'San Diego'}, {u'castle': 2, u'retreat': 1}]

### Example of the lengthy JSON file

In [15]:
with open('../../hadoop/spark_sql/Nail_Salon_Permits.json') as fp:
     data = json.load(fp)

### A JSON object per line (read each line as a string then use json.loads(s)

In [16]:
!cat per_line.json

{"animal":"cat", "location":"San Diego"}
{"place":"Denver", "food":"Ham Salad"}


If we try the load method it fails as expected that method deserializes a JSON object, not a JSON object per line:

In [17]:
with open('per_line.json') as fp:
     data = json.load(fp)

ValueError: Extra data: line 2 column 1 - line 3 column 1 (char 41 - 80)

The solution is to read each line as a string and then use the loads method:

In [20]:
with open('per_line.json') as fp:
     for line in fp.readlines():
          x = json.loads(line)
          print x.values()

[u'San Diego', u'cat']
[u'Ham Salad', u'Denver']


In [21]:
help(json.loads)

Help on function loads in module json:

loads(s, encoding=None, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw)
    Deserialize ``s`` (a ``str`` or ``unicode`` instance containing a JSON
    document) to a Python object.
    
    If ``s`` is a ``str`` instance and is encoded with an ASCII based encoding
    other than utf-8 (e.g. latin-1) then an appropriate ``encoding`` name
    must be specified. Encodings that are not ASCII based (such as UCS-2)
    are not allowed and should be decoded to ``unicode`` first.
    
    ``object_hook`` is an optional function that will be called with the
    result of any object literal decode (a ``dict``). The return value of
    ``object_hook`` will be used instead of the ``dict``. This feature
    can be used to implement custom decoders (e.g. JSON-RPC class hinting).
    
    ``object_pairs_hook`` is an optional function that will be called with the
    result of any object literal de

In [22]:
s = '{"test":{"animal":"cat", "location":"San Diego"}, "meta":{"retreat":1, "castle":2}}'

In [23]:
d = json.loads(s)
d

{u'meta': {u'castle': 2, u'retreat': 1},
 u'test': {u'animal': u'cat', u'location': u'San Diego'}}

In [24]:
help(json.dumps)

Help on function dumps in module json:

dumps(obj, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, encoding='utf-8', default=None, sort_keys=False, **kw)
    Serialize ``obj`` to a JSON formatted ``str``.
    
    If ``skipkeys`` is true then ``dict`` keys that are not basic types
    (``str``, ``unicode``, ``int``, ``long``, ``float``, ``bool``, ``None``)
    will be skipped instead of raising a ``TypeError``.
    
    
    If ``ensure_ascii`` is false, all non-ASCII characters are not escaped, and
    the return value may be a ``unicode`` instance. See ``dump`` for details.
    
    If ``check_circular`` is false, then the circular reference check
    for container types will be skipped and a circular reference will
    result in an ``OverflowError`` (or worse).
    
    If ``allow_nan`` is false, then it will be a ``ValueError`` to
    serialize out of range ``float`` values (``nan``, ``inf``, ``-inf``) in
    strict co

### JSON.dumps(d) to serialize a JSON object to a string

In [25]:
json.dumps(d)

'{"test": {"location": "San Diego", "animal": "cat"}, "meta": {"castle": 2, "retreat": 1}}'

### Summary

We see that loads and dumps are inverse operations. That is, dump and load convert between files and objects, while dumps and loads convert between strings and objects.