# Delo z JSON datotekami

## Branje in pisanje JSON datotek
    

Since its inception, JSON has quickly become the de facto standard for information exchange. Chances are you’re here because you need to transport some data from here to there. Perhaps you’re gathering information through an API or storing your data in a document database. One way or another, you’re up to your neck in JSON, and you’ve got to Python your way out.

### A (Very) Brief History of JSON

Not so surprisingly, JavaScript Object Notation was inspired by a subset of the JavaScript programming language dealing with object literal syntax. They’ve got a nifty website that explains the whole thing. Don’t worry though: JSON has long since become language agnostic and exists as its own standard, so we can thankfully avoid JavaScript for the sake of this discussion.

Ultimately, the community at large adopted JSON because it’s easy for both humans and machines to create and understand.

As you can see, JSON supports primitive types, like strings and numbers, as well as nested lists and objects.

> Wait, that looks like a Python dictionary! I know, right? It’s pretty much universal object notation at this point, but I don’t think UON rolls off the tongue quite as nicely.

### Python Supports JSON Natively

Python comes with a built-in package called json for encoding and decoding JSON data.

Just throw this little guy up at the top of your file:

In [None]:
import json

The process of encoding JSON is usually called serialization. This term refers to the transformation of data into a series of bytes (hence serial) to be stored or transmitted across a network. You may also hear the term marshaling, but that’s a whole other discussion. Naturally, deserialization is the reciprocal process of decoding data that has been stored or delivered in the JSON standard.

Yikes! That sounds pretty technical. Definitely. But in reality, all we’re talking about here is reading and writing. Think of it like this: encoding is for writing data to disk, while decoding is for reading data into memory.

### Serializing JSON

What happens after a computer processes lots of information? It needs to take a data dump. Accordingly, the json library exposes the dump() method for writing data to files. There is also a dumps() method (pronounced as “dump-s”) for writing to a Python string.

Simple Python objects are translated to JSON according to a fairly intuitive conversion.

<div class="table-responsive">
<table class="table table-hover">
<thead>
<tr>
<th>Python</th>
<th>JSON</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>dict</code></td>
<td><code>object</code></td>
</tr>
<tr>
<td><code>list</code>, <code>tuple</code></td>
<td><code>array</code></td>
</tr>
<tr>
<td><code>str</code></td>
<td><code>string</code></td>
</tr>
<tr>
<td><code>int</code>, <code>long</code>, <code>float</code></td>
<td><code>number</code></td>
</tr>
<tr>
<td><code>True</code></td>
<td><code>true</code></td>
</tr>
<tr>
<td><code>False</code></td>
<td><code>false</code></td>
</tr>
<tr>
<td><code>None</code></td>
<td><code>null</code></td>
</tr>
</tbody>
</table>
</div>

Imagine you’re working with a Python object in memory that looks a little something like this:

In [None]:
data = {
    "president": {
        "name": "Zaphod Beeblebrox",
        "species": "Betelgeusian"
    }
}

It is critical that you save this information to disk, so your mission is to write it to a file.

Using Python’s context manager, you can create a file called data_file.json and open it in write mode. (JSON files conveniently end in a .json extension.)

In [None]:
with open("data/data_file.json", "w") as write_file:
    json.dump(data, write_file)

Note that dump() takes two positional arguments: (1) the data object to be serialized, and (2) the file-like object to which the bytes will be written.

Or, if you were so inclined as to continue using this serialized JSON data in your program, you could write it to a native Python str object.

In [None]:
json_string = json.dumps(data)
print(json_string)

{"president": {"name": "Zaphod Beeblebrox", "species": "Betelgeusian"}}


Notice that the file-like object is absent since you aren’t actually writing to disk. Other than that, dumps() is just like dump().

### Deserializing JSON

Great, looks like you’ve captured yourself some wild JSON! Now it’s time to whip it into shape. In the json library, you’ll find load() and loads() for turning JSON encoded data into Python objects.

Just like serialization, there is a simple conversion table for deserialization, though you can probably guess what it looks like already.

<div class="table-responsive">
<table class="table table-hover">
<thead>
<tr>
<th>JSON</th>
<th>Python</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>object</code></td>
<td><code>dict</code></td>
</tr>
<tr>
<td><code>array</code></td>
<td><code>list</code></td>
</tr>
<tr>
<td><code>string</code></td>
<td><code>str</code></td>
</tr>
<tr>
<td><code>number</code> (int)</td>
<td><code>int</code></td>
</tr>
<tr>
<td><code>number</code> (real)</td>
<td><code>float</code></td>
</tr>
<tr>
<td><code>true</code></td>
<td><code>True</code></td>
</tr>
<tr>
<td><code>false</code></td>
<td><code>False</code></td>
</tr>
<tr>
<td><code>null</code></td>
<td><code>None</code></td>
</tr>
</tbody>
</table>
</div>

Technically, this conversion isn’t a perfect inverse to the serialization table. That basically means that if you encode an object now and then decode it again later, you may not get exactly the same object back. I imagine it’s a bit like teleportation: break my molecules down over here and put them back together over there. Am I still the same person?

In reality, it’s probably more like getting one friend to translate something into Japanese and another friend to translate it back into English. Regardless, the simplest example would be encoding a tuple and getting back a list after decoding, like so:

In [None]:
blackjack_hand = (8, "Q")
encoded_hand = json.dumps(blackjack_hand)
decoded_hand = json.loads(encoded_hand)

In [None]:
blackjack_hand == decoded_hand

False

In [None]:
type(blackjack_hand)

tuple

In [None]:
type(decoded_hand)

list

In [None]:
blackjack_hand == tuple(decoded_hand)

True

This time, imagine you’ve got some data stored on disk that you’d like to manipulate in memory. You’ll still use the context manager, but this time you’ll open up the existing data_file.json in read mode.

In [None]:
with open("data/data_file.json", "r") as read_file:
    data = json.load(read_file)

In [None]:
data

{'president': {'name': 'Zaphod Beeblebrox', 'species': 'Betelgeusian'}}

Things are pretty straightforward here, but keep in mind that the result of this method could return any of the allowed data types from the conversion table. This is only important if you’re loading in data you haven’t seen before. In most cases, the root object will be a dict or a list.

If you’ve pulled JSON data in from another program or have otherwise obtained a string of JSON formatted data in Python, you can easily deserialize that with loads(), which naturally loads from a string:

In [None]:
json_string = """
{
    "researcher": {
        "name": "Ford Prefect",
        "species": "Betelgeusian",
        "relatives": [
            {
                "name": "Zaphod Beeblebrox",
                "species": "Betelgeusian"
            }
        ]
    }
}
"""
data = json.loads(json_string)

In [None]:
data

{'researcher': {'name': 'Ford Prefect',
  'species': 'Betelgeusian',
  'relatives': [{'name': 'Zaphod Beeblebrox', 'species': 'Betelgeusian'}]}}

### Vaja: parsing JSON data

Objective: using data file 'interface-data.json', create output that resembles the following by parsing the included JSON file.

    Interface Status
    ================================================================================
    DN                                                 Description           Speed    MTU  
    -------------------------------------------------- --------------------  ------  ------
    topology/pod-1/node-201/sys/phys-[eth1/33]                              inherit   9150 
    topology/pod-1/node-201/sys/phys-[eth1/34]                              inherit   9150 
    topology/pod-1/node-201/sys/phys-[eth1/35]                              inherit   9150 

In [None]:
head = """================================================================\n
DN                                                  Speed    MTU\n 
--------------------------------------------------  ------  ------\n"""

In [None]:
import json

with open('data/exer1-interface-data.json') as f:
    #jsondata = f.read()
    json_object = json.load(f)

In [None]:
imdata = json_object["imdata"]

In [None]:
with open('data/interface_output.txt', 'w') as f:
    f.write(head)
    #print(head)
    for interface in imdata:
        attributes = interface["l1PhysIf"]["attributes"]
        dn = attributes["dn"]
        speed = attributes["speed"]
        mtu = attributes["mtu"]
        data_string = f"{dn:50} {speed:8} {mtu:7}\n"
        #print(data_string)
        f.write(data_string)

### Parsing JSON Files With the pandas Library

In [None]:
import pandas as pd

[Primeri datasetov](https://github.com/jdorfman/awesome-json-datasets#bitcoin)

Reading a JSON string to pandas object can take a number of parameters. The parser will try to parse a DataFrame if typ is not supplied or is None. To explicitly force Series parsing, pass typ=series

    pd.read_json(json)

- dtype : if True, infer dtypes, if a dict of column to dtype, then use those, if False, then don’t infer dtypes at all, default is True, apply only to the data.
- convert_axes : boolean, try to convert the axes to the proper dtypes, default is True
- convert_dates : a list of columns to parse for dates; If True, then try to parse date-like columns, default is True.
- keep_default_dates : boolean, default True. If parsing dates, then parse the default date-like columns.
- numpy : direct decoding to NumPy arrays. default is False; Supports numeric data only, although labels may be non-numeric. Also note that the JSON ordering MUST be the same for each term if numpy=True.
- precise_float : boolean, default False. Set to enable usage of higher precision (strtod) function when decoding string to double values. Default (False) is to use fast but less precise builtin functionality.
- date_unit : string, the timestamp unit to detect if converting dates. Default None. By default the timestamp precision will be detected, if this is not desired then pass one of ‘s’, ‘ms’, ‘us’ or ‘ns’ to force timestamp precision to seconds, milliseconds, microseconds or nanoseconds respectively.
- lines : reads file as one json object per line. 
- encoding : The encoding to use to decode py3 bytes.
- chunksize : when used in combination with lines=True, return a JsonReader which reads in chunksize lines per iteration.

#### Orient options

`orient` :
- Series:
    - default is index
    - allowed values are {split, records, index}
- DataFrame:
    - default is columns
    - allowed values are {split, records, index, columns, values, table}

In [None]:
dfjo = pd.DataFrame(dict(A=range(1, 4), B=range(4, 7), C=range(7, 10)), columns=list('ABC'), index=list('xyz'))

In [None]:
dfjo

Unnamed: 0,A,B,C
x,1,4,7
y,2,5,8
z,3,6,9


The format of the JSON string:

<table class="colwidths-given table">
<colgroup>
<col style="width: 12%">
<col style="width: 88%">
</colgroup>
<tbody>
<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">split</span></code></p></td>
<td><p>dict like {index -&gt; [index], columns -&gt; [columns], data -&gt; [values]}</p></td>
</tr>
<tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">records</span></code></p></td>
<td><p>list like [{column -&gt; value}, … , {column -&gt; value}]</p></td>
</tr>
<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">index</span></code></p></td>
<td><p>dict like {index -&gt; {column -&gt; value}}</p></td>
</tr>
<tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">columns</span></code></p></td>
<td><p>dict like {column -&gt; {index -&gt; value}}</p></td>
</tr>
<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">values</span></code></p></td>
<td><p>just the values array</p></td>
</tr>
<tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">table</span></code></p></td>
<td><p>adhering to the JSON <a class="reference external" href="https://specs.frictionlessdata.io/json-table-schema/">Table Schema</a></p></td>
</tr>
</tbody>
</table>

- Column oriented (the default for DataFrame) serializes the data as nested JSON objects with column labels acting as the primary index:

In [None]:
dfjo.to_json(orient="columns")

'{"A":{"x":1,"y":2,"z":3},"B":{"x":4,"y":5,"z":6},"C":{"x":7,"y":8,"z":9}}'

- Index oriented (the default for Series) similar to column oriented but the index labels are now primary:

In [None]:
dfjo.to_json(orient="index")

'{"x":{"A":1,"B":4,"C":7},"y":{"A":2,"B":5,"C":8},"z":{"A":3,"B":6,"C":9}}'

- Record oriented serializes the data to a JSON array of column -> value records, index labels are not included. This is useful for passing DataFrame data to plotting libraries, for example the JavaScript library d3.js

In [None]:
dfjo.to_json(orient="records")

'[{"A":1,"B":4,"C":7},{"A":2,"B":5,"C":8},{"A":3,"B":6,"C":9}]'

- Value oriented is a bare-bones option which serializes to nested JSON arrays of values only, column and index labels are not included:

In [None]:
dfjo.to_json(orient="values")

'[[1,4,7],[2,5,8],[3,6,9]]'

- Split oriented serializes to a JSON object containing separate entries for values, index and columns. Name is also included for Series:

In [None]:
dfjo.to_json(orient="split")

'{"columns":["A","B","C"],"index":["x","y","z"],"data":[[1,4,7],[2,5,8],[3,6,9]]}'

- Table oriented serializes to the JSON Table Schema, allowing for the preservation of metadata including but not limited to dtypes and index names.

In [None]:
dfjo.to_json(orient="table")

'{"schema":{"fields":[{"name":"index","type":"string"},{"name":"A","type":"integer"},{"name":"B","type":"integer"},{"name":"C","type":"integer"}],"primaryKey":["index"],"pandas_version":"0.20.0"},"data":[{"index":"x","A":1,"B":4,"C":7},{"index":"y","A":2,"B":5,"C":8},{"index":"z","A":3,"B":6,"C":9}]}'

#### Primer: ocenas.json

In [None]:
# način 1
ocenas = pd.read_json('data/ocenas.json', orient='column')
ocenas.drop(columns='description', inplace=True)
ocenas.drop(['title', 'units', 'base_period', 'missing'], inplace=True)
ocenas.index.name = 'year'
ocenas.rename(columns={'data':'temp_anomaly_celsius'}, inplace=True)
ocenas.index = pd.to_datetime(ocenas.index).year
ocenas.head()

Unnamed: 0_level_0,temp_anomaly_celsius
year,Unnamed: 1_level_1
1880,-0.12
1881,-0.09
1882,-0.1
1883,-0.18
1884,-0.27


#### Primer: temperatures.json

In [None]:
#!cat ./data/temperatures.json

In [None]:
import json
#load json object
with open('data/temperatures.json') as f:
    d = json.load(f)

In [None]:
# podatke pretvorimo v json v pomnilniku
temps_json = json.dumps(d['data'])

In [None]:
temps = pd.read_json(temps_json, orient='index')
temps.head()

Unnamed: 0,value,anomaly
189512,50.34,-1.68
189612,51.99,-0.03
189712,51.56,-0.46
189812,51.43,-0.59
189912,51.01,-1.01


#### Primer: cities.json

In [None]:
# lahko prodamo amapk ne moremo dobit geolokacije vn
cities = pd.read_json('data/cities.json', orient='records')
cities.head(2) 

Unnamed: 0,name,id,nametype,recclass,mass,fall,year,reclat,reclong,geolocation,:@computed_region_cbhk_fwbd,:@computed_region_nnqa_25f4
0,Aachen,1,Valid,L5,21.0,Fell,1880-01-01T00:00:00.000,50.775,6.08333,"{'type': 'Point', 'coordinates': [6.08333, 50....",,
1,Aarhus,2,Valid,H6,720.0,Fell,1951-01-01T00:00:00.000,56.18333,10.23333,"{'type': 'Point', 'coordinates': [10.23333, 56...",,


In [None]:
with open('data/cities.json') as f:
    d = json.load(f)

In [None]:
from pandas.io.json import json_normalize
cities = json_normalize(d)

  cities = json_normalize(d)


In [None]:
cities['coordinates_x'] = cities['geolocation.coordinates'].str[0]
cities['coordinates_y'] = cities['geolocation.coordinates'].str[1]
cities.drop(columns=['geolocation.coordinates', ':@computed_region_cbhk_fwbd', ':@computed_region_nnqa_25f4'], inplace=True)
cities.set_index('name', inplace=True)
cities['mass'] = pd.to_numeric(cities['mass'])
# na podoben način še ostale
cities.head()

Unnamed: 0_level_0,id,nametype,recclass,mass,fall,year,reclat,reclong,geolocation.type,coordinates_x,coordinates_y
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Aachen,1,Valid,L5,21.0,Fell,1880-01-01T00:00:00.000,50.775,6.08333,Point,6.08333,50.775
Aarhus,2,Valid,H6,720.0,Fell,1951-01-01T00:00:00.000,56.18333,10.23333,Point,10.23333,56.18333
Abee,6,Valid,EH4,107000.0,Fell,1952-01-01T00:00:00.000,54.21667,-113.0,Point,-113.0,54.21667
Acapulco,10,Valid,Acapulcoite,1914.0,Fell,1976-01-01T00:00:00.000,16.88333,-99.9,Point,-99.9,16.88333
Achiras,370,Valid,L6,780.0,Fell,1902-01-01T00:00:00.000,-33.16667,-64.95,Point,-64.95,-33.16667


#### Primer: transactions.json

In [None]:
!head -n 10 data/transactions.json

{
"txs":[

{
   "lock_time":0,
   "ver":1,
   "size":373,
   "inputs":[
      {
         "sequence":4294967295,


In [None]:
with open('data/transactions.json') as f:
    data= json.load(f)

[pandas.json_normalize](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.json_normalize.html): Normalize semi-structured JSON data into a flat table.

In [None]:
from pandas.io.json import json_normalize

#json_normalize(data['txs']).head()

In [None]:
trans = json_normalize(data['txs'], record_path=['out'], meta=['time', 'relayed_by', 'vout_sz', 'hash'])

  trans = json_normalize(data['txs'], record_path=['out'], meta=['time', 'relayed_by', 'vout_sz', 'hash'])


In [None]:
trans.head()

Unnamed: 0,spent,tx_index,type,addr,value,n,script,time,relayed_by,vout_sz,hash
0,False,0,0,1H7r57SXAwaKs3Tf5ugbkRNxwfh9YaxC5b,7541,0,76a914b0cd787a7a879ac0a5277b0013ec7b11c145055d...,1586376721,0.0.0.0,2,0f06714015f334626a168ee3e0aa5e0d3866a33dad504b...
1,False,0,0,1BPULhbGfrojrknyD7aZYMtRVUu38Cn75j,1364400,1,76a91471f13b222426eb80b47d2413d21a8904ec1966b2...,1586376721,0.0.0.0,2,0f06714015f334626a168ee3e0aa5e0d3866a33dad504b...
2,False,0,0,1LQ6YURobx4EGZRp8bdEDHup6T56o5NGKN,3127836,0,76a914d4c895721d3a8cd74bb3ccbb699a3dbe342c0807...,1586376722,0.0.0.0,2,3684072a50d7389933210d7adf4f98640d3d53c8cb245e...
3,False,0,0,1HSLVVSSQmzaNG8sbakhFDrmpzUPZLnYCe,30036732,1,76a914b44cae99837337275d21d2c5c6ed6cddf7a7e9f7...,1586376722,0.0.0.0,2,3684072a50d7389933210d7adf4f98640d3d53c8cb245e...
4,False,0,0,3Lb2MJWbBE88BUHf6tAw8ZzhkR6H2cYRhR,206183,0,a914cf48401e3cf81080352f281ea859ccabd51a821487,1586376721,0.0.0.0,3,3d3cc141654170060a7e298a9e5298557970e8cd0051ab...


#### Primer: all_hour_geo.json

In [None]:
from pandas.io.json import json_normalize
import json

In [None]:
#load json object
with open('data/all_hour_geo.json') as f:
    d = json.load(f)

In [None]:
data = [element['properties'] for element in  d['features']]
all_hour_geo = json_normalize(data)

In [None]:
all_hour_geo.head(3)

Unnamed: 0,mag,place,time,updated,tz,url,detail,felt,cdi,mmi,...,ids,sources,types,nst,dmin,rms,gap,magType,type,title
0,0.69,"16km ESE of Anza, CA",1586352802900,1586353032308,-480,https://earthquake.usgs.gov/earthquakes/eventp...,https://earthquake.usgs.gov/earthquakes/feed/v...,,,,...,",ci39143639,",",ci,",",geoserve,nearby-cities,origin,phase-data,scit...",12.0,0.05468,0.14,98.0,ml,earthquake,"M 0.7 - 16km ESE of Anza, CA"
1,2.34,"7km ENE of Pahala, Hawaii",1586352794640,1586353127910,-600,https://earthquake.usgs.gov/earthquakes/eventp...,https://earthquake.usgs.gov/earthquakes/feed/v...,,,,...,",hv71464377,",",hv,",",geoserve,origin,phase-data,",49.0,0.02127,0.13,136.0,ml,earthquake,"M 2.3 - 7km ENE of Pahala, Hawaii"
2,0.85,"15km ESE of Anza, CA",1586352704490,1586352926133,-480,https://earthquake.usgs.gov/earthquakes/eventp...,https://earthquake.usgs.gov/earthquakes/feed/v...,,,,...,",ci39143631,",",ci,",",geoserve,nearby-cities,origin,phase-data,scit...",30.0,0.04884,0.14,55.0,ml,earthquake,"M 0.9 - 15km ESE of Anza, CA"


#### Primer: rates.json

In [None]:
#load json object
with open('data/rates.json') as f:
    d = json.load(f)

In [None]:
# json_normalize(d['rates'])

In [None]:
rates = json_normalize(d['rates'], record_path=['periods'], meta=['name', 'code', 'country_code'])
rates.head()

Unnamed: 0,effective_from,rates.super_reduced,rates.reduced,rates.standard,rates.reduced1,rates.reduced2,rates.parking,name,code,country_code
0,0000-01-01,4.0,10.0,21.0,,,,Spain,ES,ES
1,0000-01-01,,9.0,20.0,,,,Bulgaria,BG,BG
2,0000-01-01,,,27.0,5.0,18.0,,Hungary,HU,HU
3,0000-01-01,,12.0,21.0,,,,Latvia,LV,LV
4,0000-01-01,,,23.0,5.0,8.0,,Poland,PL,PL
