# JSON

The JSON format https://www.json.org/json-en.html was inspired by the object and array format used in the JavaScript language. It is a text based open standard designed for human readable data interchange. It is language independent. Since Python was invented before JavaScript, Python’s syntax for dictionaries and lists influenced the syntax of JSON. So the format of JSON is nearly identical to a combination of Python lists and dictionaries. Here is a JSON encoding that is roughly equivalent to the simple XML format:

JSON

JavaScript Object Notation

    {"menu": {
      "id": "file",
      "value": "File",
      "popup": {
        "menuitem": [
          {"value": "New", "onclick": "CreateNewDoc()"},
          {"value": "Open", "onclick": "OpenDoc()"},
          {"value": "Close", "onclick": "CloseDoc()"}
        ]
      }
    }}
    


Using Python
https://docs.python.org/3.6/library/json.html

A JSON object is a set of name/key value pairs. A pair like `"id" : "file"` or `"value" : "File"`. The value can be a string, number, object, array, True, False or Null value. The `"popup"` key has a value of another JSON object 
    
`{
    "menuitem": [
      {"value": "New", "onclick": "CreateNewDoc()"},
      {"value": "Open", "onclick": "OpenDoc()"},
      {"value": "Close", "onclick": "CloseDoc()"}
    ]
  }`
  
So JSON objects can be compared to a row in a dataframe with column names and values, but a value can be a complete new dataframe. It can have multiple records even. We call such a collection of values. it is represented by a `[ ]`. In the example above we see that the key "menuitem" has a value of the collection of 3 JSON objects, `[
      {"value": "New", "onclick": "CreateNewDoc()"},
      {"value": "Open", "onclick": "OpenDoc()"},
      {"value": "Close", "onclick": "CloseDoc()"}
    ]`

We can state that a JSON object looks like a dictionary. It can have a dictionary as a value, and this can lead to a dictionary in a dictionary tree. It is derived from Javascript in which an object is described like a dictionary. 

In [1]:
import json
f = open('../data/sample.json')
data = json.load(f)
# What does this look like?
data

{'max_score': 5.9047804,
 'took': 47,
 'total': 288,
 'hits': [{'_id': '8660',
   '_score': 5.9047804,
   'entrezgene': '8660',
   'name': 'insulin receptor substrate 2',
   'symbol': 'IRS2',
   'taxid': 9606},
  {'_id': '3667',
   '_score': 5.812647,
   'entrezgene': '3667',
   'name': 'insulin receptor substrate 1',
   'symbol': 'IRS1',
   'taxid': 9606},
  {'_id': '3643',
   '_score': 5.759741,
   'entrezgene': '3643',
   'name': 'insulin receptor',
   'symbol': 'INSR',
   'taxid': 9606},
  {'_id': '10580',
   '_score': 5.6254673,
   'entrezgene': '10580',
   'name': 'sorbin and SH3 domain containing 1',
   'symbol': 'SORBS1',
   'taxid': 9606},
  {'_id': '8471',
   '_score': 5.5908923,
   'entrezgene': '8471',
   'name': 'insulin receptor substrate 4',
   'symbol': 'IRS4',
   'taxid': 9606},
  {'_id': '60676',
   '_score': 5.443571,
   'entrezgene': '60676',
   'name': 'pappalysin 2',
   'symbol': 'PAPPA2',
   'taxid': 9606},
  {'_id': '3630',
   '_score': 5.4403577,
   'entrezgene

In [2]:
# Need to see what's available on the main branche
print(data.keys())


dict_keys(['max_score', 'took', 'total', 'hits'])


we can easily parse the data in a pandas dataframe

In [15]:
# see what is available on the hits branche
a = data['hits'][0]
print(a.keys())
    

dict_keys(['_id', '_score', 'entrezgene', 'name', 'symbol', 'taxid'])


In [16]:
import pandas as pd
df_data = pd.DataFrame.from_dict(data['hits'])

In [17]:
df_data


Unnamed: 0,_id,_score,entrezgene,name,symbol,taxid
0,8660,5.90478,8660,insulin receptor substrate 2,IRS2,9606
1,3667,5.812647,3667,insulin receptor substrate 1,IRS1,9606
2,3643,5.759741,3643,insulin receptor,INSR,9606
3,10580,5.625467,10580,sorbin and SH3 domain containing 1,SORBS1,9606
4,8471,5.590892,8471,insulin receptor substrate 4,IRS4,9606
5,60676,5.443571,60676,pappalysin 2,PAPPA2,9606
6,3630,5.440358,3630,insulin,INS,9606
7,6517,5.343458,6517,solute carrier family 2 member 4,SLC2A4,9606
8,10000,5.316108,10000,AKT serine/threonine kinase 3,AKT3,9606
9,3651,5.288981,3651,pancreatic and duodenal homeobox 1,PDX1,9606


In [18]:
print(df_data.dtypes)
## outputting to json

_id            object
_score        float64
entrezgene     object
name           object
symbol         object
taxid           int64
dtype: object


But we can also define a json object and write it to a json file

In [19]:
data = {"menu": {
  "id": "file",
  "value": "File",
  "popup": {
    "menuitem": [
      {"value": "New", "onclick": "CreateNewDoc()"},
      {"value": "Open", "onclick": "OpenDoc()"},
      {"value": "Close", "onclick": "CloseDoc()"}
    ]
  }
}}

with open('output.json', 'w') as f:
    json.dump(data, f)

we can use https://jsonlint.com to validate the output

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_json.html

Mind you if you want to write a dataframe to a json object you need to think about the orientation

In [20]:
data = df_data.to_json(orient='records')
data

'[{"_id":"8660","_score":5.9047804,"entrezgene":"8660","name":"insulin receptor substrate 2","symbol":"IRS2","taxid":9606},{"_id":"3667","_score":5.812647,"entrezgene":"3667","name":"insulin receptor substrate 1","symbol":"IRS1","taxid":9606},{"_id":"3643","_score":5.759741,"entrezgene":"3643","name":"insulin receptor","symbol":"INSR","taxid":9606},{"_id":"10580","_score":5.6254673,"entrezgene":"10580","name":"sorbin and SH3 domain containing 1","symbol":"SORBS1","taxid":9606},{"_id":"8471","_score":5.5908923,"entrezgene":"8471","name":"insulin receptor substrate 4","symbol":"IRS4","taxid":9606},{"_id":"60676","_score":5.443571,"entrezgene":"60676","name":"pappalysin 2","symbol":"PAPPA2","taxid":9606},{"_id":"3630","_score":5.4403577,"entrezgene":"3630","name":"insulin","symbol":"INS","taxid":9606},{"_id":"6517","_score":5.3434577,"entrezgene":"6517","name":"solute carrier family 2 member 4","symbol":"SLC2A4","taxid":9606},{"_id":"10000","_score":5.3161077,"entrezgene":"10000","name":"

In [9]:
with open('output.json', 'w') as f:
    json.dump(data, f)