# JSON
* JavaScript object notation
* Lightweight data interchange format for sharing data
* JSON data is written as name/value pairs. A name/value pair consist of a field name, followed by colon, followed by value. Ex "name":"purvil"
* JSON value must be string, number, an object(JSON object), an array, a boolean, null
* When it is stored to file, it is saved as string.

## Reading JSON `loads`
* Converting JSON into python value. 

In [36]:
import json

In [37]:
jsonData = '''{ "office": 
    {"medical": [
      { "room-number": 100,
        "use": "reception",
        "sq-ft": 50,
        "price": 75
      },
      { "room-number": 101,
        "use": "waiting",
        "sq-ft": 250,
        "price": 75
      },
      { "room-number": 102,
        "use": "examination",
        "sq-ft": 125,
        "price": 150
      },
      { "room-number": 103,
        "use": "examination",
        "sq-ft": 125,
        "price": 150
      },
      { "room-number": 104,
        "use": "office",
        "sq-ft": 150,
        "price": 100
      }
    ]},
    "parking": {
      "location": "premium",
      "style": "covered",
      "price": 750
    }
} '''

In [38]:
jsonToPython = json.loads(jsonData)

In [39]:
type(jsonToPython)

dict

In [40]:
jsonToPython

{'office': {'medical': [{'room-number': 100,
    'use': 'reception',
    'sq-ft': 50,
    'price': 75},
   {'room-number': 101, 'use': 'waiting', 'sq-ft': 250, 'price': 75},
   {'room-number': 102, 'use': 'examination', 'sq-ft': 125, 'price': 150},
   {'room-number': 103, 'use': 'examination', 'sq-ft': 125, 'price': 150},
   {'room-number': 104, 'use': 'office', 'sq-ft': 150, 'price': 100}]},
 'parking': {'location': 'premium', 'style': 'covered', 'price': 750}}

### Access components

In [41]:
jsonToPython['office']

{'medical': [{'room-number': 100,
   'use': 'reception',
   'sq-ft': 50,
   'price': 75},
  {'room-number': 101, 'use': 'waiting', 'sq-ft': 250, 'price': 75},
  {'room-number': 102, 'use': 'examination', 'sq-ft': 125, 'price': 150},
  {'room-number': 103, 'use': 'examination', 'sq-ft': 125, 'price': 150},
  {'room-number': 104, 'use': 'office', 'sq-ft': 150, 'price': 100}]}

In [42]:
jsonToPython['office']['medical']

[{'room-number': 100, 'use': 'reception', 'sq-ft': 50, 'price': 75},
 {'room-number': 101, 'use': 'waiting', 'sq-ft': 250, 'price': 75},
 {'room-number': 102, 'use': 'examination', 'sq-ft': 125, 'price': 150},
 {'room-number': 103, 'use': 'examination', 'sq-ft': 125, 'price': 150},
 {'room-number': 104, 'use': 'office', 'sq-ft': 150, 'price': 100}]

In [43]:
jsonToPython['parking']

{'location': 'premium', 'style': 'covered', 'price': 750}

In [44]:
jsonToPython['office']['medical'][0]

{'room-number': 100, 'use': 'reception', 'sq-ft': 50, 'price': 75}

In [45]:
jsonToPython['office']['medical'][0]['price']

75

### Writing JSON `dumps`
* Converting python dictionary to JSON

In [46]:
pythonDict = {'name':'Bob', 'aage':44, 'isEmployed':True}

In [47]:
dictToJson = json.dumps(pythonDict)

In [48]:
dictToJson

'{"name": "Bob", "aage": 44, "isEmployed": true}'

* JSON can only save python objects: List, Dict, boolean, numbers, character string and none

In [65]:
class Employee(object):
    def __init__(self, name, salary, zipcode):
        self.name = name
        self.salary = salary
        self.zip = zipcode

In [66]:
purvil = Employee("Purvil", 150000, 92126)

In [67]:
# jsonPurvil = json.dumps(purvil)
# we can not convert generic python object to JSON

* `We have to convert python object to dictionary and store it in JSON. This works for simple objects only
* To convert use,
    - `object.__dict__`

In [68]:
jsonPurvil = json.dumps(purvil, default=lambda x: x.__dict__)

In [69]:
jsonPurvil

'{"name": "Purvil", "salary": 150000, "zip": 92126}'

#### Sorting

In [70]:
print(json.dumps({"c":0, "b":0, "a":0}, sort_keys=True))

{"a": 0, "b": 0, "c": 0}


In [71]:
print(json.dumps({"c":0, "b":0, "a":0}, sort_keys=True, indent=4))

{
    "a": 0,
    "b": 0,
    "c": 0
}


#### Compact encoding
* Instead of `, ` use `,`
* Instead of `: ` use `:`

In [72]:
print(json.dumps([1,2,3,{'4':5, "6":7}], separators=(',', ':')))

[1,2,3,{"4":5,"6":7}]


### Object_hook in loads
* `object_hook` is an optional function that will be called with the result of any object literal decoded(a dict). The return value of object_hook will be used instead of the dictionary.

In [78]:
# Object JSON decoding
def as_complex(dct):
    if '__complex__' in dct:
        return complex(dct['real'], dct['imag'])
    return dct

In [79]:
var1 = json.loads('{"__complex__":true, "real": 1, "imag":2}', object_hook=as_complex)

In [80]:
var1

(1+2j)

In [81]:
type(var1)

complex

### `parse_float` `parse_int`

* `parse_float` will be used with string of every JSON float to be decoded. By default JSON string float will be decoded to python float type. But using it we can decode it to some other type like `decimal.Decimal`

In [82]:
var1 = json.loads('1.1', parse_float=decimal.Decimal)
var1

1.1

In [83]:
type(var1)

float

In [85]:
import decimal
var2 = json.loads('1.1', parse_float=decimal.Decimal)
type(var2)

decimal.Decimal

In [86]:
var2

Decimal('1.1')

* `parse_int` will decode every JSON string int to specified type.
* `parse_int = float`

## Serialization and De-serialization using `__dict__`
* Take python object and store in JSON called serialization. Opposite way around is deserialization. 

In [89]:
# Serialize
class Person:
    def __init__(self, name = None):
        if name:
            self.name = name

people = [Person("purvil"), Person('Japan')]

s = json.dumps([p.__dict__ for p in people])

In [90]:
# Deserialize
clones = json.loads(s)
print(clones)

[{'name': 'purvil'}, {'name': 'Japan'}]


In [91]:
for clone in clones:
    p = Person()
    p.__dict__ = clone
    print(p)
    print(p.name)

<__main__.Person object at 0x0000021B5B7E1898>
purvil
<__main__.Person object at 0x0000021B5B7E17B8>
Japan


## Handling non string keys

* The JSON format expects the keys to a dictionary to be string.
* If you have other type as keys in dictionary trying to encode as JSON will raise TypeError.
* One way to solve it is skip non string keys using the `skipkeys` argument.
* Rather than raising an exception non-string keys will simply ignored

In [98]:
data = [{'a':5, "b":3, 4:6}]

In [99]:
json.dumps(data)

'[{"a": 5, "b": 3, "4": 6}]'

In [100]:
data1 = [{'a':5, "b":3, ('b','x'):6}]

In [102]:
json.dumps(data1, skipkeys=True)

'[{"a": 5, "b": 3}]'

## Reading Writing Json to files `load()` `dump()`

In [105]:
data = [{'a':'A', 'b':(2,4), 'c':3.0}]

In [106]:
filename = 'jsonData.txt'

In [107]:
with open(filename, 'w') as outfile:
    json.dump(data, outfile)

In [108]:
print(open(filename, 'r').read())

[{"a": "A", "b": [2, 4], "c": 3.0}]


In [109]:
infile = open(filename, 'r')

In [110]:
json.load(infile)

[{'a': 'A', 'b': [2, 4], 'c': 3.0}]

--------------------------------

# Pickle
* Why we call dump and load instead write and read?
     - As we are dealing with binary data
* Pickle is used to serialize and deserialize python object structure also known as marshaling or flattening.
* Serialization refers to the process of converting an object in memory to byte stream that can be store on disk or send over a network.
* Pickle is very useful for ML algorithms, where we can save them to be able to make future predictions at later time without having to rewrite or retrain the model all again.

In [1]:
import pickle

In [3]:
emp = {1:'A', 2:'B', 3:'C', 4:'D', 5:'E'}

In [4]:
#pickle
pickling_on = open("emp.pickle", "wb") # make sure to open in binary mode
pickle.dump(emp,  pickling_on)
pickling_on.close()

In [None]:
# unpickle


In [5]:
pickle_off = open("emp.pickle", "rb")
emp2 = pickle.load(pickle_off)

In [6]:
print(emp2)

{1: 'A', 2: 'B', 3: 'C', 4: 'D', 5: 'E'}


* `pickle.picklingError` occur when trying to pickle an object that does not support pickling
* `pickle.unpicklingError` raise when file contains corrupted data
* `EOFError`: THis is raised when the end of file is detected.

* Check pickle protocol to make sure compatibility

In [9]:
pickle.HIGHEST_PROTOCOL 

4

* Unlike JSON we can store any data.
* Saved data is no readable so provide data security.
* Non python program may not be recontruct pickled python object.
* It is very slow.
* `cpickle` is 1000 time faster. But in python 2 only.
* Python 3's pickle is faster.

* We can pickle object boolean, int, float, complex, string, tuple, list, set, dictionary.
* Also classes  and function can be pickled
* Generator, inner class, lambda function can not be pickled easily. use `dill` package.

#### Create compressed file

In [15]:
import bz2
sfile = bz2.BZ2File('compressedFile', 'w') # no need of b as compressed is already written as binart
pickle.dump(emp,sfile)

* To unpickle pythonh 2 object in python 3 use `encoding = 'latin1'` in the load() function

### Pickle and multiprocessing

* When a task is divided into several processes, we might need to share data.
* Process do not share memory space, so when they want to send info between each other they use serialization, which is done using pickle module.

In [17]:
import multiprocessing as mp

In [20]:
from math import cos
p = mp.Pool(2) # 2 CPUs
p.map(cos, range(10))

[1.0,
 0.5403023058681398,
 -0.4161468365471424,
 -0.9899924966004454,
 -0.6536436208636119,
 0.28366218546322625,
 0.960170286650366,
 0.7539022543433046,
 -0.14550003380861354,
 -0.9111302618846769]

In [21]:
# p.map(lambda x:x**2, range(10)) # will cause error as pickle is used by mp and pickle does not support lambda functions

* `dill` package similar to pickle that can serialize lambda function and other things. It is identical to pickle in other ways

In [25]:
import dill
dill.dump(lambda x: x**2, open('filename', 'wb'))

* To use lambda function in multiprocessing use fork of multiprocessing called `pathos.multiprocessing` This use dill for serialization instead of pickle.

In [28]:
import pathos.multiprocessing as mp
p = mp.Pool(2)
p.map(lambda x: 2**x, range(10))

[1, 2, 4, 8, 16, 32, 64, 128, 256, 512]

------------------------------

### CSV module

In [2]:
import csv

* CSV is lightweight
* Each line of text is single row.
* Field are separated by comma character
* Just data itself.

#### csv.reader
* Returns the reader object that will iterate over line in the given csv file.

In [9]:
with open("beatles-discography.csv", 'r') as csvfile:
    csvreader = csv.reader(csvfile, delimiter = ',')
    for row in csvreader:
        print(row)

['Title', 'Released', 'Label', 'UK Chart Position', 'US Chart Position', 'BPI Certification', 'RIAA Certification']
['Please Please Me', '22 March 1963', 'Parlophone(UK)', '1', '-', 'Gold', 'Platinum']
['With the Beatles', '22 November 1963', 'Parlophone(UK)', '1', '-', 'Platinum', 'Gold']
['Beatlemania! With the Beatles', '25 November 1963', 'Capitol(CAN)', '-', '-', '', '']
['Introducing... The Beatles', '10 January 1964', 'Vee-Jay(US)', '-', '2', '', '']
['Meet the Beatles!', '20 January 1964', 'Capitol(US)', '-', '1', '', '5xPlatinum']
['Twist and Shout', '3 February 1964', 'Capitol(CAN)', '-', '-', '', '']
["The Beatles' Second Album", '10 April 1964', 'Capitol(US)', '-', '1', '', '2xPlatinum']
["The Beatles' Long Tall Sally", '11 May 1964', 'Capitol(CAN)', '-', '-', '', '']
["A Hard Day's Night", '26 June 1964', 'United Artists(US)[C]', '-', '1', '', '4xPlatinum']
['', '10 July 1964', 'Parlophone(UK)', '1', '-', 'Gold', '']
['Something New', '20 July 1964', 'Capitol(US)', '-', '2

#### csv.writer
* Used to write csv fileZ

In [1]:
#### csv.DictReader

In [5]:
with open('beatles-discography.csv', 'r') as file:
    data = csv.DictReader(file)
    print(data)
    for line in data:
        print(line)

<csv.DictReader object at 0x000001EE472525F8>
OrderedDict([('Title', 'Please Please Me'), ('Released', '22 March 1963'), ('Label', 'Parlophone(UK)'), ('UK Chart Position', '1'), ('US Chart Position', '-'), ('BPI Certification', 'Gold'), ('RIAA Certification', 'Platinum')])
OrderedDict([('Title', 'With the Beatles'), ('Released', '22 November 1963'), ('Label', 'Parlophone(UK)'), ('UK Chart Position', '1'), ('US Chart Position', '-'), ('BPI Certification', 'Platinum'), ('RIAA Certification', 'Gold')])
OrderedDict([('Title', 'Beatlemania! With the Beatles'), ('Released', '25 November 1963'), ('Label', 'Capitol(CAN)'), ('UK Chart Position', '-'), ('US Chart Position', '-'), ('BPI Certification', ''), ('RIAA Certification', '')])
OrderedDict([('Title', 'Introducing... The Beatles'), ('Released', '10 January 1964'), ('Label', 'Vee-Jay(US)'), ('UK Chart Position', '-'), ('US Chart Position', '2'), ('BPI Certification', ''), ('RIAA Certification', '')])
OrderedDict([('Title', 'Meet the Beatles