# JSON dictionary

Date: August 2019 | Author: Hilary Goh | Team: Woodside, Intelligent Assets & Robotics

---

Only using data from one pump
* cleanest signal
* Ground truth available
* ON & OFF state

###### OPAM:1473 | FLOC: AU21.A6424AP7 | Thing ID: 00:80:00:00:04:01:21:C0 
###### PHD tag: PGP.64FI491.DACA.PV | Group: Membrane Biological Reactor | GTm: m3/hr

---

Test a random forest, aim > 80% accuracy

Can it select useful features?

Will it just use temperature - easy feature

###### Data source:
* saved json from OnOff_c0.ipynb
* var timestart="2019-07-07 00:00:00"
* var timeend="2019-07-27 23:55:00"
* no filtering - full raw dataset


Test/Train/Validate
* need to set aside a test set (unseen data)

---

###### Import libraries and modules

In [3]:
%matplotlib inline
import pandas as pd
import pandas_profiling
import numpy as np
import matplotlib.pyplot as plt
import json

## Examine dicts and lists
* check timestamp conversion
* check array (correct columns, correct index)
* split pandas dataframe
* check keys

In [4]:
with open('c0.json','r') as fp: #open the file for viewing, not using pandas just yet
     rawr = json.load(fp)

In [5]:
rawr.keys() #check for dictionaries and lists

dict_keys(['id', 'start', 'end', 'V1', 'V2', 'V3', 'T', 'gotData', 'groundtruth', 'sensorId', 'sensorType', 'QUUUUuery'])

In [6]:
print (rawr['id']) #should only be one sensor ID
print (rawr['start'])
print (rawr['end'])
print (rawr['gotData'])
print (rawr['sensorId'])
print (rawr['sensorType'])

00:80:00:00:04:01:21:c0
2019-07-07 00:00:00
2019-07-27 23:55:00
False
0
T


In [7]:
rawr['groundtruth'].keys()

dict_keys(['PHD', 'start', 'end', 'url', 'PHDhits', 'OnOff', 'ts', 'value'])

In [8]:
print (rawr['groundtruth']['PHD'])
print (rawr['groundtruth']['start'])
print (rawr['groundtruth']['end'])
print (rawr['groundtruth']['url'])

PGP.64FI491.DACA.PV
2019-07-06 16:00:00
2019-07-27 15:55:00
https://api-timeseries.woodside.io/api/v1/tag?query={%22start_time%22:%222019-07-06 16:00:00%22,%22end_time%22:%222019-07-27 15:55:00%22,%22tags%22:[{%22name%22:%22PGP.64FI491.DACA.PV%22}],%22format%22:%22json%22}


In [9]:
rawr['groundtruth']['PHDhits'][0:4]

[{'name': 'PGP.64FI491.DACA.PV',
  'value': 109.1254,
  'time': '2019-07-06 16:01:00'},
 {'name': 'PGP.64FI491.DACA.PV',
  'value': 108.9785,
  'time': '2019-07-06 16:02:00'},
 {'name': 'PGP.64FI491.DACA.PV',
  'value': 109.0653,
  'time': '2019-07-06 16:03:00'},
 {'name': 'PGP.64FI491.DACA.PV',
  'value': 108.9085,
  'time': '2019-07-06 16:04:00'}]

In [23]:
rawr['groundtruth']['OnOff'][0:4]

[1, 1, 1, 1]

In [11]:
rawr['groundtruth']['ts'][0:10]

['2019-7-7 00:01:00',
 '2019-7-7 00:02:00',
 '2019-7-7 00:03:00',
 '2019-7-7 00:04:00',
 '2019-7-7 00:05:00',
 '2019-7-7 00:06:00',
 '2019-7-7 00:07:00',
 '2019-7-7 00:08:00',
 '2019-7-7 00:09:00',
 '2019-7-7 00:10:00']

In [24]:
rawr['groundtruth']['value'][0:10] #flowrate, m3/hr

[109.1254,
 108.9785,
 109.0653,
 108.9085,
 108.9919,
 108.815,
 109.0453,
 109.1154,
 109.0953,
 109.1187]

In [13]:
rawr['QUUUUuery'].keys()

dict_keys(['size', 'from', 'query', 'sort'])

In [14]:
print (rawr['QUUUUuery']['size'])
print (rawr['QUUUUuery']['from'])
print (rawr['QUUUUuery']['sort'])

2500
0
[{'Timestamp': 'asc'}]


In [15]:
rawr['QUUUUuery']['query'].keys()

dict_keys(['bool'])

In [16]:
rawr['QUUUUuery']['query']['bool'].keys()

dict_keys(['must', 'filter'])

In [17]:
rawr['QUUUUuery']['query']['bool']['must']

[{'match_phrase': {'TagName': '00:80:00:00:04:01:21:c0'}},
 {'match_phrase': {'SensorType': 'T'}}]

In [18]:
rawr['QUUUUuery']['query']['bool']['filter'].keys()

dict_keys(['range'])

In [19]:
rawr['QUUUUuery']['query']['bool']['filter']['range']

{'Timestamp': {'gte': '2019-07-07 00:00:00',
  'lte': '2019-07-27 23:55:00',
  'time_zone': '+08:00',
  'format': 'yyyy-MM-dd HH:mm:ss'}}

##### Temperature Dictionary

In [6]:
rawr['T'].keys()

dict_keys(['gotData', 'hits', 'data'])

In [16]:
rawr['T']['gotData']

True

In [20]:
rawr['T']['hits'][0:1]

[{'_index': 'iottimeseries_temperature',
  '_type': '_doc',
  '_id': 'NmYGyGsB3zul1noaseq0',
  '_score': None,
  '_source': {'MoteId': '00:80:00:00:04:01:21:c0',
   'DeploymentId': -1,
   'TagName': '00:80:00:00:04:01:21:c0--1-T1u',
   'SensorId': 'T1u',
   'Timestamp': 1562429007,
   'TagValue': '23.5',
   'Rssi': -75,
   'Voltage': 3.08,
   'MsgNo': 873,
   'MsgNoPersist': 877,
   'SensorType': 'T'},
  'sort': [1562429007000]}]

In [22]:
rawr['T']['data'].keys()

dict_keys(['T1', 'T2', 'T3', 'OnOff'])

In [23]:
rawr['T']['data']['T1'].keys() #nb another T

dict_keys(['ts', 'T'])

In [29]:
rawr['T']['data']['T1']['ts'][-2:] #check timestamp format for temperature
# readings every half hour

['2019-7-24 13:06:10', '2019-7-24 13:36:10']

In [25]:
rawr['T']['data']['T1']['T'][0:10] #check temperature values
# readings every half hour

['23.5', '23.1', '23', '22.9', '22.8', '22.7', '22.6', '22.5', '22.3', '22.3']

In [30]:
rawr['T']['data']['OnOff'][0:5] #Brad's on/off detector

[1, 1, 1, 1, 1]

###### Vibration dictionary

In [106]:
rawr['V1'].keys()

dict_keys(['gotData', 'hits', 'data'])

In [108]:
rawr['V1']['data'].keys()

dict_keys(['freq', 'amp', 'ts', 'OnOff'])

In [111]:
rawr['V1']['data']['freq'][0:10]

[320, 13, 5, 11, 213, 320, 5, 213, 11, 13]

In [25]:
rawr['V2']['data']['freq'][0:10]

[213, 13, 11, 533, 8, 16, 13, 213, 8, 533]

In [26]:
rawr['V3']['data']['freq'][0:10]

[18, 5, 30, 107, 7, 18, 30, 5, 7, 107]

In [113]:
rawr['V1']['data']['amp'][0:10]

[0.06952,
 0.07669,
 0.07902,
 0.0855,
 0.10278,
 0.05162,
 0.05633,
 0.07594,
 0.0872,
 0.08946]

In [11]:
rawr['V1']['data']['ts'][0:10] #check timestamp format for vib 1
# readings every half hour, top 5 peaks

['2019-7-7 00:03:56',
 '2019-7-7 00:03:56',
 '2019-7-7 00:03:56',
 '2019-7-7 00:03:56',
 '2019-7-7 00:03:56',
 '2019-7-7 00:33:55',
 '2019-7-7 00:33:55',
 '2019-7-7 00:33:55',
 '2019-7-7 00:33:55',
 '2019-7-7 00:33:55']

In [116]:
rawr['V1']['data']['OnOff'][0:10]

[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

In [13]:
rawr['V3']['data']['ts'][0:10] #check timestamp format for vib 1
# readings every half hour, top 5 peaks

['2019-7-7 00:04:50',
 '2019-7-7 00:04:50',
 '2019-7-7 00:04:50',
 '2019-7-7 00:04:50',
 '2019-7-7 00:04:50',
 '2019-7-7 00:34:53',
 '2019-7-7 00:34:53',
 '2019-7-7 00:34:53',
 '2019-7-7 00:34:53',
 '2019-7-7 00:34:53']

---