# Working with NewLab data set using SQL data
We have data from our Sept-Dec deployment in NL uploded to our SQL DB 

I'd like to try: 
* pulling this data
* SQL query experimentation
* plotting 

## FYI sqlconfig
to ```import sqlconfig``` the file "sqlconfig.py" should be in this folder or directory adjusted acordingly  
This file has the user/password for SQL connection and is in the gitignore so you will have to create this locally

---
Create sqlconfig.py as:
```python
# .gitignore should include reference to config.py
passwd = "[password]"
user = "[username]"
```
---

In [2]:
import dash
import dash_core_components as dcc
import dash_html_components as html
import plotly.graph_objs as go
import plotly.figure_factory as FF
from datetime import datetime
import glob
import os.path
import pymysql
import sqlconfig # From sqlconfig.py
import pandas as pd
import sqlalchemy
import psycopg2
from tqdm import tqdm
print("Import Complete")

Import Complete


### SQL setup
create engine for CBAS db

In [3]:
passwd = sqlconfig.passwd  # From sqlconfig.py
user = sqlconfig.user  # From sqlconfig.py
DB = 'NewLab'  #name of databases to activate 
user

'sm'

In [4]:
engine = sqlalchemy.create_engine('postgresql+psycopg2://'+user+':'+passwd+'@35.221.58.17/'+DB)

In [5]:
query= ''' 
SELECT * from cbasnl
-- where sensor = 'protoCBAS-G' AND
-- timestamp BETWEEN '2019-09-21 00:00:00' and '2019-09-30 11:59:00'
ORDER BY timestamp asc;
'''


In [17]:
#place query in CBAStest df

CBAS =  pd.read_sql(query,engine,
                        index_col=["timestamp"])

In [18]:
CBAS.head()

Unnamed: 0_level_0,battery,Tdb_BME680,RH_BME680,P_BME680,Alt_BME680,TVOC,ECO2,RCO2,Tdb_scd30,RH_scd30,...,PPD_fixed_air,Ta_adj_fixed_air,Cooling_effect_fixed_air,SET_fixed_air,TComf_fixed_air,TempDiff_fixed_air,TComfLower_fixed_air,TComfUpper_fixed_air,Acceptability_fixed_air,Condit_fixed_air
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2019-09-06 15:59:00+00:00,,,,,,,,,,,...,,,,-999.0,25.076096,,21.576096,28.576096,False,-1
2019-09-06 15:59:00+00:00,,,,,,,,,,,...,,,,-999.0,25.076096,,21.576096,28.576096,False,-1
2019-09-06 15:59:00+00:00,,,,,,,,,,,...,,,,-999.0,25.076096,,21.576096,28.576096,False,-1
2019-09-06 15:59:00+00:00,,,,,,,,,,,...,,,,-999.0,25.076096,,21.576096,28.576096,False,-1
2019-09-06 15:59:00+00:00,,,,,,,,,,,...,,,,-999.0,25.076096,,21.576096,28.576096,False,-1


Check which sensors are in this data set

In [19]:
print(CBAS.sensor.unique())
# what unique values are in "sensor" column
print(type(CBAS.index)) # check timestamp is recognized as DatetimeIndex
print(CBAS.index.min())
print(CBAS.index.max())
# min/max index valeus (date range)

['protoCBAS-A' 'protoCBAS-B' 'protoCBAS-C' 'protoCBAS-D' 'protoCBAS-G']
<class 'pandas.core.indexes.datetimes.DatetimeIndex'>
2019-09-06 15:59:00+00:00
2020-01-08 19:05:00+00:00


### So "CBAS" is one dataframe 
* df date ranges from '2019-09-06 15:59:00' - '2020-01-08 19:05:00'
* Sensors in this df ['protoCBAS-A', 'protoCBAS-B', 'protoCBAS-C', 'protoCBAS-D','protoCBAS-G']

This is a bit different from how we usually managed dataframes where we had a list of dataframes for each sensor.  
### options to manage (one df vs list of dfs):
* grouping this df by "sensor" column into different dfs and place them in list
* Use Pandas filtering/grouping as needed, only situation I see this needed is when ploting different sensors as seperate traces.


### Further exploring the dataset...

In [20]:
print(CBAS.columns)
# what columns do we have

Index(['battery', 'Tdb_BME680', 'RH_BME680', 'P_BME680', 'Alt_BME680', 'TVOC',
       'ECO2', 'RCO2', 'Tdb_scd30', 'RH_scd30', 'Lux', 'PM1', 'PM25', 'PM10',
       'Air', 'sensor_SD', 'note', 'sensor_note', 'Coord_X_m', 'Coord_Y_m',
       'Coord_Z_m', 'Position_HumanReadable', 'Wkdy', 'Hour', 'Month', 'TOD',
       'DOY', 'sensor', 'UTCI_approx', 'UTCI_comfortable', 'UTCI_stressRange',
       'PMV', 'PPD', 'Ta_adj', 'Cooling_effect', 'SET', 'running_mean',
       'TComf', 'TempDiff', 'TComfLower', 'TComfUpper', 'Acceptability',
       'Condit', 'UTCI_approx_fixed_air', 'UTCI_comfortable_fixed_air',
       'UTCI_stressRange_fixed_air', 'PMV_fixed_air', 'PPD_fixed_air',
       'Ta_adj_fixed_air', 'Cooling_effect_fixed_air', 'SET_fixed_air',
       'TComf_fixed_air', 'TempDiff_fixed_air', 'TComfLower_fixed_air',
       'TComfUpper_fixed_air', 'Acceptability_fixed_air', 'Condit_fixed_air'],
      dtype='object')


In [21]:
print(CBAS.Position_HumanReadable.unique())
# unique values in Position_HumanReadable column

[None 'FA HQ (Gnd floor desks west of fishbowls)'
 '2nd floor Mezz Across from Quiet Area'
 'Cubicles (Gnd floor west of event space)' 'In Quiet Area, 2nd floor'
 '2nd floor Mezz Quiet Area' 'Bridge (middle of bridge)'
 'Lynq loft ( 2nd floor)' 'central corridor (ground floor) cubicle'
 '"Wind Tunnel"' 'desk near supply vent' 'Supply vent' 'Cubicles new desk'
 'bench under front staircase' 'welcome desk']


Thinking how this will plot, and how this data differes from .CSV files usually used
looking at extradata_to_static_dash.py script and how it manages dfs

Some in script modifications to this data for plotting are:
* Timezone convesion
    Convert to NYC tz, index will need to be timezone aware to work  
    Code:
    ```python
    def tz_NYC(d): 
        d.index = d.index.tz_convert('America/New_York')
        return d
    dfs = list(map(tz_NYC, dfs))    
    
    ```
    ---
* Adjust for "gremlins" in CBAS-B CO2 sensor  
```
RCO2 data is offset by +782ppm from "2019-09-05 "-"2019-11-10 "
after "2019-11-10 " CO2 seemed to report as it should.  
Made this tweak in January, not sure if anything has changed as of writing this (Mar-09) 
```
    ```python
     dfs[1]["2019-09-05 ":"2019-11-10 "]["RCO2"] = (dfs[1]["2019-09-05 ":"2019-11-10 "]["RCO2"]-782) #adjust for gremlins in CBAS-B CO2 sensor

    ```
    ---
* Remove "Wind Tunnel" testing  
    pull data reffering to Wind tunnel as it is not related to NewLab 
```Python
dfs = [d.loc[d["Position_HumanReadable"] != '"Wind Tunnel"'] for d in dfs]
```
---

### Adjusting snippets for this data
    My goal with this "NewLab" table is to have a dataset that is ready to pull and work with  
    requiring less redundant modifications.  
    So things like pulling out the wind tunnel testing and offsetting RCO@ for CBAS-B should be handled already

In [22]:
def tz_NYC(d): 
        d.index = d.index.tz_convert('America/New_York')
        return d

In [23]:
CBASNYC = tz_NYC(CBAS) # data in db shoudl stay as UTC, only convert just before displaying

In [24]:
CBASNYC.head()

Unnamed: 0_level_0,battery,Tdb_BME680,RH_BME680,P_BME680,Alt_BME680,TVOC,ECO2,RCO2,Tdb_scd30,RH_scd30,...,PPD_fixed_air,Ta_adj_fixed_air,Cooling_effect_fixed_air,SET_fixed_air,TComf_fixed_air,TempDiff_fixed_air,TComfLower_fixed_air,TComfUpper_fixed_air,Acceptability_fixed_air,Condit_fixed_air
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2019-09-06 11:59:00-04:00,,,,,,,,,,,...,,,,-999.0,25.076096,,21.576096,28.576096,False,-1
2019-09-06 11:59:00-04:00,,,,,,,,,,,...,,,,-999.0,25.076096,,21.576096,28.576096,False,-1
2019-09-06 11:59:00-04:00,,,,,,,,,,,...,,,,-999.0,25.076096,,21.576096,28.576096,False,-1
2019-09-06 11:59:00-04:00,,,,,,,,,,,...,,,,-999.0,25.076096,,21.576096,28.576096,False,-1
2019-09-06 11:59:00-04:00,,,,,,,,,,,...,,,,-999.0,25.076096,,21.576096,28.576096,False,-1


In [26]:
CBASxwind = CBAS.loc[CBAS["Position_HumanReadable"] != '"Wind Tunnel"']

In [17]:
print(CBAStestxwind.Position_HumanReadable.unique())

[None '2nd floor Mezz Across from Quiet Area' '2nd floor Mezz Quiet Area'
 'Bridge (middle of bridge)' 'desk near supply vent' 'Supply vent'
 'In Quiet Area, 2nd floor' 'Cubicles (Gnd floor west of event space)'
 'Cubicles new desk' 'FA HQ (Gnd floor desks west of fishbowls)'
 'Lynq loft ( 2nd floor)' 'bench under front staircase' 'welcome desk'
 'central corridor (ground floor) cubicle']


##  SELECT board

In [27]:
Aquery= ''' 
SELECT * 
FROM cbasnl
WHERE sensor = 'protoCBAS-A' 
-- AND timestamp BETWEEN '2019-09-06 00:00:00' and '2019-09-30 11:59:00'
'''


CBASA =  pd.read_sql(Aquery,engine,parse_dates=["timestamp"], index_col=["timestamp"])

In [28]:
CBASA.head()

Unnamed: 0_level_0,battery,Tdb_BME680,RH_BME680,P_BME680,Alt_BME680,TVOC,ECO2,RCO2,Tdb_scd30,RH_scd30,...,PPD_fixed_air,Ta_adj_fixed_air,Cooling_effect_fixed_air,SET_fixed_air,TComf_fixed_air,TempDiff_fixed_air,TComfLower_fixed_air,TComfUpper_fixed_air,Acceptability_fixed_air,Condit_fixed_air
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2019-09-06 15:59:00+00:00,,,,,,,,,,,...,,,,-999.0,25.076096,,21.576096,28.576096,False,-1
2019-09-06 16:00:00+00:00,,,,,,,,,,,...,,,,-999.0,25.076096,,21.576096,28.576096,False,-1
2019-09-06 16:00:00+00:00,,,,,,,,,,,...,,,,-999.0,25.076096,,21.576096,28.576096,False,-1
2019-09-06 16:00:00+00:00,,,,,,,,,,,...,,,,-999.0,25.076096,,21.576096,28.576096,False,-1
2019-09-06 20:20:00+00:00,3.972174,24.16,55.79,100.95,,0.0,400.0,800.0,24.58,56.38,...,30.305161,21.500962,20.325461,3.834539,25.076096,-0.496096,21.576096,28.576096,True,0


In [30]:
CBASA.sensor

timestamp
2019-09-06 15:59:00+00:00    protoCBAS-A
2019-09-06 16:00:00+00:00    protoCBAS-A
2019-09-06 16:00:00+00:00    protoCBAS-A
2019-09-06 16:00:00+00:00    protoCBAS-A
2019-09-06 20:20:00+00:00    protoCBAS-A
                                ...     
2020-01-08 18:35:00+00:00    protoCBAS-A
2020-01-08 18:40:00+00:00    protoCBAS-A
2020-01-08 18:45:00+00:00    protoCBAS-A
2020-01-08 18:50:00+00:00    protoCBAS-A
2020-01-08 18:55:00+00:00    protoCBAS-A
Name: sensor, Length: 35835, dtype: object