# Purpose

The purpose of this Notebook is to read a HOBO Sensor table "as-is" and:

 * Extract the Hobo ID into a **hoboid** variable. 
 * Remove redundant row numbers (**#**) column.
 * Extract the Timezone into the **timezone** variable.
 * Check and remove any readings which contain **duplicated timestamps**.
 * Format the table containing the sensor readings in columns into 3 separate tables, one for each sensor. 
    * Add the units mentioned in each column **name** as values in every row as a new column.

## 1. Extract `hoboid `

In [324]:
import csv

with open('To_Insert/9790163-sample.csv', 'r') as f:
    rdr = csv.reader(f)
    line1 = next(rdr) # Remove the first row, which breaks the csv format and contains the hoboid
    line1 = line1[0]
    #Extract hoboid
    hoboid = line1.split(': ')[1][0:-1]
    #Store in table the remainder of the table
    table = list(rdr)
#Show the first 3 rows of the table, without the hoboid
print(table[0:3])

[['#', 'Date Time, GMT-10:00', 'Temp, °F (LGR S/N: 9790163, SEN S/N: 9790163)', 'RH, % (LGR S/N: 9790163, SEN S/N: 9790163)', 'Intensity, lum/ft² (LGR S/N: 9790163, SEN S/N: 9790163)'], ['1', '02/03/17 04:00:00 PM', '76.375', '69.420', '1.8'], ['2', '02/03/17 04:01:00 PM', '76.332', '69.296', '1.8']]


# Remove redundant row numbers `#` column

In [325]:
#Load table
import pandas
df = pandas.DataFrame(table[1:],columns=table[0])
df.head()

Unnamed: 0,#,"Date Time, GMT-10:00","Temp, °F (LGR S/N: 9790163, SEN S/N: 9790163)","RH, % (LGR S/N: 9790163, SEN S/N: 9790163)","Intensity, lum/ft² (LGR S/N: 9790163, SEN S/N: 9790163)"
0,1,02/03/17 04:00:00 PM,76.375,69.42,1.8
1,2,02/03/17 04:01:00 PM,76.332,69.296,1.8
2,3,02/03/17 04:02:00 PM,76.203,67.938,1.8
3,4,02/03/17 04:03:00 PM,75.942,68.361,1.8
4,5,02/03/17 04:04:00 PM,75.724,68.698,1.8


After removal of row number column:

In [326]:
df = df.iloc[:,1:]
df.head()

Unnamed: 0,"Date Time, GMT-10:00","Temp, °F (LGR S/N: 9790163, SEN S/N: 9790163)","RH, % (LGR S/N: 9790163, SEN S/N: 9790163)","Intensity, lum/ft² (LGR S/N: 9790163, SEN S/N: 9790163)"
0,02/03/17 04:00:00 PM,76.375,69.42,1.8
1,02/03/17 04:01:00 PM,76.332,69.296,1.8
2,02/03/17 04:02:00 PM,76.203,67.938,1.8
3,02/03/17 04:03:00 PM,75.942,68.361,1.8
4,02/03/17 04:04:00 PM,75.724,68.698,1.8


## Extract `timezone` 

In [327]:
#Extract timezone and units
timezone_units = df.columns
#First separate the variable name from the timezone/unit description
    #Names
names = [x.split(', ')[0] for x in timezone_units]
df.columns = names
print('Extracted Sensor Names: ',names)
    # Units
timezone_units = [x.split(', ')[1] for x in timezone_units]
#Timezone needs no further pre-processing
timezone=timezone_units[0]
#But units do:
units = [x.split(' ')[0] for x in timezone_units[1:]]
print('Extracted Timezone: ',timezone,'\nExtracted Units: ',units)


Extracted Sensor Names:  ['Date Time', 'Temp', 'RH', 'Intensity']
Extracted Timezone:  GMT-10:00 
Extracted Units:  ['°F', '%', 'lum/ft²']


# Remove (if any) rows if `timestamp` is duplicated  

In [328]:
# On the test file, row 9 was artificially duplicated for testing.
# Notice row 9 only has as duplicate the timestamp. All else is different. This is by design per Eileen's request.
print('Number of rows BEFORE removal: ',len(df.index))
df = df.drop_duplicates(subset='Date Time')
print('Number of rows AFTER removal: ',len(df.index))

Number of rows BEFORE removal:  2730
Number of rows AFTER removal:  2729


# Create the `temperature sensor` table. 

In [329]:
#Format the temperature sensor table
temperature_sensor = df.loc[:,['Date Time','Temp']]
#Now add the units 
temperature_sensor['Unit'] = units[0]
temperature_sensor.head()
#temperature_sensor.to_csv('lala_land_path')

Unnamed: 0,Date Time,Temp,Unit
0,02/03/17 04:00:00 PM,76.375,°F
1,02/03/17 04:01:00 PM,76.332,°F
2,02/03/17 04:02:00 PM,76.203,°F
3,02/03/17 04:03:00 PM,75.942,°F
4,02/03/17 04:04:00 PM,75.724,°F


# Create the `relative humidity sensor` table. 

In [330]:
#Format the temperature sensor table
rh_sensor = df.loc[:,['Date Time','RH']]
#Now add the units 
rh_sensor['Unit'] = units[1]
rh_sensor.head()
#rh_sensor.to_csv('lala_land_path')

Unnamed: 0,Date Time,RH,Unit
0,02/03/17 04:00:00 PM,69.42,%
1,02/03/17 04:01:00 PM,69.296,%
2,02/03/17 04:02:00 PM,67.938,%
3,02/03/17 04:03:00 PM,68.361,%
4,02/03/17 04:04:00 PM,68.698,%


# Create the `intensity sensor` table. 

In [323]:
#Format the temperature sensor table
intensity_sensor = df.loc[:,['Date Time','Intensity']]
#Now add the units 
intensity_sensor['Unit'] = units[2]
intensity_sensor.head()
#intensity_sensor.to_csv('lala_land_path')

Unnamed: 0,Date Time,Intensity,Unit
0,02/03/17 04:00:00 PM,1.8,lum/ft²
1,02/03/17 04:01:00 PM,1.8,lum/ft²
2,02/03/17 04:02:00 PM,1.8,lum/ft²
3,02/03/17 04:03:00 PM,1.8,lum/ft²
4,02/03/17 04:04:00 PM,1.8,lum/ft²
