##  <span style="color:darkred">Data extraction and visualization for indoor climate assessment</span>
The below scripts are made by Henriette Steenhoff, s134869, to help visualize the sampling of different indoor climate measures for the Master Thesis of Anne Sloth Bidstrup, s112862 at The Technical University of Denmark (DTU), May 2017.

The purpose of this Notebook is to give the reader an idea of how the code works and how it can be run. The Notebook does not go into details on climate related matters as this is described in the related report and is now viewed as a part of the scope of the coding task. This Notebook is solely the documentation of the code and descriptions of decisions made regarding the collected data to produce the correct output.

The reader of this Notebook is assumed to have a basic understanding of python programming, regular expressions, patterns, trees, simple data structures and related terminology. 

*In order to run this code you need to have Python 2.7 installed along with jupyter notebooks, plotly and pandas. Furthermore, you need to have the data available and structured as described in the section Structure of Data Output.*

----

##  <span style="color:darkred">Information about the different data sources</span>
This section introduces the different data sources and their content. This is not a thorough description but rather an overview to make it easier to follow along as the code is documented below.

### <span style="color:darkred">The different data sources</span>
**Netatmo, ``.xlsx``**

Data extracted weekly from Netatmo for each location, each monitored room is represented by its own file.

* CO2
* heat 
* humidity 

Together with this information from Netatmo a file, the ``have``-file contains information about outdoor temperature for the given week.

**Bygweb - wireless, ``.db``**

Data from Access database fetched on a weekly basis of all locations that need to be extracted and put into seprate sheets.
* ``PIR`` and ``reed``- for living room/kitchen and bedroom
* ``Compas`` and ``Access`` - files generated but note used any further.
* *Several of the variables in the data will be discarded as they are not used*



### <span style="color:darkred">Different measures, which to use</span>
#### Wireless, ``PIR/Reed`` 
* **PIR (movement censor)** The exciting part, indicates someone at home if between ``close`` and ``open``
    * ``Timed out`` - no more movement, **rows will be discarded**
* **Reed (door)** Present/home values ``open/close``

One file is create for each location with all ``PIR/Reed`` values to get an overview of when there are someone at home.

#### Wireless, ``Compas/Acc`` 
* ONLY ``moved``
* Sorted by room
* *Number of entities*
* % of time one has been at home

This information is extracted from the Access databass but not processed any further.

#### Netatmo, ``CO2``
In order to look at the CO2-levels when people are at home, the PIR/Reed values will be merged with the CO2-values.

#### Netatmo, ``Humidity``
Calculating relative humidity will be based on the equations provided by Anne for boundaries red, yellow and green ocmbined with the outdoor temperature from the ``have``-file.

----
### Documentation
Some references for help on different coding solutions

* [``pandas`` documentation](http://pandaproject.net/docs/importing-access-files.html)
* [``plotly`` documentation](https://plot.ly/python/ipython-notebook-tutorial/)

#### To-do:
- Look into possibilities for making transparent background on plots to save $\checkmark$ - *given as argument in plot* [see here](http://stackoverflow.com/questions/29968152/python-setting-background-color-to-transparent-in-plotly-plots)
- Find target group in need of feedback $\checkmark$ - Anne has list
- Separate processed datasets from raw data (in folders) $\checkmark$ - done ultimo March
- Make system to distinguish between weeks and different users $\checkmark$ - *weekly folders, alias and room identifiers*
- Find solution for merging Netatmo CO2 data with Wireless "when-people-are-at-home-data" $\checkmark$ - finalized medio April

----

## <span style="color:darkred">Structure of data output</span>

This section will give a brief introduction on how to
 * name the different files to be read
 * structure the raw data in order to be loaded into the program
 * find the different plots and output files
 
*As the code is made as modular as possible, it is easy to change the structures of the different paths to fit new configurations.*
 
### I/O description
* All input files (the Access file) must be put in the root of the directory. 
* The netatmo data should be placed in the netatmo path, which is shown with an example in the Data Processing section.

### Folder structure
This information is used for organizing files into right folders in the working directory. The description is made based on the structure that was used during the feedback period.

![Tree structure](tree.png)

* For each week there will be a folder named '``Data week x``', where ``x`` denotes the week number according to the Gregorian calendar. In the figure above, the weekly folder is seen as the root. 
* In the weekly folder there are two subfolders: ``Netatmo``, ``Wireless`` and ``ProcessedData``.
* The ``Netatmo`` folder has subfolders ``CO2``, ``HR`` and ``Visualization``. ``Netatmo`` contains all the raw data files extracted from Netatmo. The raw files are restructured by the program and the relevant data for each location is extracted. The proccessed files are put in ``ProcessedData``.
  * ``Visualization`` contains bar charts, pie chart and plots for each week generated by the program for the given week. Each home is uniquely identified in each folder by it's ``ID`` i.e: ``Hexxx``.
  * ``CO2`` contains the calculated home/awa values for each observation. This is the basis used when generating the plots in ``Visualization``.
  * ``HR`` contains the calculated boundaries for each humidity value and related timestamp and temperature in Celcius.
* ``ProcessedData`` contains all data used in that week from both Netatmo and Wireless sorted into files for each room in the house, this is the data extracted by the program from the ``Wireless/Bygweb files`` added to the directory each week. The different homes can be identified by the aliases made by Anne and a room identifier $\{stue, værelse\}$ etc. 

Since the data from Netatmo and Wireless will be interdependent, there will be no separate folder for the processed Wireless/Netatmo data. 

*Some data (unused fields and observations) has been removed to improve performance.*

### <span style="color:darkred">Naming convention</span>
### DATA FILES
All the data can be on Dropbox (but will be sent to involved parties for review) 
* There will be as many processed netatmo files for a home as there are monitored rooms (with a maximum of 3).

| Data type  |  Generic naming standard | Example |
|---|---|---|
| <span style="color:darkred">**PIR/Reed**</span>  | '``alias``-**PIRReed**``.xlsx``'   | ``he117-PIRReed.xlsx`` |
| <span style="color:darkred">**AccCompas**</span> | '``alias``-**AccCompas**``_room.xlsx``' | ``he117-AccCompas_kitchen.xlsx`` |
|<span style="color:darkred">**Netatmo**</span>| '**netatmo**``-alias-room.xlsx``' | ``netatmo_He61 fStue.xlsx`` |
|  |  |  |
|<span style="color:darkred">**Full PIR/Reed**</span>|**weekMerge**``xx.xlsx``| ``weekMerge15.xlsx``|

### PLOTS
As described above, all the visualizations are found in folder ``Netatmo`` in subfolder ``Visualizations``. File names will start with the ``alias`` for the home and be followed by an indication of which output we have $\{temp, co2, AccCompas, HR\}$ and what room the measurements belong to: $\{livingroom, bedroom, entrance\}$.

| Output type | Generic naming standard | Example |
|----|----|----|
| <span style="color:darkred">**CO2 plot** </span>        | '``alias``-**co2**-``room``.png'       | ``He201-co2-**room**.png`` |
| <span style="color:darkred">**Temperature plot**</span> | '``alias``-**temp**-``room``.png'      | ``He9f-temp-Livingroom.png`` |
| <span style="color:darkred">**Fresh air** </span>        | '``alias``-``room``-**fa**.png' | ``He13 f Liv-fa.png`` |
| <span style="color:darkred">**Humidity rate** </span>   | '``alias``-**HR**-``room``.png'        | ``He41f-hr-livingroom.png`` |

----

## <span style="color:darkred">Explanation of how to run the code</span>

Depending on what data you need, you can run different parts of the code. This section will give a brief introduction on how to:
* access data from different weeks
* toggle between different weeks
* change path/data source
* weekly plots
  * generate humidity plots 
  * generate co2 plots
  * generate temperature graphs

*Some basic parts of the code will need to be executed regardless of what data you need. Unless this is done, you will not have any data to work on.*


#### External functions
The majority of functionality has been moved from the script into a separate file containing all function logic. This has been done in order to simplify and shorten this Notebook and in order for the user to read the code more easily.

All external functions can be found in the python file ``externalFunctions.py`` which is included in this repository.

----

**References **
* [Pandas reference](http://www.shanelynn.ie/select-pandas-dataframe-rows-and-columns-using-iloc-loc-and-ix/)


## <span style="color:darkred">Data Preprocessing</span>
* Importing needed Python libraries
* Adding global plotting color values
* Importing external functions

In [101]:
# Imports - getting relevant libraries
import re
import pandas as pd
from collections import Counter 
import numpy as np
from operator import itemgetter
from scipy import linalg
from os import *
from os.path import isfile, join
import time
from datetime import date, timedelta
import datetime
from pylab import *

import plotly
import plotly.plotly as py
from IPython.display import Image 
import plotly.graph_objs as go
# API access to plotting tools
plotly.tools.set_credentials_file(username='frksteenhoff2', api_key ='duu8hsfRmuI5rF2EU8o5')

# Setting the colors for the different plots, 
# such that all plots get the same colors
homeGreen = 'rgb(0, 204, 0)'         # bright green 
hedeBlue  = 'rgb(0, 0, 255)'         # blue
pieGreen  = homeGreen                # bright green
pieOrange = 'rgb(246, 214, 56)'      # orange
pieRed    = 'rgb(204, 0, 0)'         # red
ticksAxes = 'rgb(107, 107, 107)'     # axes color - grey
bgBorder  = 'rgba(255, 255, 255, 0)' # white

# Suppress warnings
pd.options.mode.chained_assignment = None  # default='warn'

# Reading in external functions
import externalFunctions as ex

### Extracting basic information to use when processing data

**Get currect week of year ($weekOfYear - 1$)**
* Create folder for weekly data
* Set <span style="color:red">**week variable**, ``weekNumber``, used throughout as data source identifier</span> 
* Setting names for different data source paths **based on week variable**

### For initial data preparation
<span style="color:red">Only use this for preparing data for one specific week</span>

In [469]:
# Get week of year
# Minus one to indicate work on last week's data
#weekNumber = date.today().isocalendar()[1]-1

# Adding new folder Week_'x' to directory 
#dir_name = "Week_"+str(weekNumber)
#if not path.exists(dir_name):
#    makedirs(dir_name)
#    print "A new folder named '%s' have been created for data from week %d." % (dir_name, weekNumber)
#else:
#    print "An existing folder named '%s' was used" % dir_name
    
# Folder structure for different data
weekNumber = 15
print 'Week number:', weekNumber

# Using Anne's folder structure from Dropbox
base_path = "C:/Users/frksteenhoff/Dropbox/Data eksempel til Henriette/"
# Data locations
netpath   = base_path + "Data week " +str(weekNumber)+ "/Netatmo"
weekpath  = base_path + "Data week " + str(weekNumber)
PIRpath   = base_path + "Data week " + str(weekNumber) + "/ProcessedData/PIRReed/"
COMpath   = base_path + "Data week " + str(weekNumber) + "/ProcessedData/CompAcc/"

# Change back known folder structure
testpath  = base_path + "Program - extractWork/"
viz_path  = base_path + "Data week " + str(weekNumber) + "/Netatmo/Visualization"

# For reference weeks
#netpath  = base_path + "Data Reference weeks/Netatmo"
#weekpath  = base_path + "Data Reference weeks"
#PIRpath   = base_path + "Data Reference weeks/ProcessedData/PIRReed/"
#COMpath   = base_path + "Data Reference weeks/ProcessedData/CompAcc/"

# Change back known folder structure
#testpath  = base_path + "Program - extractWork/"
#viz_path  = base_path + "Data Reference weeks/Netatmo/Visualization"

# Change current directory according to week
print getcwd()
chdir(weekpath)
print "\nCurrent directory:\n", weekpath

Week number: 15
C:\Users\frksteenhoff\Dropbox\Data eksempel til Henriette\Data Reference weeks\Netatmo

Current directory:
C:/Users/frksteenhoff/Dropbox/Data eksempel til Henriette/Data week 15


### Patterns
#### Taking care of different spellings and (lacking) naming conventions for the different rooms

*Can be optimised but is not a priority right now*

In [322]:
# LISTS OF DIFFERENT NAMES GIVEN FOR EACH ROOM.....
# Entrance
entre_pattern = '|'.join(['Entre PIR', 'EntrÃ© PIR', 'upstairs', 
                          'entrance/kitchen', 'hallway', 'PIR entrance',
                          'entrance','PIR gang', 'PIR stairs',
                          'Reed main enterance', 'reed main enterance',
                          'Reed main entrance', 'PIR Entre', 'Reed Main door',
                          'PIR trapper', 'Pir stairs', 'main entrance reed north',
                          'pir entrance', 'pir stairs', 'main enterance', 'Reed entrance', 'Main door Reed',
                          'entrance', 'entre', 'main entrance reed', 'Reed'])
# Living room
livingroom_pattern = '|'.join(['Stue PIR', 'PIR stue', 'PIR livingroom','PIR upstairs office/livingroom',
                               'PIR 1st floor living room', 'PIR uostairs living room', '1st', 'living',
                               'living room', 'livingroom', 'stue', 'Stue', 'K\xf8kken', 'Living', 'Kitchen',
                               'kitchen south', 'kitchen', 'kitchen South', 'livingroom/bedroom/kitchen', 'kken',
                              'roomkitchen','frste sal','rstesal'])
# Bedroom
bedroom_pattern = '|'.join(['Bedroom', 'bedroom', 'bed room', 'sove', 'Guest room','Guestroom','sovevrlse', 
                            'loorlivingbedroom','ogsovevrelse','frstesal', 'Sove', 'uptairs', 'Uptairs', "Downstairs"])

----

## <span style="color:darkred">Wireless work</span>

### Merge ``PIR`` and ``Reed``

In [470]:
# TABLE WITH ALL HOMES (for current week)
# Initialize dataframe
chdir(weekpath)
pir_reed_merge = pd.DataFrame()
# For special weeks (week 16)
#fileNames = ['PIR', 'Reed', 'PIR-2', 'Reed-2']
# For normal weeks
fileNames = ['PIR', 'Reed']

# Read in files for PIR and Reed and merge to one 
for name in fileNames:
    data = pd.read_excel(name+".xlsx", name)
    pir_reed_merge = pir_reed_merge.append(data)

#### Remove all rows where code values $\in \{tempUpdate, TimeOut, ReEstablishedLink, LostLink\}$
Agreed upon with Anne -- data is not useful for the ouput.

In [471]:
# Remove unwanted temperature measuresold_obs = len(pir_reed_merge)
old_obs = len(pir_reed_merge)
pir_reed_merge = pir_reed_merge.loc[pir_reed_merge['code'].isin(['movement','open','closed'])]

# Remove unnecessary features (duration, lastContact, threshold, batVoltage, ID, time)
pir_reed_merge = pir_reed_merge.drop(['duration','lastContact','threshold','battVoltage','ID','time_', 'rh', 'temp'], axis=1)

new_obs = len(pir_reed_merge)  

# Print number of rows in all and relvant rows
print "Number of observations (temperature included):", old_obs
print "Number of observations (temperature excluded):", new_obs

Number of observations (temperature included): 14132
Number of observations (temperature excluded): 13865


### Saving Wireless data to files

The different naming conventions indicating the rooms in the homes are handled by the different patterns in the preprocessing section.

Be advised! Due to lacking consistency in naming convention **some devices may not be included here**. Please make sure that running the code in section *Merging files containing same type of information* include the same room names as the ones  commented out below. -- otherwise contact Henriette.

#### Create and save individual files for each home
Containing merged PIR/Reed samples sorted by timestamp. All files saved more than once are due to multiple monitored rooms in the house.

In [472]:
# Clock time spent on execution
start_time = time.time()

# If directory path does not exist - create it
if not path.exists(PIRpath):
    makedirs(PIRpath)
chdir(PIRpath)
    
# Create one file with all PIR/Reed values per home
for alias in pir_reed_merge['bolig'].unique():
    # Get data for specific home
    current_home = (pir_reed_merge.loc[pir_reed_merge['bolig'].isin([alias])]).sort_values(by=['submitDate','submitTime'])
    
    # Save merge file 
    writer = pd.ExcelWriter(alias.lower() + ' PIRReed.xlsx', engine='xlsxwriter')
    print "Current home", alias
    current_home.to_excel(writer)
    writer.save() 

print("\n--- Execution time: %s seconds ---" % (time.time() - start_time))

Current home He87
Current home Aalbrovej21
Current home He145
Current home He183
Current home he109
Current home he141
Current home He211
Current home he9
Current home He93
Current home He27
Current home Ho102
Current home He213
Current home he187
Current home he213
Current home He171
Current home he117
Current home He104
Current home He111
Current home Klakkebjerg-Tasnim
Current home He197
Current home ho72
Current home He41
Current home He61
Current home he61
Current home He69
Current home He13
Current home He107
Current home He175
Current home he143
Current home Ho70
Current home Ho46
Current home He59
Current home He221
Current home He115
Current home Ho92
Current home He99
Current home Ho48
Current home ho48
Current home he103
Current home ho24
Current home He131
Current home Ho20
Current home He25
Current home he57
Current home He35
Current home He109
Current home he87
Current home Aalbrovej21He59
Current home He103
Current home He57

--- Execution time: 5.80900001526 seconds ---

### Find files PIR/Reed 
Use for calculating time home/CO2 level

In [473]:
pirReed_files = []
# Change directory to folder of processed PIR/Reed files
chdir(PIRpath)

# Find all processed PIR/Reed files
pirReed_files += [file for file in listdir('.') if not file.startswith('netatmo') and not file.endswith('.png')]

#print "Files: ", pirReed_files
print "\nFiles in all: ", len(pirReed_files)



Files in all:  51


### Save Acc/Compas information to file

In [452]:
# Clock time spent on execution
start_time = time.time()

# Initialize dataframe
acc_compas_merge = pd.DataFrame()
fileNames        = ['Compas', 'Acc']

# If directory path does not exist - create it
if not path.exists(COMpath):
    makedirs(COMpath)
chdir(weekpath)

# Read in files for Compas / Acc - merge to one 
for name in fileNames:
    # Only read in needed columns
    data = pd.read_excel(name+".xlsx", name)# usecols=['tagName','tagID','code','bolig','temp','rh','submitDate','submitTime'])
    acc_compas_merge = acc_compas_merge.append(data)

# Count initial number of observations
#old_obs = len(acc_compas_merge)
#print old_obs

# Keep only code value 'moved'
acc_compas = acc_compas_merge.loc[acc_compas_merge['code'].isin(['moved'])]
# Removed unused columns (for a smaller file size to save)
acc_compas = acc_compas.drop(['temp','lastContact','threshold','battVoltage','ID','rh'], axis=1)

print "Number of observations (temperature included):", old_obs
print "Number of observations (temperature excluded):", len(acc_compas), "\n"

# Sort values from each room into separate files
chdir(COMpath)
for alias in list(set(acc_compas['bolig'])):
    # Get data for specific home
    current_home = (acc_compas.loc[acc_compas['bolig'].isin([alias])]).sort_values(by=['submitDate','submitTime'])
    # save bedroom info to file     #wayprettiercode (dir_, alias, string, df, dfcol, pattern)
    ex.saveDataToFile(alias.lower(), "-AccCompas_bedroom.xlsx", current_home, 'tagName', bedroom_pattern)
    # Save kitchen info to file
    ex.saveDataToFile(alias.lower(), "-AccCompas_livKitchen.xlsx", current_home, 'tagName', livingroom_pattern)

print("\n--- Execution time: %s seconds ---" % (time.time() - start_time))

Number of observations (temperature included): 100408
Number of observations (temperature excluded): 25253 

Saving values for 'ho46', in all: 21
Saving values for 'ho46', in all: 60
Saving values for 'he145', in all: 70
Saving values for 'he145', in all: 177
Saving values for 'he41', in all: 381
Saving values for 'he41', in all: 487
Saving values for 'he183', in all: 8
Saving values for 'he183', in all: 62
Saving values for 'he213', in all: 26
Saving values for 'he213', in all: 1129
Saving values for 'ho66', in all: 31
Saving values for 'ho66', in all: 42
Saving values for 'he213', in all: 27
Saving values for 'he213', in all: 238
Saving values for 'he211', in all: 167
Saving values for 'ho48', in all: 315
Saving values for 'he103', in all: 570
Saving values for 'he107', in all: 42
Saving values for 'he107', in all: 165
Saving values for 'ho24', in all: 4
Saving values for 'ho24', in all: 199
Saving values for 'he113', in all: 33
Saving values for 'he113', in all: 126
Saving values fo

----
## <span style="color:darkred">Netatmo arbejde</span> 

### Extracting hourly data from ``have_fil``
The ``have_fil`` file contains out door temperatures for the same week as the Wireless data. It is used to calculate the relative humidity indoor.

Since the Wireless and Netatmo data cannot be mapped $1:1$, the out door temperature is calculated on an hourly basis for each hour of each day.

#### Handling missing data - ``have_fil``
Missing both hours of a single day and entire days is one of the realities of the data.

The following algortihm was made to work around the problem.
* **To work around missing days** -- all days will be generated from start date and 7 days ahead, assuming that the first days is always in the data.
* **If missing hour** - the median of the hours for the remaining days is inserted. 


In [328]:
# Average temp per hour of day for week 
def haveCalculation(dataFrame):
    hour_cnt = {}
    daysInWeek = []
    # Extract all days in week from 1st day and seven days forward
    # due to missing data this has to be stated explicitly
    for days in pd.date_range(dataFrame['Timezone : Europe/Copenhagen'].unique().min(), periods=7):
        daysInWeek.append(days.day)
        
    # Extract temp for each hour of each day
    for day in daysInWeek:
        for hour in range(0,24):
            # Find relevant hour of day
            dayCombo   = str(day)+"-"+str(hour)
            hourlyTemp = dataFrame.loc[dataFrame['con'].isin([dayCombo])]
            
            # Summing the temperatures
            # If no entries for given hour - 
            # impute with median for remaining days at same hour
            if len(hourlyTemp) == 0:
                dailyTemp = dataFrame.loc[dataFrame['Hour'].isin([hour])]
                hour_cnt[dayCombo] = dailyTemp['Temperature'].median()
            else:
                # Else add average of hour for specific day
                hour_cnt[dayCombo] = hourlyTemp['Temperature'].sum(axis=0)/(len(hourlyTemp))
    return hour_cnt


### Create list of all Netatmo files
All room data and have fil (outdoor temperature) separately

In [474]:
# Find all files in folder
# Keep all files with extension .xls (the Netatmo files)
netatmo_files = []

chdir(netpath)
netatmo_files += [file for file in listdir('.') if not file.startswith('have') and file.endswith('.xls')]
# Print list of files
print "Data path", getcwd(), "\n"
#print "Files:\n", netatmo_files
print "\nNumber of files: %d" % len(netatmo_files)

# Get have fil - REMOVE LAST COLUMN BEFORE RUNNING
have_ = [file for file in listdir('.') if file.startswith('have')]
# Read netatmo outdoor temp
have_fil = pd.read_excel(have_[0],sheetname='Worksheet')
print "\nOut door temperatures from", have_[0], "read"

# Clean-up: remove header and first 2 rows
have_fil.columns = have_fil.iloc[1,:]
have_fil = have_fil.drop(have_fil.index[[0,1]])

# Add hour label 0-23, vector t_u
have_fil['Hour'] = pd.to_datetime(have_fil['Timezone : Europe/Copenhagen']).dt.hour
have_fil['Day']  = pd.to_datetime(have_fil['Timezone : Europe/Copenhagen']).dt.day
# Make individual date/hour IDs
have_fil["con"] = have_fil.Day.astype(str).str.cat(have_fil.Hour.astype(str), sep="-")

# Converting temperature from Celcius to Kelvin
hour_cnt_have    = haveCalculation(have_fil)



Data path C:\Users\frksteenhoff\Dropbox\Data eksempel til Henriette\Data week 15\Netatmo 


Number of files: 88

Out door temperatures from have_17_4_2017.xls read


### Saving netatmo files for CO2-work
Adding alias in file name in order to fetch file for CO2-work

Time and date formats saved correct after this step

In [454]:
# Read in netatmo data from the netatmo folder, 
# saves data in the PIR/Reed folder for easy localization 
# Use for CO2 calculations
for room_file in netatmo_files:
    ## If directory path does not exist - create it
    #if not path.exists(PIRpath):
    #    makedirs(PIRpath)
    chdir(netpath)
    # Read in file

    netatmo_data = pd.read_excel(room_file) 
    # Get name for output file
    location_name = netatmo_data.iloc[0,0]
    # Get room for output file
    room_name = netatmo_data.iloc[0,3]
    print "Reading file", location_name, room_name

        # Change range of data and give new column names
    netatmo_data.columns = netatmo_data.iloc[1,:]
    netatmo_data = netatmo_data.drop(netatmo_data.index[[0,1]])
    
    # Make Datetime value from Timestamp
    netatmo_data.iloc[:,1] = pd.to_datetime(netatmo_data.iloc[:,1])
    # Extract and add hour and day indicators
    netatmo_data['Hour'] = netatmo_data.iloc[:,1].dt.hour
    netatmo_data['Time'] = netatmo_data.iloc[:,1].dt.time
    netatmo_data['Date'] = netatmo_data.iloc[:,1].dt.date
    netatmo_data['Day']  = netatmo_data.iloc[:,1].dt.day
    # Make individual date/hour IDs
    netatmo_data['con']  = netatmo_data.Day.astype(str).str.cat(netatmo_data.Hour.astype(str), sep="-")

    # Converting temperatures from Celcius to Kelvin
    netatmo_data['Kelvin'] = netatmo_data['Temperature'] + 273.15
    
    # Save netatmo data for use with co2
    name = "netatmo_" + location_name + " " + room_name.encode("ascii", "ignore").replace("/", "")

    # Do not include timestamp and timezone 
    ex.saveDataframeToPath(netatmo_data, name, PIRpath)

print "\nAll files processed!"

Reading file He139 f 1st floor living room
Reading file Ho46 1st floor living/bedroom
Reading file He211 f Bedroom downstairs
Reading file Ho104 Bedroom
Reading file Ho92 Bedroom
Reading file He147 Bedroom
Reading file Ho102 Bedroom
Reading file He175 Bedroom
Reading file HE201 Bedroom
Reading file He145 Bedroom
Reading file He25 Bedroom
Reading file He131 Bedroom
Reading file He57 Bedroom
Reading file HO70 Bedroom
Reading file He35 Bedroom
Reading file He109 Bedroom
Reading file He103 Bedroom
Reading file He213 bedroom
Reading file He197 f bedroom
Reading file He131 Bedroom
Reading file He87 f Bedroom
Reading file He57 Bedroom
Reading file He117 f Bedroom
Reading file He143 f Bedroom
Reading file HO70 Bedroom
Reading file Ho66 f Bedroom
Reading file He35 Bedroom
Reading file He139 f Bedroom
Reading file He9 f Bedroom
Reading file He107 f bedroom
Reading file Ho24 f Bedroom
Reading file He109 Bedroom
Reading file He99 f Bedroom
Reading file He103 Bedroom
Reading file Ho92 Bedroom
Readi

In [455]:
# Get only alias!
str(location_name + room_name).split('f')[0].strip(" ")

'HE141'

### Read Netatmo files
#### Extract overall temperature for all homes - per room file
Used in the plots of the temperature during the week -- an average added to all locations' plot of indoor temperatures.

In [475]:
# make sure we are in the right folder
chdir(netpath)

# files for room data
all_time_liv = pd.DataFrame()
all_time_bed = pd.DataFrame()

livingroom_pattern = (re.compile(livingroom_pattern))
bed_pattern = (re.compile(bedroom_pattern))

# Merge all files for given room (living room (+ kitchen/upstairs), bedroom)
for room_file in netatmo_files:
    if livingroom_pattern.match(room_file):
        # read file
        liv_data = pd.read_excel(room_file) 
        # Change range of data and give new column names
        liv_data.columns = liv_data.iloc[1,:]
        liv_data = liv_data.drop(liv_data.index[[0,1]])
        
        # Extract only needed columns CO2, temperature and date/time values
        temp = liv_data[liv_data.columns[:5]]
        temp.columns = ['Timestamp', 'Timezone', 'Temperature', 'Humidity', 'CO2']
        
        # Add content to all time data file
        all_time_liv = all_time_liv.append([temp])
        
    elif bed_pattern.match(room_file):
        # read file
        bed_data = pd.read_excel(room_file) 
        # Change range of data and give new column names
        bed_data.columns = bed_data.iloc[1,:]
        bed_data = bed_data.drop(bed_data.index[[0,1]])

        # Extract only needed columns CO2, temperature and date/time values
        temp = liv_data[liv_data.columns[:5]]
        temp.columns = ['Timestamp', 'Timezone', 'Temperature', 'Humidity', 'CO2']
        
        # Add content to all time data file
        all_time_bed = all_time_bed.append([temp])
        
    else: 
        # It is okay that "have"-file is missing!
         print "Error! File: %s missing room classification!" % room_file

            
# Save all time temperatures
# Livingroom/kitchen temp
#ex.saveDataframeToPath(all_time_liv, 'LivingroomKitchen_tempAll_'+str(weekNumber), base_path)
#ex.saveDataframeToPath(all_time_bed, 'Bedroom_tempAll_'+str(weekNumber), base_path)
print "All time average temperatures calculated!"

All time average temperatures calculated!


### Calculate temperature per hour

In [476]:
# For households in list of households, plot temperature agains mean temperature
uniq_hour_liv = {}
uniq_hour_bed = {}
uniq_hour_liv = ex.createMeanTempForRoom(all_time_liv)
uniq_hour_bed = ex.createMeanTempForRoom(all_time_bed)
print "All mean temperatures per hour created"

All mean temperatures per hour created


### Find min and max temperature values 
For plotting all temperatures within equal ranges

In [477]:
# Calculate all time high and low from all temperatures
all_time_high = 0
all_time_low  = 0

# Find all time max value
if (max(uniq_hour_bed['Temperature']) > max(uniq_hour_liv['Temperature'])):
    all_time_high = max(uniq_hour_bed['Temperature'])+6
else:
    all_time_high = max(uniq_hour_liv['Temperature'])+6
               
# Find all time min value
if (min(uniq_hour_bed['Temperature']) < min(uniq_hour_liv['Temperature'])):
    all_time_low = min(uniq_hour_bed['Temperature'])-6
else:
    all_time_low = min(uniq_hour_liv['Temperature'])-6
                     
print "Min temp:", round(all_time_low, 2)+6
print "Max temp:", round(all_time_high, 2)-6

Min temp: 18.7
Max temp: 24.4


#### Check that all weekly temperatures are calculated

In [478]:
# Check that there are in fact 168 values (24x7)
168 == len(uniq_hour_bed)


True

#### <span style="color:darkred">Equations used for calculating boundaries</span>
The equations are assumed correct and will not be further discussed only presented as these are some of the basis for later work.
\begin{equation}rh_{red}   = 0.6*\frac{p_{mv}}{p_{mi}}\end{equation}
\begin{equation}rh_{gul}   = 0.75*\frac{p_{mv}}{p_{mi}}\end{equation}

\begin{equation}p_{mv} = \frac{e^{77.3450+0.0057*t_v}-\frac{7235}{t_v}}{t_{v}^{8.2}}\end{equation}
\begin{equation}p_{mi} = \frac{e^{77.3450+0.0057*T_i}-\frac{7235}{t_v}}{T_{i}^{8.2}}\end{equation}

$t_v = \frac{1}{3}t_{u} + \frac{2}{3}t_i$

where $t_i$ is the temperature from Netatmo<br></br>
and $t_u$ is the outdoor temperature taken from a weather station also provided by Netatmo.

These calculations are made using all data provided for each household for the current week. All temperatures are measured in Kelvin.

### Create figures (pie chart + temperatures) for all netatmo data

In [337]:
from fractions import Fraction
def createHumidityPlot(dataFrame, hour_cnt_netatmo, hour_cnt_have, room_n, location_n, col1, col2, col3, livpat, bedpat):
    # initialize arrays for limit values rh_gul and rh_roed
    # If directory path does not exist - create it
    livingroom_pattern = re.compile(livpat)
    bedroom_pattern = re.compile(bedpat)
    if not path.exists(viz_path):
        makedirs(viz_path)
    chdir(viz_path)
    
    room     = ""
    rh_gul   = []
    rh_roed  = []
    pmv_list = []
    pmi_list = []
    hr_data  = []
    rh_boundaries = pd.DataFrame()
    
    # Calculations for humidity equation
    # Constants
    for i in range(0,24):
        t_i   = np.asarray(hour_cnt_netatmo.values()) + 273.15  # converted to Kelvin
        t_ude = np.asarray(hour_cnt_have.values())    + 273.15  # converted to Kelvin
        t_v   = np.add(Fraction(1,3)*t_ude[i], Fraction(2,3)*t_i[i])

        # Limit equations
        p_mv  = math.exp(77.3450 + 0.0057*t_v     - 7235.0 / t_v)   / (t_v**8.2)
        p_mi  = math.exp(77.3450 + 0.0057*t_i[i] - 7235.0 / t_i[i]) / (t_i[i]**8.2)
        
        # Equation for upper and lower bound
        rh_gul.append(0.6 * p_mv / p_mi)
        rh_roed.append(0.75 * p_mv / p_mi)
        pmi_list.append(p_mi)
        pmv_list.append(p_mv)

    print "Boundaries calculated"
    # Humidity
    # Only for rooms with humidity measure
    hr_data = dataFrame[['Humidity', 'Kelvin', 'Hour', 'Temperature','Date','Time']] # Using Kelvin temperatures
    hr_data['rh_gul'] = np.zeros(len(hr_data))
    hr_data['rh_roed'] = np.zeros(len(hr_data))
    # Initialize rh-dict for value groups
    rh_dict = dict.fromkeys(['middleValue', 'lowValue', 'highValue'], 0)
    
    # Check netatmo data against humidity boundaries for each hour of day
    for i in range(0, len(hr_data['Humidity'])):
        # Only for testing - see values of red yellow and humidity
        #print hr_data.iloc[i,0], rh_gul[hr_data.iloc[i,2]] * 100, rh_roed[hr_data.iloc[i,2]] * 100
        # if humidity value are between rh_gul and rh_roed at given hour
        if hr_data.iloc[i,0] > rh_gul[hr_data.iloc[i,2]] * 100 and hr_data.iloc[i,0] < rh_roed[hr_data.iloc[i,2]] * 100:
            rh_dict['middleValue'] += 1
        # If humidity is less than rh_gul
        elif hr_data.iloc[i,0] < rh_gul[hr_data.iloc[i,2]] * 100:
            rh_dict['lowValue'] += 1
        # If humidity is greater than rh_roed 
        elif hr_data.iloc[i,0] > rh_roed[hr_data.iloc[i,2]] * 100:
            rh_dict['highValue'] += 1
        else:
            print 'Something fails'
        # For each value, set the  limits...
        hr_data.iloc[i,6] = rh_gul[hr_data.iloc[i,2]]
        hr_data.iloc[i,7] = rh_roed[hr_data.iloc[i,2]]
    
    room     = room_n.encode("ascii", "ignore").replace("/", "")
    print room
    room_id = ""
    if livingroom_pattern.match(room):
        room_id = "livingroom"
    elif bedroom_pattern.match(room):
        room_id = "bedroom"
    else:
        room_id = "other"

    #Save HR data for room to file - bedroom/livingroom as extension to match room in house
    ex.saveDataframeToPath(hr_data[['Date', 'Time', 'Humidity', 'Temperature', 'rh_gul', 'rh_roed']], location_n + '-RH-' + room_id, netpath+"/HR")
    # Save boundaries and temperatures to file
    #rh_boundaries['t_i']     = t_i  
    #rh_boundaries['t_ude']   = t_ude
    #rh_boundaries['rh_gul']  = rh_gul
    #rh_boundaries['rh_roed'] = rh_roed
    #rh_boundaries['p_mi']    = pmi_list
    #rh_boundaries['p_mv']    = pmv_list
    #ex.saveDataframeToPath(rh_boundaries, location_n + '-RH-boundaries', netpath+"/HR")
    #print "Humidity file saved for " + room_id
    #chdir(viz_path)

    # Save plot to proper location
    #plotType = '-RH-'           # type: relative humidity
    #room     = room_n.encode("ascii", "ignore").replace("/", "")
    #filen    = location_n + plotType + room + ".png"
    #filen    = filen.replace(" ", "")

    # Plot over fresh air 
#    fig = {
#        'data': [{'labels': ['Under anbefaling', 'Indenfor anbefaling', 'Over anbefaling'],
#                  'values': [rh_dict['lowValue'],
#                             rh_dict['middleValue'],
#                             rh_dict['highValue']],
#                  'type': 'pie', 
#                  'marker': {'colors': [col1,
#                                        col2,
#                                        col3]},
#                  'textinfo': 'none'}],
#        'layout': { 'autosize': False,
#                    'width': 350,
#                    'height': 350,
#                    "paper_bgcolor": "rgba(0, 0, 0, 0)",
#                    "plot_bgcolor": "rgba(0, 0, 0, 0)",
#                    'showlegend': False}
#         }
#    print rh_dict['lowValue'], rh_dict['middleValue'], rh_dict['highValue'] ,  len(hr_data)
#
#    # Save to folder
#    py.image.save_as(fig, filename=filen)
#    print "Plot created " + filen
    # Plot result
    #Image(fullPathToPlot) # Display a static image
    #py.iplot(fig)

In [254]:
# Plot temperatures for each room against overall temp
# Encoding documentation in Python, https://docs.python.org/2/howto/unicode.html
def createTempPlot(dataFrame, tempLiv, tempBed, location_n, room_n, pattern, tAxes, bgBorder, rangeMin, rangeMax):
    #Change back to correct working directory
    # If directory path does not exist - create it
    if not path.exists(viz_path):
        makedirs(viz_path)
    chdir(viz_path)
    
    # Save plot to proper location
    plotType = '-temp' # type: temperature
        
    # Plotting weekly temperature overview - single home
    trace1 = go.Scatter(
              x = pd.to_datetime(dataFrame.iloc[:,1]),
              y = list(dataFrame.Temperature),
        name = 'Din bolig', # Style name/legend entry with html tags
        connectgaps=False
    )

    # Choose whether to use living room/kitchen temp or bedroom temp - overall
    print room_n
    new_pat = re.compile(pattern)
    if new_pat.match(room_n):
        trace2 = go.Scatter(
              x = pd.to_datetime(tempLiv.Timezone),
              y = list(tempLiv.newTemp),
        name = 'Hedelyngen',
        connectgaps = False
        )
    else:
        trace2 = go.Scatter(
              x = pd.to_datetime(tempBed.Timezone),
              y = list(tempBed.newTemp),
        name = 'Hedelyngen',
        connectgaps = False
    )
        
    data = [trace1, trace2]

    # Setting layout details for plot
    layout = go.Layout(
        autosize = False,
        width = 600,
        height = 350,
        paper_bgcolor = 'rgba(0, 0, 0, 0)',
        plot_bgcolor = "rgba(0, 0, 0, 0)",
        showlegend = False,
        xaxis=dict(
            tickfont=dict(
                size=14,
                color=tAxes
            )
        ),

        yaxis=dict(
            range=[rangeMin,rangeMax],
            zeroline=True,
            titlefont=dict(
                size=16,
                color=tAxes
            ),
            tickfont=dict(
                size=16,
                color=tAxes
            )
        ),

        legend=dict(
            x=0,
            y=1.0,
            bgcolor=bgBorder,
            bordercolor=bgBorder
        )
    )

    # Give name for plot to be saved
    room = room_n.encode("ascii", "ignore").replace("/", "")
    filen = location_n + plotType + "-" + room + ".png"
    filen = filen.replace(" ", "")

    # Create and save figure
    fig = go.Figure(data=data, layout=layout)
    py.image.save_as(fig, filename=filen)
    #Image(fullPathToPlot) # Display a static image


In [489]:
# Clock time spent on execution
start_time = time.time()

chdir(PIRpath)
netatmo_process = []
netatmo_process += [file for file in listdir('.') if file.startswith('netatmo_')]

# For each netatmo file, except the "have" file, create humidity and temperature plot
#chdir(test)
for room_file in netatmo_process:
    # Read in file
    netatmo_data = pd.read_excel(room_file) 
    
    # Get name for output file
    fst = room_file.split("_")[1]
    location_name = fst.split(" ")[0]
    # Get room for output file
    rm = room_file.split(" ")
    
    # Getting the right part of name as room name
    if len(rm) > 2:
        room_name = rm[len(rm)-2] + rm[len(rm)-1][:-5]
    else:
        room_name = rm[len(rm)-1][:-5]
    # Remove leading "f"
    if room_name.startswith("f"):
        room_name = room_name[1:]
    print "Reading file", location_name, room_name
    
    # Ok using a subset of values for day and hour
    hour_cnt = ex.calculateHourlyTemp(netatmo_data, 'con', 'Temperature')
    
    # Generate plots
    # createTempPlot(dataFrame, tempLiv, tempBed, dir_, location_n, room_n, tAxes, bgBorder):
    #createTempPlot(netatmo_data, uniq_hour_liv, uniq_hour_bed, location_name, room_name, livingroom_pattern, ticksAxes, bgBorder, all_time_low, all_time_high)
    # createHumidityPlot(dataFrame, room_n, location_n):
    createHumidityPlot(netatmo_data, hour_cnt, hour_cnt_have, room_name, location_name, pieGreen, pieOrange, pieRed, livingroom_pattern, bedroom_pattern)
    chdir(PIRpath)
print("\n--- Execution time: %s seconds ---" % (time.time() - start_time))

Reading file He147 Bedroom
Boundaries calculated
Bedroom
Reading file He147 Kitchenlivingroom
Boundaries calculated
Kitchenlivingroom

--- Execution time: 4.80900001526 seconds ---


----
## Anne - further work
* Calculate number of times door has been opened per room ``while`` movement indoor
* Movement per week day
* Heat consumption
* Accesing the Netatmo API for data instead of manually downloading the needed files 

### <span style="color:darkred">Accessing Netatmo from their API</span>
* [API description on GitHub](https://github.com/philippelt/netatmo-api-python)

In [116]:
import lnetatmo