Explore and analyse your Home Assistant data
Clone or download
robmarkcole Update .travis.yml
Drop 3.5 from CI
Latest commit a2f0922 Dec 28, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.github adds release-drafter Dec 4, 2018
detective Format with black Dec 28, 2018
docs/images Tidy Jun 5, 2018
notebooks Adds export_to_csv Dec 28, 2018
tests Use index to query available entities Dec 25, 2018
util Tidy Jun 5, 2018
.gitignore Remove ensure_list Dec 8, 2018
.travis.yml Update .travis.yml Dec 28, 2018
LICENSE Initial commit Jan 23, 2018
README.md Release 0.9 Dec 21, 2018
requirements_test.txt Update requirements_test.txt Dec 21, 2018
setup.py Bump version to 1.0 Dec 25, 2018

README.md

PyPI Version build status Binder Code style: black

Introduction

The HASS-data-detective package, which we may also refer to as 'detective' or 'data-detective', provides classes and functions to help you explore and analyse the data in your Home Assistant database. If you are using Hassio, it will automatically discover your sqlite database and by default collect information about the entities in your database. The recommended workflow is to then load the database content into a Pandas dataframe using the fetch_all_data method. This is recommended as all of the work of formatting the data for analysis is done up front, but it could take a couple of minutes. However if you have a very large database and cannot load it into a Pandas dataframe due to memory limits, you will have to adopt a different workflow where you query and process only the data you are interested in. Usage of detective.ipynb shows examples of using the detective with both of these workflows.

Note that not all python packages can be installed on Hassio yet - scipy is in this category. Notable packages which have scipy as a dependency include Seaborn.

Try out detective online

You can try out detective online without installing anything. If you click on the 'launch binder' button above, detective will be started in a Docker container online using the Binderhub service. Run the Usage of detective notebook to explore detective, and you can also use the Upload button to upload your own home-assistant_v2.db database file for analysis. Note that all data is deleted when the container closes down, so this service is just for trying out detective.

Installation on you machine

You can either: pip install HASS-data-detective for the latest released version from pypi, or pip install git+https://github.com/robmarkcole/HASS-data-detective.git --upgrade for the bleeding edge version from github. Alternatively if you wish to contribute to the development of detective, clone this repository and install in editable mode with pip install -e .

Initialise HassDatabase

Detective first needs to know the location of your database in order to initialise the HassDatabase object which handles communication with your database. If you are using the default sqlite database and have the home-assistant_v2.db file locally just supply the path:

from detective.core import HassDatabase

db = detective.HassDatabase('sqlite:////' + 'path_to/home-assistant_v2.db')

If you are running a database server for Home Assistant (e.g. mysql) you need to initialise the HassDatabase directly with the correct connection string, for example:

db = HassDatabase("mysql://scott:tiger@localhost/test")

Alternatively if you are using detective on Hassio, there are two possible ways to initialise the HassDatabase. The easiest is with db_from_hass_config. This will initialise a HassDatabase based on the the information found in your Home Assistant config folder, which it will automatically discover:

from detective.core import db_from_hass_config

db = db_from_hass_config() # Auto detect

Alternatively it's possible to pass the path in:

db = db_from_hass_config("/home/homeassistant/config") # Pass in path to config

Initialisation of HassDatabase accepts keyword arguments to influence how the object is initialised:

Argument Description
fetch_entities Boolean to indicate if we should fetch the entities when constructing the database. If not, you will have to call db.fetch_entities() at a later stage before being able to use self.entities and self.domains.

By default with fetch_entities=True, on initialisation HassDatabase will query the database and list the available domains and their entities in its domains and entities attributes:

db.domains
['persistent_notification',
 'remote',
 'script',
 'camera',
 'group',
 'light',
 'zone',
 'alarm_control_panel',
 'switch',
 'automation',
 'media_player',
 'device_tracker',
 'binary_sensor',
 'sensor',
 'input_select',
 'updater',
 'sun']

The attribute entities is a dictionary accessed via a domain name:

db.entities['binary_sensor']
['binary_sensor.motion_at_home',
 'binary_sensor.living_room_motion_sensor',
 'binary_sensor.in_bed_bayesian',
 'binary_sensor.hall_motion_sensor',
 'binary_sensor.bedroom_motion_sensor',
 'binary_sensor.blink_armed_status',
 'binary_sensor.blink_blink_camera_percy_motion_enabled',
 'binary_sensor.workday_sensor',
 'binary_sensor.living_room_nest_protect_online',
 'binary_sensor.bayesianbinary']

Simple query

Note that at this point we still haven't downloaded any actual data. Lets query a single sensor using SQL and demonstrate the data formatting steps performed by detective, in order to convery raw data into a format suitable for plotting and analysing:

query = text(
    """
    SELECT state, last_changed
    FROM states
    WHERE entity_id in ('sensor.hall_light_sensor')
    AND NOT state='unknown'
    """
    )
response = db.perform_query(query)

df = pd.DataFrame(response.fetchall()) # Convert the response to a dataframe

df.columns = ['state', 'last_changed'] # Set the columns

df = df.set_index('last_changed') # Set the index on datetime

df.index = pd.to_datetime(df.index) # Convert string to datetime

df = df.mask(df.eq('None')).dropna().astype(float) #  Convert state strings to floats for plotting

We can then plot the data:

df.plot(figsize=(16, 6));

png

Fetch all raw data

Use fetch_all_data to cache all your raw database data into a Pandas dataframe in memory. It is useful to keep this raw data in case you mess up your processed data and don't want to go through the process of fetching the raw data all over again.

%%time
db.fetch_all_data()
Querying the database, this could take a while
master_df created successfully.
CPU times: user 11.7 s, sys: 12.8 s, total: 24.4 s
Wall time: 1min 1s

We now have the raw data in a Pandas dataframe on the master_df attribute. We must use another class to process this data into a format suitable for plotting and processing. There are separate classes for numerical and binary sensors, which allows them to both implement a plot method correctly.

NumericalSensors class

The NumericalSensors class is for formatting numerical data. Create a dataframe with formatted numerical data like so:

sensors_num_df = detective.NumericalSensors(db.master_df)

We can access the list of sensor entities:

sensors_num_df.entities[0:10]
['sensor.next_train_to_wat',
 'sensor.next_bus_to_new_malden',
 'sensor.darksky_sensor_temperature',
 'sensor.darksky_sensor_precip_probability',
 'sensor.iphone_battery_level',
 'sensor.robins_iphone_battery_level',
 'sensor.blink_blink_camera_percy_temperature',
 'sensor.blink_blink_camera_percy_notifications',
 'sensor.next_train_in',
 'sensor.home_to_waterloo']

Now lets look at the Pandas dataframe which is on the data attribute:

sensors_num_df.data.head()
entity sensor.average_indoor_temp sensor.bedroom_light_sensor sensor.bedroom_temperature sensor.blink_blink_camera_percy_notifications sensor.blink_blink_camera_percy_temperature sensor.bme680air_qual sensor.bme680humidity sensor.bme680pressure sensor.bme680temperature sensor.breaches_fredallcardgmailcom ... sensor.next_train_to_wat sensor.next_train_to_wim sensor.remote_living_room_button sensor.robins_iphone_battery_level sensor.speedtest_download sensor.volume_used_volume_1 sensor.wipy_humidity sensor.wipy_memory sensor.wipy_temperature sensor.work_to_home
last_changed
2017-10-28 06:48:00.143377 20.2 15621.0 18.89 1.0 21.0 98.51 43.58 1033.93 21.07 0.0 ... 1125.0 87.0 1002.0 94.0 36.37 20.7 14.0 38112.0 32.0 25.0
2017-10-28 06:48:01.060922 20.2 15621.0 18.89 1.0 21.0 98.51 43.50 1033.93 21.07 0.0 ... 1125.0 87.0 1002.0 94.0 36.37 20.7 14.0 38112.0 32.0 25.0
2017-10-28 06:48:01.069416 20.2 15621.0 18.89 1.0 21.0 98.51 43.50 1033.93 21.06 0.0 ... 1125.0 87.0 1002.0 94.0 36.37 20.7 14.0 38112.0 32.0 25.0
2017-10-28 06:48:01.076784 20.2 15621.0 18.89 1.0 21.0 98.51 43.50 1033.95 21.06 0.0 ... 1125.0 87.0 1002.0 94.0 36.37 20.7 14.0 38112.0 32.0 25.0
2017-10-28 06:48:01.079950 20.2 15621.0 18.89 1.0 21.0 98.54 43.50 1033.95 21.06 0.0 ... 1125.0 87.0 1002.0 94.0 36.37 20.7 14.0 38112.0 32.0 25.0

5 rows × 52 columns

Lets check for correlations in the data:

corrs = sensors_num_df.correlations()

corrs[(corrs['value'] > 0.8) | (corrs['value'] < -0.8)]
value
sensor.next_train_in-sensor.next_train_to_wim 0.999961
sensor.iphone_battery_level-sensor.robins_iphone_battery_level 0.923446
sensor.bme680air_qual-sensor.bme680pressure 0.862630
sensor.mean_temperature-sensor.bedroom_temperature 0.814340
sensor.living_room_temperature-sensor.bme680temperature 0.801827
sensor.bme680pressure-sensor.darksky_sensor_temperature -0.810146
sensor.bme680humidity-sensor.bme680pressure -0.862619
sensor.memory_usage_real-sensor.volume_used_volume_1 -0.902779
sensor.bme680humidity-sensor.bme680air_qual -0.999989

Unsurprisingly the mean temperature is strongly correlated with all of the temperature sensors. Interestingly my iphone battery level is somewhat inversely correlated with the travel time from home to waterloo, which gets longer late at night when my battery level is more likely to be low.

Plot numerical sensor data

We can pass a single entity to plot:

sensors_num_df.plot('sensor.darksky_sensor_temperature')

png

We can pass a list of entities to plot:

to_plot = ['sensor.living_room_temperature',
           'sensor.bedroom_temperature',
           'sensor.darksky_sensor_temperature']

sensors_num_df.plot(to_plot)

png

BinarySensors class

The BinarySensors class is for binary sensor data with on/off states.

sensors_binary_df = detective.BinarySensors(db.master_df)

sensors_binary_df.entities
['binary_sensor.workday_sensor',
 'binary_sensor.blink_blink_camera_percy_motion_enabled',
 'binary_sensor.living_room_nest_protect_online',
 'binary_sensor.blink_armed_status',
 'binary_sensor.hall_motion_sensor',
 'binary_sensor.bedroom_motion_sensor',
 'binary_sensor.living_room_motion_sensor',
 'binary_sensor.motion_at_home',
 'binary_sensor.bayesianbinary',
 'binary_sensor.in_bed_bayesian']

We can plot a single binary sensor:

sensors_binary_df.plot('binary_sensor.motion_at_home')

png

OK now we have demonstrated the basic classes and functionality of detective, lets move on to some analysis!

Day of week analysis

Lets analyse the motion_at_home binary sensor data. We first create features from the raw data for the day-of-the-week and time categories, then perform analysis on these features.

from detective.time is_weekday, time_category

motion_df = sensors_binary_df.data[['binary_sensor.motion_at_home']] # Must pass a list to return correctly indexed df

motion_df['weekday'] = motion_df.index.weekday_name # get the weekday name

motion_df['is_weekday'] = motion_df.index.map(lambda x: is_weekday(x)) # determine if day is a weekday or not

motion_df = motion_df[motion_df['binary_sensor.motion_at_home'] == True] # Keep only true detection events

motion_df['time_category'] = motion_df.index.map(lambda x: time_category(x)) # Generate a time_category feature

motion_df.head()
entity binary_sensor.motion_at_home weekday is_weekday time_category
last_changed
2017-08-07 20:08:17.810800 True Monday True evening
2017-08-07 20:08:26.921077 True Monday True evening
2017-08-07 20:10:20.017217 True Monday True evening
2017-08-07 20:11:31.024414 True Monday True evening
2017-08-07 20:12:02.027471 True Monday True evening

Lets now do a groupby operation:

motion_df['binary_sensor.motion_at_home'].groupby(motion_df['is_weekday']).describe()['count']
is_weekday
False     4452
True     10862
Name: count, dtype: object
motion_df_gb = motion_df['binary_sensor.motion_at_home'].groupby([motion_df['weekday'], motion_df['time_category']]).sum().unstack()

motion_df_gb.fillna(value=0, inplace=True) # Replace NaN with 0

motion_df_gb = motion_df_gb.astype('int') # Ints rather than floats

motion_df_gb = motion_df_gb.T

motion_df_gb
weekday Friday Monday Saturday Sunday Thursday Tuesday Wednesday
time_category
daytime 1000 690 962 631 844 880 800
evening 394 599 239 496 453 532 545
morning 839 688 1047 833 664 655 619
night 92 93 131 113 163 149 163

We see that there is a lot of activity on saturday mornings, when I hoover the house. We can also visualise this data using Seaborn.

Seaborn

Seaborn is a python package for doing statistical plots. Unfortunately it is not yet supported on Hassio, but if you are on a Mac or PC you can use it like follows:

import seaborn as sns

fig, ax = plt.subplots(figsize=(14, 6))
days_list = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
times_list = ['morning', 'daytime', 'evening', 'night']
ax = sns.heatmap(motion_df_gb[days_list].loc[times_list], annot=True, linewidths=.5, fmt="d", ax=ax, cmap='Reds');
ax.set_title('Activity at home by day and time category')
#fig.savefig('heatmap.jpg')

png

Auth helpers

When querying the database, you might end up with user IDs and refresh token IDs. We've included a helper to help load the auth from Home Assistant and help you process this data.

from detective.auth import auth_from_hass_config

auth = auth_from_hass_config()
auth.users
{
  "user-id": {
    "id": "id of user",
    "name": "Name of user",
  }
}
auth.refresh_tokens
"refresh-token-id": {
  "id": "id of token",
  "user": "user object related to token",
  "client_name": "Name of client that created token",
  "client_id": "ID of client that created token",
}
> auth.user_name('some-user-id')
Paulus