Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistant InfluxDB data types. #4696

Closed
brentahughes opened this issue Dec 3, 2016 · 5 comments · Fixed by #5238
Closed

Inconsistant InfluxDB data types. #4696

brentahughes opened this issue Dec 3, 2016 · 5 comments · Fixed by #5238

Comments

@brentahughes
Copy link

In an ongoing effort to clean up all the bugs with the influxDB history component we still have some outstanding bugs that I would like to discus before writing up a solution.

Known Issues

  1. Home Assistant does not have strict data types for each entity state and attribute. The state of a sensor can be string, int, float, or bool. This causes errors with InfluxDB because the first measurement sent determines the data type for each column from then on. So if the first one was an int, then every update after has to also be an int. If the next time it updates it's a string it will error causing nothing to go to influxDB. This is what is happening at the moment for most updates causing very little to actually get to influxdb.

  2. Home Assistant currently uses the unit_of_measurement attribute if it exists as the measurement name otherwise it uses the entity name as the measurement. This allows some common measurements like temperature to be grouped into the same series, but the majority of entities go to their own time series. This is only useful if you want to look at each individual entity. InfluxDB does not allow merging of different measurements. This should be done by putting them all under the same measurement name but use tags to query the ones we want out.

  3. Strings currently get sent to influxdb as field columns. Influxdb uses tags and fields. Tags are indexed and used for identifying metrics in a series. Fields are the actual metric data. While influxDB allows strings to be sent to fields it is not very common and has very little use since you can't query by field. If you could it would be super slow because it has no index.


Existing Attempts


Proposed Solution

  • Prevent any string or boolean from being sent to a field column.
    • Only numeric values should be sent to a field column.
  • Update all measurements to be sent to 2 different series.
    • Series (Measurements)
      • gauges: This will only include numeric value changes to the state and attributes.
      • counters: This will happen for every change no matter what the data type. All tags will be the same but the only field will be a boolean or some other value to indicate a change occurred
    • This would line up influxDB in a similar way that the recorder component currently puts all states in a single table.

Impact On Existing Implementation

  • All existing measurements will be discontinued requiring any existing users to update their queries for the new layout.
    • This will likely not be a problem as it's so broken already that very few people likely use it.
  • For every update a maximum of two measurements will be sent to InfluxDB.
    • Influx allows multiple measurements per request so it will not increase the number of requests. It will only increase the payload size slightly.

I have opened this issue to get feedback and ideas from other people that are using the influxdb component before I code it and open a PR.

@duecedriver
Copy link

So is this what is causing my history timelines to be drawn with bad or no coloring and taking forever to load and causing database locked errors in my logs?

@brentahughes
Copy link
Author

This is a metrics component. Not the recorder used to show the logbook and graphs.

@brentahughes
Copy link
Author

brentahughes commented Dec 5, 2016

Here is my proposed changes https://github.com/bah2830/home-assistant/commit/fb9decbe81dc1bcbc4617a7817cc9605df42bed0

I am currently running this in production from my branch https://github.com/bah2830/home-assistant/tree/REWRITE_INFLUXDB_METRICS for extended testing.

This adds two measurements hass.state and hass.state.count

hass.state
Only state changes that have a state or attribute that is numeric.

  • Tags: domain, entity_id
  • Fields: all states or attributes that are numeric

hass.state.count
Every state change

  • Tags: domain, entity_id
  • Fields: count

@bestlibre
Copy link
Contributor

Can we also add the other attributes and states which are not numeric as tags ?

@brentahughes
Copy link
Author

brentahughes commented Dec 5, 2016

I thought about doing something like this but tags are used for the purpose of querying the metrics and are indexed.

Because we don't know which attribute values are informative data and which are state data it's hard to know what should be there and what shouldn't. Also if we put attributes in tags then every metric will have nulled tags and traditionally nulls in an indexed column can have a negative impact on performance.

One thing I did think of doing was adding a 3rd measurement for the strings. But the usefulness of strings for metric data is limited and for the majority of users it will just be a lot of junk building up in their database.

titilambert added a commit to titilambert/home-assistant that referenced this issue Jan 10, 2017
balloob pushed a commit that referenced this issue Jan 14, 2017
* Revert #4791 and fixes #4696

* Update influxDB based on PR comments

* Add migration script

* Update influxdb_migrator based on PR comments

* Add override_measurement option to influxdb_migrator

* Rename value field to state when data is string type

* Fix influxdb cloning query
@home-assistant home-assistant locked and limited conversation to collaborators Apr 30, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants