Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite influxdb metrics to be more consistent #4791

Merged
merged 3 commits into from Jan 4, 2017
Merged

Rewrite influxdb metrics to be more consistent #4791

merged 3 commits into from Jan 4, 2017

Conversation

brentahughes
Copy link

Description:
This should be the final update required to influxDB to fix all the inconsistencies and errors. This moves all state changes to go two only 2 different measurements (hass.state and hass.state.count) instead of nearly every entity going to it's own measurement making it impossible to do merged queries of multiple entities.

Every state change no matter what the data type will go to hass.state.count with a value of just 1. This allows for counters for all state changes not just influxdb friendly values.

If a state or attribute contains a numeric value then those values will be sent to hass.state as well.

See #4696 for more details on the changes. I have been running this change for 2 days without a single error from influxdb or hass.

THIS IS A BREAKING CHANGE! All existing metrics will stop getting updates and all metrics will now start to go to hass.state.count and hass.state

This issue will not require any changes to the influxdb database for the user. It will just start sending metrics to a different series resulting in any existing queries to need updating.

Related issue (if applicable): addresses #4696

@mention-bot
Copy link

@bah2830, thanks for your PR! By analyzing the history of the files in this pull request, we identified @lwis, @kk7ds and @MartinHjelmare to be potential reviewers.

@brentahughes
Copy link
Author

@lwis this will revert the changes you made in #4632 but it achieves the same function with the addition of the count measurement as well.

@MartinHjelmare
Copy link
Member

Why do we not want to store text and boolean data? Isn't this discarding possibly useful information?

@brentahughes
Copy link
Author

I'm trying to avoid the issue with Home Assistant allowing mixed data types in the state and attributes. This causes the majority of the errors in influxdb currently.

@balloob
Copy link
Member

balloob commented Dec 17, 2016

Sorry that this PR has gone stale. I find it difficult to judge this PR. @MartinHjelmare, do you have better insight on this?

@brentahughes
Copy link
Author

@balloob No problem. I fully understand that this change could be controversial. If it's not like or an alternative isn't decided on I have considered just moving this into a custom component of my own until a better solution comes along.

@MartinHjelmare
Copy link
Member

I like that this change probably will make the component more robust. I don't like so much that it discards some values.

Since bool is a subclass of int, boolean values should also be converted to float and stored, if I'm not mistaken. So only strings will be discarded. So I'm thinking this could be ok.

Is it OK to handle this as usual for breaking changes and just have users adapt? We will for sure have comments/issues raised for this change. I don't like to have workarounds for some specific values like node id's etc.

@UltraSub
Copy link

UltraSub commented Dec 24, 2016

Really waiting for this fix! 👍
Influx is so broken at the moment. Can't upgrade because of previous changes to the component which was a good change btw, but it has broken the integration if you already had data in Influx because of existing field types (the float conversion which is happening in newer versions). I could work around that, but trying to avoid doing data movement two times with this one maybe coming up :)

I'm with @bah2830 on this. Yes it's a breaking change, but the work he has done ensures reliability in the future for this vital component. Do you guys think this is going to make the next release?

Copy link
Member

@MartinHjelmare MartinHjelmare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should go ahead with this.

Storing string values in tags would be an option, but I don't think this is a good idea, due to reasons @bah2830 mentioned in #4696. You can read more about that here:
https://docs.influxdata.com/influxdb/v1.1/concepts/schema_and_data_layout/

I think one thing that would make this transition more easily accepted by our users, would be a script that would parse the user's existing influx db and update it according to this new schema. Do you think that is a good idea and doable?

try:
if len(whitelist) > 0 and state.entity_id not in whitelist:
return
if len(whitelist) > 0 and state.entity_id not in whitelist:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you check length of whitelist is more than zero? Just check if the list is true.

if whitelist and state.entity_id not in whitelist:

if isinstance(value, (int, float)):
state_fields[key] = float(value)

if len(state_fields) > 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See first comment above.

json_body[0]['tags'].update(tags)

state_fields = {}
if isinstance(_state, (int, float)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, this will return True for bool too, since that is a subclass of int. So eg True will be converted to 1.0. States will probably never be booleans but state attributes might.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually okay with this allowing booleans of others are okay with it. If needed I can block them though.

elif isinstance(value, int):
# Prevent column data errors in influxDB.
json_body[0]['fields'][key] = float(value)
if isinstance(value, (int, float)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above about booleans.

@lwis
Copy link
Member

lwis commented Dec 27, 2016

I'm with @MartinHjelmare, I think the biggest concern is porting the data over, if possible.

@balloob
Copy link
Member

balloob commented Jan 4, 2017

Alrighty, merging ! 🐬

@balloob balloob merged commit 4ef7e08 into home-assistant:dev Jan 4, 2017
@titilambert
Copy link
Contributor

titilambert commented Jan 8, 2017

Hello !
I just saw this PR and i'm disappointed because I missed it :/. I understand the goal: Fix #4696 (that I got also with the current version).
But, after reading this new version, I'm pretty sure that storing all data in only one measurement is not a good idea. I'm using (in my job) InfluxDB for 3 years and I'm a InfluxData Stack contributor for 1 year, and we're always trying to avoid that's kind of DB structure. With this, we can get an issue with the cardinality of the database (https://docs.influxdata.com/influxdb/v1.1/troubleshooting/frequently-asked-questions/#why-does-series-cardinality-matter)
I just made a test, with my very small HA installation:

  • with the current version I got 104 of cardinality
  • with this new version I got 201 of cardinality
    So we have twice of cardinality, this could be an issue on small systems...

Also, and this could be the more important issue, we lose the unit of each metric which is not really cool.

Of course, I understand the issue #4696, but I think we can try to find a better solution.

I'm hoping to propose you something in the near future.

@titilambert
Copy link
Contributor

titilambert commented Jan 8, 2017

This issue and workaround is here: http://docs.influxdata.com/influxdb/v1.1/troubleshooting/frequently-asked-questions/#can-i-change-a-field-s-data-type
So I will try to apply the first and simple workaround: Write the Data to a Different Field

@titilambert
Copy link
Contributor

titilambert commented Jan 8, 2017

In fact, I think the root issue seems the event structure.
For example, this is an event of waqi component:

INFO:homeassistant.core:Bus:Handling
<Event state_changed[L]:
    old_state=<state sensor.waqi_saintjeanbaptiste_montreal=129;
               temperature=5,
               unit_of_measurement=AQI,
               nitrogen_dioxide=26,
               friendly_name=WAQI Saint-Jean-Baptiste, Montreal,
               dominentpol=,
               ozone=5,
               pressure=6,
               attribution=Data provided by the World Air Quality Index project,
               humidity=-18,
               icon=mdi:cloud,
               time=2017-01-08T06:00:00+09:00,
               particle=129 @ 2017-01-08T07:28:49.445874-05:00>,
    entity_id=sensor.waqi_saintjeanbaptiste_montreal,
    new_state=<state sensor.waqi_saintjeanbaptiste_montreal=115;
               temperature=5,
               unit_of_measurement=AQI,
               nitrogen_dioxide=15,
               friendly_name=WAQI Saint-Jean-Baptiste, Montreal,
               dominentpol=,
               ozone=11,
               pressure=6,
               attribution=Data provided by the World Air Quality Index project,
               humidity=-19,
               icon=mdi:cloud,
               time=2017-01-08T07:00:00+09:00,
               particle=115 @ 2017-01-08T08:02:03.690199-05:00>>

We have only ONE unit_of_measurement, here AQI, but we have multiple data, like pressure, humidity, ozone, ...
So, for example. storing humidity in AQI measurement does not seems having sense... It seems missing unit for each metric ...
I also got the same issue for my climate component (honeywell).

In fact, each value (humity, pressure, ozone ...) should be more complex:
It should be something like: {"type": "humidity", name: "humidity", "unit": "%", "value": -19}
In that case:

  • type will be the measurement
  • name will be a tag
  • unit will be a tag
  • value will be a field
    (and we also add the domain and the entity_id as tag)

Of course, this reflection is based on influxDB structure, I don't know if this have any sense inside Home-Assistant. And I guess this is a HUGE refactoring ? @balloob

In any case, I will propose something adapted to the current event structure...

@titilambert titilambert mentioned this pull request Jan 9, 2017
@titilambert
Copy link
Contributor

Here my proposition: #5238

titilambert added a commit to titilambert/home-assistant that referenced this pull request Jan 10, 2017
balloob pushed a commit that referenced this pull request Jan 14, 2017
* Revert #4791 and fixes #4696

* Update influxDB based on PR comments

* Add migration script

* Update influxdb_migrator based on PR comments

* Add override_measurement option to influxdb_migrator

* Rename value field to state when data is string type

* Fix influxdb cloning query
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants