-
-
Notifications
You must be signed in to change notification settings - Fork 29.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve InfluxDB #5238
Improve InfluxDB #5238
Conversation
@titilambert, thanks for your PR! By analyzing the history of the files in this pull request, we identified @bah2830, @bestlibre and @fabaff to be potential reviewers. |
8c98f60
to
48ec0bf
Compare
I don't have too much insight on this. @bah2830, @MartinHjelmare please respond :-) |
I agree with @titilambert on the mix of data that can live in these fields is a problem. But by reverting we go back to the issue that nearly every state change goes into it's own series. Which means trying to sum up all motion sensors or averaging all temp sensors is impossible as they don't all use the same or even set the unit_of_measurement. This also has the issue that "unit_of_measurement" is wildly inconsistent. Climate platforms all set the unit_of_measurement in degrees while the state is actually a string (cool, heat, etc...). Also if the unit_of_measurement is in the attributes every attribute now goes into that same unit. The problem is each state object can have lots of different types of data in it, including a mix within the same state. So assuming "unit_of_measurement" covers everything means you now have to go searching through your series to figure out that your humidity level in the house is under the ℉ series because it was an attribute of a climate entity that used a string as the state but ℉ as the unit_of_measurement. |
|
||
measurement = state.attributes.get('unit_of_measurement') | ||
if measurement in (None, ''): | ||
if default_measurement: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this was the original or not. But this means you can only override the default if a unit_of_measurement is not set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this was the original. And you're right we can only override if there is no unit_of_measurement...
@bah2830 could you give me an example of influxQL query you want to do ? |
@titilambert Most common one for me would be counting every time a state changes and then grouping by domain. Gets an overall idea of the activity of things. Or my humidity sensors that use % as the unit_of_measurement. But the humidity for the nest is an attribute so it ends up in the degrees series. I can no longer group them by time(1m) and average it. You can't group by across series. |
@bah2830 I understand your point. In fact you want join series/measurements. |
So joining series would be nice. My problem is assuming the unit_of_measurement is actually a representation of the data when it's not currently. Ideally Home Assistant would be more strictly typed but that's not the case and we can't make it just for this. If we update the default_measurement to allow overriding the unit_of_measurement then I think this change could do everything people may need. Including allowing me to group up series that would normally be split. |
@bah2830 based on your last comment I updated the PR. I just add a new param: |
Thanks. I think adding that and the update to send '_str' for non numerics should work out just fine for most users. |
@bah2830 could you confirm that is working for you ? |
@titilambert I will merge your branch into mine tonight and let it run for a while to check for any errors and verify it will work as expected. |
@bah2830 run for a while -> we should merge this before Friday when it is the release day. |
I started getting the data type error immediately after startup.
Looks like the states that made it into influxdb are all setting the value to 0 causing the column to be an int. |
'tags': { | ||
'domain': state.domain, | ||
'entity_id': state.object_id, | ||
}, | ||
'time': event.time_fired, | ||
'fields': { | ||
'value': 1 | ||
_state_key: _state, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be cast to a float to prevent inconsistent values that cause data type errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cast is done line 107:
_state = float(state_helper.state_as_number(state))
It would be interesting to know what the previous events for the PIR were where the values were converted to int. I'm going to go through the reported issues for Influxdb to see how wide range of type conversion problems we have seen. |
This was using the the override config so all were in the same measurement. But I've seen it a lot with just the nest that the nest API will return 71 and then later return 71.2 Many people in the gitter channels complain about this data type error. Which is where the original PR came from to convert them to float and fix the most common error I've seen with the influxdb component. |
Numbers will already have been converted to float via the state helper |
I think the problem is that state_as_number will return a float only if the state is not one of the const listed. If not you get a 1 or 0. So possibly because the nest and all other climates set the state as a string based on the current mode of the thermostat instead of the actual temp. But I can guarantee without anything custom on influxdb config and just letting it use the unit_of_measurement or the entity as the series name I get an insane number of date type errors. Even if I completely delete the database and start fresh I get the same result. By forcing everything to float I get zero errors but then its just down to getting the series in a more usable structure for looking at overall stats. |
@balloob Where can we document the migration ? |
@ArnoutVerbeken I just retested and this work for me and I don't have any error in HA logs anymore (about influxDB) You have to note:
@ArnoutVerbeken so maybe you will have to change your graph queries when you grab some string fields (to use |
@bah2830 BTW I fixed the progress bar issue... |
try: | ||
new_point["fields"][field] = float(point[field]) | ||
except (ValueError, TypeError): | ||
new_key = "{}_str".format(field) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For states the key should be "state"
for strings according to the component code. State attributes have the correct key now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MartinHjelmare Thanks for your review ! I just updated the migration script !
@titilambert this looks good! Tests for the script would be icing on the cake, but I'm not sure if it's strictly needed. |
I found an issue in the migration script. I need more invstigation |
The cloning query is now fixed |
What is the state of this, is this ok to merge for 0.36? |
@balloob do we need tests for the migration script? We should document the breaking change in the release notes and explain how to run the migration script (somewhere). @titilambert I've tested the migration script on a very limited db and there it works. We should mention in the docs for the script that users should only run it on an old db, ie before starting hass after the upgrade. Otherwise, if running the script after already having started an upgraded hass, the keys in the db will be modified again by the script. Ie "FIELDNAME_str" will be turned into "FIELDNAME_str_str". Besides these points I think we can merge. |
THe only issue is the migration of waqi data (AQI measurement) which have a |
@titilambert the waqi time field has to get removed anyway, only JSON serializable fields should be sent over. |
So.... We got these instructions to run the migration script.
Should we run the script after upgrade to 0.36 but before first start of 0.36? This has been said here somewhere but it is not in the documentation page.... I just don't want to loose my DB data... Thanks |
I'll write up a PR with some docs now. Happy if @titilambert can review that when it's up.
Sorry that all docs wasn't ready at release. We'll try to make it right asap. |
Thank you. Do we need to override measurement or is there a default if we omit it? It seems we can leave it out, but it is in the example. |
You do not need to set the override_measurement. If that is not specified, the old measurement names will be retained. It's only if you want to change how the data is stored and instead store everything under a single measurement. |
OK, understood. What is the best practice to do with all the changes in this PR: Single measurement or leave like it is? Thanks. |
I'd recommend you to leave it like it is. If you don't know that you want to change this, I suggest you don't change it. The default way in 0.35 was to use the |
Description:
This PR revert #4791 to store data in InfluxDB using best practices (low cardinality).
See comment here: #4791 (comment)
It also fixes #4696.
Related issue: fixes #4696 , Revert #4791