Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Statistics takes longer than 1.111 seconds at startup #110433

Open
Mariusthvdb opened this issue Feb 12, 2024 · 8 comments
Open

Statistics takes longer than 1.111 seconds at startup #110433

Mariusthvdb opened this issue Feb 12, 2024 · 8 comments

Comments

@Mariusthvdb
Copy link
Contributor

The problem

running debugpy reveals several remarkable timings during setup.
writing this up per request.

What version of Home Assistant Core has the issue?

2024.2.x

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant OS

Integration causing the issue

statistics

Link to integration documentation on our website

https://www.home-assistant.io/integrations/statistics/

Diagnostics information

No response

Example YAML snippet

No response

Anything in the logs that might be useful for us?

2024-02-12 18:06:06.833 WARNING (MainThread) [asyncio] Executing <Task finished name='Task-13034' coro=<StatisticsSensor._initialize_from_database() done, defined at /usr/src/homeassistant/homeassistant/components/statistics/sensor.py:534> result=None created at /usr/src/homeassistant/homeassistant/core.py:633> took 1.111 seconds

Additional information

No response

@home-assistant
Copy link

Hey there @ThomDietrich, mind taking a look at this issue as it has been labeled with an integration (statistics) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of statistics can trigger bot actions by commenting:

  • @home-assistant close Closes the issue.
  • @home-assistant rename Awesome new title Renames the issue.
  • @home-assistant reopen Reopen the issue.
  • @home-assistant unassign statistics Removes the current integration label and assignees on the issue, add the integration domain after the command.
  • @home-assistant add-label needs-more-information Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue.
  • @home-assistant remove-label needs-more-information Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.

(message by CodeOwnersMention)


statistics documentation
statistics source
(message by IssueLinks)

@bdraco
Copy link
Member

bdraco commented Feb 12, 2024

The problem is almost certainly in

self._add_state_to_queue(state)

Its calling _add_state_to_queue which for a power sensor that updates every 5s in a 24h period will be 17280 calls.

It likely needs an optimized version for restoring data

@bdraco
Copy link
Member

bdraco commented Feb 12, 2024

The whole integration is doing a lot of datetime math could likely be all floats instead since we don't need local time

@Mariusthvdb
Copy link
Contributor Author

Mariusthvdb commented Feb 12, 2024

forgot to post the yaml configs, sorry, do you need these too, or is it a generic issue

btw, the power sensor updates way more frequently than every 5 seconds... it is per state change of the smart meter, and that can be more than once per second so it seems.
guess you can safely multiply by 5 there

so, this is using a template sensor (UI) based on 2 integrations (reading smart meter (mqtt) and solar(mqtt)),

        state: >
         {{expand('sensor.zp_actuele_opbrengst','sensor.netto_verbruik')
           |selectattr('state','is_number')
           |map(attribute='state')
           |map('float')|sum}}

then fed into a filter:

  - unique_id: bruto_stroom_verbruik_filter
    platform: filter
    entity_id: sensor.bruto_stroom_verbruik
    filters:
      - filter: outlier
        window_size: 5
        radius: 1.5
      - filter: time_simple_moving_average
        window_size: '00:00:30'
        precision: 3
      - filter: range
        lower_bound: 0

and finally the statistics:

  - unique_id: standby_power_laatste_24_uur
    platform: statistics
    entity_id: sensor.bruto_stroom_verbruik_filter
    state_characteristic: percentile
    percentile: 30
    max_age:
      hours: 24
    precision: 3

@Mariusthvdb
Copy link
Contributor Author

Mariusthvdb commented Feb 12, 2024

also, maybe consider, add a polling/update frequency option. seems unnecessary to update this frequently, once a minute should suffice as absolute minimum, maybe even lower.

think trigger template, or scan_interval:

Aware this has been discussed before, maybe revisit that?

@bdraco
Copy link
Member

bdraco commented Mar 4, 2024

This one is still a problem even after #112127 in 2024.4

@ThomDietrich
Copy link
Contributor

Hey guys,
this is a difficult issue. Generally the statistics integration is NOT optimized for huge sampling settings. I did always try to highlight that to users, however I can see that the warning did not make it into docs. Of course we can agree that this is not a valid justification or solution.

Recently I didn't have the time to add further improvements but I had planned to implement optimizations per characteristic (e.g. for the max value we need to store only one value, the largest one so far). For the percentile an optimization is not trivial.

(
If I may go a bit on a tangent. Some time ago I did discuss the nature of the statistics component with one of the core developers. The component provides timeseries data processing capabilities the main HA database does not provide. Because of that the statistics component needs to maintain its own data storage and implements many typical arithmetic functions. It is my humble opinion that the HA database (a traditional relational database) should be replaced by one of the amazing open source timeseries databases out there. This would solve many core issues (and feature requests) and performance challenges, and the statistics component would be nearly unnecessary. My recommendation was InfluxDB but there might be an even better fit out there. Working on the statistics component always felt like building a house on muddy soil.
)

Anyhow. @bdraco back when you worked on #110438 did you compare performance/memory of an extreme use case?

@bdraco
Copy link
Member

bdraco commented Mar 4, 2024

Anyhow. @bdraco back when you worked on #110438 did you compare performance/memory of an extreme use case?

I was only looking at improving the performance of the 1 day of date with a sample rate of 5s ~17280 samples

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants