Use multiple indexed group-by queries to get start time states for MySQL#138786
Use multiple indexed group-by queries to get start time states for MySQL#138786
Conversation
|
Hey there @home-assistant/core, mind taking a look at this pull request as it has been labeled with an integration ( Code owner commandsCode owners of
|
|
Verified with the test data provided in the linked issue that compiling stats was lightning fast with MySQL now |
|
Description was updated to include tested versions of MySQL |
emontnemery
left a comment
There was a problem hiding this comment.
Code looks good. Some comments on readability and structuring of the code though.
|
Please take a look at the requested changes, and use the Ready for review button when you are done, thanks 👍 |
|
With the structuring changes I'll have to do some retesting. I won't be able to do that until Monday at the earliest unless I can figure out a good way to test it remotely |
|
MySQL 9 retest is good |
|
SQLite retest on production (2x is good) |
|
Postgresql startup is good as well. Verified LTS caught up |
|
MariaDB looks good as well |
…SQL (#138786) * tweaks * mysql * mysql * Update homeassistant/components/recorder/history/modern.py * Update homeassistant/components/recorder/history/modern.py * Update homeassistant/components/recorder/const.py * Update homeassistant/components/recorder/statistics.py * Apply suggestions from code review * mysql * mysql * cover * make sure db is fully init on old schema * fixes * fixes * coverage * coverage * coverage * s/slow_dependant_subquery/slow_dependent_subquery/g * reword * comment that callers are responsible for staying under the limit * comment that callers are responsible for staying under the limit * switch to kwargs * reduce branching complexity * split stats query * preen * split tests * split tests
|
Thanks |
Proposed change
Redo of #133397 but for MySQL only since its the only engine will sometimes optimize a dependent subquery poorly, but always optimizes an indexed group-by with < 1000 ids well. MariaDB, PostgreSQL, and SQLite don't have this issue.
I dislike having to maintain another query for MySQL, but since we officially support it so there are limited options.
For comparison with MySQL 8.4 we get
Using where; Using index; Using filesorton theDEPENDENT SUBQUERYand it takes 36434232 microseconds. For the same data loaded in MariaDB we end up withUsing where; Using index; Using filesortand it takes 632 microseconds. MySQL can't do it without a filesortWith the test data provided by users, everything ran fast when restored to a comparable MariaDB system. Performance was also acceptable on MySQL 8.0 in testing. When using MySQL 8.4 and 9.0 performance was significantly worse as detailed above. The group by query performed well on all tested MySQL versions including 8.0,8.4,9.0
Testing:
Stats catch up now takes a few seconds instead of minutes
Live runs for a few hours no longer showed any performance issues
Purged database to empty and everything went ok
Type of change
Additional information
Checklist
ruff format homeassistant tests)If user exposed functionality or configuration variables are added/changed:
If the code communicates with devices, web services, or third-party tools:
Updated and included derived files by running:
python3 -m script.hassfest.requirements_all.txt.Updated by running
python3 -m script.gen_requirements_all.To help with the load of incoming pull requests: