Skip to content

Use multiple indexed group-by queries to get start time states for MySQL#138786

Merged
frenck merged 29 commits intodevfrom
mysql_dependant_subquery_filesort
Mar 2, 2025
Merged

Use multiple indexed group-by queries to get start time states for MySQL#138786
frenck merged 29 commits intodevfrom
mysql_dependant_subquery_filesort

Conversation

@bdraco
Copy link
Copy Markdown
Member

@bdraco bdraco commented Feb 18, 2025

Proposed change

Redo of #133397 but for MySQL only since its the only engine will sometimes optimize a dependent subquery poorly, but always optimizes an indexed group-by with < 1000 ids well. MariaDB, PostgreSQL, and SQLite don't have this issue.

I dislike having to maintain another query for MySQL, but since we officially support it so there are limited options.

For comparison with MySQL 8.4 we get Using where; Using index; Using filesort on the DEPENDENT SUBQUERY and it takes 36434232 microseconds. For the same data loaded in MariaDB we end up with Using where; Using index; Using filesort and it takes 632 microseconds. MySQL can't do it without a filesort

With the test data provided by users, everything ran fast when restored to a comparable MariaDB system. Performance was also acceptable on MySQL 8.0 in testing. When using MySQL 8.4 and 9.0 performance was significantly worse as detailed above. The group by query performed well on all tested MySQL versions including 8.0,8.4,9.0

+----+--------------------+------------------+------------+--------+-----------------------------------------------------------------+---------------------------------------+---------+-----------------------------------------+------+----------+------------------------------------------+
| id | select_type        | table            | partitions | type   | possible_keys                                                   | key                                   | key_len | ref                                     | rows | filtered | Extra                                    |
+----+--------------------+------------------+------------+--------+-----------------------------------------------------------------+---------------------------------------+---------+-----------------------------------------+------+----------+------------------------------------------+
|  1 | PRIMARY            | states_meta      | NULL       | range  | PRIMARY                                                         | PRIMARY                               | 8       | NULL                                    |    8 |   100.00 | Using where; Using index                 |
|  1 | PRIMARY            | states           | NULL       | ref    | ix_states_last_updated_ts,ix_states_metadata_id_last_updated_ts | ix_states_last_updated_ts             | 9       | func                                    |    1 |    57.55 | Using where                              |
|  1 | PRIMARY            | state_attributes | NULL       | eq_ref | PRIMARY                                                         | PRIMARY                               | 8       | test_test_test_.states.attributes_id    |    1 |   100.00 | NULL                                     |
|  2 | DEPENDENT SUBQUERY | states           | NULL       | ref    | ix_states_last_updated_ts,ix_states_metadata_id_last_updated_ts | ix_states_metadata_id_last_updated_ts | 9       | test_test_test_.states_meta.metadata_id | 4621 |    50.00 | Using where; Using index; Using filesort |
+----+--------------------+------------------+------------+--------+-----------------------------------------------------------------+---------------------------------------+---------+-----------------------------------------+------+----------+------------------------------------------+

Testing:
Stats catch up now takes a few seconds instead of minutes
Live runs for a few hours no longer showed any performance issues
Purged database to empty and everything went ok

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New integration (thank you!)
  • New feature (which adds functionality to an existing integration)
  • Deprecation (breaking change to happen in the future)
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code or addition of tests

Additional information

Checklist

  • The code change is tested and works locally.
  • Local tests pass. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • I have followed the development checklist
  • I have followed the perfect PR recommendations
  • The code has been formatted using Ruff (ruff format homeassistant tests)
  • Tests have been added to verify that the new code works.

If user exposed functionality or configuration variables are added/changed:

If the code communicates with devices, web services, or third-party tools:

  • The manifest file has all fields filled out correctly.
    Updated and included derived files by running: python3 -m script.hassfest.
  • New or updated dependencies have been added to requirements_all.txt.
    Updated by running python3 -m script.gen_requirements_all.
  • For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description.

To help with the load of incoming pull requests:

@home-assistant
Copy link
Copy Markdown
Contributor

Hey there @home-assistant/core, mind taking a look at this pull request as it has been labeled with an integration (recorder) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of recorder can trigger bot actions by commenting:

  • @home-assistant close Closes the pull request.
  • @home-assistant rename Awesome new title Renames the pull request.
  • @home-assistant reopen Reopen the pull request.
  • @home-assistant unassign recorder Removes the current integration label and assignees on the pull request, add the integration domain after the command.
  • @home-assistant add-label needs-more-information Add a label (needs-more-information, problem in dependency, problem in custom component) to the pull request.
  • @home-assistant remove-label needs-more-information Remove a label (needs-more-information, problem in dependency, problem in custom component) on the pull request.

Comment thread homeassistant/components/recorder/history/modern.py Outdated
Comment thread homeassistant/components/recorder/history/modern.py Outdated
Comment thread homeassistant/components/recorder/const.py Outdated
Comment thread homeassistant/components/recorder/statistics.py Outdated
@bdraco bdraco added the bugfix label Feb 18, 2025
Comment thread homeassistant/components/recorder/history/modern.py Outdated
Comment thread homeassistant/components/recorder/statistics.py Outdated
@bdraco
Copy link
Copy Markdown
Member Author

bdraco commented Feb 19, 2025

Verified with the test data provided in the linked issue that compiling stats was lightning fast with MySQL now

@bdraco bdraco marked this pull request as ready for review February 26, 2025 16:07
@bdraco
Copy link
Copy Markdown
Member Author

bdraco commented Feb 26, 2025

Description was updated to include tested versions of MySQL

Copy link
Copy Markdown
Contributor

@emontnemery emontnemery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good. Some comments on readability and structuring of the code though.

Comment thread homeassistant/components/recorder/models/database.py Outdated
Comment thread homeassistant/components/recorder/history/modern.py Outdated
Comment thread homeassistant/components/recorder/history/modern.py Outdated
Comment thread homeassistant/components/recorder/history/modern.py Outdated
Comment thread homeassistant/components/recorder/statistics.py Outdated
Comment thread homeassistant/components/recorder/statistics.py Outdated
@home-assistant
Copy link
Copy Markdown
Contributor

Please take a look at the requested changes, and use the Ready for review button when you are done, thanks 👍

Learn more about our pull request process.

@home-assistant home-assistant Bot marked this pull request as draft February 27, 2025 09:43
@bdraco
Copy link
Copy Markdown
Member Author

bdraco commented Feb 27, 2025

With the structuring changes I'll have to do some retesting. I won't be able to do that until Monday at the earliest unless I can figure out a good way to test it remotely

@bdraco
Copy link
Copy Markdown
Member Author

bdraco commented Feb 28, 2025

MySQL 9 retest is good

@bdraco
Copy link
Copy Markdown
Member Author

bdraco commented Feb 28, 2025

SQLite retest on production (2x is good)

@bdraco
Copy link
Copy Markdown
Member Author

bdraco commented Feb 28, 2025

Postgresql startup is good as well. Verified LTS caught up

@bdraco
Copy link
Copy Markdown
Member Author

bdraco commented Feb 28, 2025

MariaDB looks good as well

@bdraco bdraco marked this pull request as ready for review February 28, 2025 17:10
@home-assistant home-assistant Bot requested a review from emontnemery February 28, 2025 17:10
@frenck frenck merged commit c9abe76 into dev Mar 2, 2025
@frenck frenck deleted the mysql_dependant_subquery_filesort branch March 2, 2025 14:13
bramkragten pushed a commit that referenced this pull request Mar 2, 2025
…SQL (#138786)

* tweaks

* mysql

* mysql

* Update homeassistant/components/recorder/history/modern.py

* Update homeassistant/components/recorder/history/modern.py

* Update homeassistant/components/recorder/const.py

* Update homeassistant/components/recorder/statistics.py

* Apply suggestions from code review

* mysql

* mysql

* cover

* make sure db is fully init on old schema

* fixes

* fixes

* coverage

* coverage

* coverage

* s/slow_dependant_subquery/slow_dependent_subquery/g

* reword

* comment that callers are responsible for staying under the limit

* comment that callers are responsible for staying under the limit

* switch to kwargs

* reduce branching complexity

* split stats query

* preen

* split tests

* split tests
@bdraco
Copy link
Copy Markdown
Member Author

bdraco commented Mar 2, 2025

Thanks

@github-actions github-actions Bot locked and limited conversation to collaborators Mar 3, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Home Assistant 2025.x Causes Long-Running SQL Queries on Percona/MySQL 8.0 (MariaDB not affected)

7 participants