Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: optimize hash naming for MySQL storage #25309

Merged
merged 1 commit into from
Mar 10, 2024

Conversation

ankush
Copy link
Member

@ankush ankush commented Mar 10, 2024

InnoDB is "index organized"* and the primary key is that index. So random names can send rows all over the place.

Typically documents created closer in time should live closer on mysql pages too.

Test: I created a DocType with 300 documents and each containing 1000 child table rows.

operation before after difference
read a single document w/ 1000 child table rows; MySQL pages read 885 8 100x reduction
create a new hash name (microseconds) 1.65 3.94 2.3x increase

Pages read numbers are mostly meaningless as they depend on # of rows read. The general idea is previously number of pages read was proportional to the number of rows read because of random distribution of data. The proportionality factor being how many rows can fit in single 16KB page.

Notes:

  • This data is from bufferpool stats, not 100% accurate.
  • mariadb was restarted everytime before executing query. So data shouldn't be in pool by default.
  • Default page size in mariadb is 16kb, you need A LOT of data to trigger this. Make sure you hit at least >100mb table size before testing anything.

TODO:

  • verify test results with more reliable setup.
  • any side effects?

* - https://15445.courses.cs.cmu.edu/fall2023/slides/04-storage2.pdf (page 25)

@github-actions github-actions bot added the add-test-cases Add test case to validate fix or enhancement label Mar 10, 2024
@ankush ankush mentioned this pull request Mar 10, 2024
5 tasks
Random names can send rows all over the place, typically documents
created closer in time should live closer on mysql pages too.
@ankush ankush marked this pull request as ready for review March 10, 2024 15:57
@ankush ankush requested review from a team and akhilnarang and removed request for a team March 10, 2024 15:57
@ankush ankush enabled auto-merge March 10, 2024 15:57
@ankush
Copy link
Member Author

ankush commented Mar 10, 2024

Merging this for now, it's slightly better than random inserts. This PR helps up until ~1gb of table size by some rough math. (16 (0 to f) ^ 3 (hex from timestamp) * 16kb page size)

ULID is best long term fix.

@ankush ankush disabled auto-merge March 10, 2024 16:02
@ankush ankush enabled auto-merge March 10, 2024 16:04
@ankush ankush added the defer backport Backports for some PR are deferred for a week or two to test them properly before releasing label Mar 10, 2024
@ankush ankush merged commit 71a7305 into frappe:develop Mar 10, 2024
23 checks passed
@ankush ankush deleted the mysql_optimized_naming branch March 10, 2024 16:05
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 26, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
add-test-cases Add test case to validate fix or enhancement defer backport Backports for some PR are deferred for a week or two to test them properly before releasing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant