perf: optimize hash naming for MySQL storage #25309

ankush · 2024-03-10T13:16:50Z

InnoDB is "index organized"* and the primary key is that index. So random names can send rows all over the place.

Typically documents created closer in time should live closer on mysql pages too.

Test: I created a DocType with 300 documents and each containing 1000 child table rows.

operation	before	after	difference
read a single document w/ 1000 child table rows; MySQL pages read	885	8	100x reduction
create a new hash name (microseconds)	1.65	3.94	2.3x increase

Pages read numbers are mostly meaningless as they depend on # of rows read. The general idea is previously number of pages read was proportional to the number of rows read because of random distribution of data. The proportionality factor being how many rows can fit in single 16KB page.

Notes:

This data is from bufferpool stats, not 100% accurate.
mariadb was restarted everytime before executing query. So data shouldn't be in pool by default.
Default page size in mariadb is 16kb, you need A LOT of data to trigger this. Make sure you hit at least >100mb table size before testing anything.

TODO:

verify test results with more reliable setup.
any side effects?

* - https://15445.courses.cs.cmu.edu/fall2023/slides/04-storage2.pdf (page 25)

Random names can send rows all over the place, typically documents created closer in time should live closer on mysql pages too.

ankush · 2024-03-10T15:59:05Z

Merging this for now, it's slightly better than random inserts. This PR helps up until ~1gb of table size by some rough math. (16 (0 to f) ^ 3 (hex from timestamp) * 16kb page size)

ULID is best long term fix.

github-actions bot added the add-test-cases Add test case to validate fix or enhancement label Mar 10, 2024

ankush mentioned this pull request Mar 10, 2024

UUID for name and link fields #25310

Open

5 tasks

perf: optimize hash naming for MySQL storage

665f1fd

Random names can send rows all over the place, typically documents created closer in time should live closer on mysql pages too.

ankush force-pushed the mysql_optimized_naming branch from adebfbc to 665f1fd Compare March 10, 2024 15:52

ankush marked this pull request as ready for review March 10, 2024 15:57

ankush requested review from a team and akhilnarang and removed request for a team March 10, 2024 15:57

ankush enabled auto-merge March 10, 2024 15:57

ankush disabled auto-merge March 10, 2024 16:02

ankush enabled auto-merge March 10, 2024 16:04

ankush added the defer backport Backports for some PR are deferred for a week or two to test them properly before releasing label Mar 10, 2024

ankush merged commit 71a7305 into frappe:develop Mar 10, 2024
23 checks passed

ankush deleted the mysql_optimized_naming branch March 10, 2024 16:05

This was referenced Mar 14, 2024

Change naming for Raven Message to autoincrement The-Commit-Company/raven#751

Closed

perf: use base32 space for random names instead of base16 #25497

Merged

mergify bot mentioned this pull request Mar 19, 2024

perf: use base32 space for random names instead of base16 (backport #25497) #25546

Merged

github-actions bot locked as resolved and limited conversation to collaborators Mar 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: optimize hash naming for MySQL storage #25309

perf: optimize hash naming for MySQL storage #25309

ankush commented Mar 10, 2024 •

edited

Loading

ankush commented Mar 10, 2024 •

edited

Loading

perf: optimize hash naming for MySQL storage #25309

perf: optimize hash naming for MySQL storage #25309

Conversation

ankush commented Mar 10, 2024 • edited Loading

ankush commented Mar 10, 2024 • edited Loading

ankush commented Mar 10, 2024 •

edited

Loading

ankush commented Mar 10, 2024 •

edited

Loading