Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: show relative order eviction with "fast growing tenant" #6377

Merged
merged 6 commits into from
Jan 25, 2024

Conversation

koivunej
Copy link
Member

@koivunej koivunej commented Jan 17, 2024

Refactor out test_disk_usage_eviction tenant creation and add a custom case with 4 tenants, 3 made with pgbench scale=1 and 1 made with pgbench scale=4.

Because the tenants are created in order of scales [1, 1, 1, 4] this is simple enough to demonstrate the problem with using absolute access times, because on a disk usage based eviction run we will disproportionally target the first scale=1 tenant(s), and the later larger tenant does not lose anything.

In hindsight even test_partial_evict_tenant might be enough to demonstrate this if only creation order is used as access order, I just failed to see it because of the original byte based comparisons and my earlier thinking of the test supporting selecting either scale=4 or scale=6 tenant as the warm tenant. The earlier PR #6375 already added the assertions to that test.

This test is not enough to show the difference between relative_equal and relative_spare (the fudge factor); much larger scale will be needed for "the large tenant", but that will make debug mode tests slower.

What is missing? Well, there is only testing for amount of layers, previously there was only testing for freed bytes. The layers could be evicted in any order, but I am not sure how to add checks for those: the eviction env constrains us to L0.

Cc: #5304

Copy link

github-actions bot commented Jan 17, 2024

2280 tests run: 2196 passed, 0 failed, 84 skipped (full report)


Flaky tests (2)

Postgres 14

  • test_fully_custom_config: debug
  • test_emergency_mode: release

Code coverage (full report)

  • functions: 54.5% (10593 of 19438 functions)
  • lines: 81.6% (60665 of 74374 lines)

The comment gets automatically updated with the latest test results
084b80c at 2024-01-19T19:15:16.550Z :recycle:

@koivunej
Copy link
Member Author

koivunej commented Jan 18, 2024

Weird that adding a test causes another test to fail. Secondary mode doesn't yet have the relative at all, please ignore for now. I'll either impl and fix it on a separate PR or, unsure. Ah, easy to fix.

@koivunej
Copy link
Member Author

koivunej commented Jan 18, 2024

Force pushed for splitting the last commit 087458a to forgotten script/reformat (which did not cause a workflow failure? no, it did fail, I just failed to spot it) and the actual change.

Base automatically changed from dube_rel_access_time_asserts to main January 18, 2024 10:39
Copy link
Contributor

@problame problame left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assertions in the new test case test_dube_and_fast_growing_tenant aren't very strong.

The (IMO confusing) assertion for ABSOLUTE_ORDER doesn't clearly show the problem with ABSOLUTE_ORDER that's being solved by RELATIVE_ORDER.
I think that's the most important role of the if ABSOLUTE_ORDER branch in this test: demonstrate the problem clearly. I think unrolling the loop would help.


Your disk_usage_eviction_task.rs doc comment is out of date with this PR:

    /// This strategy will evict layers more fairly but is untested.

This PR is the first time I looked at the new eviction strategy. I think "fudge factor" is the wrong term, given that we subtract, not multiply it from tenant_candidates.len().

Also, it seems like we don't set relative_last_activity for secondary tenants layers. Assuming the default is 0, this means we always evict secondary first. That's probably not what we want; the secondary location can at any point in time become primary location, we should keep it equally active.
=> Let's create an issue for this and discuss in person some time.


test_runner/regress/test_disk_usage_eviction.py Outdated Show resolved Hide resolved
@koivunej
Copy link
Member Author

Also, it seems like we don't set relative_last_activity for secondary tenants layers. Assuming the default is 0, this means we always evict secondary first. That's probably not what we want; the secondary location can at any point in time become primary location, we should keep it equally active.
=> Let's create an issue for this and discuss in person some time.

It was on my list to do next. In vienna we discussed that the starting point for secondary evictions will be the fixed (oldest layer on the heatmap so we don't get incrementally more evictions); I had been planning to type that out and discuss on the PR. It might be that heatmap no longer resembles what was thought of back in vienna, but the rule is still very much the same.

However for testing on staging we don't have secondaries there yet this is fine. A test got added for secondary eviction which will need to be changed for these eviction orders as well.

@koivunej
Copy link
Member Author

Two force-pushes for:

  1. rebase was confusing, learning about the new sharding related APIs took me too many rounds
  2. endless reformatting

@koivunej koivunej enabled auto-merge (squash) January 25, 2024 13:38
@koivunej koivunej merged commit 463b6a2 into main Jan 25, 2024
46 checks passed
@koivunej koivunej deleted the dube_fast_growing_tenant branch January 25, 2024 13:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants