Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor (Hyper)ClockCache code for upcoming changes #11572

Closed
wants to merge 4 commits into from

Conversation

pdillinger
Copy link
Contributor

@pdillinger pdillinger commented Jun 28, 2023

Summary: Separate out some functionality that will be common to both static and dynamic HCC into BaseClockTable. Table::InsertState and GrowIfNeeded will be used by the dynamic HCC so don't make much sense right now.

Test Plan: existing tests. No functional changes intended.

Performance test in subsequent PR #11601

@facebook-github-bot
Copy link
Contributor

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Summary: Separate out some functionality that will be common to both
static and dynamic HCC into BaseClockTable. Table::InsertState and
GrowIfNeeded will be used by the dynamic HCC so don't make much sense
right now.

One small functional change in insertion has the process abort as soon
as it finds a pre-existing match (already_matches & FindSlot). This
wasn't strictly needed for performance but cleans up the calling of
TryInsert as we now don't try to call it after we've already seen a
match.

Test Plan: existing tests. TODO: performance tests
@facebook-github-bot
Copy link
Contributor

@pdillinger has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@pdillinger has updated the pull request. You must reimport the pull request before landing.

pdillinger added a commit to pdillinger/rocksdb that referenced this pull request Jul 11, 2023
Summary:
Stacked on facebook#11572
* Minimize use of std::function and lambdas to minimize chances of
compiler heap-allocating closures (unnecessary stress on allocator). It
appears that converting FindSlot to a template enables inlining the
lambda parameters, avoiding heap allocations.
* Clean up some logic with FindSlot
* Fix handling of rare case of probing all slots, with new unit test.
(Previously Insert would not roll back displacements in that case, which
would kill performance if it were to happen.)
* Add an -early_exit option to cache_bench for gathering memory stats
before deallocation.

Test Plan: unit test added for probing all slots

 ## Seeing heap allocations
Run `MALLOC_CONF="stats_print:true" ./cache_bench -cache_type=hyper_clock_cache`
before facebook#11572 vs. after this change. Before, we see this in the
interesting bin statistics:
```
size  nrequests
----  ---------
  32     578460
  64      24340
8192     578460
```
And after:
```
size  nrequests
----  ---------
  32  (insignificant)
  64      24370
8192     579130
```

 ## Performance test
Build with `make USE_CLANG=1 PORTABLE=0 DEBUG_LEVEL=0 -j32 cache_bench`

Run `./cache_bench -cache_type=hyper_clock_cache -ops_per_thread=5000000`
in before and after configurations, simultaneously:

Before: Complete in 33.244 s; Rough parallel ops/sec = 2406442
After:  Complete in 32.773 s; Rough parallel ops/sec = 2441019
@facebook-github-bot
Copy link
Contributor

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@pdillinger has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

pdillinger added a commit to pdillinger/rocksdb that referenced this pull request Jul 12, 2023
Summary:
Stacked on facebook#11572
* Minimize use of std::function and lambdas to minimize chances of
compiler heap-allocating closures (unnecessary stress on allocator). It
appears that converting FindSlot to a template enables inlining the
lambda parameters, avoiding heap allocations.
* Clean up some logic with FindSlot
* Fix handling of rare case of probing all slots, with new unit test.
(Previously Insert would not roll back displacements in that case, which
would kill performance if it were to happen.)
* Add an -early_exit option to cache_bench for gathering memory stats
before deallocation.

Test Plan: unit test added for probing all slots

 ## Seeing heap allocations
Run `MALLOC_CONF="stats_print:true" ./cache_bench -cache_type=hyper_clock_cache`
before facebook#11572 vs. after this change. Before, we see this in the
interesting bin statistics:
```
size  nrequests
----  ---------
  32     578460
  64      24340
8192     578460
```
And after:
```
size  nrequests
----  ---------
  32  (insignificant)
  64      24370
8192     579130
```

 ## Performance test
Build with `make USE_CLANG=1 PORTABLE=0 DEBUG_LEVEL=0 -j32 cache_bench`

Run `./cache_bench -cache_type=hyper_clock_cache -ops_per_thread=5000000`
in before and after configurations, simultaneously:

Before: Complete in 33.244 s; Rough parallel ops/sec = 2406442
After:  Complete in 32.773 s; Rough parallel ops/sec = 2441019
pdillinger added a commit to pdillinger/rocksdb that referenced this pull request Jul 12, 2023
Summary: ... ahead of planned dynamic HCC variant. This changes
simplifies some logic while still enabling future code sharing between
implementation variants.

Stacked on facebook#11572

Test Plan: existing tests
@jowlyzhang jowlyzhang assigned jowlyzhang and unassigned jowlyzhang Jul 12, 2023
@jowlyzhang jowlyzhang self-requested a review July 12, 2023 17:56
// required, and the operation should fail if not possible.
// NOTE: Otherwise, occupancy_ is not managed in this function
template <class Table>
Status ChargeUsageMaybeEvictStrict(size_t total_charge, size_t capacity,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this and below ChargeUsageMaybeEvictNonStrict were private before, should they continue to have more restricted access.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I'll fix this in #11609

Copy link
Contributor

@jowlyzhang jowlyzhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the PR!

@facebook-github-bot
Copy link
Contributor

@pdillinger merged this pull request in c3c84b3.

pdillinger added a commit to pdillinger/rocksdb that referenced this pull request Jul 12, 2023
Summary: ... ahead of planned dynamic HCC variant. This changes
simplifies some logic while still enabling future code sharing between
implementation variants.

Stacked on facebook#11572

Test Plan: existing tests
pdillinger added a commit to pdillinger/rocksdb that referenced this pull request Jul 12, 2023
Summary:
Stacked on facebook#11572
* Minimize use of std::function and lambdas to minimize chances of
compiler heap-allocating closures (unnecessary stress on allocator). It
appears that converting FindSlot to a template enables inlining the
lambda parameters, avoiding heap allocations.
* Clean up some logic with FindSlot
* Fix handling of rare case of probing all slots, with new unit test.
(Previously Insert would not roll back displacements in that case, which
would kill performance if it were to happen.)
* Add an -early_exit option to cache_bench for gathering memory stats
before deallocation.

Test Plan: unit test added for probing all slots

 ## Seeing heap allocations
Run `MALLOC_CONF="stats_print:true" ./cache_bench -cache_type=hyper_clock_cache`
before facebook#11572 vs. after this change. Before, we see this in the
interesting bin statistics:
```
size  nrequests
----  ---------
  32     578460
  64      24340
8192     578460
```
And after:
```
size  nrequests
----  ---------
  32  (insignificant)
  64      24370
8192     579130
```

 ## Performance test
Build with `make USE_CLANG=1 PORTABLE=0 DEBUG_LEVEL=0 -j32 cache_bench`

Run `./cache_bench -cache_type=hyper_clock_cache -ops_per_thread=5000000`
in before and after configurations, simultaneously:

Before: Complete in 33.244 s; Rough parallel ops/sec = 2406442
After:  Complete in 32.773 s; Rough parallel ops/sec = 2441019
facebook-github-bot pushed a commit that referenced this pull request Jul 14, 2023
Summary:
Stacked on #11572
* Minimize use of std::function and lambdas to minimize chances of
compiler heap-allocating closures (unnecessary stress on allocator). It
appears that converting FindSlot to a template enables inlining the
lambda parameters, avoiding heap allocations.
* Clean up some logic with FindSlot (FIXMEs from #11572)
* Fix handling of rare case of probing all slots, with new unit test.
(Previously Insert would not roll back displacements in that case, which
would kill performance if it were to happen.)
* Add an -early_exit option to cache_bench for gathering memory stats
before deallocation.

Pull Request resolved: #11601

Test Plan:
unit test added for probing all slots

## Seeing heap allocations
Run `MALLOC_CONF="stats_print:true" ./cache_bench -cache_type=hyper_clock_cache`
before #11572 vs. after this change. Before, we see this in the
interesting bin statistics:

```
size  nrequests
----  ---------
  32     578460
  64      24340
8192     578460
```
And after:
```
size  nrequests
----  ---------
  32  (insignificant)
  64      24370
8192     579130
```

## Performance test
Build with `make USE_CLANG=1 PORTABLE=0 DEBUG_LEVEL=0 -j32 cache_bench`

Run `./cache_bench -cache_type=hyper_clock_cache -ops_per_thread=5000000`
in before and after configurations, simultaneously:

```
Before: Complete in 33.244 s; Rough parallel ops/sec = 2406442
After:  Complete in 32.773 s; Rough parallel ops/sec = 2441019
```

Reviewed By: jowlyzhang

Differential Revision: D47375092

Pulled By: pdillinger

fbshipit-source-id: 46f0f57257ddb374290a0a38c651764ea60ba410
jowlyzhang pushed a commit to jowlyzhang/rocksdb that referenced this pull request Jul 18, 2023
Summary:
Separate out some functionality that will be common to both static and dynamic HCC into BaseClockTable. Table::InsertState and GrowIfNeeded will be used by the dynamic HCC so don't make much sense right now.

Pull Request resolved: facebook#11572

Test Plan:
existing tests. No functional changes intended.

Performance test in subsequent PR facebook#11601

Reviewed By: jowlyzhang

Differential Revision: D47110496

Pulled By: pdillinger

fbshipit-source-id: 379bd433322a42ea28c0043b41ec24956d21e7aa
pdillinger added a commit to pdillinger/rocksdb that referenced this pull request Jul 18, 2023
Summary: ... ahead of planned dynamic HCC variant. This changes
simplifies some logic while still enabling future code sharing between
implementation variants.

Stacked on facebook#11572

Test Plan: existing tests
facebook-github-bot pushed a commit that referenced this pull request Jul 18, 2023
Summary:
... ahead of planned dynamic HCC variant. This changes
simplifies some logic while still enabling future code sharing between
implementation variants.

Detail: For complicated reasons, using a std::function parameter to
`ConstApplyToEntriesRange` with a lambda argument does not play
nice with templated HandleImpl. An explicit conversion to std::function
would be needed for it to compile. Templating the function type is the
easy work-around.

Also made some functions from #11572 private as recommended

Pull Request resolved: #11609

Test Plan: existing tests

Reviewed By: jowlyzhang

Differential Revision: D47407415

Pulled By: pdillinger

fbshipit-source-id: 0f65954db16335999b78fb7d2563ec627624cef0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants