Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new int64 add merge operator, also with Java API #11202

Open
wants to merge 29 commits into
base: main
Choose a base branch
from

Conversation

alanpaxton
Copy link
Contributor

@alanpaxton alanpaxton commented Feb 8, 2023

Supercedes #6884 and #6877 (cc @adamretter )

This branch/PR is based on a copy of the code in #6884, which given the age of the PR we chose to restart, though the core of the merge operator is carried over.

This merge operator adds int64_t values. At a cursory glance, this may appear similar to UInt64AddOperator, but this merge operator handles signed numbers. It can also handle both Addition or Subtraction by merging either positive or negative integers.

Values are stored into the database using 8-bit variable-byte encoding (as suggested by @pdillinger). Because the length is known, we do not require a bit flag for continuation. So for smaller positive numbers the value stored has the advantage of also consuming less bytes than UInt64AddOperator. The most significant stored byte is a 2s complement value which is sign-extended on decoding.

Therefore the range -128..127 can be stored in 1 bytes, -32768..32767 in 2 bytes, etc.
Zig-zag encoded values are in turn encoded as varints in the data. Varints now implement the 8-bit variable-length encoding (known length) suggested by @pdillinger.

A Java API to this merge operator has been added. In addition, a small Java utility is provided which implements the same encoding as the merge operator.

Changing the UInt64MergeOperator to use a similar varint encoding would require that a "new" operator be created, in order to avoid backward compatibility breaking.

  • The Makefile now builds a single merge_operators_test binary which combines the GTEST for this merge operator with that of stringappend. This addresses your concerns re additional test binaries.
  • I have regenerated TARGETS as per instructions.

Copy link
Contributor

@pdillinger pdillinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would appreciate if reviewers could help decide whether this is a useful library for customer API or whether other facilities can do what this does. Can clients always merge() (i.e. never put()) and fetch a merged value back via get() ? I don't know the details of RocksDB well enough to answer this.

The RocksDB user needs to encode their data for going into Merge (or Put) and coming out from Get, so the user does need serialization / deserialization compatible with the operator.

utilities/merge_operators/int64add/int64_add.cc Outdated Show resolved Hide resolved
@alanpaxton alanpaxton force-pushed the eb-1879-int64add-merge-operator branch 4 times, most recently from b09dd98 to fbbb6f8 Compare March 2, 2023 16:12
@facebook-github-bot
Copy link
Contributor

@dannyhchen has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@anand1976 anand1976 requested a review from pdillinger May 17, 2023 22:30
@alanpaxton alanpaxton force-pushed the eb-1879-int64add-merge-operator branch from fbbb6f8 to 47d81d1 Compare July 6, 2023 11:40
@facebook-github-bot
Copy link
Contributor

@alanpaxton has updated the pull request. You must reimport the pull request before landing.

@alanpaxton
Copy link
Contributor Author

@anand1976 @adamretter rebased and once again ready for review/import.

@adamretter
Copy link
Collaborator

@anand1976 rebased and once again ready for review/import.

@adamretter adamretter force-pushed the eb-1879-int64add-merge-operator branch from 47d81d1 to f7abeed Compare July 28, 2023 14:34
@facebook-github-bot
Copy link
Contributor

@alanpaxton has updated the pull request. You must reimport the pull request before landing.

@alanpaxton alanpaxton force-pushed the eb-1879-int64add-merge-operator branch from f7abeed to cde1f97 Compare September 12, 2023 14:18
@facebook-github-bot
Copy link
Contributor

@alanpaxton has updated the pull request. You must reimport the pull request before landing.

@alanpaxton
Copy link
Contributor Author

@anand1976 I have once again rebased this and it is ready for review/import.

@alanpaxton alanpaxton force-pushed the eb-1879-int64add-merge-operator branch from cde1f97 to f5e7bd9 Compare October 25, 2023 14:26
@facebook-github-bot
Copy link
Contributor

@alanpaxton has updated the pull request. You must reimport the pull request before landing.

2 similar comments
@facebook-github-bot
Copy link
Contributor

@alanpaxton has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@alanpaxton has updated the pull request. You must reimport the pull request before landing.

@alanpaxton alanpaxton force-pushed the eb-1879-int64add-merge-operator branch from d228702 to 84424f4 Compare October 25, 2023 16:46
@facebook-github-bot
Copy link
Contributor

@alanpaxton has updated the pull request. You must reimport the pull request before landing.

@alanpaxton alanpaxton force-pushed the eb-1879-int64add-merge-operator branch from 84424f4 to cfad4ca Compare October 25, 2023 17:47
@facebook-github-bot
Copy link
Contributor

@alanpaxton has updated the pull request. You must reimport the pull request before landing.

@alanpaxton alanpaxton force-pushed the eb-1879-int64add-merge-operator branch from cfad4ca to f0e9a37 Compare October 30, 2023 08:41
@facebook-github-bot
Copy link
Contributor

@alanpaxton has updated the pull request. You must reimport the pull request before landing.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@alanpaxton has updated the pull request. You must reimport the pull request before landing.

alanpaxton and others added 28 commits May 7, 2024 10:32
Build a Java API for the new Int64AddMergeOperator that looks like all the other merge operators. Add tests.

Individual operator tests which are part of merge_operators_test shouldn’t be in TEST_MAIN_SOURCES

Add int64 merge operator to the tests which mention the other merge operators.
Include the header that says `extern “C” { … `
Implement encoding and zigzag in Java so that we can test it more thoroughly.
Exporting the int64 merge operator to Java requires support for encoding to make it useful.

It also requires that support in order to test it from Java.
Accidentally re-introduced deleted loading code when rebasing.
Now removed that again, and added the loading of the new int64add merge operator into the new loading code.
Moved the new int64add operator into ROCKSDB_NAMESPACE because that’s where the other merge operators are. Split the class into .h and .cc
In the context in which we use them (values in RocksDB (key,value)-pairs, value lengths are always known, so they do not need a continuation bit and a 7-bit encoding, but can be encoded more efficiently as an 8-bit encoding.

This has the added advantage of being consistent with fixed size encodings, which can be seen as having trailing-zeros in the most significant bytes, which are truncated in this encoding.
$ python3 buckifier/buckify_rocksdb.py
The problem only shows up when ASSERT_STATUS_CHECKED=1,
i.e. running changed merge_operators_test under CI
make format doesn’t format this but does complain
Don’t need to turn them into zigzag first. There end up being 2 branches in encoding anyway, but there’s perhaps a little less actual code all told.
Changed format in core C++ 8 bit varint encoding, so the Java shadow implementation (used for testing) needs to work the same way.
stdout or stderr was causing a decoding error. Make the error mode “backslashreplace” instead of the default “strict” so we have a chance of seeing the output.
@alanpaxton alanpaxton force-pushed the eb-1879-int64add-merge-operator branch from 7e762e8 to 16502aa Compare May 7, 2024 09:32
@facebook-github-bot
Copy link
Contributor

@alanpaxton has updated the pull request. You must reimport the pull request before landing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants