Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RocksJava BytewiseComparator semantics? #2001

Closed
mikhail-antonov opened this issue Mar 17, 2017 · 9 comments
Closed

RocksJava BytewiseComparator semantics? #2001

mikhail-antonov opened this issue Mar 17, 2017 · 9 comments

Comments

@mikhail-antonov
Copy link
Contributor

mikhail-antonov commented Mar 17, 2017

Hi, I'm using RocksJava to generate SST file from my Java code, and have been seeing an issue where I'm writing out the data that are lexicographically sorted, but getting occasional error, that I've narrowed down to an attempt to append out-of-order key.

The error I'm getting:

Caused by: org.rocksdb.RocksDBException: � at org.rocksdb.SstFileWriter.add(Native Method) at org.rocksdb.SstFileWriter.add(SstFileWriter.java:24)

This is code to repro:

public class TestSstFileWriter {
  public static void main(String[] args) throws RocksDBException {
    EnvOptions envOpts = new EnvOptions();
    Options opts = new Options();
    SstFileWriter writer = new SstFileWriter(envOpts, opts,
      new BytewiseComparator(new ComparatorOptions()));
    writer.open("/tmp/test_file_with_weird_keys.sst");
    byte[] gKey = Bytes.fromHex("000000293030303030303030303030303030303030303032303736343730696E666F33");
    byte[] wKey = Bytes.fromHex("0000008d3030303030303030303030303030303030303030303437363433696e666f34");
    writer.add(new Slice(gKey), new Slice(dummyV));
    writer.add(new Slice(wKey), new Slice(dummyV));
    writer.finish();
  }
}

Where fromHex conversion is done as in http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-common/1.1.1/org/apache/hadoop/hbase/util/Bytes.java#2357

I think there's something weird with the comparison here. BytewiseComparator in Java relies on ByteBuffer comparator semantics, which is:

public int compareTo(ByteBuffer that) {
        int n = this.position() + Math.min(this.remaining(), that.remaining());
        for (int i = this.position(), j = that.position(); i < n; i++, j++) {
            int cmp = compare(this.get(i), that.get(j));
            if (cmp != 0)
                return cmp;
        }
        return this.remaining() - that.remaining();
    }
    private static int compare(byte x, byte y) {
        return Byte.compare(x, y);
    }

Whereas the comparator semantics in my Java code is as (essentially same as in the class I've linked):

public static int compareTo(byte[] buffer1, int offset1, int length1,
                         byte[] buffer2, int offset2, int length2) {
      // Short circuit equal case
      if (buffer1 == buffer2 &&
        offset1 == offset2 &&
        length1 == length2) {
        return 0;
      }
      // Bring WritableComparator code local
      int end1 = offset1 + length1;
      int end2 = offset2 + length2;
      for (int i = offset1, j = offset2; i < end1 && j < end2; i++, j++) {
        int a = (buffer1[i] & 0xff);
        int b = (buffer2[j] & 0xff);
        if (a != b) {
          return a - b;
        }
      }
      return length1 - length2;
    }

(to overcome byte signness in Java and make it more in line with memcmp).

Thoughts?

@mikhail-antonov
Copy link
Contributor Author

cc @adamretter @IslamAbdelRahman

@IslamAbdelRahman
Copy link
Contributor

Thanks @mikhail-antonov for reporting the issue, I am not sure what is the purpose of us having a BytewiseComparator implementation in java.
Maybe @adamretter will have more insight.

Is it possible to pass the native c++ comparator to SstFileWriter ?
Actually we recently updated the constructor so that we don't need to pass a comparator to SstFileWriter, but we did not update the jni interface
1ffbdfd

@mikhail-antonov
Copy link
Contributor Author

@IslamAbdelRahman I think conceptually being able to implement custom comparator in Java and pass it down to RocksDB is good; in this specific case it doesn't seem it's useful, I think best is to update JNI wrapper to allow param-less constructor as you mentioned, let me have a look at it.

For now I've working that around by implementing custom comparator and passing it to SstFileWriter, but having to implement it seems unnecessary burden.

@mikhail-antonov
Copy link
Contributor Author

So #2028 adds support for SstFileWriter constructor w/o explicit comparator to JNI api, I think with that current Bytewise comparator (both regular and reverse) should be obsolete? Having the only default comparator in Java that one is forced to use that's not compatibly with memcmp seems weird.

@IslamAbdelRahman
Copy link
Contributor

@mikhail-antonov, Yes I think we should remove this comparator. It's confusing and we should remove it from the code base

@adamretter
Copy link
Collaborator

The Java implementations of BytewiseComparator, ReverseBytewiseComparator and DirectBytewiseComparator serve two purposes:

  1. They prove that the ability to implement Comparators in Java is possible. Demonstrated by the BytewiseComparatorTest.
  2. They serve as code examples for others who want to implement Comparators in Java.

One of the important things about (1) is that they try to prove that they are directly compatible with the C++ equivalent comparators via the java_vs_cpp_ and cpp_vs_java tests in BytewiseComparatorTest.

If you want to just use the Native C++ BytewiseComparator from Java. That is entirely possible, and certainly should be preferred over the Java version for performance (as documented in the Javadoc). If you look at Options in the Java API, you can set either a BuiltinComparator (i.e. C++ native comparator) or a AbstractComparator which is a Comparator implemented in Java.

There should likely be another constructor added for SstFileWriter which allows you to specify a BuiltinComparator as opposed to forcing you to use a AbstractCompartator.

@mikhail-antonov The tests seem to show at present that the C++ and Java comparator implementations of Bytewise Comparator are equivalent. Do you believe you have found a bug in the Java BytewiseComparator implementation?

@mikhail-antonov
Copy link
Contributor Author

@adamretter

Thanks for the detailed reply. I totally see the value of being able to implement custom comparators in Java and use it.

Do you believe you have found a bug in the Java BytewiseComparator implementation?

Well, unless I'm missing something, in the first post in this issue I've described how Java and native implementations are different?

https://gist.github.com/mikhail-antonov/ef4fd053f8e47947fc59e6cd559aa77d - is a test that fails for me, Java comparator that is.

@mikhail-antonov
Copy link
Contributor Author

@adamretter any thoughts? I think that with JNI wrapper supporting comparator-less constructor Bytewise comparator in Java isn't needed for the most part, and it does seem to be incompatible with memcmp semantics.

@gfosco
Copy link
Contributor

gfosco commented Jan 10, 2018

Closing this via automation due to lack of activity. If discussion is still needed here, please re-open or create a new/updated issue.

@gfosco gfosco closed this as completed Jan 10, 2018
@adamretter adamretter reopened this Mar 15, 2018
alanpaxton pushed a commit to alanpaxton/rocksdb that referenced this issue Oct 12, 2021
closes facebook#5891
closes facebook#2001

Java BytewiseComparator is now unsigned compliant, consistent with the default C++ comparator, which has always been thus. Consequently 2 tickets reporting the previous broken state can be closed.

 This test confirms that the following issues were in fact resolved
 by a change made between 6.2.2 and 6.22.1,
 to wit facebook@7242dae7
which as part of its effect, changed the Java bytewise comparators.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants