Merge and Project - https://github.com/imprint-serde/imprint-java/issues/12#15
Merge and Project - https://github.com/imprint-serde/imprint-java/issues/12#15expanded-for-real merged 42 commits intomainfrom
Conversation
…mance tracking; add comprehensive String benchmark
…to remove a bunch of stuff
…ing deserialization a bit faster
Try to enhance string deserialization
A full list of enhancements can be found here - #3
…ome micro-optimizations added that were found along the way
* Full comprehensive comparison tests with a lot of other libraries + some micro-optimizations added that were found along the way * replace deprecated gradle methods with latest --------- Co-authored-by: expand3d <>
# Conflicts: # src/jmh/java/com/imprint/benchmark/ComparisonBenchmark.java # src/main/java/com/imprint/core/ImprintRecord.java # src/main/java/com/imprint/types/TypeHandler.java # src/main/java/com/imprint/types/Value.java
…with nested objects
…es in gradle file. Also fix permission issue
…Comparison tests iterations manually
# Conflicts: # .github/workflows/ci.yml # build.gradle # src/jmh/java/com/imprint/benchmark/ComparisonBenchmark.java # src/main/java/com/imprint/core/ImprintRecord.java # src/main/java/com/imprint/types/TypeHandler.java # src/main/java/com/imprint/types/Value.java
# Conflicts: # .github/workflows/ci.yml # build.gradle # src/jmh/java/com/imprint/benchmark/ComparisonBenchmark.java # src/main/java/com/imprint/core/ImprintRecord.java
There was a problem hiding this comment.
I'll update this later once we fix serialization slowness and add real merge/project comparison testing
There was a problem hiding this comment.
Just kinda copied the Rust impl on this one. note that it isn't zero copy either here
| * Takes ownership of the byte array. Caller must not modify after construction. | ||
| */ | ||
| public BytesValue(byte[] value) { | ||
| this.value = value.clone(); |
There was a problem hiding this comment.
very minor performance tweak
There was a problem hiding this comment.
added tests for merge and project
There was a problem hiding this comment.
added tests for merge and project
| // Lazy-loaded directory state | ||
| private List<DirectoryEntry> parsedDirectory; | ||
| // Lazy-loaded directory state. Needs to maintain ordering so that we can binary search the endOffset | ||
| private TreeMap<Integer, DirectoryEntry> parsedDirectory; |
There was a problem hiding this comment.
Had to use a tree map here. The abstraction between the writer and this class needs some work to optimize out all the excessive sorting I'm doing
| // Lazy-loaded directory state. Needs to maintain ordering so that we can binary search the endOffset | ||
| private TreeMap<Integer, DirectoryEntry> parsedDirectory; | ||
| private boolean directoryParsed = false; | ||
| private int directoryCount = -1; // Cached count to avoid repeated VarInt decoding |
There was a problem hiding this comment.
Got rid of directory count caching
There was a problem hiding this comment.
Removed a bunch of my javadoc comments from before since the private methods changed. Will add back after I fix the serialization hotpath and settle on a optimal abstraction here
| // Create header preserving first record's schema ID | ||
| var newHeader = new Header(first.getHeader().getFlags(), first.getHeader().getSchemaId(), mergedPayload.remaining()); |
There was a problem hiding this comment.
oops that's actually a bug in the rust implementation - this would generate a new schema ID (given the current spec of schema id). I'm not totally sure how I want to use schema IDs and there's some work still to figure that out (you can watch imprint-serde/imprint#8 for when we start working on that).
For now let's just make this a TODO so we don't lose track and hardcode something like 0xdeadbeef so it's "obviously" wrong.
There was a problem hiding this comment.
Yeah I saw that and wasn't sure but I am keeping a watch on imprint-serde/imprint#8 as well!
| * - ByteBuffer operations | ||
| */ | ||
| @Disabled("Enable manually for profiling") | ||
| //@Disabled("Enable manually for profiling") |
There was a problem hiding this comment.
probably meant to comment this back in before merging
Adds merge and project APIs - #12. Note that these are not yet zero-copy so imprint-serde/imprint#10 is still in play for the Java version as well
Also addresses comments from the previous PR but in doing so I discovered why serialization is not currently optimal. There are a lot of redundant sorting conversions back and forth that need to be reviewed and the abstraction will need to be changed slightly (also realizing I don't need a Builder class and a Writer so will consolidate). I'll create an issue describing the process since I wanted to get the merge and project API out there on this one and didn't want to load it up too much