Skip to content
This repository was archived by the owner on Jan 26, 2026. It is now read-only.

Merge and Project - https://github.com/imprint-serde/imprint-java/issues/12#15

Merged
expanded-for-real merged 42 commits intomainfrom
dev
Jun 9, 2025
Merged

Merge and Project - https://github.com/imprint-serde/imprint-java/issues/12#15
expanded-for-real merged 42 commits intomainfrom
dev

Conversation

@expanded-for-real
Copy link
Collaborator

Adds merge and project APIs - #12. Note that these are not yet zero-copy so imprint-serde/imprint#10 is still in play for the Java version as well

Also addresses comments from the previous PR but in doing so I discovered why serialization is not currently optimal. There are a lot of redundant sorting conversions back and forth that need to be reviewed and the abstraction will need to be changed slightly (also realizing I don't need a Builder class and a Writer so will consolidate). I'll create an issue describing the process since I wanted to get the merge and project API out there on this one and didn't want to load it up too much

expanded-for-real and others added 30 commits June 1, 2025 13:23
…mance tracking; add comprehensive String benchmark
Try to enhance string deserialization
A full list of enhancements can be found here - #3
…ome micro-optimizations added that were found along the way
* Full comprehensive comparison tests with a lot of other libraries + some micro-optimizations added that were found along the way

* replace deprecated gradle methods with latest

---------

Co-authored-by: expand3d <>
# Conflicts:
#	src/jmh/java/com/imprint/benchmark/ComparisonBenchmark.java
#	src/main/java/com/imprint/core/ImprintRecord.java
#	src/main/java/com/imprint/types/TypeHandler.java
#	src/main/java/com/imprint/types/Value.java
expand3d added 12 commits June 5, 2025 15:23
…es in gradle file. Also fix permission issue
# Conflicts:
#	.github/workflows/ci.yml
#	build.gradle
#	src/jmh/java/com/imprint/benchmark/ComparisonBenchmark.java
#	src/main/java/com/imprint/core/ImprintRecord.java
#	src/main/java/com/imprint/types/TypeHandler.java
#	src/main/java/com/imprint/types/Value.java
# Conflicts:
#	.github/workflows/ci.yml
#	build.gradle
#	src/jmh/java/com/imprint/benchmark/ComparisonBenchmark.java
#	src/main/java/com/imprint/core/ImprintRecord.java
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update this later once we fix serialization slowness and add real merge/project comparison testing

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just kinda copied the Rust impl on this one. note that it isn't zero copy either here

* Takes ownership of the byte array. Caller must not modify after construction.
*/
public BytesValue(byte[] value) {
this.value = value.clone();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very minor performance tweak

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added tests for merge and project

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added tests for merge and project

// Lazy-loaded directory state
private List<DirectoryEntry> parsedDirectory;
// Lazy-loaded directory state. Needs to maintain ordering so that we can binary search the endOffset
private TreeMap<Integer, DirectoryEntry> parsedDirectory;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to use a tree map here. The abstraction between the writer and this class needs some work to optimize out all the excessive sorting I'm doing

// Lazy-loaded directory state. Needs to maintain ordering so that we can binary search the endOffset
private TreeMap<Integer, DirectoryEntry> parsedDirectory;
private boolean directoryParsed = false;
private int directoryCount = -1; // Cached count to avoid repeated VarInt decoding
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got rid of directory count caching

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed a bunch of my javadoc comments from before since the private methods changed. Will add back after I fix the serialization hotpath and settle on a optimal abstraction here

Copy link
Contributor

@agavra agavra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Comment on lines +147 to +148
// Create header preserving first record's schema ID
var newHeader = new Header(first.getHeader().getFlags(), first.getHeader().getSchemaId(), mergedPayload.remaining());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops that's actually a bug in the rust implementation - this would generate a new schema ID (given the current spec of schema id). I'm not totally sure how I want to use schema IDs and there's some work still to figure that out (you can watch imprint-serde/imprint#8 for when we start working on that).

For now let's just make this a TODO so we don't lose track and hardcode something like 0xdeadbeef so it's "obviously" wrong.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I saw that and wasn't sure but I am keeping a watch on imprint-serde/imprint#8 as well!

* - ByteBuffer operations
*/
@Disabled("Enable manually for profiling")
//@Disabled("Enable manually for profiling")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably meant to comment this back in before merging

@expanded-for-real expanded-for-real merged commit e5340c5 into main Jun 9, 2025
3 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants