-
Notifications
You must be signed in to change notification settings - Fork 560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DataKernel serializer (with ASCII optimization) #55
Conversation
Datakernel serializer is so limited and it can be used in only a few situations.
That's all I know from its tutoiral. I haven't test its performance, but fast is far away from enough, there is a long way for it to become a productive component. Tell me if there's any wrong here. |
myshzzx, thank you for feedback.
|
That's a big sacrifice, for extreme speed. It fills a blank of jvm-serialization, as I know. |
I think it is fine to discuss various trade-offs of codecs on the mailing list, but I do not think there is any specific criteria for inclusion, either regarding performance or implementation or even limitations. |
Well, actually, DataKernel Serializer supports UTF8 (by default) and ASCII, UTF16 strings (using annotations), allowing to specify the best one depending on data being serialized. According to jvm-serializers wiki, there is subcategory of manually optimized serializers so it's allowed to use parameters, that provide best result. ASCII optimization is quite similar to ‘Nullable’ optimization, which is common in other serializers. Nevertheless, running the benchmarks without ASCII annotation gives a little bit slower but still the fastest ‘total’ times for DataKernel serializer comparing to other serializers, according to our measurements. |
A tangent regarding the ASCII optimization: basically, I think we shouldn't I think it's helpful to differentiate between the schema definition and the I think this is reflects a common real-world scenario: most string fields Similarly, we only allow the non-null optimization if the schema specifies On Mon, Sep 28, 2015 at 10:26 AM, Vitaliy Mykhalko <notifications@github.com
|
I fully agree with @cakoose. I don't see a problem in codec optimizing for likely cases (of, say, ASCII), but I think that it must be capable of handling non-ASCII. And this is one reason why perhaps we should force use of couple of non-ASCII characters -- I know there are 4 different test files, some with them. |
I agree that Unicode support is usually needed (and DataKernel serializer still gives great results with Unicode, so we are ok with UTF8). But I think in many cases ASCII optimization can be really useful:
Maybe ASCII optimization should be allowed for some kind of fields like "uri" and "format", and disallowed for the rest of fields, which could potentially contain Unicode? In doing so, wouldn't it give more balanced and realistic benchmark? |
@vmykh If schema (whatever the source; external, based on Java class definition) indicates that the datatype can NOT contain non-ASCII content, sure. I don't have specific objection to denoting that for fields you mention, and leaving others like "title", "description" as full Unicode. |
I think the ASCII optimization is a useful one to make. Many users will Long story (again, sorry): Having been around this project for a long time has taught me how difficult We've since made changes where we thought the benefit was unambiguous. For But the fact remains that our test value is still very biased. For One way to improve this project is to have multiple test values that try But as it stands now, we're only using a single test schema/value. To me, On Wed, Sep 30, 2015 at 1:13 PM, Tatu Saloranta notifications@github.com
|
Okay, I've updated this pull request, and now ASCII optimization is used only for "uri" and "format" fields. Alternatively, I've also created another pull request, where serializer doesn't use ASCII optimization at all. So you can decide which one better conforms requirements and then merge it. |
Favoured #56 over ASCII optimizations. |
Hi, I want to add Datakernel serializer.
It's extremely fast and space-efficient serializer, crafted using bytecode engineering.
As you can see from benchmark, it is ~1.5 times faster than closest opponent (considering serialization + deserialization total time).
Here you can find more info and examples. Source code can be found in this place. Datakernel serializer is also available on maven central.
To create jar file, Maven Shade Plugin was used, which allows to include all dependencies in jar (such as ObjectWeb ASM) and avoid conflicts if you're using same dependency but with different version. Such kind of jar is called UberJar. More Details.