Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove fastutil dependency #576

Merged
merged 5 commits into from
Apr 4, 2023
Merged

Conversation

ennerf
Copy link
Collaborator

@ennerf ennerf commented Mar 26, 2023

dataset currently depends on it.unimi.dsi:fastutil, which adds a ~19MB dependency for 3 classes that are actually being used. After icu4j the total size with dependencies (other than JavaFX) is about ~29MB, so removing it would reduce the total size by 2/3 to below 10MB.

mvn dependency:copy-dependencies -DincludeScope=runtime
ls -lhS chartfx-chart/target/dependency

 19M fastutil-8.3.1.jar
2.2M commons-math3-3.6.1.jar
1.4M controlsfx-11.1.1.jar
1.2M JTransforms-3.1.jar
574K commons-lang3-3.12.0.jar
433K math-master-SNAPSHOT.jar
387K dataset-master-SNAPSHOT.jar
348K ikonli-fontawesome5-pack-11.5.0.jar
228K JLargeArrays-1.5.jar
164K pngj-2.1.0.jar
156K ikonli-fontawesome-pack-11.5.0.jar
 51K slf4j-api-2.0.0-alpha0.jar
 37K ikonli-javafx-11.5.0.jar
 27K annotations-21.0.1.jar
 11K ikonli-core-11.5.0.jar

In this PR I shaded the fastutil classes and removed some parts that aren't being used (e.g. sorting). Fastutil uses the Apache v2 license, so it could be used as is.

Do you think it'd be worth using the generator to create more custom-fit implementations?

@ennerf ennerf temporarily deployed to coverage March 26, 2023 14:00 — with GitHub Actions Inactive
@ennerf ennerf temporarily deployed to coverage March 26, 2023 14:00 — with GitHub Actions Inactive
@codecov
Copy link

codecov bot commented Mar 26, 2023

Codecov Report

Patch coverage: 26.11% and project coverage change: -0.38 ⚠️

Comparison is base (8d349d9) 51.97% compared to head (4ba3974) 51.59%.

❗ Current head 4ba3974 differs from pull request most recent head 4d527f0. Consider uploading reports for the commit 4d527f0 to get more accurate results

Additional details and impacted files
@@             Coverage Diff              @@
##               main     #576      +/-   ##
============================================
- Coverage     51.97%   51.59%   -0.38%     
- Complexity     6401     6457      +56     
============================================
  Files           363      367       +4     
  Lines         37028    37586     +558     
  Branches       6063     6167     +104     
============================================
+ Hits          19244    19394     +150     
- Misses        16522    16901     +379     
- Partials       1262     1291      +29     
Impacted Files Coverage Δ
...in/java/io/fair_acc/dataset/spi/DoubleDataSet.java 99.36% <ø> (ø)
...va/io/fair_acc/dataset/spi/DoubleErrorDataSet.java 100.00% <ø> (ø)
...ain/java/io/fair_acc/dataset/spi/FloatDataSet.java 99.37% <ø> (ø)
...io/fair_acc/dataset/spi/MultiDimDoubleDataSet.java 98.98% <ø> (ø)
...io/fair_acc/dataset/spi/fastutil/IntArrayList.java 0.00% <0.00%> (ø)
...fair_acc/dataset/spi/utils/StringHashMapList2.java 0.00% <0.00%> (ø)
...va/io/fair_acc/dataset/spi/fastutil/ArrayUtil.java 20.00% <20.00%> (ø)
.../fair_acc/dataset/spi/fastutil/FloatArrayList.java 38.98% <38.98%> (ø)
...fair_acc/dataset/spi/fastutil/DoubleArrayList.java 41.08% <41.08%> (ø)

... and 5 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@wirew0rm
Copy link
Member

Thanks for giving this another go.

dataset currently depends on it.unimi.dsi:fastutil, which adds a ~19MB dependency for 3 classes that are actually being used. After icu4j the total size with dependencies (other than JavaFX) is about ~29MB, so removing it would reduce the total size by 2/3 to below 10MB.

I can see that this would be a huge improvement for applications where the size of the resulting jar matters 👍 .

In this PR I shaded the fastutil classes and removed some parts that aren't being used (e.g. sorting). Fastutil uses the Apache v2 license, so it could be used as is.

Any reason to use manual shading and not rely on the shading plugin? Benefits would be the ability to follow updates (fastutil seems reasonably stable, but it's still actively maintained), document/reduce code ownership, making it easy to use other classes from fastutil in the future in a consistent way. I haven't used that functionality myself before, so I might overlook some complications.

Another solution that gets mentioned in the fastutils issue is using proguard to produce a reduced jar, but I'd rather not make all users depend on a non-standard solution.

Do you think it'd be worth using the generator to create more custom-fit implementations?

That would essentially mean re-implementing fastutil (or the small subset of it that we are using) with our own templating (fastutil is generated using a C-preprocessor based code-generator), right? Do you mean custom-fit in terms of datatypes or in terms of features/performance characteristics? I suspect you have specific improvements in mind?
I think it's probably feasible with the infrastructure we have in place, but since this type of "custom templating" code is harder to read/modify/maintain I think it should only be done if it there are substantial benefits over vendoring fastutil (independent of if it is done via having an explicit copy in our tree or programatically from maven).

In general this looks good, and I wouldn't be opposed to merging this, either because there are aguments against sharding in the maven step that i'm not aware of or simply because noone has time to implement it and we have a working solution to a real problem here.

@ennerf
Copy link
Collaborator Author

ennerf commented Mar 27, 2023

Any reason to use manual shading and not rely on the shading plugin? Benefits would be the ability to follow updates (fastutil seems reasonably stable, but it's still actively maintained), document/reduce code ownership, making it easy to use other classes from fastutil in the future in a consistent way. I haven't used that functionality myself before, so I might overlook some complications.

Using the shading plugin would pull in a lot of other classes like DoubleCollection, DoubleList, and various utility classes for unused features like iterators and sorting. I think it'd be a decent amount of effort to find an appropriate subset of classes that still works.

It was fairly straight forward to get the lists in, so I don't think adding more classes (e.g. a primitive map) would be that difficult to add later if needed. The basic features of primitive lists are incredibly stable, so I don't expect any significant updates that would need to be backported.

that would essentially mean re-implementing fastutil (or the small subset of it that we are using) with our own templating (fastutil is generated using a C-preprocessor based code-generator), right?

Yes, I was mostly thinking of naming conventions for consistency. There is an argument for keeping it though to simplify pulling in more features if needed.

Copy link
Member

@wirew0rm wirew0rm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I added a small notice to the top of the copied files linking to the original project and stating that the files where modified/simplified.

@wirew0rm wirew0rm merged commit ed41b3a into fair-acc:main Apr 4, 2023
1 of 3 checks passed
@ennerf ennerf deleted the ennerf/remove-fastutil branch April 4, 2023 12:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants