Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guava JAR is HUGE #605

Closed
gissuebot opened this issue Oct 31, 2014 · 35 comments
Closed

Guava JAR is HUGE #605

gissuebot opened this issue Oct 31, 2014 · 35 comments

Comments

@gissuebot
Copy link

@gissuebot gissuebot commented Oct 31, 2014

Original issue created by ceefour666 on 2011-04-14 at 08:03 PM


Google Collections 1.0 is ~600 KB

Guava r09 is 1.1 MB.

Please split the library into smaller pieces (at least two, -core and -extras)

The core should contain only the most frequently used classes and it should be around 100-300 KB. The extras contains the rest of the stuff.

@gissuebot
Copy link
Author

@gissuebot gissuebot commented Oct 31, 2014

Original comment posted by cgdecker on 2011-04-14 at 08:25 PM


If you're worried about Guava increasing the size of your application, the suggestion is that you use ProGuard as part of your build process to create a jar that strips out everything you aren't using. See http://code.google.com/p/guava-libraries/wiki/UsingProGuardWithGuava

@gissuebot
Copy link
Author

@gissuebot gissuebot commented Oct 31, 2014

Original comment posted by ceefour666 on 2011-04-14 at 08:31 PM


Thanks for the comment.

While ProGuard may be useful for applications, I think it's not applicable for framework/library type projects, like ModeShape.

We're also trying to reduce/eliminate the risk of classpath conflicts.

Here's the discussion: ModeShape/modeshape#69

@gissuebot
Copy link
Author

@gissuebot gissuebot commented Oct 31, 2014

Original comment posted by cgdecker on 2011-04-14 at 09:25 PM


Yeah, the page I linked does recommend not doing this for libraries. It should be up to the applications using the libraries what they do with them. Is the worry that users will avoid your library because of the size of the Guava dependency? As long as you avoid @Beta APIs, I think you should be fine otherwise.

I know that for one release (r03) there was a separate jar for each package in addition to the jar with everything, but they decided not to continue doing that.

At any rate, I'm not someone who can make any sort of decision about this... maybe someone else will have some comments.

@gissuebot
Copy link
Author

@gissuebot gissuebot commented Oct 31, 2014

Original comment posted by kevinb9n on 2011-04-19 at 02:07 AM


We considered this very carefully when we started Guava, and we believe that a single JAR, with a recommendation of ProGuard to size-sensitive applications, is the way to go. The key realization was that even if we split into 10 separate packages, most users are still going to use 15% of this package, 5% of that one, etc., and would still benefit from ProGuard just as much! We would add a lot of administrative overhead for no real benefit.

We have some more documentation that gives advice on how to depend on Guava from a library, which I believe Charles is planning on externalizing for you.


Status: WorkingAsIntended

@gissuebot
Copy link
Author

@gissuebot gissuebot commented Nov 1, 2014

Original comment posted by wasserman.louis on 2012-07-28 at 08:31 AM


Issue #1087 has been merged into this issue.

@gissuebot
Copy link
Author

@gissuebot gissuebot commented Nov 1, 2014

Original comment posted by m...@re-entry.ca on 2012-08-01 at 03:45 PM


It isn't the size of the jar but the visibility of unneeded packages that is concerning. We have a large development team, and one way we keep everyone on e right track is to control the jars they have access to. Event Bux and the io classes are unhelpful in our product, but we really like collections. If guava was intelligently decomposed a bit you would find more people interested in using what they need without worrying about API creep.

@gissuebot
Copy link
Author

@gissuebot gissuebot commented Nov 1, 2014

Original comment posted by ceefour666 on 2012-08-01 at 03:50 PM


Agree with this, esp #6.

Collections is the most useful part of Guava.

@gissuebot
Copy link
Author

@gissuebot gissuebot commented Nov 1, 2014

Original comment posted by wasserman.louis on 2012-08-01 at 04:14 PM


This got discussed in the Hangout with the whole team yesterday (http://youtu.be/rkjW-zwZhJQ?t=31m18s), which provides some more discussion. (tl;dw: Most Guava users end up using a few features from many different packages. Relatively few users only use c.g.c.collect, or only c.g.c.base.)

There's also the issue that c.g.c.collect depends on a bunch of the other Guava packages, too, especially base, math, and primitives.

If you're really that fussy about what's accessible from Guava, ProGuard can also strip out everything in Guava except common.collect and the classes (not just the packages) that it actually depends on. I whipped up a ProGuard configuration file to do that in about five minutes; it's attached.

@gissuebot
Copy link
Author

@gissuebot gissuebot commented Nov 1, 2014

Original comment posted by wasserman.louis on 2012-08-01 at 04:16 PM


Issue #1087 has been merged into this issue.

@gissuebot
Copy link
Author

@gissuebot gissuebot commented Nov 1, 2014

Original comment posted by kevinb@google.com on 2012-08-03 at 04:33 PM


If you look closely at what your company needs and doesn't need, I strongly doubt that it divides cleanly along package boundaries.

Also, anyone who wants to maintain a sliced-up version of Guava is always welcome to do so; I think they'll find out it's a lot more painful than it's worth, but I'm not stopping them!

@gissuebot
Copy link
Author

@gissuebot gissuebot commented Nov 1, 2014

Original comment posted by cpovirk@google.com on 2013-03-11 at 03:34 PM


Issue #1329 has been merged into this issue.

@gissuebot
Copy link
Author

@gissuebot gissuebot commented Nov 1, 2014

Original comment posted by jjzazuet on 2013-03-11 at 08:02 PM


Hello everyone.

Well I did a small proof of concept and managed to create a set of atomic jars for Guava 14. Take a look.

https://github.com/jjzazuet/seeds-libraries

As far as I can tell, I am not experiencing any kind of pain after a 3 hour process. The code divided cleanly among package boundaries. A testament to a good design I guess :).

Before anyone rage jumps on me this is only the raw, unmodified, java source code of the core libraries.

Now, if I know Google, they won't be switching to Gradle as a build system any time soon. This approach works for me and my current project so I could in theory volunteer to maintain these atomic jars for Guava.

Question is, is anyone still interested in having these atomic jars at all in Maven Central?

So, oh mighty Google, art thou interested in such tribute? :P

Thanks again for your time and help.

@gissuebot
Copy link
Author

@gissuebot gissuebot commented Nov 1, 2014

Original comment posted by cpovirk@google.com on 2013-03-11 at 08:08 PM


As noted on issue 1329, we've painted ourselves into a corner by releasing the monolithic jar: Since any new, non-monolithic jars would be separate artifacts, Maven wouldn't know that guava-14.0 and guava-base-15.0 are incompatible with one another, and users who inadvertently mix the two are likely to see runtime failures. That keeps me from offering an endorsement for the Seeds project (aside from the name ;)), but certainly users can accept the risks if they'd like, perhaps if they're part of a small dependency graph.

(As you noted, our packages currently have a dependency graph with no loops. We've considered changing this, but we haven't gone through with it yet. You're safe for at least a while.)

@gissuebot
Copy link
Author

@gissuebot gissuebot commented Nov 1, 2014

Original comment posted by m...@thebishops.org on 2013-03-12 at 02:25 AM


I would think deployments will be able to sort out their mixed-version jar problem just fine. We have to do that anyways when considering what jars are in the dependency graph. It is not unusual to exclude old version of jars in transitive dependencies.

In other words, you aren't painted into any kind of corner. Guava can be decomposed into smaller, more consumable units. Thank goodness you don't have circular dependencies!

@gissuebot
Copy link
Author

@gissuebot gissuebot commented Nov 1, 2014

Original comment posted by cpovirk@google.com on 2013-03-12 at 02:49 AM


We do still hear from people who have both guava and the old google-collections on their classpath without realizing it. (We find out only because they post cryptic errors on StackOverflow and we recognize old versions of classes.) I wouldn't be surprised if we hear from someone with both guava and seeds-base somewhere down the line. The unfortunate thing in both cases is that tools can't identify the incompatibility for us (to the best of my knowledge -- if someone knows better, please enlighten me!). Once it's identified, certainly users can do as you suggest.

(No promises on the circular dependencies :) There would be some advantages to being able to use classes like ImmutableList and FluentIterable from common.base.)

@gissuebot
Copy link
Author

@gissuebot gissuebot commented Nov 1, 2014

Original comment posted by cgruber@google.com on 2013-03-12 at 02:54 AM


I'm looking into conflicting versions detection, Chris. I know of no specific metadata that could make explicit an implicit conflict between different artifacts, but I'm following up.

And that's the key issue, as Chris said... it's not like guava-13.jar and guava-14.jar deps existing - that's handled in the dependency graph analysis. It's that there is no signal that guava-base-14.0.jar and guava-14.0.jar are mutually exclusive (for the fraction of their class file contents that overlap) in the maven metadata.

So yes, it is definitely not a new problem - other teams have dealt with maven dependencies and care and feeding of their dependency graph, but making a change that doesn't force a built-time breakage so people are forced to see it and fix it is something that goes a bit against the Guava team's grain. It may be worthwhile in some cases, but it really has to be worth the risk of our customers pushing erroneous binary packages.

@gissuebot
Copy link
Author

@gissuebot gissuebot commented Nov 1, 2014

Original comment posted by jjzazuet on 2013-03-12 at 06:08 PM


@cgruber Crazy idea here. If the real deal breaker for end developers is the inability to signal incompatible artifacts inside the classpath (e.g. guava-14 vs guava-base-15) then why not change the package names to something like com.google.seeds.base (or something you prefer) so that the compiler raises errors for code using the monolithic Guava?

I mean, as an end user/developer I'd certainly grudge a bit for having to fix the compiler errors after upgrading to guava 15, but at least I'd know I have the option of choosing atomic packages.

In other words, that would introduce an API breaking change which would signal the start of atomic packaging.

Does that make sense?

Thanks for your time.

@gissuebot
Copy link
Author

@gissuebot gissuebot commented Nov 1, 2014

Original comment posted by ian.b.robertson on 2013-04-03 at 10:03 PM


One option would be to make guava-15 just the base classes (as opposed to putting those in a guava-base-15 artifact). Clients would need to know to include the other jars as they needed them, but we wouldn't be seeing duplicate class issues.

Another option would be to make an "empty" guava-15 jar which the other guava jars could depend on; the big drawback here of course is that while an empty jar is lightweight, it's not free.

Another route that might work would be to have the pom for guava-15 include a relocation section rerouting to guava-base; http://maven.apache.org/guides/mini/guide-relocation.html has more information on this option.

Finally, it's worth noting that the maven-enforcer-plugin does allow a check for duplicate classes. However, since most people won't be using this check, it doesn't help much from a support point of view.

@gissuebot
Copy link
Author

@gissuebot gissuebot commented Nov 1, 2014

Original comment posted by jjzazuet on 2013-07-15 at 03:33 PM


Hi guys.

In case anyone's still insterested, I took the plunge and uploaded atomic jars for release 14.0.1 at Maven Central. Here are the relevant links:

http://seeds.tribe7.net
http://search.maven.org/#search%7Cga%7C1%7Cseeds

I gave some thought to the potential classpath conflicts and in the end I decided to fork from Guava and rename the package structure. Hopefully this will not introduce issues but in any case, let me know.

It pretty much works for me at the moment and hopefully it will for someone else.

Anything else, let me know.

Thanks for your time and help!

@gissuebot
Copy link
Author

@gissuebot gissuebot commented Nov 1, 2014

Original comment posted by kak@google.com on 2013-11-28 at 01:15 PM


Issue #1594 has been merged into this issue.

@gissuebot
Copy link
Author

@gissuebot gissuebot commented Nov 1, 2014

Original comment posted by mauro...@tiscali.it on 2013-11-28 at 01:42 PM


I can't believe the main reason of not splitting up Guava into smaller JARs is because there's the risk to have a "guava-base-15" artifact not matching "guava-14" for version conflict resolutions!
There are other examples of libraries which were split over time (I can think of Spring, or Hibernate, for instance), developers are used to handle such cases. Also, some suggestions on how to fix these problems were already mentioned. The best for me would be to provide an empty "guava" POM (with no JARs) which depends on all the other sub-packages (I think it's the opposite of what was suggested in #18), so that if I require "guava" all the other artifacts are automatically included, otherwise I can just choose the ones I need.

Maintaining a splitting by ourselves using ProGuard is not a viable way if you consider that we have to manage a codebase of 200+ external dependencies... if we had to follow this advice for any library that we also need to deploy to the client (which might be using slow connections) we would die... especially as soon as we need to upgrade one or more of the split up libraries.

@gissuebot
Copy link
Author

@gissuebot gissuebot commented Nov 1, 2014

Original comment posted by davide.cavestro on 2013-11-28 at 03:15 PM


Let's suppose a lot of people split Guava themselves: you obtain a lot of projects that will tend to crystallize their guava dependency to an old custom-shrunk version, cause upgrading to a new one would potentially be a PITA. Is it really what you want?

Now suppose at a certain point I have the shrunk jar published into my custom maven repo with the original id (I use Gradle as a build system, but I guess the same stand for every build system including some dependency management features). The dependency manager would resolve Guava dependencies with the shrunk jar even for 3rd party dependencies that could potentially need additional classes: that would be a problem.
OTOH if I publish the shrunk jar with a different id (a custom group-artifact-version) then I could hit some duplicate class issues (the ones you want to avoid): that would also be a problem.

So there are no feasibility problems in using ProGuard, yGuard and so on, but in this case it would be perceived as a workaround cause IT REALLY IS a poor workaround.

I think this is an issue where the safer approach is not necessarily the best over the time.
Guava is a great library, and it is really a pity continuing to limit its adoption on my projects just because it lacks some packaging refinement.

So please consider reopening this issue.

@gissuebot
Copy link
Author

@gissuebot gissuebot commented Nov 1, 2014

Original comment posted by marcotrev on 2013-11-28 at 03:36 PM


+1 to comments #21 and #22.

Comment #19 talks about a fork born with the only purpose to better support developers in embedding the library, this should warn you about the correctness of your decision.

One more question: is Guava going to increase or decrease in size over time? ;-)

Sooner or later you'll have to split it, isn't it better doing it now?

Regards,

@gissuebot
Copy link
Author

@gissuebot gissuebot commented Nov 1, 2014

Original comment posted by cpovirk@google.com on 2013-12-02 at 10:00 PM


@mauromol@tiscali.it:
How much manual splitting is involved? How often does Proguard do the wrong thing when pointed at a project and told to include what is necessary? And is this something that needs to be done for each of the 200 dependencies? My understanding was that only one configuration and one Proguard run was required no matter how many dependencies there were.

@davide.cavestro:
I'm unclear on why a custom shrunk version of Guava would be put into a Maven repository. Isn't the idea that Proguard is run on the final project output, rather than on each of its input libraries?

@gissuebot
Copy link
Author

@gissuebot gissuebot commented Nov 1, 2014

Original comment posted by davide.cavestro on 2013-12-03 at 08:21 AM


I have no doubt that ProGuard is a great obfuscator/shrinker, but - as every piece of software - it has some known limitations ( see http://proguard.sourceforge.net/index.html#manual/limitations.html ) and possibly some bugs. Also - as every obfuscator - it brings to the build system some additional complications, such as defining entry points, potential issues related to reflection and so on.

So - when possible - I prefer using explicit dependencies declarations in order to maintain control over the 3rd party code our developers may depend on, hence reflecting the code availability directly within their IDE (instead of removing from project output at build time unneeded code that they see as available when coding). It's simpler and safer.
I think shrinking is great when you need to further reduce the size of properly packaged libraries (in that case it makes no sense splitting them up further or even asking someone to do so).
IMHO so far Guava is packaged as a monolith and could be packaged is a better fashion. Hence I'm trying to make you aware of these scenarios :-)

Thanks for your consideration (anything you decide)

@gissuebot
Copy link
Author

@gissuebot gissuebot commented Nov 1, 2014

Original comment posted by mauro...@tiscali.it on 2013-12-03 at 11:52 AM


@cpov...@google.com: I do not even take into consideration to apply Proguard to the whole codebase, as this consists of classes that are or may be called in a variety of ways (direct invocation, reflection, even remote class loading). When you have a complex project (and not a simple HelloWorld application), I do not think it's wise to force concepts like compile vs runtime dependencies just to apply workarounds to handle cases like this, it's too risky and hard to maintain.

This is what I wanted to say: it's not desirable to treat Guava as a special case, because there's no reason for which Guava is "better" than all the other 199 dependencies to justify such a special treatment. I still believe a better modularization for Guava would be desirable, especially if there are no strong reasons for not doing it.

@gissuebot
Copy link
Author

@gissuebot gissuebot commented Nov 1, 2014

Original comment posted by cpovirk@google.com on 2013-12-03 at 01:21 PM


OK, thanks. Most of the team's knowledge of Proguard is secondhand, so we hear some good things and some bad things, and we don't know how to weigh them against one another. Additionally, most of that knowledge is with Android, which I suspect is more likely to have a single entry point (and perhaps less reflection in general) than a typical app.

Hearing the feedback here, my personal main reservation to splitting Guava (well, on top of the possibility of conflict between guava-n and guava-base-m) is that the bulk of the code is located in c.g.c.collect. Any app that uses collect (which, I suspect, is what most apps use) is going to get most of Guava along with it. I did some math on this at one point, but it looks like I never posted it externally:

"Basically everyone is using something from collect, and collect pulls in base+math+primitives. That's about 9000 methods that everyone would be stuck with. Splitting out the remaining 3000 into a separate jar is potentially helpful for teams right on the edge [of Android limits], of course."

@gissuebot
Copy link
Author

@gissuebot gissuebot commented Nov 1, 2014

Original comment posted by davide.cavestro on 2013-12-03 at 04:07 PM


Some additional data on the issue.
Follows the weight (disk occupation of uncompressed class files) of guava 15.0 potential subpackages:

195K com/google/common/reflect
341K com/google/common/cache
465K com/google/common/util
35K com/google/common/eventbus
33K com/google/common/escape
287K com/google/common/base
222K com/google/common/io
6.1K com/google/common/annotations
123K com/google/common/hash
5.9K com/google/common/xml
145K com/google/common/primitives
5.3K com/google/common/html
110K com/google/common/net
2.7M com/google/common/collect
47K com/google/common/math
4.7M com/google/common/

I'm also attaching the composition graph obtained launching Stan4j on guava 15.0.
Each edge's weight reflects the dependency's strength, which in turn (on my understanding) tells how many times a certain package refers another one through imports, method calls and so on.
From that graph it seems the "collect" package depends only on "base".

So if the whole guava uncompressed weight is 4.7M, supposing "collect" weight is 2.7M and it depends only on "base" (287K), if we package them as collect.jar and base.jar then their cumulative weight would be ~3M.

@gissuebot
Copy link
Author

@gissuebot gissuebot commented Nov 1, 2014

Original comment posted by davide.cavestro on 2013-12-03 at 04:15 PM


ERRATA: sorry, on my last post I mixed wrong data and also left out the "math" and "primitives" packages, hence a client that depends on "collect" would really need 2.7M + 47K + 145K + 287K = ~3.2M (collect + math + primitives + base), saving ~1.5M

@gissuebot
Copy link
Author

@gissuebot gissuebot commented Nov 1, 2014

Original comment posted by jjzazuet on 2013-12-25 at 04:00 AM


Well guys. I've now published version 15.0 of my atomic Guava port, in case it's useful to anyone.

http://seeds.tribe7.net
http://search.maven.org/#search%7Cga%7C1%7Cseeds

Thank you and happy holidays ;)

@gissuebot
Copy link
Author

@gissuebot gissuebot commented Nov 1, 2014

Original comment posted by jjzazuet on 2014-03-23 at 09:57 PM


Well guys. I've now published version 16.0.1 of my atomic Guava port, in case it's useful to anyone.

Please, consider moving all String related functionality to a separate package 'common.base.strings' and also all functional base classes to a separate package as well. I think these are the two major fat sources for the base classes of Guava.

http://seeds.tribe7.net
http://search.maven.org/#search%7Cga%7C1%7Cseeds

Thank you :)

@gissuebot
Copy link
Author

@gissuebot gissuebot commented Nov 1, 2014

Original comment posted by m...@dr-lanka.de on 2014-06-18 at 07:48 AM


Thanks a lot for the guava split. I want to use Guava in my Android app and reached Dalvik's 64k-method bound.

I want to use the Guava Caches and ListenableFuture, but unfortunately the latter is part of seeds-util, that references math, primitives, base, function and strings.

This results in 13k method signatures versus 14k for the original Guava package.

Is there any way to further reduce it?

Any hint is highly appreciated... :)

@gissuebot
Copy link
Author

@gissuebot gissuebot commented Nov 1, 2014

Original comment posted by jjzazuet on 2014-06-19 at 03:50 PM


@32 I think it should be possible to shrink it even more. Last time I checked, there were at least two code packages which had only one shared class among them. I'll see if this is the case when I get to update my port to Guava 17 (I know I know, I'll hurry up). :P

Cheers!

@gissuebot
Copy link
Author

@gissuebot gissuebot commented Nov 1, 2014

Original comment posted by m...@dr-lanka.de on 2014-06-25 at 09:08 AM


@33 Is there anything I could help when you port it?

@gissuebot
Copy link
Author

@gissuebot gissuebot commented Nov 1, 2014

Original comment posted by kevinb@google.com on 2014-06-25 at 01:47 PM


Please don't use the Guava issue tracker as a support forum for this other unsanctioned project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant