Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use String.intern() for Annotation and Class scanning [SPR-14862] #19428

Closed
spring-projects-issues opened this issue Oct 31, 2016 · 12 comments
Closed

Comments

@spring-projects-issues
Copy link
Collaborator

@spring-projects-issues spring-projects-issues commented Oct 31, 2016

Renier Roth opened SPR-14862 and commented

Consider Using String.intern() on the Type scanned by the Visitors. These Strings are always identical but are duplicated in Memory cause of new String() call.

Class: org.springframework.asm.Type
Line 565 & 580

example:
{CODE:linenumbers=true}
/**

  • Returns the binary name of the class corresponding to this type. This

  • method must not be used on method types.

  • @return the binary name of the class corresponding to this type.
    */
    public String getClassName() {
    switch (sort) {
    case VOID:
    return "void";
    case BOOLEAN:
    return "boolean";
    case CHAR:
    return "char";
    case BYTE:
    return "byte";
    case SHORT:
    return "short";
    case INT:
    return "int";
    case FLOAT:
    return "float";
    case LONG:
    return "long";
    case DOUBLE:
    return "double";
    case ARRAY:
    StringBuilder sb = new StringBuilder(getElementType().getClassName());
    for (int i = getDimensions(); i > 0; --i) {
    sb.append("[]");
    }
    return sb.toString();
    case OBJECT:
    return new String(buf, off, len).replace('/', '.');
    default:
    return null;
    }
    }

    /**

    • Returns the internal name of the class corresponding to this object or
    • array type. The internal name of a class is its fully qualified name (as
    • returned by Class.getName(), where '.' are replaced by '/'. This method
    • should only be used for an object or array type.
    • @return the internal name of the class corresponding to this object type.
      */
      public String getInternalName() {
      return new String(buf, off, len);
      }

{CODE}

Changed to:
{CODE:linenumbers=true}
/**

  • Returns the binary name of the class corresponding to this type. This

  • method must not be used on method types.

  • @return the binary name of the class corresponding to this type.
    */
    public String getClassName() {
    switch (sort) {
    case VOID:
    return "void";
    case BOOLEAN:
    return "boolean";
    case CHAR:
    return "char";
    case BYTE:
    return "byte";
    case SHORT:
    return "short";
    case INT:
    return "int";
    case FLOAT:
    return "float";
    case LONG:
    return "long";
    case DOUBLE:
    return "double";
    case ARRAY:
    StringBuilder sb = new StringBuilder(getElementType().getClassName());
    for (int i = getDimensions(); i > 0; --i) {
    sb.append("[]");
    }
    return sb.toString();
    case OBJECT:
    return new String(buf, off, len).replace('/', '.').intern();
    default:
    return null;
    }
    }

    /**

    • Returns the internal name of the class corresponding to this object or
    • array type. The internal name of a class is its fully qualified name (as
    • returned by Class.getName(), where '.' are replaced by '/'. This method
    • should only be used for an object or array type.
    • @return the internal name of the class corresponding to this object type.
      */
      public String getInternalName() {
      return new String(buf, off, len).intern();
      }

{CODE}
Lines difference in 34 & 49

This is used by several visitors on Class/Annotation scanning. The names of these Classes are then cached, but uses a new String Reference in Memory.

By Using String.intern() we can avoid duplicated Strings.
Memory Consumption and count of duplicated Strings depends on How many Annotations you have in your managed Beans.


Affects: 4.3.3

Issue Links:

  • #19637 Upgrade to ASM 5.2
  • #19452 Use String.intern() for Class reading

Referenced from: commits d859826, 61d7d16

@spring-projects-issues
Copy link
Collaborator Author

@spring-projects-issues spring-projects-issues commented Oct 31, 2016

Juergen Hoeller commented

Good point! Revised in our fork of ASM for Spring Framework 4.3.4, due towards the end of this week.

@spring-projects-issues
Copy link
Collaborator Author

@spring-projects-issues spring-projects-issues commented Nov 11, 2016

Ivan Sopov commented

This change looks really suspicious since it breaks simple rule of thumb "never-ever user String.intern() method".

More information can be found on - https://youtu.be/YgGAUGC9ksk?t=1739 (point on video just about String.intern() method) and on https://shipilev.net/talks/joker-Oct2014-string-catechism.pdf (slides 48-59). Let me quote it:

Q: I will use String.intern just on this tiny little location.
A: Excellent, you already know where your bottlenecks are going to be.

It seems that it is the first usage of String.intern() in spring-framework. How about removing it and banning such usages in future?

@spring-projects-issues
Copy link
Collaborator Author

@spring-projects-issues spring-projects-issues commented Nov 11, 2016

Juergen Hoeller commented

String.intern() is not allowed to be used anywhere else in our codebase. It's just for this particular purpose, interning the parsed class names seems to make sense since we are holding on to them in our metadata representations... and since they are likely to be in the string table already, or to end up there once the classes are being loaded through regular means. So we're literally just using it for ASM-parsed class names from bytecode; our follow-up #19452 adheres to that rule as well.

@spring-projects-issues
Copy link
Collaborator Author

@spring-projects-issues spring-projects-issues commented Nov 11, 2016

Oleg Poleshuk commented

Hi, I'm pretty sure you have made your conclusions based on a solid research: performance analysis before/after, bytecode analysis, tried different JVM versions.
Unfortunately this issue does not have a link to your research. Could you be so kind and add a link?

@spring-projects-issues
Copy link
Collaborator Author

@spring-projects-issues spring-projects-issues commented Nov 11, 2016

Juergen Hoeller commented

Oleg Poleshuk, in all fairness, that was a rather passive-aggressive comment that I don't consider justified here... neither to myself nor to the original reporter.

I'll nevertheless take the bait: Potential negative effects of String interning are indeed JVM-dependent. I am not concerned about increased perm-gen consumption since we're just interning class names which end up on perm-gen in any case. Based on educated armchair reasoning, I don't see an issue here. Are we really concerned about interning the names of a few classes which might end up not getting loaded eventually, despite being in the scanned packages of the deployment unit?

You do have a point that this is a tradeoff, so I'm happy to learn about any specific effects you are concerned about and reconsider the change for 4.3.5 accordingly. Let me turn the need for a proof around: Have you done solid research about the negative effects of String interning in the ASM class name parsing of Spring Framework 4.3.4, to rephrase your own words? Feel free to give 4.3.5.BUILD-SNAPSHOT a spin if you'd like to include the effects of the more recent #19452 as well.

Renier Roth, as the original reporter, could you elaborate on your specific motivation for String interning here? In particular if I'm not representing it accurately in my comment above?

@spring-projects-issues
Copy link
Collaborator Author

@spring-projects-issues spring-projects-issues commented Nov 11, 2016

Juergen Hoeller commented

With respect to the performance of the String.intern operation, that is generally a valid concern, of course. However, in our particular case, we're talking about classpath scanning... a very early step in the bootstrap phase. The String pool is not expected to be large at that point. That aside, the rather expensive I/O cost of classpath scanning will easily outweight any CPU cycles spent on String comparisons for interning. Based on that perspective, I see the need for a proof of a measurable negative performance effect if you really care. I'm happy to take that into account for a revision of the code in 4.3.5 as well.

Generally, if we only did such fine-tuning based on "solid research" along the lines above, we'd never get around to any fine-tuning at all. There has to be some pragmatism in this, along with the willingness to reconsider if negative effects show up. This is particularly the case for concurrency fine-tuning of which we had a lot in recent years: mostly based on assumptions, always a tradeoff, never ideal for anybody, essentially just about finding a fine-tuned compromise that is good enough for the mainstream case. The only way of getting there is releasing it and then iteratively interacting with our stakeholders.

@spring-projects-issues
Copy link
Collaborator Author

@spring-projects-issues spring-projects-issues commented Nov 11, 2016

Renier Roth commented

Hi,

It is like Juergen wrote. I see it like him.

I did research the fakt, that class names where hold multible times in memory as Strings.
Thats why i created this issue.

Done via JProfiler and duplicate String search on a heap dump. I do not have made Screenshots, but everyone can do it on a heap dump and search for duplicate class names.

It is also, the case (described in the issues) that every annotation you do on a managed bean (parsed by this scanner) are held in Memory as a String.

So if you have 1000 managed Beans and each have an Annotation "Tag" inside the package "com.somreallylongname.models.somthing.else.another" you have the String "com.somreallylongname.models.somthing.else.another.Tag" inside your Heap Memory 1000 times. And this is only one annotation.

All Classnames where held in Memory for managed Beans even the method return types.
I am glad that Juergen included this in Spring ASM code.
This reduces Memory consumption in our project for duplicated in memory CLASSNAMES and TYPES as String representations.

I checked after using Spring 4.3.4 again with the JProfile duplicate String feature, on a Heap dump. The classes as String are no longer duplicated anymore. Thats why i created the other ticket where some other Strings poped up, caused by ClassReader.

@spring-projects-issues
Copy link
Collaborator Author

@spring-projects-issues spring-projects-issues commented Nov 11, 2016

Oleg Poleshuk commented

Thanks for quick answers, I was surprised to get a response at all, usually it can take weeks to get a response in huge projects like Spring, I appreciate your responses.

The performance argument regarding "happens only during bootstrap" is only partially valid, unfortunately.
The intern table affects garbage collection https://twitter.com/shipilev/status/797050034248421376

If we apply the argument regarding CPU ("that aside, the rather expensive I/O cost of classpath scanning will easily outweight any CPU cycles spent on String comparisons for interning") to a memory consumption, I would say that memory effect from using intern() is almost invisible in comparison to the whole memory consumption by most enterprise Spring projects (JPA, Hibernate, REST, Tomcat consume a lot of memory).

String deduplication is available since Java 8 update 20 https://blog.codecentric.de/en/2014/08/string-deduplication-new-feature-java-8-update-20-2/
Enabling it for the whole application will have much greater memory saving effect with no penalties, making intern() useless.

@spring-projects-issues
Copy link
Collaborator Author

@spring-projects-issues spring-projects-issues commented Nov 11, 2016

Juergen Hoeller commented

Good point about the impact on garbage collection, and thanks for the specific pointers!

Given that we're interning class names only, I would not expect the String table to grow, at least not significantly... since it'll contain the class names eventually anyway. As a consequence, there shouldn't be noteworthy impact on GC either. If there turns out to be a measurable difference in GC, I'm happy to reconsider. Without that, I'm inclined to leave the arrangement as-is: It has a positive impact for the original reporter's scenario, at least, and no proven negative impact in other scenarios yet.

String deduplication is indeed a nice recent JVM feature, and I expect Compact Strings in JDK 9 to have quite an impact as well.

Worth a note: The author of the String deduplication article wrote about String interning as well, highlighting the tradeoff but clearly not arguing that it should never be used at all: https://blog.codecentric.de/en/2012/03/save-memory-by-using-string-intern-in-java/ ... From my perspective, it should never be used in common application code. Even in our framework case, we are using it for a very specific kind of String only, certainly not recommending it for general use across the codebase.

@spring-projects-issues
Copy link
Collaborator Author

@spring-projects-issues spring-projects-issues commented Nov 11, 2016

Renier Roth commented

As written above:
"I am not concerned about increased perm-gen consumption since we're just interning class names which end up on perm-gen in any case"
The garbage collector size will most likely not increase, cause this is only intern for Classnames and Types. No negative effect on GC

I would say that every byte you can save make a difference I could even messurte it. Saying that other Frameworks are worse in memory mangement, makes this tuning not invalid.

And yes I would not use intern() on every code line, cause of the negative effects discribed, but as commented before its during classpath scanning and the Strings are Classnames and Types that are most likely inside the StringTable or should be.

String duplicattion feature from Java8 does not conflict with this tuning and its for older JVMs as well.

@spring-projects-issues
Copy link
Collaborator Author

@spring-projects-issues spring-projects-issues commented Nov 11, 2016

Ivan Sopov commented

Just one more note: these two intern() calls place two different kinds of class-names to this table - with "." and "/" separators. It seems to me that only one of them is placed there by ClassLoaders.
And in any case - ClassLoader using intern() seems like implementation-specific detail, that may change. (Since interned strings are really for native code JVM may track Strings that are referenced only from native code and remove them when they are not needed - e.g. class unloading).

@spring-projects-issues
Copy link
Collaborator Author

@spring-projects-issues spring-projects-issues commented Nov 11, 2016

Juergen Hoeller commented

Good point: So effectively, we'll have two representations of every class name in the String pool then. Both variants might end up there in any case (since the slash variant corresponds to the internal resource name that the JVM tracks per class). Admittedly, this is making assumptions about the JVM's default interning... but then again, the JVM and the JDK standard libraries rather aggressively intern String literals and in particular reflection artifact names (see the java.lang.Class implementation itself).

FWIW, checking other common libraries, there is a lot of String.intern() going on out there: e.g. in AspectJ, Groovy, JRuby, Joda-Time, Jackson, Hibernate, EclipseLink, FreeMarker, Woodstox, Xerces, the JAXB RI (the latter two included in the JDK)... In most cases, they rather aggressively intern attribute names and other metadata. What we are doing is clearly not out of the ordinary. One could even argue that we are aligning with the JDK's own libaries and Javaassist etc in terms of core metadata interning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants