New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frozen string optimizations from MRI #3491

Closed
headius opened this Issue Nov 23, 2015 · 4 comments

Comments

Projects
None yet
1 participant
@headius
Member

headius commented Nov 23, 2015

There's a number of places in MRI where they pre-freeze and deduplicate strings as part of their compilation process. This lists all such places we know of and whether they're implemented in JRuby.

  • Literal hash string keys are always deduped: {"foo" => 1}
  • Literal string keys to [] and []= calls are deduped by compiler but the dedup version is only used if the receiver is a Hash and those methods have not been patched. YARV instructions: opt_aref_with and opt_aset_with. (2d9ff09)
  • All literal strings and string parts (dstr, etc) are frozen in the AST and duplicated when a normal string is needed. This may reduce duplication in AST and IR. See YARV instructions: putstring, concatstrings. (8425569) (3d81301)
  • Literal strings followed by .freeze are treated as pre-frozen and always return the same object.
  • When pragma "frozen-string" is used in a file, all literal strings are pre-frozen.
  • when with literal string can use a frozen string. (ae62abc)
@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Nov 23, 2015

Member

Note that @kares had to exclude some MRI tests in 2577528 because I stopped deduping ALL string keys entering hashes. MRI implements h["str"] to use a frozen string if h is a hash.

Member

headius commented Nov 23, 2015

Note that @kares had to exclude some MRI tests in 2577528 because I stopped deduping ALL string keys entering hashes. MRI implements h["str"] to use a frozen string if h is a hash.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Nov 23, 2015

Member

@enebo @subbuss Not sure if we want to do these, nor am I sure of the best way.

The "all literal strings are frozen" approach seems valid and would reduce duplication of same-content bytelists in AST and IR. I'm not sure what approach we'd take. Perhaps String operand would cache a frozen string but its normal interpret would dup that String? Specialized uses like DStr logic could then go directly after the frozen string but behaviorally nothing else would change.

The [] and []= optimizations are fairly specialized but would eliminate construction of literal string keys to all hashes. For [] we'd need a new Call instr that's smart about calling against Hash. `[]=' is currently an AttrAssignInstr so we'd need a specialized version of that too.

These are pretty specific one-off optimizations but I could see them making a big different for string-key-heavy code.

Member

headius commented Nov 23, 2015

@enebo @subbuss Not sure if we want to do these, nor am I sure of the best way.

The "all literal strings are frozen" approach seems valid and would reduce duplication of same-content bytelists in AST and IR. I'm not sure what approach we'd take. Perhaps String operand would cache a frozen string but its normal interpret would dup that String? Specialized uses like DStr logic could then go directly after the frozen string but behaviorally nothing else would change.

The [] and []= optimizations are fairly specialized but would eliminate construction of literal string keys to all hashes. For [] we'd need a new Call instr that's smart about calling against Hash. `[]=' is currently an AttrAssignInstr so we'd need a specialized version of that too.

These are pretty specific one-off optimizations but I could see them making a big different for string-key-heavy code.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Nov 23, 2015

Member

It occurs to me now that freezing all literal strings in compiled Ruby would eliminate the need for FrozenString. All StringLiteral would be frozen/deduped internally with default operand behavior to dup that string. The logic of knowing whether to use the frozen version would then move outward to places where the logic actually happens, e.g. "str".freeze would need its own instr, like [] and []= above.

Here's a patch that just does the first part of this, always freezing StringLiteral internally and duping it rather than doing newStringShared:

diff --git a/core/src/main/java/org/jruby/RubyString.java b/core/src/main/java/org/jruby/RubyString.java
index 312685b..1278f64 100644
--- a/core/src/main/java/org/jruby/RubyString.java
+++ b/core/src/main/java/org/jruby/RubyString.java
@@ -783,7 +783,7 @@ public class RubyString extends RubyObject implements EncodingCapable, MarshalEn
         return strDup(runtime, getMetaClass().getRealClass());
     }

-    final RubyString strDup(Ruby runtime, RubyClass clazz) {
+    public final RubyString strDup(Ruby runtime, RubyClass clazz) {
         shareLevel = SHARE_LEVEL_BYTELIST;
         RubyString dup = new RubyString(runtime, clazz, value);
         dup.shareLevel = SHARE_LEVEL_BYTELIST;
diff --git a/core/src/main/java/org/jruby/ir/operands/StringLiteral.java b/core/src/main/java/org/jruby/ir/operands/StringLiteral.java
index 925b53a..305094d 100644
--- a/core/src/main/java/org/jruby/ir/operands/StringLiteral.java
+++ b/core/src/main/java/org/jruby/ir/operands/StringLiteral.java
@@ -1,5 +1,6 @@
 package org.jruby.ir.operands;

+import org.jruby.Ruby;
 import org.jruby.RubyString;
 import org.jruby.ir.IRVisitor;
 import org.jruby.ir.persistence.IRReaderDecoder;
@@ -31,6 +32,7 @@ public class StringLiteral extends Operand {
     final public ByteList bytelist;
     final public String   string;
     final public int      coderange;
+    private volatile RubyString frozenCache;

     public StringLiteral(ByteList val, int coderange) {
         this(internedStringFromByteList(val), val, coderange);
@@ -103,8 +105,20 @@ public class StringLiteral extends Operand {

     @Override
     public Object retrieve(ThreadContext context, IRubyObject self, StaticScope currScope, DynamicScope currDynScope, Object[] temp) {
-        // SSS FIXME: AST interpreter passes in a coderange argument.
-        return RubyString.newStringShared(context.runtime, bytelist, coderange);
+        Ruby runtime = context.runtime;
+        return getFrozenCache(this, runtime).strDup(runtime, runtime.getString());
+    }
+
+    private static RubyString getFrozenCache(StringLiteral stringLiteral, Ruby runtime) {
+        RubyString frozenCache = stringLiteral.frozenCache;
+        if (frozenCache == null) {
+            frozenCache = createAndDedupRubyString(stringLiteral, runtime);
+        }
+        return frozenCache;
+    }
+
+    private static RubyString createAndDedupRubyString(StringLiteral stringLiteral, Ruby runtime) {
+        return stringLiteral.frozenCache = runtime.freezeAndDedupString(RubyString.newStringShared(runtime, stringLiteral.bytelist, stringLiteral.coderange));
     }

     @Override
@@ -120,6 +134,10 @@ public class StringLiteral extends Operand {
         return string;
     }

+    public RubyString getFrozenCache(Ruby runtime) {
+        return getFrozenCache(this, runtime);
+    }
+
     @Override
     public void encode(IRWriterEncoder e) {
         super.encode(e);
Member

headius commented Nov 23, 2015

It occurs to me now that freezing all literal strings in compiled Ruby would eliminate the need for FrozenString. All StringLiteral would be frozen/deduped internally with default operand behavior to dup that string. The logic of knowing whether to use the frozen version would then move outward to places where the logic actually happens, e.g. "str".freeze would need its own instr, like [] and []= above.

Here's a patch that just does the first part of this, always freezing StringLiteral internally and duping it rather than doing newStringShared:

diff --git a/core/src/main/java/org/jruby/RubyString.java b/core/src/main/java/org/jruby/RubyString.java
index 312685b..1278f64 100644
--- a/core/src/main/java/org/jruby/RubyString.java
+++ b/core/src/main/java/org/jruby/RubyString.java
@@ -783,7 +783,7 @@ public class RubyString extends RubyObject implements EncodingCapable, MarshalEn
         return strDup(runtime, getMetaClass().getRealClass());
     }

-    final RubyString strDup(Ruby runtime, RubyClass clazz) {
+    public final RubyString strDup(Ruby runtime, RubyClass clazz) {
         shareLevel = SHARE_LEVEL_BYTELIST;
         RubyString dup = new RubyString(runtime, clazz, value);
         dup.shareLevel = SHARE_LEVEL_BYTELIST;
diff --git a/core/src/main/java/org/jruby/ir/operands/StringLiteral.java b/core/src/main/java/org/jruby/ir/operands/StringLiteral.java
index 925b53a..305094d 100644
--- a/core/src/main/java/org/jruby/ir/operands/StringLiteral.java
+++ b/core/src/main/java/org/jruby/ir/operands/StringLiteral.java
@@ -1,5 +1,6 @@
 package org.jruby.ir.operands;

+import org.jruby.Ruby;
 import org.jruby.RubyString;
 import org.jruby.ir.IRVisitor;
 import org.jruby.ir.persistence.IRReaderDecoder;
@@ -31,6 +32,7 @@ public class StringLiteral extends Operand {
     final public ByteList bytelist;
     final public String   string;
     final public int      coderange;
+    private volatile RubyString frozenCache;

     public StringLiteral(ByteList val, int coderange) {
         this(internedStringFromByteList(val), val, coderange);
@@ -103,8 +105,20 @@ public class StringLiteral extends Operand {

     @Override
     public Object retrieve(ThreadContext context, IRubyObject self, StaticScope currScope, DynamicScope currDynScope, Object[] temp) {
-        // SSS FIXME: AST interpreter passes in a coderange argument.
-        return RubyString.newStringShared(context.runtime, bytelist, coderange);
+        Ruby runtime = context.runtime;
+        return getFrozenCache(this, runtime).strDup(runtime, runtime.getString());
+    }
+
+    private static RubyString getFrozenCache(StringLiteral stringLiteral, Ruby runtime) {
+        RubyString frozenCache = stringLiteral.frozenCache;
+        if (frozenCache == null) {
+            frozenCache = createAndDedupRubyString(stringLiteral, runtime);
+        }
+        return frozenCache;
+    }
+
+    private static RubyString createAndDedupRubyString(StringLiteral stringLiteral, Ruby runtime) {
+        return stringLiteral.frozenCache = runtime.freezeAndDedupString(RubyString.newStringShared(runtime, stringLiteral.bytelist, stringLiteral.coderange));
     }

     @Override
@@ -120,6 +134,10 @@ public class StringLiteral extends Operand {
         return string;
     }

+    public RubyString getFrozenCache(Ruby runtime) {
+        return getFrozenCache(this, runtime);
+    }
+
     @Override
     public void encode(IRWriterEncoder e) {
         super.encode(e);

headius added a commit that referenced this issue Jan 26, 2016

Invert StringLitera/FrozenString relationship.
StringLiteral now aggregates a FrozenString, which is an
ImmutableLiteral caching its deduplicated result. StringLiteral's
logic is just to dup that string into a mutable form.

This is part of work on #3491.

headius added a commit that referenced this issue Jan 26, 2016

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Mar 3, 2016

Member

These are all implemented but we may want to test that they're being applied correctly. They could silently fail and we would not know it.

Member

headius commented Mar 3, 2016

These are all implemented but we may want to test that they're being applied correctly. They could silently fail and we would not know it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment