Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse files

CC: added B::Stackobj::Aelem for arithmetic stack optimizations on ae…

…lemfast
  • Loading branch information...
commit edda0c5ca8cd8fd072e425977dd3a1f80d34857c 1 parent cc90753
Reini Urban authored
Showing with 62 additions and 2 deletions.
  1. +2 −1  lib/B/CC.pm
  2. +24 −0 lib/B/Stackobj.pm
  3. +36 −1 ramblings/blogs-optimizing-4.md
3  lib/B/CC.pm
View
@@ -1720,7 +1720,8 @@ sub pp_aelemfast {
my $ix = $op->private;
my $lval = $op->flags & OPf_MOD;
if (!$rmg and !autovivification()) {
- runtime("PUSHs(AvARRAY($av)[$ix]);\t/* no autovivification */");
+ push @stack, B::Stackobj::Aelem->new($av, $ix, $lval);
+ # runtime("PUSHs(AvARRAY($av)[$ix]);\t/* no autovivification */");
} else {
write_back_stack();
runtime(
24 lib/B/Stackobj.pm
View
@@ -380,6 +380,30 @@ sub B::Stackobj::Bool::invalidate { }
1;
+#
+# Stackobj::Aelem
+#
+
+@B::Stackobj::Aelem::ISA = 'B::Stackobj';
+
+sub B::Stackobj::Aelem::new {
+ my ( $class, $av, $ix, $lvalue ) = @_;
+ # TODO: check flags: OPf_MOD, DEFER, SVs_RMG
+ # check no autovivification
+ my $obj = bless {
+ type => T_UNKNOWN,
+ flags => VALID_INT | VALID_DOUBLE | VALID_SV,
+ iv => "SvIV(AvARRAY($av)[$ix])",
+ nv => $lvalue ? "SvNVX(AvARRAY($av)[$ix])" : "SvNV(AvARRAY($av)[$ix])",
+ sv => "AvARRAY($av)[$ix]"
+ }, $class;
+ return $obj;
+}
+
+sub B::Stackobj::Aelem::write_back { }
+
+sub B::Stackobj::Aelem::invalidate { }
+
__END__
=head1 NAME
37 ramblings/blogs-optimizing-4.md
View
@@ -103,9 +103,44 @@ would have used the faster equivalent `SvNV(PL_curpad[4]) = SvNV(sv);` put on th
We can easily test this out by NOP'ing these code sections and see the costs.
-With 4m53.073s, without 4m23.265s. 30 seconds or ~10% faster. This is now in the typical
+With **4m53.073s**, without **4m23.265s**. 30 seconds or ~10% faster. This is now in the typical
range of p5p micro-optimizations and not considered high-priority for now.
Let's rather check out more stack optimizations.
+I added a new [`B::Stackobj::Aelem`]() object to B::Stackobj to track aelemfast accesses
+to array indices, and do the PUSH/POP optimizations on them.
+
+The generated code now looks like:
+
+ lab_116f270:
+ TAINT_NOT;
+ sp = PL_stack_base + cxstack[cxstack_ix].blk_oldsp;
+ FREETMPS;
+ rnv0 = d9_mag; lnv0 = SvNV(AvARRAY((AV*)PL_curpad[25])[1]); /* multiply */
+ d3_mm2 = lnv0 * rnv0;
+ lab_116be90:
+ TAINT_NOT;
+ sp = PL_stack_base + cxstack[cxstack_ix].blk_oldsp;
+ FREETMPS;
+ d5_dx = SvNV(PL_curpad[5]);
+ rnv0 = d3_mm2; lnv0 = d5_dx; /* multiply */
+ d29_tmp = lnv0 * rnv0;
+ SvNVX(AvARRAY((AV*)PL_curpad[28])[0]) = SvNVX(AvARRAY((AV*)PL_curpad[28])[0]) - d29_tmp;
+
+Lvalue assignments need SvNVX, right-value can keep SvNV.
+The multiply op for `PL_curpad[28])[0]` has the OPf_MOD flag since the first arg is modified.
+nextstate with TAINT, FREETMPS and sp reset is still not optimized.
+
+Performance went from **4m53.073s** to **3m58.249s**, 55s or 18.7% faster. Much better than
+with the nextstate optimizations. 30s less on top of this would be **3m30s**, still slower
+than Erlang, Racket or C#. And my goal was 2m30s.
+
+But there's still a lot to optimize and adding the 'no
+autovivification' check was also costly. Several dependant packages
+were added, like autovivification, Tie::Hash::NamedCapture, mro,
+Fcntl, IO, Exporter, Cwd, File::Spec, Config, FileHandle, IO::Handle,
+IO::Seekable, IO::File, Symbol, Exporter::Heavy, ...
+But you don't see this cost in the binary size, and neither in the run-time.
+
*TBC...*
Please sign in to comment.
Something went wrong with that request. Please try again.