Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse code

CC: added B::Stackobj::Aelem for arithmetic stack optimizations on ae…

…lemfast
  • Loading branch information...
commit edda0c5ca8cd8fd072e425977dd3a1f80d34857c 1 parent cc90753
Reini Urban authored October 08, 2012
3  lib/B/CC.pm
@@ -1720,7 +1720,8 @@ sub pp_aelemfast {
1720 1720
   my $ix   = $op->private;
1721 1721
   my $lval = $op->flags & OPf_MOD;
1722 1722
   if (!$rmg and !autovivification()) {
1723  
-      runtime("PUSHs(AvARRAY($av)[$ix]);\t/* no autovivification */");
  1723
+    push @stack, B::Stackobj::Aelem->new($av, $ix, $lval);
  1724
+    # runtime("PUSHs(AvARRAY($av)[$ix]);\t/* no autovivification */");
1724 1725
   } else {
1725 1726
     write_back_stack();
1726 1727
     runtime(
24  lib/B/Stackobj.pm
@@ -380,6 +380,30 @@ sub B::Stackobj::Bool::invalidate { }
380 380
 
381 381
 1;
382 382
 
  383
+#
  384
+# Stackobj::Aelem
  385
+#
  386
+
  387
+@B::Stackobj::Aelem::ISA = 'B::Stackobj';
  388
+
  389
+sub B::Stackobj::Aelem::new {
  390
+  my ( $class, $av, $ix, $lvalue ) = @_;
  391
+  # TODO: check flags: OPf_MOD, DEFER, SVs_RMG
  392
+  #       check no autovivification
  393
+  my $obj = bless {
  394
+    type  => T_UNKNOWN,
  395
+    flags => VALID_INT | VALID_DOUBLE | VALID_SV,
  396
+    iv    => "SvIV(AvARRAY($av)[$ix])",
  397
+    nv    => $lvalue ? "SvNVX(AvARRAY($av)[$ix])" : "SvNV(AvARRAY($av)[$ix])",
  398
+    sv    => "AvARRAY($av)[$ix]"
  399
+  }, $class;
  400
+  return $obj;
  401
+}
  402
+
  403
+sub B::Stackobj::Aelem::write_back { }
  404
+
  405
+sub B::Stackobj::Aelem::invalidate { }
  406
+
383 407
 __END__
384 408
 
385 409
 =head1 NAME
37  ramblings/blogs-optimizing-4.md
Source Rendered
@@ -103,9 +103,44 @@ would have used the faster equivalent `SvNV(PL_curpad[4]) = SvNV(sv);` put on th
103 103
 
104 104
 We can easily test this out by NOP'ing these code sections and see the costs.
105 105
 
106  
-With 4m53.073s, without 4m23.265s. 30 seconds or ~10% faster. This is now in the typical
  106
+With **4m53.073s**, without **4m23.265s**. 30 seconds or ~10% faster. This is now in the typical
107 107
 range of p5p micro-optimizations and not considered high-priority for now.
108 108
 
109 109
 Let's rather check out more stack optimizations.
110 110
 
  111
+I added a new [`B::Stackobj::Aelem`]() object to B::Stackobj to track aelemfast accesses
  112
+to array indices, and do the PUSH/POP optimizations on them.
  113
+
  114
+The generated code now looks like:
  115
+
  116
+      lab_116f270:
  117
+    	TAINT_NOT;
  118
+    	sp = PL_stack_base + cxstack[cxstack_ix].blk_oldsp;
  119
+    	FREETMPS;
  120
+    	rnv0 = d9_mag; lnv0 = SvNV(AvARRAY((AV*)PL_curpad[25])[1]);	/* multiply */
  121
+    	d3_mm2 = lnv0 * rnv0;
  122
+      lab_116be90:
  123
+    	TAINT_NOT;
  124
+    	sp = PL_stack_base + cxstack[cxstack_ix].blk_oldsp;
  125
+    	FREETMPS;
  126
+    	d5_dx = SvNV(PL_curpad[5]);
  127
+    	rnv0 = d3_mm2; lnv0 = d5_dx;	/* multiply */
  128
+    	d29_tmp = lnv0 * rnv0;
  129
+    	SvNVX(AvARRAY((AV*)PL_curpad[28])[0]) = SvNVX(AvARRAY((AV*)PL_curpad[28])[0]) - d29_tmp;
  130
+
  131
+Lvalue assignments need SvNVX, right-value can keep SvNV.
  132
+The multiply op for `PL_curpad[28])[0]` has the OPf_MOD flag since the first arg is modified.
  133
+nextstate with TAINT, FREETMPS and sp reset is still not optimized.
  134
+
  135
+Performance went from **4m53.073s** to **3m58.249s**, 55s or 18.7% faster. Much better than
  136
+with the nextstate optimizations. 30s less on top of this would be **3m30s**, still slower
  137
+than Erlang, Racket or C#. And my goal was 2m30s.
  138
+
  139
+But there's still a lot to optimize and adding the 'no
  140
+autovivification' check was also costly. Several dependant packages
  141
+were added, like autovivification, Tie::Hash::NamedCapture, mro,
  142
+Fcntl, IO, Exporter, Cwd, File::Spec, Config, FileHandle, IO::Handle,
  143
+IO::Seekable, IO::File, Symbol, Exporter::Heavy, ...
  144
+But you don't see this cost in the binary size, and neither in the run-time.
  145
+
111 146
 *TBC...*

0 notes on commit edda0c5

Please sign in to comment.
Something went wrong with that request. Please try again.