Speed up set closure by using a fixed amount of memory. #45

ltratt · 2016-09-25T10:04:01Z

In a typical description of this algorithm -- and in the previous implementation
-- one has a todo set which contains pairs (prod_i, dot). Unfortunately this is
a slow way of doing things. Searching the set for the next item and removing it
is slow; and, since we don't know how many potential dots there are production,
the set is of potentially unbounded size, so we can end up resizing memory.
Since this function is the most expensive in the table generation, using a
HashSet (which is the "obvious" solution) is pretty slow.

However, we can reduce these costs through two observations:

The initial todo set is populated with (prod_i, dot) pairs that all come
from self.items.keys(). There's no point copying these into a todo list.
All subsequent todo items are of the form (prod_off, 0). Since the dot in
these cases is always 0, we don't need to store pairs: simply knowing which
prod_off's we need to look at is sufficient. We can represent these with a
fixed-size bitfield.
All we need to do is first iterate through the items in 1 and, when it's
exhausted, continually iterate over the bitfield from 2 until no new items have
been added.

On my machine, this speeds up the time needed to build the PHP LR table by 10%,
from 0.33s to 0.30s.

ptersilie · 2016-10-03T15:05:07Z

src/lib/stategraph.rs

+
+        // In a typical description of this algorithm, one would have a todo set which contains
+        // pairs (prod_i, dot). Unfortunately this is a slow way of doing things. Searching the set
+        // for the next item and removing it is slow; and, since we don't know how many potential


dots there are in a production

ltratt · 2016-10-03T16:10:34Z

If the new commit fixes things for you, let me know, I'll squash, then we'll be in a state to merge (hopefully).

ptersilie · 2016-10-04T15:18:09Z

Yep, looks good.

In a typical description of this algorithm -- and in the previous implementation -- one has a todo set which contains pairs (prod_i, dot). Unfortunately this is a slow way of doing things. Searching the set for the next item and removing it is slow; and, since we don't know how many potential dots there are production, the set is of potentially unbounded size, so we can end up resizing memory. Since this function is the most expensive in the table generation, using a HashSet (which is the "obvious" solution) is pretty slow. However, we can reduce these costs through two observations: 1) The initial todo set is populated with (prod_i, dot) pairs that all come from self.items.keys(). There's no point copying these into a todo list. 2) All subsequent todo items are of the form (prod_off, 0). Since the dot in these cases is always 0, we don't need to store pairs: simply knowing which prod_off's we need to look at is sufficient. We can represent these with a fixed-size bitfield. All we need to do is first iterate through the items in 1 and, when it's exhausted, continually iterate over the bitfield from 2 until no new items have been added. On my machine, this speeds up the time needed to build the PHP LR table by 10%, from 0.33s to 0.30s.

ltratt · 2016-10-04T15:19:54Z

OK, ready for merging (with luck).

ptersilie · 2016-10-04T15:46:00Z

Merged.

ltratt assigned ptersilie Sep 25, 2016

ptersilie reviewed Oct 3, 2016

View reviewed changes

ltratt force-pushed the faster_closing branch from cc1bf72 to 0184cc7 Compare October 4, 2016 15:19

ptersilie merged commit bb171e8 into master Oct 4, 2016

ptersilie deleted the faster_closing branch October 4, 2016 15:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up set closure by using a fixed amount of memory. #45

Speed up set closure by using a fixed amount of memory. #45

ltratt commented Sep 25, 2016

ptersilie Oct 3, 2016

ltratt commented Oct 3, 2016

ptersilie commented Oct 4, 2016

ltratt commented Oct 4, 2016

ptersilie commented Oct 4, 2016

Speed up set closure by using a fixed amount of memory. #45

Speed up set closure by using a fixed amount of memory. #45

Conversation

ltratt commented Sep 25, 2016

ptersilie Oct 3, 2016

Choose a reason for hiding this comment

ltratt commented Oct 3, 2016

ptersilie commented Oct 4, 2016

ltratt commented Oct 4, 2016

ptersilie commented Oct 4, 2016