Skip to content
This repository has been archived by the owner on Aug 13, 2018. It is now read-only.

Speed up set closure by using a fixed amount of memory. #45

Merged
merged 1 commit into from Oct 4, 2016

Conversation

ltratt
Copy link
Member

@ltratt ltratt commented Sep 25, 2016

In a typical description of this algorithm -- and in the previous implementation
-- one has a todo set which contains pairs (prod_i, dot). Unfortunately this is
a slow way of doing things. Searching the set for the next item and removing it
is slow; and, since we don't know how many potential dots there are production,
the set is of potentially unbounded size, so we can end up resizing memory.
Since this function is the most expensive in the table generation, using a
HashSet (which is the "obvious" solution) is pretty slow.

However, we can reduce these costs through two observations:

  1. The initial todo set is populated with (prod_i, dot) pairs that all come
    from self.items.keys(). There's no point copying these into a todo list.
  2. All subsequent todo items are of the form (prod_off, 0). Since the dot in
    these cases is always 0, we don't need to store pairs: simply knowing which
    prod_off's we need to look at is sufficient. We can represent these with a
    fixed-size bitfield.
    All we need to do is first iterate through the items in 1 and, when it's
    exhausted, continually iterate over the bitfield from 2 until no new items have
    been added.

On my machine, this speeds up the time needed to build the PHP LR table by 10%,
from 0.33s to 0.30s.


// In a typical description of this algorithm, one would have a todo set which contains
// pairs (prod_i, dot). Unfortunately this is a slow way of doing things. Searching the set
// for the next item and removing it is slow; and, since we don't know how many potential
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dots there are in a production

@ltratt
Copy link
Member Author

ltratt commented Oct 3, 2016

If the new commit fixes things for you, let me know, I'll squash, then we'll be in a state to merge (hopefully).

@ptersilie
Copy link
Member

Yep, looks good.

In a typical description of this algorithm -- and in the previous implementation
-- one has a todo set which contains pairs (prod_i, dot). Unfortunately this is
a slow way of doing things. Searching the set for the next item and removing it
is slow; and, since we don't know how many potential dots there are production,
the set is of potentially unbounded size, so we can end up resizing memory.
Since this function is the most expensive in the table generation, using a
HashSet (which is the "obvious" solution) is pretty slow.

However, we can reduce these costs through two observations:
  1) The initial todo set is populated with (prod_i, dot) pairs that all come
     from self.items.keys(). There's no point copying these into a todo list.
  2) All subsequent todo items are of the form (prod_off, 0). Since the dot in
     these cases is always 0, we don't need to store pairs: simply knowing which
     prod_off's we need to look at is sufficient. We can represent these with a
     fixed-size bitfield.
All we need to do is first iterate through the items in 1 and, when it's
exhausted, continually iterate over the bitfield from 2 until no new items have
been added.

On my machine, this speeds up the time needed to build the PHP LR table by 10%,
from 0.33s to 0.30s.
@ltratt
Copy link
Member Author

ltratt commented Oct 4, 2016

OK, ready for merging (with luck).

@ptersilie ptersilie merged commit bb171e8 into master Oct 4, 2016
@ptersilie ptersilie deleted the faster_closing branch October 4, 2016 15:45
@ptersilie
Copy link
Member

Merged.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants