Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimized ranges (fix for issue #150) #173

Merged
merged 6 commits into from
Jan 9, 2022
Merged

Optimized ranges (fix for issue #150) #173

merged 6 commits into from
Jan 9, 2022

Conversation

glromeo
Copy link
Contributor

@glromeo glromeo commented Jan 1, 2022

@bcoe this PR focuses only on #159
please notice that I had to fix the shebang snapshot, I investigated the differences and it looks like before it was reporting 2 branches (wasn't it wrong?) while now there is only one
image

@glromeo
Copy link
Contributor Author

glromeo commented Jan 1, 2022

The other difference in the snapshots is the same I analized here
Also, running c8 tests I get the same results with this branch as with v8.1.0

@bcoe
Copy link
Member

bcoe commented Jan 1, 2022

Hey @glromeo, how are you testing against c8? I'm still seeing the same issue, I test via:

cd v8-to-istanbul
npm link .
cd c8
npm link v8-to-istanbul
./bin/c8.js --all=true --include=test/fixtures/all/ts-compiled/**/*.js --exclude="test/*.js" node ./test/fixtures/all/ts-compiled/main.js

On this branch you'll get:

loaded.ts      |   89.47 |       80 |     100 |   89.47 | 4-5               

On the main branch you'll get:

loaded.ts      |   73.68 |    66.66 |     100 |   73.68 | 4-5,16-18         

It seems like you're failing to find this range:

            {
              "startOffset": 351,
              "endOffset": 391,
              "count": 0
            }

Which is what includes the else {\n return 'wat?';\n }". I'm betting there's just an off by one error in the sliceRange logic.

What I would be tempted to do, is break out the tests for range.js into their own file, then we can test a bunch of edge cases, without having to recreate them in snapshots 😄

if (!lines.length) return {}

const start = originalPositionTryBoth(
sourceMap,
lines[0].line,
Math.max(0, startCol - lines[0].startCol)
)
if (!(start && start.source)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might not need to change this logic, if we figure out what's going on with the range detection logic.

Copy link
Contributor Author

@glromeo glromeo Jan 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rearranged those ifs to fail fast and avoid the costly originalEndPositionFor...but admittedly it's a nano-optimization...

Comment on lines 115 to 128
}
const lines = covSource.lines.filter(line => {
// Upstream tooling can provide a block with the functionName
// (empty-report), this will result in a report that has all
// lines zeroed out.
if (block.functionName === '(empty-report)') {

// (empty-report), this will result in a report that has all
// lines zeroed out.
if (block.functionName === '(empty-report)') {
this.all = true
covSource.lines.forEach(line => {
line.count = 0
this.all = true
return true
}
})
}

const lines = this.all ? covSource.lines : sliceRange(covSource.lines, startCol, endCol)
if (!lines.length) {
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would need to be something closer to:

        let lines
        if (block.functionName === '(empty-report)') {
          lines = covSource.lines.filter((line) => {
            line.count = 0
            this.all = true
            return true
          })
        } else {
          lines = sliceRange(covSource.lines, startCol, endCol)
        }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you...that is definitely better than what I had originally...
I just moved the assignment of this all out of the loop being careful to do
this.all = lines.length > 0
...sorry it's not premature optimisation it's OCD 😆

lib/range.js Outdated
* ...something resembling a binary search, to find the lowest line within the range.
* And then you could break as soon as the line is longer than the range...
*/
module.exports.sliceRange = (lines, startCol, endCol) => {
Copy link
Member

@bcoe bcoe Jan 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to write out a suggestion in pseudo code, I think you could do this:

Phase 1.

  1. Check lines[lines.length / 2]
  2. If line.startCol > endCol, look at bottom half of remaining lines. If line.startCol < endCol look at top half of remaining lines.
  3. Repeat step 2, until you find the first index where line.startCol > endCol, store this as upperIndex.

Phase 2.

const filteredLines = []
for (let i = upperIndex - 1; i >= 0; i--) {
  const line = lines[i]
  if (startCol <= line.endCol && endCol >= line.startCol) {
    filteredLines.unshift(line)
  } else if (line.startCol < startCol) {
    break;
  }
}

I think picking one of either the startCol or endCol for the search will work best, f I'm thinking of the problem properly, and it seems like startCol is the right choice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glromeo I'm not sure my mental model is right, so take what I say with a grain of salt ... but, I think once we've created the array of lines, for line 0 - N, I believe the start/end position should never overlap, and line[n].endPos should always be < line[n + 1].startPos.

Given this, I was thinking about the algorithm, a better approach than my first recommendation might be:

  • perform a binary search (starting at the half point, and going to the upper or lower half), until you find the index of the first line with endCol >= line.startCol -- store this as indexStart.

Then you can just do this:

const lines = []
for (let i = indexStart; i <= indexEnd; i++) {
  const line = lines[i]
  if (startCol < line.endCol && endCol >= line.startCol) {
    lines.push(line);
  } else {
    break;
  }
}

☝️ I think this should be functionally equivalent to the old approach, but perform way less comparisons.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glromeo I tested this optimization which tests half the proposed algorithm:

    const lines = []
    for (const line of this.lines) {
      if (startCol <= line.endCol && endCol >= line.startCol) {
        lines.push(line)
      } else if (line.startCol > endCol) {
        break
      }
    }

And it works like a charm in c8's test suite, the other half of the algorithm is starting at the appropriate offset of lines, rather than at 0.

@glromeo
Copy link
Contributor Author

glromeo commented Jan 2, 2022

@bcoe I must have messed up the symlinks jumping back and forth between node 16 & 11... I am able to reproduce the issue now! sorry about that 🤦

@glromeo
Copy link
Contributor Author

glromeo commented Jan 2, 2022

I went back to the drawing board and I realised that endCol on the lines is treated differently here

return startCol < line.endCol && endCol >= line.startCol

vs here
return startCol <= line.endCol && endCol >= line.startCol

Why should the transpiled lines be treated differently than the source ones?
The way the source (1st) lines' endCol is treated makes more sense to me and I think the comparison should be the same.
To reinforce my idea If I try line.startCol < endCol in both cases it's just ts-only/loaded coverage that shrinks to 17-18 instead of 16-18 and that's wrong, but if I try the opposite... hell breaks loose

@glromeo
Copy link
Contributor Author

glromeo commented Jan 3, 2022

ok, I see there's a history... I came up with 2 different sliceRange for now because I want to take more time to investigate how to find a unified solution that doesn't fall in the "off by one" issue and I wanted to keep the PR small

Both all of v8-to-istanbul tests are OK as well as c8's

@bcoe
Copy link
Member

bcoe commented Jan 3, 2022

@glromeo with regards to the two different ranges, I think it had to do with <=, >= overselecting when incrementing line counts, which as we've seen leads to false positives.

In the case of source maps, on the other hand, it worked better to make sure we caught as many lines as possible to then apply remapping to -- I'm a little fuzzy on this, but I think basically it was a lot of trial and error with a few common transpilers.

What I would do, potentially, would be just add an option like Inclusive = true|false, which just toggles your the logical check you perform (I think that could work?). Then you could potentially still use a similar approach of finding the first index to check, and running until the inclusive or exclusive check fails.

@glromeo
Copy link
Contributor Author

glromeo commented Jan 3, 2022

@bcoe it's ready for you to review/test it if you have some time today

require('tap').mochaGlobals()
require('should')

describe('range', () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thaniks for starting to flesh out this suite 👌

@bcoe
Copy link
Member

bcoe commented Jan 4, 2022

@glromeo thanks, I'll review soon. One thing I'd like to figur out how to test is how many fewer itertations we perform with the new algorithm, I will just link both the original and this branch against a large codebase, and incremement a counter.

@glromeo
Copy link
Contributor Author

glromeo commented Jan 5, 2022

The script of the issue goes from 9.84s (6.01s) down to 4.67s (0.84s)

I got

image

given
image

then I did

image

and it was

image

commenting out applyCoverage

image

I get a baseline of

image

@glromeo
Copy link
Contributor Author

glromeo commented Jan 5, 2022

Out of curiosity I tried to botch a double binary search to validate my hunch that it's better to scan for the end of the range.
With that change it goes from 4.67s to 4.25s (from 0.84s to 0.42s) which seemed to contradict me!
But then I had a glance at the bundle and I noticed big blocks of inline css and that reassured me to stick with the search and scan approach. 😉

Copy link
Member

@bcoe bcoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this contribution 👌 sounds like a really good performance improvement overall.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants