Optimized ranges (fix for issue #150) #173

glromeo · 2022-01-01T02:13:29Z

@bcoe this PR focuses only on #159
please notice that I had to fix the shebang snapshot, I investigated the differences and it looks like before it was reporting 2 branches (wasn't it wrong?) while now there is only one

glromeo · 2022-01-01T02:23:05Z

~~The other difference in the snapshots is the same I analized here~~
Also, running c8 tests I get the same results with this branch as with v8.1.0

bcoe · 2022-01-01T17:37:23Z

Hey @glromeo, how are you testing against c8? I'm still seeing the same issue, I test via:

cd v8-to-istanbul
npm link .
cd c8
npm link v8-to-istanbul
./bin/c8.js --all=true --include=test/fixtures/all/ts-compiled/**/*.js --exclude="test/*.js" node ./test/fixtures/all/ts-compiled/main.js

On this branch you'll get:

loaded.ts      |   89.47 |       80 |     100 |   89.47 | 4-5

On the main branch you'll get:

loaded.ts      |   73.68 |    66.66 |     100 |   73.68 | 4-5,16-18

It seems like you're failing to find this range:

            {
              "startOffset": 351,
              "endOffset": 391,
              "count": 0
            }

Which is what includes the else {\n return 'wat?';\n }". I'm betting there's just an off by one error in the sliceRange logic.

What I would be tempted to do, is break out the tests for range.js into their own file, then we can test a bunch of edge cases, without having to recreate them in snapshots 😄

bcoe · 2022-01-01T17:38:06Z

lib/source.js

    if (!lines.length) return {}

    const start = originalPositionTryBoth(
      sourceMap,
      lines[0].line,
      Math.max(0, startCol - lines[0].startCol)
    )
+    if (!(start && start.source)) {


I think we might not need to change this logic, if we figure out what's going on with the range detection logic.

I rearranged those ifs to fail fast and avoid the costly originalEndPositionFor...but admittedly it's a nano-optimization...

bcoe · 2022-01-01T17:41:52Z

lib/v8-to-istanbul.js

        }
-        const lines = covSource.lines.filter(line => {
-          // Upstream tooling can provide a block with the functionName
-          // (empty-report), this will result in a report that has all
-          // lines zeroed out.
-          if (block.functionName === '(empty-report)') {
+
+        // (empty-report), this will result in a report that has all
+        // lines zeroed out.
+        if (block.functionName === '(empty-report)') {
+          this.all = true
+          covSource.lines.forEach(line => {
            line.count = 0
-            this.all = true
-            return true
-          }
+          })
+        }
+
+        const lines = this.all ? covSource.lines : sliceRange(covSource.lines, startCol, endCol)
+        if (!lines.length) {
+          return


I think this would need to be something closer to:

let lines if (block.functionName === '(empty-report)') { lines = covSource.lines.filter((line) => { line.count = 0 this.all = true return true }) } else { lines = sliceRange(covSource.lines, startCol, endCol) }

thank you...that is definitely better than what I had originally...
I just moved the assignment of this all out of the loop being careful to do
this.all = lines.length > 0
...sorry it's not premature optimisation it's OCD 😆

bcoe · 2022-01-01T18:02:33Z

lib/range.js

+ * ...something resembling a binary search, to find the lowest line within the range.
+ * And then you could break as soon as the line is longer than the range...
+ */
+module.exports.sliceRange = (lines, startCol, endCol) => {


Just wanted to write out a suggestion in pseudo code, I think you could do this:

Phase 1.

Check lines[lines.length / 2]

If line.startCol > endCol, look at bottom half of remaining lines. If line.startCol < endCol look at top half of remaining lines.

Repeat step 2, until you find the first index where line.startCol > endCol, store this as upperIndex.

Phase 2.

const filteredLines = [] for (let i = upperIndex - 1; i >= 0; i--) { const line = lines[i] if (startCol <= line.endCol && endCol >= line.startCol) { filteredLines.unshift(line) } else if (line.startCol < startCol) { break; } }

I think picking one of either the startCol or endCol for the search will work best, f I'm thinking of the problem properly, and it seems like startCol is the right choice.

@glromeo I'm not sure my mental model is right, so take what I say with a grain of salt ... but, I think once we've created the array of lines, for line 0 - N, I believe the start/end position should never overlap, and line[n].endPos should always be < line[n + 1].startPos.

Given this, I was thinking about the algorithm, a better approach than my first recommendation might be:

perform a binary search (starting at the half point, and going to the upper or lower half), until you find the index of the first line with endCol >= line.startCol -- store this as indexStart.

Then you can just do this:

const lines = [] for (let i = indexStart; i <= indexEnd; i++) { const line = lines[i] if (startCol < line.endCol && endCol >= line.startCol) { lines.push(line); } else { break; } }

☝️ I think this should be functionally equivalent to the old approach, but perform way less comparisons.

@glromeo I tested this optimization which tests half the proposed algorithm:

const lines = [] for (const line of this.lines) { if (startCol <= line.endCol && endCol >= line.startCol) { lines.push(line) } else if (line.startCol > endCol) { break } }

And it works like a charm in c8's test suite, the other half of the algorithm is starting at the appropriate offset of lines, rather than at 0.

glromeo · 2022-01-02T11:41:40Z

@bcoe I must have messed up the symlinks jumping back and forth between node 16 & 11... I am able to reproduce the issue now! sorry about that 🤦

glromeo · 2022-01-02T23:59:52Z

I went back to the drawing board and I realised that endCol on the lines is treated differently here

v8-to-istanbul/lib/v8-to-istanbul.js

Line 125 in 53c1cd8

return startCol < line.endCol && endCol >= line.startCol

vs here

v8-to-istanbul/lib/source.js

Line 79 in 53c1cd8

return startCol <= line.endCol && endCol >= line.startCol

Why should the transpiled lines be treated differently than the source ones?
The way the source (1st) lines' endCol is treated makes more sense to me and I think the comparison should be the same.
To reinforce my idea If I try line.startCol < endCol in both cases it's just ts-only/loaded coverage that shrinks to 17-18 instead of 16-18 and that's wrong, but if I try the opposite... hell breaks loose

glromeo · 2022-01-03T12:48:13Z

ok, I see there's a history... I came up with 2 different sliceRange for now because I want to take more time to investigate how to find a unified solution that doesn't fall in the "off by one" issue and I wanted to keep the PR small

Both all of v8-to-istanbul tests are OK as well as c8's

bcoe · 2022-01-03T15:42:45Z

@glromeo with regards to the two different ranges, I think it had to do with <=, >= overselecting when incrementing line counts, which as we've seen leads to false positives.

In the case of source maps, on the other hand, it worked better to make sure we caught as many lines as possible to then apply remapping to -- I'm a little fuzzy on this, but I think basically it was a lot of trial and error with a few common transpilers.

What I would do, potentially, would be just add an option like Inclusive = true|false, which just toggles your the logical check you perform (I think that could work?). Then you could potentially still use a similar approach of finding the first index to check, and running until the inclusive or exclusive check fails.

glromeo · 2022-01-03T20:50:07Z

@bcoe it's ready for you to review/test it if you have some time today

bcoe · 2022-01-04T15:49:28Z

test/range.js

+require('tap').mochaGlobals()
+require('should')
+
+describe('range', () => {


Thaniks for starting to flesh out this suite 👌

bcoe · 2022-01-04T15:50:37Z

@glromeo thanks, I'll review soon. One thing I'd like to figur out how to test is how many fewer itertations we perform with the new algorithm, I will just link both the original and this branch against a large codebase, and incremement a counter.

glromeo · 2022-01-05T02:45:25Z

The script of the issue goes from 9.84s (6.01s) down to 4.67s (0.84s)

I got

given

then I did

and it was

commenting out applyCoverage

I get a baseline of

glromeo · 2022-01-05T03:42:02Z

Out of curiosity I tried to botch a double binary search to validate my hunch that it's better to scan for the end of the range.
With that change it goes from 4.67s to 4.25s (from 0.84s to 0.42s) which seemed to contradict me!
But then I had a glance at the bundle and I noticed big blocks of inline css and that reassured me to stick with the search and scan approach. 😉

bcoe

Thanks for this contribution 👌 sounds like a really good performance improvement overall.

glromeo added 2 commits January 1, 2022 01:56

fix for issue #159

093374b

fixed shebang fixture incorrectly having 2 branches

a3c3926

glromeo mentioned this pull request Jan 1, 2022

Fixed startCol/endCol vs startOffset/endOffset mismatch due to greediness and whitespaces #172

Closed

removed trimRange (oops!)

825e320

bcoe reviewed Jan 1, 2022

View reviewed changes

Temporary solution with 2 variants of sliceRange

c10c8ac

Refactored into a single sliceRange with inclusive option

db4e653

improved line coverage

984557c

bcoe reviewed Jan 4, 2022

View reviewed changes

glromeo mentioned this pull request Jan 4, 2022

Null entries in sourcesContent #175

Closed

bcoe approved these changes Jan 9, 2022

View reviewed changes

bcoe merged commit 3f83226 into istanbuljs:master Jan 9, 2022

bcoe mentioned this pull request Mar 23, 2022

Recent changes to v8-to-istanbul cause issues for Node 10 bcoe/c8#385

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimized ranges (fix for issue #150) #173

Optimized ranges (fix for issue #150) #173

glromeo commented Jan 1, 2022 •

edited by bcoe

Loading

glromeo commented Jan 1, 2022 •

edited

Loading

bcoe commented Jan 1, 2022

bcoe Jan 1, 2022

glromeo Jan 3, 2022 •

edited

Loading

bcoe Jan 1, 2022

glromeo Jan 3, 2022

bcoe Jan 1, 2022 •

edited

Loading

bcoe Jan 2, 2022

bcoe Jan 2, 2022

glromeo commented Jan 2, 2022

glromeo commented Jan 2, 2022 •

edited

Loading

glromeo commented Jan 3, 2022 •

edited

Loading

bcoe commented Jan 3, 2022

glromeo commented Jan 3, 2022

bcoe Jan 4, 2022

bcoe commented Jan 4, 2022

glromeo commented Jan 5, 2022 •

edited

Loading

glromeo commented Jan 5, 2022

bcoe left a comment

Optimized ranges (fix for issue #150) #173

Optimized ranges (fix for issue #150) #173

Conversation

glromeo commented Jan 1, 2022 • edited by bcoe Loading

glromeo commented Jan 1, 2022 • edited Loading

bcoe commented Jan 1, 2022

bcoe Jan 1, 2022

Choose a reason for hiding this comment

glromeo Jan 3, 2022 • edited Loading

Choose a reason for hiding this comment

bcoe Jan 1, 2022

Choose a reason for hiding this comment

glromeo Jan 3, 2022

Choose a reason for hiding this comment

bcoe Jan 1, 2022 • edited Loading

Choose a reason for hiding this comment

Phase 1.

Phase 2.

bcoe Jan 2, 2022

Choose a reason for hiding this comment

bcoe Jan 2, 2022

Choose a reason for hiding this comment

glromeo commented Jan 2, 2022

glromeo commented Jan 2, 2022 • edited Loading

glromeo commented Jan 3, 2022 • edited Loading

bcoe commented Jan 3, 2022

glromeo commented Jan 3, 2022

bcoe Jan 4, 2022

Choose a reason for hiding this comment

bcoe commented Jan 4, 2022

glromeo commented Jan 5, 2022 • edited Loading

The script of the issue goes from 9.84s (6.01s) down to 4.67s (0.84s)

glromeo commented Jan 5, 2022

bcoe left a comment

Choose a reason for hiding this comment

glromeo commented Jan 1, 2022 •

edited by bcoe

Loading

glromeo commented Jan 1, 2022 •

edited

Loading

glromeo Jan 3, 2022 •

edited

Loading

bcoe Jan 1, 2022 •

edited

Loading

glromeo commented Jan 2, 2022 •

edited

Loading

glromeo commented Jan 3, 2022 •

edited

Loading

glromeo commented Jan 5, 2022 •

edited

Loading