-
-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimized ranges (fix for issue #150) #173
Conversation
|
Hey @glromeo, how are you testing against
On this branch you'll get:
On the main branch you'll get:
It seems like you're failing to find this range:
Which is what includes the What I would be tempted to do, is break out the tests for |
if (!lines.length) return {} | ||
|
||
const start = originalPositionTryBoth( | ||
sourceMap, | ||
lines[0].line, | ||
Math.max(0, startCol - lines[0].startCol) | ||
) | ||
if (!(start && start.source)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might not need to change this logic, if we figure out what's going on with the range detection logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rearranged those ifs to fail fast and avoid the costly originalEndPositionFor...but admittedly it's a nano-optimization...
lib/v8-to-istanbul.js
Outdated
} | ||
const lines = covSource.lines.filter(line => { | ||
// Upstream tooling can provide a block with the functionName | ||
// (empty-report), this will result in a report that has all | ||
// lines zeroed out. | ||
if (block.functionName === '(empty-report)') { | ||
|
||
// (empty-report), this will result in a report that has all | ||
// lines zeroed out. | ||
if (block.functionName === '(empty-report)') { | ||
this.all = true | ||
covSource.lines.forEach(line => { | ||
line.count = 0 | ||
this.all = true | ||
return true | ||
} | ||
}) | ||
} | ||
|
||
const lines = this.all ? covSource.lines : sliceRange(covSource.lines, startCol, endCol) | ||
if (!lines.length) { | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would need to be something closer to:
let lines
if (block.functionName === '(empty-report)') {
lines = covSource.lines.filter((line) => {
line.count = 0
this.all = true
return true
})
} else {
lines = sliceRange(covSource.lines, startCol, endCol)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you...that is definitely better than what I had originally...
I just moved the assignment of this all out of the loop being careful to do
this.all = lines.length > 0
...sorry it's not premature optimisation it's OCD 😆
lib/range.js
Outdated
* ...something resembling a binary search, to find the lowest line within the range. | ||
* And then you could break as soon as the line is longer than the range... | ||
*/ | ||
module.exports.sliceRange = (lines, startCol, endCol) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wanted to write out a suggestion in pseudo code, I think you could do this:
Phase 1.
- Check
lines[lines.length / 2]
- If
line.startCol
>endCol
, look at bottom half of remaining lines. Ifline.startCol < endCol
look at top half of remaining lines. - Repeat step
2
, until you find the first index whereline.startCol > endCol
, store this asupperIndex
.
Phase 2.
const filteredLines = []
for (let i = upperIndex - 1; i >= 0; i--) {
const line = lines[i]
if (startCol <= line.endCol && endCol >= line.startCol) {
filteredLines.unshift(line)
} else if (line.startCol < startCol) {
break;
}
}
I think picking one of either the startCol
or endCol
for the search will work best, f I'm thinking of the problem properly, and it seems like startCol
is the right choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@glromeo I'm not sure my mental model is right, so take what I say with a grain of salt ... but, I think once we've created the array of lines
, for line 0 - N, I believe the start/end position should never overlap, and line[n].endPos
should always be < line[n + 1].startPos
.
Given this, I was thinking about the algorithm, a better approach than my first recommendation might be:
- perform a binary search (starting at the half point, and going to the upper or lower half), until you find the index of the first
line
withendCol
>=line.startCol
-- store this asindexStart
.
Then you can just do this:
const lines = []
for (let i = indexStart; i <= indexEnd; i++) {
const line = lines[i]
if (startCol < line.endCol && endCol >= line.startCol) {
lines.push(line);
} else {
break;
}
}
☝️ I think this should be functionally equivalent to the old approach, but perform way less comparisons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@glromeo I tested this optimization which tests half the proposed algorithm:
const lines = []
for (const line of this.lines) {
if (startCol <= line.endCol && endCol >= line.startCol) {
lines.push(line)
} else if (line.startCol > endCol) {
break
}
}
And it works like a charm in c8
's test suite, the other half of the algorithm is starting at the appropriate offset of lines, rather than at 0
.
@bcoe I must have messed up the symlinks jumping back and forth between node 16 & 11... I am able to reproduce the issue now! sorry about that 🤦 |
I went back to the drawing board and I realised that v8-to-istanbul/lib/v8-to-istanbul.js Line 125 in 53c1cd8
vs here Line 79 in 53c1cd8
Why should the transpiled lines be treated differently than the source ones? The way the source (1st) lines' endCol is treated makes more sense to me and I think the comparison should be the same.To reinforce my idea If I try line.startCol < endCol in both cases it's just ts-only/loaded coverage that shrinks to 17-18 instead of 16-18 and that's wrong, but if I try the opposite... hell breaks loose
|
ok, I see there's a history... I came up with 2 different sliceRange for now because I want to take more time to investigate how to find a unified solution that doesn't fall in the "off by one" issue and I wanted to keep the PR small Both all of v8-to-istanbul tests are OK as well as c8's |
@glromeo with regards to the two different ranges, I think it had to do with In the case of source maps, on the other hand, it worked better to make sure we caught as many lines as possible to then apply remapping to -- I'm a little fuzzy on this, but I think basically it was a lot of trial and error with a few common transpilers. What I would do, potentially, would be just add an option like |
@bcoe it's ready for you to review/test it if you have some time today |
require('tap').mochaGlobals() | ||
require('should') | ||
|
||
describe('range', () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thaniks for starting to flesh out this suite 👌
@glromeo thanks, I'll review soon. One thing I'd like to figur out how to test is how many fewer itertations we perform with the new algorithm, I will just link both the original and this branch against a large codebase, and incremement a counter. |
Out of curiosity I tried to botch a double binary search to validate my hunch that it's better to scan for the end of the range. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this contribution 👌 sounds like a really good performance improvement overall.
@bcoe this PR focuses only on #159
please notice that I had to fix the shebang snapshot, I investigated the differences and it looks like before it was reporting 2 branches (wasn't it wrong?) while now there is only one