patch_parse: populate line numbers while parsing diffs #4687

tiennou · 2018-06-18T20:45:51Z

This should fix #4672. Looks like providing the "parsed" line numbers in the line struct was never actually wired (!).

tiennou · 2018-06-18T20:46:27Z

tests/diff/parse.c

+
+void test_diff_parse__lineinfo(void)
+{
+	const char *text = PATCH_ORIGINAL_TO_CHANGE_MIDDLE;


#define PATCH_ORIGINAL_TO_CHANGE_MIDDLE \ "diff --git a/file.txt b/file.txt\n" \ "index 9432026..cd8fd12 100644\n" \ "--- a/file.txt\n" \ "+++ b/file.txt\n" \ "@@ -3,7 +3,7 @@ this is some context!\n" \ " around some lines\n" \ " that will change\n" \ " yes it is!\n" \ "-(this line is changed)\n" \ "+(THIS line is changed!)\n" \ " and this\n" \ " is additional context\n" \ " below it!\n"

✨

Also, I don't know what to make of the "this is some context!\n" line. Right now it's completely skipped.

That sounds correct. We should be surfacing that in the hunk header but not as a line in the diff.

tiennou · 2018-06-18T20:50:01Z

src/patch_parse.c

@@ -607,6 +611,9 @@ static int parse_hunk_body(
 		line->content_len = ctx->parse_ctx.line_len - prefix;
 		line->content_offset = ctx->parse_ctx.content_len - ctx->parse_ctx.remain_len;
 		line->origin = origin;
+		line->num_lines = 1;


I'm suspicious of this : is there actually a way of building a text-based diff with embedded newlines 😧 ?

I don't think that's a thing. I don't know how this could ever be anything but 1.

Looks like the xdiff code also doesn't set it. The only setters are in diff_print.c.

ethomson · 2018-06-19T08:17:32Z

Yeah, this is an omission. But note that in #4672, the problem is not with a parsed patch (since there's not a lot of API to produce those yet) but with a generated patch. So I probably broke what was there (or it never worked right, I'm not sure which).

So we should add some tests for both (or general tests).

ethomson · 2018-06-19T08:19:34Z

tests/diff/parse.c

+	cl_git_assert_lineinfo(-1, 6, 1, patch, 0, l++);
+	cl_git_assert_lineinfo(7, 7, 1, patch, 0, l++);
+	cl_git_assert_lineinfo(8, 8, 1, patch, 0, l++);
+	cl_git_assert_lineinfo(9, 9, 1, patch, 0, l++);


I don't love this. I realize that you're aiming for brevity here but it's really hard for me to read and understand what this at a glance. I would much rather you unrolled these into simple tests.

And it's not even complete… Here's an example of the expanded version. My one isn't as exhaustive (origin and offset are missing), but it tests each line, so it's about 5 lines of asserts * 9 lines of patch => 45 lines of test.

Would something like this be more acceptable ?

cl_git_pass(git_line_from_hunk_patch_diff(&line, patch, 0, l++)); cl_git_assert_lineinfo(line, origin, old, new, newlines, offset, "stuff\n");

Don't use an expanded version, but simply use an array :)

struct { size_t start; size_t end; size_t length; /* No idea what those fields really are, but you get what I mean */ } expected_lineinfos[] = { { 3, 3, 1 }, { 4, 4, 1 }, ... }; for (i = 0; i < ARRAY_SIZE(expected_lineinfos); i++) { git_line_from_hunk_patch_diff(&line, patch, 0, i); cl_assert_equal_i(old_lineno, line->old_lineno); cl_assert_equal_i(new_lineno, line->new_lineno); cl_assert_equal_i(num_lines, line->num_lines); }

⚠️ Incoming bikeshedding.

I'm not against going the array route (I contemplated doing that). What makes me prefer the macro route is that I can preserve "intent" (eg. cl_git_assert_lineinfo vs a bunch of equal_i), as well as __LINE__ numbers, the latter helps to make the failure context relevant straight from CI (which is something I'm personally striving for when writing tests (hence my propension to add macros on top of helper functions). There are quite many tests that check line info around, and that, as pointed out above, more would be quite welcome (eg. I haven't found anything that test hunk headers context lines behavior), and thus my preference goes to a tight one-liner.

The array route doesn't allow for that preservation, and even though it looks cleaner, it's messier to preserve the line info with them. I have a feeling I have already wrote one of these array-ed+macro tests somewhere, which, given my constraints, was 😒.

I'm torn. I appreciate that you're trying to report line numbers, though it's not my highest priority. I'd rather have readable tests that convey intent than to know the line number. eg:

cl_assert_equal_i(old_lineno, line->old_lineno); cl_assert_equal_i(new_lineno, line->new_lineno); cl_assert_equal_i(num_lines, line->num_lines);

makes it incredibly clear what we're doing when I look at this. Now, I concede that regressions would be a bit challenging -- we would have an accurate line number, but that wouldn't necessarily correspond to the failing test data. We would know, for example, that the new_lineno test failed, but not what the data was that caused it to fail, since it could have been any data from the array.

This is a pattern we have in a lot of places and I don't think that we see a lot of regressions around them. So I'm inclined to just shrug and say it's okay. I definitely have done big refactorings where this was a bit annoying - for example, when refactoring the iterators - but it didn't scar me so much that I want to avoid it.

Is your motivation borne out of past experiences with this pattern? Or are you trying to future proof things? Again, my suggestion is that we go for simplicity in tests over strict failure-reporting accuracy here, but we can probably spend some time working to find a more readable compromise if you disagree.

We would know, for example, that the new_lineno test failed, but not what the data was that caused it to fail, since it could have been any data from the array.

Exactly my point. Sometimes seeing the data (from CI, usually) makes it apparent at first glance what the failure is about.

Is your motivation borne out of past experiences with this pattern? Or are you trying to future proof things?

Depends on the pattern. I'm 😒 on the array pattern, because of the forced "trip to the debugger", and you'll usually have to do 2 attempts (since you don't know which array test index actually failed), but I can switch to that if that's the preference. The macro I'm more opinionated about, because it's DRY, and some things can get pervasive.

Some "prior" art :

cl_reflog_check_entry used 18 times in tests. For reference old_lineno gets tested 18 times, and the coverage isn't complete (some of this struct's members are not tested).

check_single_patch_stats, which does a high level count check.

Since I clearly don't mind writing crazy macro, does an integrated "pass an array of structs" approach to a one-shot test helper be acceptable, something like :

void test_patch_lines__correct_info(void) { struct linetests { int origin, old, new; /* more */ const char *content; } tests[] = { { CTX, 1, 1, "myline"}, { CTX, 2, 2, "otherline"}, { ADD, -1, 3, "newline"}, { DEL, 3, -1, "oldline"},, { CTX, 4, 4, "bigline"}, }; cl_git_patch_test_lines_in_hunk(patch, hunk_id, &tests); }

At least I might be able to report the failing test idx in there (as cl_reflog_check_entry does).

Well, in my opinion it doesn't help much to have a one-shot test helper that's only ever going to be used once. In that case, you can just as well inline it, as you'd be using line info anyway. I wanted to propose to just use cl_assert_equal_i_fmt, but it turns out fmt is not really a format but simply a string where the two ints should be printed into. Would've been nice to have a macro cl_assert_equal_i_fmt(tests[i].old, line->old_lineno, "old line numbers not matching: expected %i, got %i in line '%s'", tests[i].old, line->old_lineno, line->str). But yeah.

Dunno. I'll retreat from this discussion as I do not think it's important enough. Both styles have their advantages and disadvantages.

a one-shot test helper that's only ever going to be used once.

Actually, the plan was to have one helper that checks line_info and use that everywhere. That would reinforce the current "generated patchs" test as well (that's what I was hinting at above, since some checks are missing).

As an aside, sometimes I feel limited by clar's architecture, and long for something like rspec behaviors… That would allow reusing complete test suites against different "initial conditions" (think bare repo vs non-bare repo worktrees).

Yeah, I had the same feeling sometimes about clar. It' works reasonably well and is a breeze to use in most cases, but some kind of data/configuration/permutation driven interface would in some cases be really helpful. I mean you can easily set that up already, but you do loose information then like we see here in this example.

tiennou · 2018-06-19T19:32:36Z

But note that in #4672, the problem is not with a parsed patch (since there's not a lot of API to produce those yet) but with a generated patch.

Sorry, semantics. "parsed" means "from some file", and generated means "comes from xdiff", mostly ?(as in, at some point we had access to both A & B).

I'm pretty sure the OP is working with the first kind of patch, because the 2nd kind is tested (tests/diff/patch.c:260).

So I probably broke what was there (or it never worked right, I'm not sure which).

Never hooked up.

ethomson · 2018-06-22T09:27:13Z

I'm pretty sure the OP is working with the first kind of patch

Yes, you're absolutely right, thanks for the reminder.

pks-t

I'm fine with these changes, but leave it to @ethomson, as he is a bit more invested in this PR

ethomson · 2018-07-06T11:35:59Z

It's fine. I'd rather have the functionality than a test that I can reason about. If we ever have to revisit it, I'll either suffer or pull it apart.

tiennou commented Jun 18, 2018

View reviewed changes

patch_parse: populate line numbers while parsing diffs

f9e2802

tiennou force-pushed the fix/4672 branch from 54c6b5f to f9e2802 Compare June 18, 2018 22:13

ethomson reviewed Jun 19, 2018

View reviewed changes

pks-t approved these changes Jul 6, 2018

View reviewed changes

ethomson merged commit f463379 into libgit2:master Jul 6, 2018

pks-t added the backport-0.27.6 label Oct 11, 2018

tiennou deleted the fix/4672 branch November 1, 2018 03:56

snyk-bot mentioned this pull request Feb 23, 2020

[Snyk] Upgrade nodegit from 0.4.1 to 0.26.4 saurabharch/Breezeblocks#1

Open

snyk-bot mentioned this pull request Apr 22, 2020

[Snyk] Upgrade nodegit from 0.24.3 to 0.26.5 aminatakonate000/Graviton-App#4

Open

snyk-bot mentioned this pull request May 5, 2020

[Snyk] Upgrade nodegit from 0.24.3 to 0.26.5 Barnstorm-Online/ngp-openapi-generator#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

patch_parse: populate line numbers while parsing diffs #4687

patch_parse: populate line numbers while parsing diffs #4687

tiennou commented Jun 18, 2018

tiennou Jun 18, 2018

tiennou Jun 18, 2018

ethomson Jun 19, 2018

tiennou Jun 18, 2018

ethomson Jun 19, 2018

tiennou Jun 19, 2018

ethomson commented Jun 19, 2018

ethomson Jun 19, 2018

tiennou Jun 19, 2018

pks-t Jun 22, 2018

tiennou Jun 22, 2018

ethomson Jun 26, 2018

tiennou Jun 26, 2018

pks-t Jun 29, 2018

tiennou Jun 29, 2018

pks-t Jul 6, 2018

tiennou commented Jun 19, 2018

ethomson commented Jun 22, 2018

pks-t left a comment

ethomson commented Jul 6, 2018

patch_parse: populate line numbers while parsing diffs #4687

patch_parse: populate line numbers while parsing diffs #4687

Conversation

tiennou commented Jun 18, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ethomson commented Jun 19, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tiennou commented Jun 19, 2018

ethomson commented Jun 22, 2018

pks-t left a comment

Choose a reason for hiding this comment

ethomson commented Jul 6, 2018