Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

patch_parse: populate line numbers while parsing diffs #4687

Merged
merged 1 commit into from
Jul 6, 2018

Conversation

tiennou
Copy link
Contributor

@tiennou tiennou commented Jun 18, 2018

This should fix #4672. Looks like providing the "parsed" line numbers in the line struct was never actually wired (!).


void test_diff_parse__lineinfo(void)
{
const char *text = PATCH_ORIGINAL_TO_CHANGE_MIDDLE;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#define PATCH_ORIGINAL_TO_CHANGE_MIDDLE \
	"diff --git a/file.txt b/file.txt\n" \
	"index 9432026..cd8fd12 100644\n" \
	"--- a/file.txt\n" \
	"+++ b/file.txt\n" \
	"@@ -3,7 +3,7 @@ this is some context!\n" \
	" around some lines\n" \
	" that will change\n" \
	" yes it is!\n" \
	"-(this line is changed)\n" \
	"+(THIS line is changed!)\n" \
	" and this\n" \
	" is additional context\n" \
	" below it!\n"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I don't know what to make of the "this is some context!\n" line. Right now it's completely skipped.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds correct. We should be surfacing that in the hunk header but not as a line in the diff.

@@ -607,6 +611,9 @@ static int parse_hunk_body(
line->content_len = ctx->parse_ctx.line_len - prefix;
line->content_offset = ctx->parse_ctx.content_len - ctx->parse_ctx.remain_len;
line->origin = origin;
line->num_lines = 1;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm suspicious of this : is there actually a way of building a text-based diff with embedded newlines 😧 ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's a thing. I don't know how this could ever be anything but 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the xdiff code also doesn't set it. The only setters are in diff_print.c.

@ethomson
Copy link
Member

Yeah, this is an omission. But note that in #4672, the problem is not with a parsed patch (since there's not a lot of API to produce those yet) but with a generated patch. So I probably broke what was there (or it never worked right, I'm not sure which).

So we should add some tests for both (or general tests).

cl_git_assert_lineinfo(-1, 6, 1, patch, 0, l++);
cl_git_assert_lineinfo(7, 7, 1, patch, 0, l++);
cl_git_assert_lineinfo(8, 8, 1, patch, 0, l++);
cl_git_assert_lineinfo(9, 9, 1, patch, 0, l++);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love this. I realize that you're aiming for brevity here but it's really hard for me to read and understand what this at a glance. I would much rather you unrolled these into simple tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it's not even complete… Here's an example of the expanded version. My one isn't as exhaustive (origin and offset are missing), but it tests each line, so it's about 5 lines of asserts * 9 lines of patch => 45 lines of test.

Would something like this be more acceptable ?

cl_git_pass(git_line_from_hunk_patch_diff(&line, patch, 0, l++));
cl_git_assert_lineinfo(line, origin, old, new, newlines, offset, "stuff\n");

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use an expanded version, but simply use an array :)

struct {
    size_t start;
    size_t end;
    size_t length; /* No idea what those fields really are, but you get what I mean */
} expected_lineinfos[] = {
    { 3, 3, 1 },
    { 4, 4, 1 },
    ...
};

for (i = 0; i < ARRAY_SIZE(expected_lineinfos); i++) {
    git_line_from_hunk_patch_diff(&line, patch, 0, i);
    cl_assert_equal_i(old_lineno, line->old_lineno);
    cl_assert_equal_i(new_lineno, line->new_lineno);
    cl_assert_equal_i(num_lines, line->num_lines);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Incoming bikeshedding.

I'm not against going the array route (I contemplated doing that). What makes me prefer the macro route is that I can preserve "intent" (eg. cl_git_assert_lineinfo vs a bunch of equal_i), as well as __LINE__ numbers, the latter helps to make the failure context relevant straight from CI (which is something I'm personally striving for when writing tests (hence my propension to add macros on top of helper functions). There are quite many tests that check line info around, and that, as pointed out above, more would be quite welcome (eg. I haven't found anything that test hunk headers context lines behavior), and thus my preference goes to a tight one-liner.

The array route doesn't allow for that preservation, and even though it looks cleaner, it's messier to preserve the line info with them. I have a feeling I have already wrote one of these array-ed+macro tests somewhere, which, given my constraints, was 😒.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm torn. I appreciate that you're trying to report line numbers, though it's not my highest priority. I'd rather have readable tests that convey intent than to know the line number. eg:

cl_assert_equal_i(old_lineno, line->old_lineno);
cl_assert_equal_i(new_lineno, line->new_lineno);
cl_assert_equal_i(num_lines, line->num_lines);

makes it incredibly clear what we're doing when I look at this. Now, I concede that regressions would be a bit challenging -- we would have an accurate line number, but that wouldn't necessarily correspond to the failing test data. We would know, for example, that the new_lineno test failed, but not what the data was that caused it to fail, since it could have been any data from the array.

This is a pattern we have in a lot of places and I don't think that we see a lot of regressions around them. So I'm inclined to just shrug and say it's okay. I definitely have done big refactorings where this was a bit annoying - for example, when refactoring the iterators - but it didn't scar me so much that I want to avoid it.

Is your motivation borne out of past experiences with this pattern? Or are you trying to future proof things? Again, my suggestion is that we go for simplicity in tests over strict failure-reporting accuracy here, but we can probably spend some time working to find a more readable compromise if you disagree.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would know, for example, that the new_lineno test failed, but not what the data was that caused it to fail, since it could have been any data from the array.

Exactly my point. Sometimes seeing the data (from CI, usually) makes it apparent at first glance what the failure is about.

Is your motivation borne out of past experiences with this pattern? Or are you trying to future proof things?

Depends on the pattern. I'm 😒 on the array pattern, because of the forced "trip to the debugger", and you'll usually have to do 2 attempts (since you don't know which array test index actually failed), but I can switch to that if that's the preference. The macro I'm more opinionated about, because it's DRY, and some things can get pervasive.

Some "prior" art :

  • cl_reflog_check_entry used 18 times in tests. For reference old_lineno gets tested 18 times, and the coverage isn't complete (some of this struct's members are not tested).
  • check_single_patch_stats, which does a high level count check.

Since I clearly don't mind writing crazy macro, does an integrated "pass an array of structs" approach to a one-shot test helper be acceptable, something like :

void test_patch_lines__correct_info(void)
{
    struct linetests {
        int origin, old, new; /* more */
        const char *content;
    } tests[] = {
        { CTX, 1, 1, "myline"},
        { CTX, 2, 2, "otherline"},
        { ADD, -1, 3, "newline"},
        { DEL, 3, -1, "oldline"},,
        { CTX, 4, 4, "bigline"},
    };

    cl_git_patch_test_lines_in_hunk(patch, hunk_id, &tests);
}

At least I might be able to report the failing test idx in there (as cl_reflog_check_entry does).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, in my opinion it doesn't help much to have a one-shot test helper that's only ever going to be used once. In that case, you can just as well inline it, as you'd be using line info anyway. I wanted to propose to just use cl_assert_equal_i_fmt, but it turns out fmt is not really a format but simply a string where the two ints should be printed into. Would've been nice to have a macro cl_assert_equal_i_fmt(tests[i].old, line->old_lineno, "old line numbers not matching: expected %i, got %i in line '%s'", tests[i].old, line->old_lineno, line->str). But yeah.

Dunno. I'll retreat from this discussion as I do not think it's important enough. Both styles have their advantages and disadvantages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a one-shot test helper that's only ever going to be used once.

Actually, the plan was to have one helper that checks line_info and use that everywhere. That would reinforce the current "generated patchs" test as well (that's what I was hinting at above, since some checks are missing).

As an aside, sometimes I feel limited by clar's architecture, and long for something like rspec behaviors… That would allow reusing complete test suites against different "initial conditions" (think bare repo vs non-bare repo worktrees).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I had the same feeling sometimes about clar. It' works reasonably well and is a breeze to use in most cases, but some kind of data/configuration/permutation driven interface would in some cases be really helpful. I mean you can easily set that up already, but you do loose information then like we see here in this example.

@tiennou
Copy link
Contributor Author

tiennou commented Jun 19, 2018

But note that in #4672, the problem is not with a parsed patch (since there's not a lot of API to produce those yet) but with a generated patch.

Sorry, semantics. "parsed" means "from some file", and generated means "comes from xdiff", mostly ?(as in, at some point we had access to both A & B).

I'm pretty sure the OP is working with the first kind of patch, because the 2nd kind is tested (tests/diff/patch.c:260).

So I probably broke what was there (or it never worked right, I'm not sure which).

Never hooked up.

@ethomson
Copy link
Member

I'm pretty sure the OP is working with the first kind of patch

Yes, you're absolutely right, thanks for the reminder.

Copy link
Member

@pks-t pks-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with these changes, but leave it to @ethomson, as he is a bit more invested in this PR

@ethomson
Copy link
Member

ethomson commented Jul 6, 2018

It's fine. I'd rather have the functionality than a test that I can reason about. If we ever have to revisit it, I'll either suffer or pull it apart.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

git_diff_line.{new,old}_lineno is incorrect
3 participants