Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize EOL to CRLF for text-mode signatures. #1263

Merged
merged 4 commits into from
Aug 24, 2020

Conversation

ni4
Copy link
Contributor

@ni4 ni4 commented Aug 17, 2020

Previously we didn't distinguish text-mode document signatures from binary ones, so for non-canonical line endings (i.e. non-CRLF) validation failed.
This PR is aimed to fix this misbehavior.

Fixes #1226
Closes #1228

@ni4 ni4 force-pushed the ni4-1228-normalize-eol-for-text-sigs branch from a04edbd to 74e393c Compare August 18, 2020 11:05
@ni4 ni4 marked this pull request as ready for review August 18, 2020 13:38
Copy link
Contributor

@ronaldtse ronaldtse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @ni4 !

Copy link
Contributor

@rrrooommmaaa rrrooommmaaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my case, test_stream_signatures fails because src/tests/data/test_stream_signatures/source.txt received extra CR when cloning from github without git config --global core.autocrlf false.
Shouldn't we update test files in this PR so they no longer depend on github settings?

@ni4
Copy link
Contributor Author

ni4 commented Aug 19, 2020

Shouldn't we update test files in this PR so they no longer depend on github settings?

Actually, some signatures there are done in binary mode, so tests will actually fail if eol sequence is changed in source.txt
Text-mode signatures would (since this PR) work well once LF is changed to CRLF.

@rrrooommmaaa
Copy link
Contributor

@ni4 The question is -- the test will remain dependent on git settings?

@ni4
Copy link
Contributor Author

ni4 commented Aug 19, 2020

@rrrooommmaaa Please see the issue #1268

And, getting back to review, do you have any further comments/suggestions on this PR?

@rrrooommmaaa
Copy link
Contributor

@ni4 I expect that lastcr should also be handled in singed_src_finish in case it was the last character of the buffer? And then it should be replaced with CRLF? Is that what you meant by /* we support CR, LF and CRLF line endings */?

@ni4
Copy link
Contributor Author

ni4 commented Aug 19, 2020

I expect that lastcr should also be handled in singed_src_finish in case it was the last character of the buffer

@rrrooommmaaa Once CR character is reached, CRLF is written. I.e. the only purpose of this flag - to detect case where CR is the last character in buffer, and the possible following LF is sent in the next signed_src_update() call.

@rrrooommmaaa
Copy link
Contributor

Once CR character is reached, CRLF is written

got it now. Can you add a remark in the code describing the purpose of lastcr field, please?
The problem is that the name describes how it is set, and I have to understand how it is used from the code.
Also, why isn't variable en called end? Is there a standard for this?

@ni4 ni4 force-pushed the ni4-1228-normalize-eol-for-text-sigs branch from 74e393c to 78923d1 Compare August 20, 2020 10:06
@ni4
Copy link
Contributor Author

ni4 commented Aug 20, 2020

Can you add a remark in the code describing the purpose of lastcr field, please?
The problem is that the name describes how it is set, and I have to understand how it is used from the code.

Added comment and force-pushed.

Also, why isn't variable en called end? Is there a standard for this?

Just got used to this style, to use 2 chars for temporary variable name, where i,j,k are not applicable.
Maybe it is related to past Pascal experience, where end is a reserved word.

@dewyatt
Copy link
Contributor

dewyatt commented Aug 20, 2020

Also, why isn't variable en called end? Is there a standard for this?

Just got used to this style, to use 2 chars for temporary variable name, where i,j,k are not applicable.
Maybe it is related to past Pascal experience, where end is a reserved word.

I'd vote against this style personally. I like names to be as descriptive as possible, just not overly long.

@ni4
Copy link
Contributor Author

ni4 commented Aug 20, 2020

I'd vote against this style personally. I like names to be as descriptive as possible, just not overly long.

Ok, np. While it is clear how to rename en, what would be better to call ch, linebg? current_char, line_begin seems to be too long. This could also lead to clang-formatting single code line to 2-3 lines, making code less readable.

@dewyatt
Copy link
Contributor

dewyatt commented Aug 20, 2020

I'd vote against this style personally. I like names to be as descriptive as possible, just not overly long.

Ok, np. While it is clear how to rename en, what would be better to call ch, linebg? current_char, line_begin seems to be too long. This could also lead to clang-formatting single code line to 2-3 lines, making code less readable.

I would read ch as character and I believe that it's a well-known abbreviation so I don't see a problem there.
I would read bg as background. Typically beg would be used to abbreviate begin. So personally I might use linebeg, begline, line_beg, etc., those all seem pretty readable as a native.

I may be biased specifically on en because I don't see that used as an abbreviation for enable very often (usually only in firmware/embedded/kernel code).

EDIT: ena is a slightly clearer abbreviation for enable

@ni4 ni4 force-pushed the ni4-1228-normalize-eol-for-text-sigs branch from 78923d1 to 4342adf Compare August 20, 2020 11:36
@ni4
Copy link
Contributor Author

ni4 commented Aug 20, 2020

@dewyatt Thanks for the detailed explanation. Renamed/force-pushed.

@rrrooommmaaa
Copy link
Contributor

@ni4 As long as lastcr is used to keep the state (last character) between update calls, I think, it's worthwhile
creating edge case tests using callback input, so that we can emulate chunks ending with 'CR' followed by 'LF' or non-LF.
Do you agree?

@ni4
Copy link
Contributor Author

ni4 commented Aug 22, 2020

@rrrooommmaaa Please see the updated PR. Actually it is simpler then input_from_callback since we know that information for signed_src_update() is read by 32k chunks.

ch++;
}
uint8_t *linebeg = ch;
uint8_t *end = (uint8_t *) buf + len;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This piece of code will work in 99.9999% cases, I understand, still, it doesn't look clean, as in theory the buffer might be at the very end of virtual memory, so buf+len can be 0x00000 or an overflow exception can be thrown,
so ch<end and ch<end+1 may be false right away.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rrrooommmaaa Thanks for pointing at this. While it should not happen in real-life systems, this may be hit on low-memory embedded systems or so on. Added a commit.

Copy link
Contributor

@rrrooommmaaa rrrooommmaaa Aug 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest simply to replace ch < end and ch + 1 < end with !=
I know that < is generally safer than != but in this case it would be neater

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rrrooommmaaa That was the first idea, however then we would end up in need to check whether ch++ overflows, adding one more check on each byte processed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, merging then.

@ni4 ni4 force-pushed the ni4-1228-normalize-eol-for-text-sigs branch from b45c9df to 3fb08a9 Compare August 24, 2020 10:03
@antonsviridenko
Copy link
Contributor

EDIT: ena is a slightly clearer abbreviation for enable

I vote for "enbl" :)

ch++;
}
uint8_t *linebeg = ch;
uint8_t *end = (uint8_t *) buf + len;
Copy link
Contributor

@rrrooommmaaa rrrooommmaaa Aug 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest simply to replace ch < end and ch + 1 < end with !=
I know that < is generally safer than != but in this case it would be neater

@rrrooommmaaa rrrooommmaaa merged commit 5101979 into master Aug 24, 2020
@rrrooommmaaa rrrooommmaaa deleted the ni4-1228-normalize-eol-for-text-sigs branch August 24, 2020 12:42
@ni4 ni4 added this to the v0.14.0 milestone Jan 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants