Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapt patch behaviour for LF/CRLF compatibility #3349

Merged
merged 1 commit into from
Jun 15, 2018

Conversation

dra27
Copy link
Member

@dra27 dra27 commented May 10, 2018

A rather more controversial part of #3260!

This relies on twothree bits of #3346, but the actual content of this PR still needs work/discussion anyhow.

Further comment to follow below.

Copy link
Member

@AltGr AltGr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, thanks, that probably wasn't very fun. It makes me start wondering, though, wether at this stage we wouldn't be better off reimplementing patch rather than translating the patch file ? Would that actually be much more work ?

I'd also like to discuss the rewriting for calculating hashes. Maybe there is no other solution, but this somehow feels wrong... Is that how openssl does it ?

let probably_binary name =
List.mem (Filename.extension name) [".zip"; ".tar"; ".gz"; ".tgz"; ".tbz"; ".txz"; ".tlz"]
in
if OpamStd.Option.map_default ((=) primary) true target || probably_binary file then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do like the Std.Option combinators, but target = None || target = Some primary would probably be easier to read here ^^.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh all right, then 🙂

@dra27 dra27 force-pushed the crlf-nonsense branch 2 times, most recently from 13b98af to 8332bf8 Compare June 10, 2018 12:29
@dra27
Copy link
Member Author

dra27 commented Jun 10, 2018

Finally, some notes on this!

The first issue is the hashing. openssl has nothing to solve - the files are fundamentally different here because (usually) of Git. Internet protocols typically use CRLF (email, http, etc.), so this doesn't come up since the standards already declare the data to be hashed as being supposed to have CRLF line endings.

I have to confess that it just feels tediously necessary, rather than wrong to me. All that is potentially happening is that we identify a simple and, at least I hope, non-exploitable transformation of the file which may have occurred. The alternative would be to force (presumably) Unix line endings, but this seems highly unsatisfactory to me, especially as the one likely to get pinged about the confusing bug reports 😉

@dra27
Copy link
Member Author

dra27 commented Jun 10, 2018

I guess we could re-implement patch, yes. There would be another benefit, in that I've certainly hit a problem with the patch command on macOS not being able to cope with a patch+rename operation (which Git will happily emit).

My only concern is how easy it is to replicate the same kind of fuzzy patching as standard patch, but in fairness I think that that's just a matter of seeing whether the patch applies near to the line numbers given when it doesn't apply exactly - I think the context still has to match exactly.

@dra27
Copy link
Member Author

dra27 commented Jun 10, 2018

Impact for each of these:

  • The hashing change I think should definitely be in 2.0.0 as it means that the whole of 2.x will behave the same way - it could be really annoying (i.e. result in more bug reports) if opam 2.0 Unix users have systems which can't cope with the odd CRLF submitted file in opam-repository, etc.
  • The patch change has the same argument - it's just annoying if those patches generated against one kind of repo can't be applied to another.

@rjbou
Copy link
Collaborator

rjbou commented Jun 13, 2018

Thanks for the work!
I'm wondering if it is the good choice to add it in 2.0.0 as there is still some work to be done: most TODO are for warning, but there is also TODO that are assert false that could happen.
As git can handle CRLF (with config option core.autocrlf), we can for the moment have a local config using it in 2.0.0, and add a further handling in future version.

@dra27
Copy link
Member Author

dra27 commented Jun 13, 2018

@rjbou - no, Git can't handle this, that's the point. You still don't know what's actually checked into the repo (see myriad guides on how to convert repos to use core.autocrlf which weren't using it correctly in the first place - the OCaml change was a nightmare)

@dra27
Copy link
Member Author

dra27 commented Jun 13, 2018

I'm extremely happy to polish this patch off properly for 2.0.0, it just needs the further discussion as to whether it's headed in the correct direction.

@dra27
Copy link
Member Author

dra27 commented Jun 13, 2018

Incidentally, for context diffs, one thought I had had was to eliminate the processing - i.e. just emit context diffs as they are, possibly with a warning. Surely no one is actually using them these days for real patches? (that would deal with the assert false branches)

|> OpamStd.Option.map_default f 1
in
`Processing (`Unified, orig, target, crlf, `Chunk (neg, pos))
with _ ->
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to keep global exception handling, especially C-c management, try [...] with _ -> [..] should never be used. See OpamStd.Exn.fatal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, indeed - I feel the need for a compiler warning to help lazy programmers like myself with this one!

@dra27 dra27 changed the title Adapt hash and patch behaviour for LF/CRLF compatibility Adapt patch behaviour for LF/CRLF compatibility Jun 14, 2018
@dra27
Copy link
Member Author

dra27 commented Jun 14, 2018

OK, this needs some more testing (principally that I must rebase #3260 to use this version instead of the old one), but this revised version:

  • Removes various TODO items (including deleting the temporary patch file 😊)
  • Adds a missing case of ensuring CR is stripped if the target file uses LF-endings
  • Skips Context diffs with a warning - the remaining assert falses are genuinely unreachable cases, therefore.

There is a note about dealing with renaming files. This is something Git certainly does but, as noted in ocaml/opam-repository#11207 (comment), this is not portable, so its absence here is not (I think) critical - it should be the case that the patch will pass through unaltered since the code would incorrectly identify it as a new file.

@AltGr
Copy link
Member

AltGr commented Jun 14, 2018

Thanks!
I guess was @rjbou omitted to say about using git's core.autocrlf input is that it would solve client problems, on Windows, if we enforce that the repository itself is purely in Unix format, which is an option and should already be 99% the case. I am not completely sure even that holds, though. It also assumes that patch files are supplied from the repository, while in practice they could come from the package source itself (although I doubt that is used in practice ?)

About the patches, since we already restricted patches to apply with -p 1 only on opam 2 (without too much trouble, just a couple had to be fixed) — I guess it would be OK to restrict the patch format some more. I'd presonnally be fine as long as we handle the git format-patch format, but it'd be easy to check what we actually have in the repository.

And about the hashing function, does that mean that hashing a whole file is now equivalent to hashing the string as obtained through open_in, as opposed as open_in_bin ?

of operation - in the Lines mode, it returns a string list of the lines, in
CrLf mode, it returns true if every line ends "\r\n". *)
type _ read_return = Lines : string list read_return
| CrLf : bool read_return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just define two functions ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where's the fun in that 🙂 More seriously, they're very similar, but not quite identical functions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd argue that you could factorise the core in read_lines_aux, and define the two functions with just the last lines. But well, doesn't really matter ^^.

sufficient documented detail of Context diffs to be able to parse them
without resorting to reverse-engineering code. It is unusual to see them
these days, so for now opam just emits a warning if a Context diff file is
encountered and does no processing to it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, maybe just add a line to the doc where it explains the patch format (and that they must apply with -p 1) that they should be unified diffs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the sentence in the docs

The aim here is that patch files should never emit "stripping CR
warnings", but CR endings will be left/added as necessary if the target
file requires them.
@dra27
Copy link
Member Author

dra27 commented Jun 15, 2018

Docs updated and I also factored out the chunk processing code to process_chunk_header at @rjbou's suggestion (contrary to my previous doubt, I had clearly just been lazy - the handling is the same in both cases!)

@dra27
Copy link
Member Author

dra27 commented Jun 15, 2018

@AltGr - indeed, the problem is that you don't know what the patch files themselves may be. While we can enforce that opam-repository uses Unix line-endings, you can't in general enforce that, even by core.autocrlf - .gitattributes can override it, and if the files are already in the repo with \r\n endings, then neither core.autocrlf or .gitattributes will automatically change that (in fact, they cause problems as it means that the files will always be detected as having changed, even on normal checkout)

@dra27
Copy link
Member Author

dra27 commented Jun 15, 2018

For the hashing, sort of - it means that it will hash the file both as if it had been read with open_in_bin (old behaviour) and also as though it had been opened on Windows with open_in (where "\r\n" gets reduced to "\n") - but note that the "\r\n" translation is being done on all OSes here - normally, of course, open_in_bin and open_in are identical on Unix.

@rjbou rjbou merged commit c774b9f into ocaml:master Jun 15, 2018
@rjbou
Copy link
Collaborator

rjbou commented Jun 15, 2018

Thanks!

@dra27 dra27 mentioned this pull request Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants