Adapt patch behaviour for LF/CRLF compatibility #3349

dra27 · 2018-05-10T13:32:02Z

A rather more controversial part of #3260!

~~This relies on ~~two~~three bits of #3346, but the actual content of this PR still needs work/discussion anyhow.~~

Further comment ~~to follow~~ below.

AltGr

Wow, thanks, that probably wasn't very fun. It makes me start wondering, though, wether at this stage we wouldn't be better off reimplementing patch rather than translating the patch file ? Would that actually be much more work ?

I'd also like to discuss the rewriting for calculating hashes. Maybe there is no other solution, but this somehow feels wrong... Is that how openssl does it ?

AltGr · 2018-05-14T12:03:02Z

src/core/opamHash.ml

+  let probably_binary name =
+    List.mem (Filename.extension name) [".zip"; ".tar"; ".gz"; ".tgz"; ".tbz"; ".txz"; ".tlz"]
+  in
+  if OpamStd.Option.map_default ((=) primary) true target || probably_binary file then


I do like the Std.Option combinators, but target = None || target = Some primary would probably be easier to read here ^^.

Oh all right, then 🙂

dra27 · 2018-06-10T12:38:29Z

Finally, some notes on this!

The first issue is the hashing. openssl has nothing to solve - the files are fundamentally different here because (usually) of Git. Internet protocols typically use CRLF (email, http, etc.), so this doesn't come up since the standards already declare the data to be hashed as being supposed to have CRLF line endings.

I have to confess that it just feels tediously necessary, rather than wrong to me. All that is potentially happening is that we identify a simple and, at least I hope, non-exploitable transformation of the file which may have occurred. The alternative would be to force (presumably) Unix line endings, but this seems highly unsatisfactory to me, especially as the one likely to get pinged about the confusing bug reports 😉

dra27 · 2018-06-10T12:43:00Z

I guess we could re-implement patch, yes. There would be another benefit, in that I've certainly hit a problem with the patch command on macOS not being able to cope with a patch+rename operation (which Git will happily emit).

My only concern is how easy it is to replicate the same kind of fuzzy patching as standard patch, but in fairness I think that that's just a matter of seeing whether the patch applies near to the line numbers given when it doesn't apply exactly - I think the context still has to match exactly.

dra27 · 2018-06-10T12:45:35Z

Impact for each of these:

The hashing change I think should definitely be in 2.0.0 as it means that the whole of 2.x will behave the same way - it could be really annoying (i.e. result in more bug reports) if opam 2.0 Unix users have systems which can't cope with the odd CRLF submitted file in opam-repository, etc.
The patch change has the same argument - it's just annoying if those patches generated against one kind of repo can't be applied to another.

rjbou · 2018-06-13T10:43:27Z

Thanks for the work!
I'm wondering if it is the good choice to add it in 2.0.0 as there is still some work to be done: most TODO are for warning, but there is also TODO that are assert false that could happen.
As git can handle CRLF (with config option core.autocrlf), we can for the moment have a local config using it in 2.0.0, and add a further handling in future version.

dra27 · 2018-06-13T10:55:59Z

@rjbou - no, Git can't handle this, that's the point. You still don't know what's actually checked into the repo (see myriad guides on how to convert repos to use core.autocrlf which weren't using it correctly in the first place - the OCaml change was a nightmare)

dra27 · 2018-06-13T10:56:23Z

I'm extremely happy to polish this patch off properly for 2.0.0, it just needs the further discussion as to whether it's headed in the correct direction.

dra27 · 2018-06-13T10:57:12Z

Incidentally, for context diffs, one thought I had had was to eliminate the processing - i.e. just emit context diffs as they are, possibly with a warning. Surely no one is actually using them these days for real patches? (that would deal with the assert false branches)

rjbou · 2018-06-13T10:14:46Z

src/core/opamSystem.ml

+                              |> OpamStd.Option.map_default f 1
+                    in
+                    `Processing (`Unified, orig, target, crlf, `Chunk (neg, pos))
+                  with _ ->


In order to keep global exception handling, especially C-c management, try [...] with _ -> [..] should never be used. See OpamStd.Exn.fatal.

Yes, indeed - I feel the need for a compiler warning to help lazy programmers like myself with this one!

dra27 · 2018-06-14T09:33:56Z

OK, this needs some more testing (principally that I must rebase #3260 to use this version instead of the old one), but this revised version:

Removes various TODO items (including deleting the temporary patch file 😊)
Adds a missing case of ensuring CR is stripped if the target file uses LF-endings
Skips Context diffs with a warning - the remaining assert falses are genuinely unreachable cases, therefore.

There is a note about dealing with renaming files. This is something Git certainly does but, as noted in ocaml/opam-repository#11207 (comment), this is not portable, so its absence here is not (I think) critical - it should be the case that the patch will pass through unaltered since the code would incorrectly identify it as a new file.

AltGr · 2018-06-14T12:40:34Z

Thanks!
I guess was @rjbou omitted to say about using git's core.autocrlf input is that it would solve client problems, on Windows, if we enforce that the repository itself is purely in Unix format, which is an option and should already be 99% the case. I am not completely sure even that holds, though. It also assumes that patch files are supplied from the repository, while in practice they could come from the package source itself (although I doubt that is used in practice ?)

About the patches, since we already restricted patches to apply with -p 1 only on opam 2 (without too much trouble, just a couple had to be fixed) — I guess it would be OK to restrict the patch format some more. I'd presonnally be fine as long as we handle the git format-patch format, but it'd be easy to check what we actually have in the repository.

And about the hashing function, does that mean that hashing a whole file is now equivalent to hashing the string as obtained through open_in, as opposed as open_in_bin ?

AltGr · 2018-06-14T12:45:20Z

src/core/opamSystem.ml

+   of operation - in the Lines mode, it returns a string list of the lines, in
+   CrLf mode, it returns true if every line ends "\r\n". *)
+type _ read_return = Lines : string list read_return
+                   | CrLf  : bool read_return


why not just define two functions ?

Where's the fun in that 🙂 More seriously, they're very similar, but not quite identical functions

i'd argue that you could factorise the core in read_lines_aux, and define the two functions with just the last lines. But well, doesn't really matter ^^.

AltGr · 2018-06-14T12:47:22Z

src/core/opamSystem.ml

+     sufficient documented detail of Context diffs to be able to parse them
+     without resorting to reverse-engineering code. It is unusual to see them
+     these days, so for now opam just emits a warning if a Context diff file is
+     encountered and does no processing to it.


Agreed, maybe just add a line to the doc where it explains the patch format (and that they must apply with -p 1) that they should be unified diffs.

I updated the sentence in the docs

The aim here is that patch files should never emit "stripping CR warnings", but CR endings will be left/added as necessary if the target file requires them.

dra27 · 2018-06-15T10:03:41Z

Docs updated and I also factored out the chunk processing code to process_chunk_header at @rjbou's suggestion (contrary to my previous doubt, I had clearly just been lazy - the handling is the same in both cases!)

dra27 · 2018-06-15T10:05:20Z

@AltGr - indeed, the problem is that you don't know what the patch files themselves may be. While we can enforce that opam-repository uses Unix line-endings, you can't in general enforce that, even by core.autocrlf - .gitattributes can override it, and if the files are already in the repo with \r\n endings, then neither core.autocrlf or .gitattributes will automatically change that (in fact, they cause problems as it means that the files will always be detected as having changed, even on normal checkout)

dra27 · 2018-06-15T10:07:12Z

For the hashing, sort of - it means that it will hash the file both as if it had been read with open_in_bin (old behaviour) and also as though it had been opened on Windows with open_in (where "\r\n" gets reduced to "\n") - but note that the "\r\n" translation is being done on all OSes here - normally, of course, open_in_bin and open_in are identical on Unix.

rjbou · 2018-06-15T14:21:14Z

Thanks!

dra27 force-pushed the crlf-nonsense branch from c676d92 to 9722968 Compare May 10, 2018 14:26

AltGr reviewed May 14, 2018

View reviewed changes

dra27 force-pushed the crlf-nonsense branch 2 times, most recently from 13b98af to 8332bf8 Compare June 10, 2018 12:29

rjbou reviewed Jun 13, 2018

View reviewed changes

dra27 mentioned this pull request Jun 14, 2018

Make hash function LF/CRLF-agnostic #3407

Merged

dra27 force-pushed the crlf-nonsense branch from 8332bf8 to c4e86c8 Compare June 14, 2018 09:27

dra27 changed the title ~~Adapt hash and patch behaviour for LF/CRLF compatibility~~ Adapt patch behaviour for LF/CRLF compatibility Jun 14, 2018

dra27 force-pushed the crlf-nonsense branch from c4e86c8 to 3f9cdf7 Compare June 14, 2018 11:17

AltGr approved these changes Jun 14, 2018

View reviewed changes

Preprocess patch files to patch CRLF correctly

9ec2458

The aim here is that patch files should never emit "stripping CR warnings", but CR endings will be left/added as necessary if the target file requires them.

dra27 force-pushed the crlf-nonsense branch from 3f9cdf7 to 9ec2458 Compare June 15, 2018 10:00

rjbou approved these changes Jun 15, 2018

View reviewed changes

rjbou merged commit c774b9f into ocaml:master Jun 15, 2018

dra27 mentioned this pull request Jun 17, 2018

LF/CRLF-agnostic hashing take 2 #3414

Closed

dra27 deleted the crlf-nonsense branch June 20, 2019 09:15

dra27 mentioned this pull request Jun 30, 2019

Solaris 10 patch command doesn't get file to patch #2160

Closed

dra27 mentioned this pull request Jun 13, 2024

Does OPAM work on windows? #246

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adapt patch behaviour for LF/CRLF compatibility #3349

Adapt patch behaviour for LF/CRLF compatibility #3349

dra27 commented May 10, 2018 •

edited

Loading

AltGr left a comment

AltGr May 14, 2018

dra27 Jun 10, 2018

dra27 commented Jun 10, 2018

dra27 commented Jun 10, 2018

dra27 commented Jun 10, 2018

rjbou commented Jun 13, 2018

dra27 commented Jun 13, 2018

dra27 commented Jun 13, 2018

dra27 commented Jun 13, 2018 •

edited

Loading

rjbou Jun 13, 2018

dra27 Jun 14, 2018

dra27 commented Jun 14, 2018

AltGr commented Jun 14, 2018

AltGr Jun 14, 2018

dra27 Jun 14, 2018

AltGr Jun 15, 2018

AltGr Jun 14, 2018

dra27 Jun 15, 2018

dra27 commented Jun 15, 2018

dra27 commented Jun 15, 2018

dra27 commented Jun 15, 2018

rjbou commented Jun 15, 2018

Adapt patch behaviour for LF/CRLF compatibility #3349

Adapt patch behaviour for LF/CRLF compatibility #3349

Conversation

dra27 commented May 10, 2018 • edited Loading

AltGr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dra27 commented Jun 10, 2018

dra27 commented Jun 10, 2018

dra27 commented Jun 10, 2018

rjbou commented Jun 13, 2018

dra27 commented Jun 13, 2018

dra27 commented Jun 13, 2018

dra27 commented Jun 13, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dra27 commented Jun 14, 2018

AltGr commented Jun 14, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dra27 commented Jun 15, 2018

dra27 commented Jun 15, 2018

dra27 commented Jun 15, 2018

rjbou commented Jun 15, 2018

dra27 commented May 10, 2018 •

edited

Loading

dra27 commented Jun 13, 2018 •

edited

Loading