Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes to reversal of environment updates #5935

Open
wants to merge 17 commits into
base: master
Choose a base branch
from
Open

Conversation

dra27
Copy link
Member

@dra27 dra27 commented Apr 24, 2024

Fixes for #5838, #5925 and #5926. This PR has become quite an epic rabbithole, but the fixes and changes are very, very intertwined. It is possible to separate the final fix for #5926 to a subsequent PR, but the final fix itself is actually quite small.

The first commit adds a new sort operation to the reftest language. While working on this, the stability of the output of opam env and the stability of the output when using unordered made making the true effect of each commit tricky. I'm still not sure that unordered is behaving as well as it should when there's actually a change to the output (i.e. when one line changes, it's allowing some unchanged lines to be re-ordered noisily in the diff).

I then add a whole range of tests:

There have been a lot of issues and PRs in the past to do with the treatment of := and =: principally where MANPATH is concerned. I'm relatively confident that the approach for this before has not been strictly correctly (and is the cause of the issue being seen in #5926). For both PATH (as defined by Posix) and MANPATH (as defined by the tool itself), an unset value, an empty value and the value : are all equivalent. For PATH, that means the current directory and all of these entries are literally equivalent to PATH=.. For MANPATH, those three entries mean "use the default search path".

Now, as far as I can see the previous discussions have been about ensuring that MANPATH has a leading or trailing :, to ensure that opam never hides the default path. Thus, if MANPATH is not set, then we have MANPATH =: "%{man}%" giving us :/home/dra/.opam/default/man. However, the focus should not be on having a leading or trailing :, the focus should be on ensuring that a "blank" entry is created if needed (this is the rule for FOO := "a" and FOO =: "a" are a: or :a respectively if FOO is unset or empty) and but then for all updates separately ensuring that the blank entry is preserved.

i.e. if FOO is :a then both FOO += "b" and FOO := "b" should set FOO to b::a

The actual fixes then proceed as:

  • The CAML_LD_LIBRARY_PATH issue arises from what appears to be an incorrect comment in OpamEnv.apply_op_zip indicating that the argument is already transformed, which may suggest why OpamEnv.reverse_env_update was missing the transformation. Essentially, the transformation ensures that opam env --revert reverts the same value as it applied in opam env
  • In OpamEnv.split_var, the cases for splitting using the quoted rules were the wrong way around. Fixing that regresses Fix reverting additions to PATH-like variables #4861, though
  • If an environment variable is empty or unset, opam internally synthesises that as []. However, if setenv or build-env has FOO = "" then we end up with [""] as opam must ensure that this variable is overwritten. The easiest solution for Using the += opetator in build-env to prepend an empty (but set) variable adds a trailing seperator #5925 is simply to allow for that - [""] can only arise from FOO = "" as an update, so += and =+ handle that case, and remove the "" entry from the list. := and =: at this point of course work correctly, because the "" should be there.
  • The quoted splitting algorithm in OpamEnv.split_var is not quite right - it doesn't cope correctly with a separator quoted inside the double-quote marks. My initial approach therefore was to replace it with the PATH-splitter in OpamStd.Env.split_path. This, however, reveals a more fundamental problem - the PATH-splitter correctly splits the PATH, but it removes all the quote marks. The user is at liberty to add quotes in a PATH even if they're not needed. For example, it's a common mistake on Windows to quote directories with spaces in them (having, for example, Path set to C:\Windows\system32;"C:\Program Files\Something\bin"). With the corrected split function, PATH += "foo" causes Path to become foo;C:\Windows\system32;C:\Program Files\Something\bin with the superfluous quotes removed. This isn't correct, given that the effect of PATH += "foo" should morally be much closer to export PATH="foo;$PATH".
  • Similarly to this dequoting, with x-env-path-rewrite, if two packages disagree over the separator, then we have the slightly odd situation where the variable is split according to the first package's separator and re-joined according to the second package's. i.e. if Path is foo;bar and the first package to be evaluated has PATH += "baz" then Path internally is ["baz"; "foo"; "bar"] and will be exported as baz;foo;bar. However, if another package is then added which incorrectly has x-env-path-rewrite: PATH ":" "target" and executes PATH += "broken" then internally we have ["broken"; "baz"; "foo"; "bar"] but the variable will be reconstituted as broken:baz:foo:bar, so from an initial value of foo;bar, opam env changed the existing separator.
  • Combined with the previous issue with quoting, I've changed the internal representation. Instead of storing just the directory, we instead store the directory, the raw unquoted version of the directory and the separator. This has the benefit that concatentation simply uses the given separators and raw values. The cygpath translations, if used, happen in the raw version, with the native directories manipulated internally. This change of representation naturally fixes several of the issues in Unexpected behaviour when prepending/appending to initially empty variables multiple times #5926 and restores the superfluous quotes in environment variables.
  • The final part of the fix for Environment revert not working correctly for Unix-like variables on Windows #5838 is to correct the rule used in Opam.unzip_to to split the value in FOO += value. I've commented (hopefully clearly) the three cases in the commit. At this point, the Fix reverting additions to PATH-like variables #4861 test passes again and Environment revert not working correctly for Unix-like variables on Windows #5838 is fixed.
  • The next commit then handled Fixes: Env var & delimiters, unversionned opam file pin, doc #3404 in a slightly different way, creating :: as required (see the changes in the reftests in that commit)
  • Finally, there's a long-standing logic error OpamEnv.reverse_env_update for =+ and =: where one list was reversed which shouldn't be. Again, the effect is visible in the reftests! The point is that a reversed environment variable was unzipped, and needs to reversed, but there was one reversal too many, given how environment variable updates are stored.

TODO

  • Update master_changes.md in full
  • Update documentation (once semantics presented here fully agreed)

@dra27 dra27 added this to the 2.2.0~beta3 milestone Apr 24, 2024
@dra27 dra27 requested a review from rjbou April 24, 2024 12:48
@dra27 dra27 added this to For beta3 in Opam 2.2.0 Apr 24, 2024
@dra27 dra27 marked this pull request as ready for review April 24, 2024 17:05
@dra27 dra27 marked this pull request as draft April 25, 2024 07:19
@dra27
Copy link
Member Author

dra27 commented Apr 25, 2024

OK, the fix I was proposing I didn't like because it didn't preserve unnecessarily escaped parts of existing variables. I then further found it regressed #4861 and consequently broke the msvs-detect package.

I have a plan (which I'm currently implementing) - a failing test for the main issue (#5838) added and for the first bit of the fix, which I am certain is right. The change I think is required for #5838 without breaking #4861 I think will also address #5925 and #5926, so I've added tests showing the failure behaviour for them as well.

CAML_LD_LIBRARY_PATH should appear in opam env --revert
@dra27
Copy link
Member Author

dra27 commented Apr 29, 2024

The change in c4167f6 requires environment files to be reset - @rjbou... does that happen naturally with a format upgrade?

That said, it's making me wonder if we should ever be "putting" rewrite_default in the environment cache files anyway?

dra27 added 13 commits May 1, 2024 11:37
MANPATH should appear in opam env --revert
As in this commit, there should be no output captured from the
opam env --revert invocation.
NV_VARS5925_1 thru _4 should all be foo (i.e. without no colon)
- In `opam exec -- env` / `opam env`:
  - NV_VARS_5926_L_2 should be `b::a` (the _L_1 case is correct)
  - NV_VARS_5926_L_4 should be `:a:b` (the _L_3 case is correct)
  - NV_VARS_5926_L_5 and _L_6 should be `b::a` (neither is correct)
  - NV_VARS_5926_L_8 should be `:a:b` (the _L_7 case is correct)
  - All four NV_VARS_5926_M_ all contain `a1:a2` instead of `a1::a2`
  - NV_VARS_5926_T_2 should be `:b:a` (the _T_1 case is correct)
  - NV_VARS_5926_T_4 should be `a::b` (the _T_3 case is correct)
  - NV_VARS_5926_T_6 should be `:b:a` (the _T_5 case is correct)
  - NV_VARS_5926_T_7 and _T_8 should be `a::b` (neither is correct)
- In `opam exec -- opam env --revert`:
  - NV_VARS_5926_L_{2,4,6,8} revert to `a` instead of `:a`
  - NV_VARS_5926_M_{1,2,3,4} should all revert to `a1::a2` where
    - _M_1 and _M_3 only have a single colon (`a1:a2`)
    - _M_2 and _M_4 have reversed the two components! (`a2:a1`)
  - NV_VARS_5926_T_{2,4,6,8} revert to `a` instead of `a:`
  - Note NV_VARS_5926_S_{1,2,3,4} revert to empty rather than `:`
    (this is correct since both interpretations are equivalent and
    opam can't know which it was)
Missing transformation of the argument, which affected reverting updates
with forward slashes (e.g. "%{lib}%/stublibs")
The cases of this match were the wrong way around.

Note that this regresses ocaml#4861's quoted directory test.
Expose a generalised version of the Windows implementation of
PATH-splitting as OpamStd.String.split_quoted.
Existing implementation was not correct w.r.t. separators which were
themselves quoted. Use OpamStd.String.split_quoted instead. However,
this causes superfluous quotes in variables to be removed.

Note that this fixes the MANPATH issue of ocaml#5838 but the quoting-handling
is still broken.
Ensure that if opam has internally synthesised an empty environment
variable, then adding to it doesn't create a stray colon.
Change the internal representation of environment variables so that both
the original quoted version and the parsed version are kept, along with
the separator. This has the benefit that reconstituting the variable
value does not require knowledge of the separator (which deals
coherently with the parasitic case of two packages updating the same
variable with a different separator: the value is almost certainly
trashed, but it is at least trashed in a semantically coherent manner!)
When reverting [FOO += "value"], it is necessary to determine what to do
if "value" is of the form "<value1><sep><value2>". Because FOO itself
will have been split at <sep>, we will never match
"<value1><sep><value2>" so it is necessary to split value at <sep> as
well (this is the crux of ocaml#4861).

However, the interaction of this and x-env-path-rewrite is incorrect. If
x-env-path-rewrite has [FOO false] (the `norewrite case), then the value
must be split according to the default interpretation of FOO (which is
how it was added).

If x-env-path-rewrite has a rewrite rule for FOO, then opam has already
assumed that value refers to a single directory (the host/target and
quoting transformation options all being this way).

If x-env-path-rewrite does not have any rule (and does have have
[FOO false]) then we are in essence in the backwards-compatible case,
and must split value.

In order to stop poor interaction between this and opam itself
injecting Windows paths, opam's updates in OpamEnv (to MANPATH, etc.)
are explicitly expanded so that in unzip_to they are treated as though
they had an x-env-path-rewrite rule given (and are therefore not split).

Combined with the previous fixes, this addresses ocaml#5838 without
regressing ocaml#4861.
let split_quoted path sep =
let length = String.length path in
let rec f acc index current last normal =
if index = length then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if index = length then
if Int.equal index length then

List.rev (if current <> "" then current::acc else acc)
else let c = path.[index]
and next = succ index in
if c = sep && normal || c = '"' then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if c = sep && normal || c = '"' then
if Char.equal c sep && normal || c = '"' then

let split_path_variable path sep =
let length = String.length path in
let rec f acc index current current_raw last normal =
if index = length then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if index = length then
if Int.equal index length then

else
let c = path.[index]
and next = succ index in
if c = sep && normal || c = '"' then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if c = sep && normal || c = '"' then
if Char.equal c sep && normal || c = '"' then

let current_raw = current_raw ^ segment in
if c = '"' then
f acc next current (current_raw ^ "\"") next (not normal)
else if next = length then (* path ends with a separator *)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
else if next = length then (* path ends with a separator *)
else if Int.equal next length then (* path ends with a separator *)


let join_var ~(sepfmt:sep_path_format) var values =
let separator =
if value = String.make 1 sep then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if value = String.make 1 sep then
if String.equal value (String.make 1 sep) then

@@ -267,11 +328,11 @@ let apply_op_zip ~sepfmt op arg (rl1,l2 as zip) =
or empty lists is returned if the variable should be unset or has an unknown
previous value. *)
let reverse_env_update ~sepfmt var op arg cur_value =
let arg = transform_format ~sepfmt arg in
if String.equal arg "" && op <> Eq then None else
let (arg, _, _) = transform_format ~sepfmt var arg in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to make the code clearer, I feel like having a record instead of a triple sum type would make things clearer in the future and potentially better at avoiding any future bugs

@@ -85,13 +137,14 @@ setenv: [
[ RCTQ_ENVSET_ADD_WITH_COL += "a:/nother/gi;ven/path" ]
]
build-env: [
[ LC_ALL = "C" ]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we'd want this all the time, like globally. Couldn't we add this in run.ml instead?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Opam 2.2.0
  
For beta3
2 participants