Skip to content

Conversation

@dra27
Copy link
Member

@dra27 dra27 commented Oct 3, 2023

This is a possible alternate implementation for extending the toplevel's use of XDG when searching for init.ml for better Windows support, originally outlined in #12364 (comment).

The fundamental problem here arises from a Windows configuration subtlety added a while ago in Windows 2000. For Windows users, there are two kinds of user configuration - roaming (typically stored in AppData\Roaming in a user's profile directory and intended to be synchronised between machines) and local (typically stored in AppData\Local in a user's profile directory and for that specific machine only). For the OCaml toplevel, it is not possible to define which one a user may want (indeed, it's quite plausible that a user might want to load configuration from both!).

This PR seeks to solve the problem on Windows by adopting more of the XDG Base Directory Specification on all platforms, and then doing an appropriate translation of the XDG defaults for Windows. At present, the toplevel only inspects $XDG_CONFIG_HOME, but XDG also provides $XDG_CONFIG_DIRS.

For Unix, the toplevel now reads the first startup file of:

  1. .ocamlinit in the current directory
  2. $XDG_CONFIG_HOME/ocaml/init.ml if $XDG_CONFIG_HOME is an absolute path
  3. $HOME/.config/ocaml/init.ml only if $XDG_CONFIG_HOME is unset, empty or a non-absolute path and $HOME is non-empty
  4. ocaml/init.ml under any of the absolute paths in the :-separated list $XDG_CONFIG_DIRS
  5. /etc/xdg/ocaml/init.ml only if $XDG_CONFIG_DIRS is unset, or contains no absolute paths
  6. $HOME/.ocamlinit, if $HOME is non-empty

For Windows, we use the XDG environment variables, but with the following tweaks:

There is a subtle tweak to the Windows default for $XDG_CONFIG_DIRS (noted in the code as well), which is intended to cater for the not inconceivable scenario where a Windows user has $XDG_CONFIG_HOME set in Cygwin (e.g. to /home/userid/.config) and then starts a native Windows ocaml.exe (which is likely not to find anything in \home\userid\.config on the current drive). The tweak is that FOLDERID_LocalAppData is both the default for $XDG_CONFIG_HOME and also the first entry in the default for $XDG_CONFIG_DIRS.

The hope is that what previously would have been Windows-specific workarounds are now merely platform-specific defaults for a cross-platform solution. Incidentally, various other libraries propose alternate defaults for macOS, but there seems to be even less agreement on what those should be than there is for Windows, so I've left grasping that nettle for another day by just leaving macOS treated as Unix.

For the OCaml toplevel, the first located init.ml is the only one which matters (cf. last paragraph of XDG BDS); naturally, nothing prevents a user from configuring their init.ml files such that init.ml from further locations is loaded. In particular, for Windows users, that means that a user could choose to call Toploop.use_silently from within init.ml in AppData\Local to source the init.ml in AppData\Roaming should they so desire!

Notes on the implementation:

  • Path-splitting on Windows is tedious because of the "-quote rules. Although it's technically only necessary to double-quote semicolon characters (which are not common in directory names!), it's really quite commonplace to see spaces double-quoted.
  • SHGetKnownFolderPath is a tricky function to wrap (the FOLDERID_ values are bulky and numerous, so not really wanted in a static array, etc.). I elected for what I hope is a simpler approach with a C primitive that just returns an empty list on Unix and returns the three values needed on Windows. It's much less code, at the cost of a slightly odd-looking primitive. Pedantically, the Unix version of it could construct ["/etc/xdg"] in C code 🤷‍♂
  • The diff of Toploop.find_ocamlinit is very noisy - it's probably easiest just to re-review the entire function in terms of the specification above.

TODO

Copy link
Contributor

@MisterDA MisterDA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe Tag_cons can be used? Otherwise it's a nice to great support for xdg-dirs and a Windows equivalent.

@dra27
Copy link
Member Author

dra27 commented Oct 3, 2023

Updated - I also hopefully pushed a fix to the symbol checker which should make AppVeyor pass 🏌️‍♂️

@gasche gasche added the windows label Oct 7, 2023
@shindere
Copy link
Contributor

I went through all the code and everything semeed very reasonable to me but
I indeed think @nojb's review would (as often) be very useful on this PR.
Incidentally, @nojb is listed as a revier in the Changes entry whereas he
has not yet reviewed (I think!), whereas @MisterDA who has commentedw is
not listed yet.

Regarding the new libraries that need to be linked with, I was wondering
whether we could avoid repeating their names between the Mingw and
MSVC cases. I think it's certainly possible to achieve this but
it's probably not for this PR.

The comment in runtime/sys.c could perhaps be improved:

"On Unix, the computation of this list is more easily computed in OCaml"
looks like an hesitation between the computation is more easily
done and the list is more easily computed, or something like that.

In toplevel/toploop.mli, regarding the documentation comment for
split_path: I was wondering wether writing PATH rather than path
wouldn't make it clearer that we talk about the content of a PATH-style
environment variable, rather than just a file path. I do realise
PATH is not a valid identifier, so perhaps my suggestion is to
clarify the link with the PATH-style environment varialbe, no
matter how this clarification occurs.

@nojb
Copy link
Contributor

nojb commented Oct 12, 2023

I indeed think @nojb's review would (as often) be very useful on this PR.
Incidentally, @nojb is listed as a revier in the Changes entry whereas he
has not yet reviewed (I think!)

Actually I took a look at this PR before it was submitted. But I intend to do a more formal review soon-ish. Thanks for the ping!

@emillon
Copy link
Contributor

emillon commented Oct 13, 2023

Should we adopt that algorithm in dune's xdg library as well?

@shindere
Copy link
Contributor

shindere commented Oct 13, 2023 via email

@dra27
Copy link
Member Author

dra27 commented Oct 13, 2023

Should we adopt that algorithm in dune's xdg library as well?

Assuming it's approved, yes please 🙂 I think what Dune's doing right now is already a compatible subset of what's proposed here.

@dra27 dra27 force-pushed the xdg_config_dirs branch 2 times, most recently from a00b20e to 704bb47 Compare October 13, 2023 15:03
@dra27
Copy link
Member Author

dra27 commented Oct 13, 2023

I went through all the code and everything semeed very reasonable to me but I indeed think @nojb's review would (as often) be very useful on this PR.

I should have made that explicit in the comments that @nojb had a look at this branch before the PR was opened, but also that I didn't intend to merge it without his full review, too 🙂

Incidentally, @nojb is listed as a revier in the Changes entry whereas he has not yet reviewed (I think!), whereas @MisterDA who has commentedw is not listed yet.

Fixed

Regarding the new libraries that need to be linked with, I was wondering whether we could avoid repeating their names between the Mingw and MSVC cases. I think it's certainly possible to achieve this but it's probably not for this PR.

I've expressly split the two of them up - partly because the lists are different already and partly because there is no reason that they have to be the same.

The comment in runtime/sys.c could perhaps be improved:

"On Unix, the computation of this list is more easily computed in OCaml" looks like an hesitation between the computation is more easily done and the list is more easily computed, or something like that.

It is indeed a frankenstein mix of two sentences - fixed, thanks!

In toplevel/toploop.mli, regarding the documentation comment for split_path: I was wondering wether writing PATH rather than path wouldn't make it clearer that we talk about the content of a PATH-style environment variable, rather than just a file path. I do realise PATH is not a valid identifier, so perhaps my suggestion is to clarify the link with the PATH-style environment varialbe, no matter how this clarification occurs.

I pushed an alternate form, and added an additional clarifying example of the strangeness of the Windows splitting requirements.

And by the way: I find the explanations given in the PR description about the ordeer in which directories are searched fo very high quality and totally ready to be transferred in the manual as the associated task suggests.

Thank you :)

@shindere
Copy link
Contributor

shindere commented Oct 13, 2023 via email

@shindere
Copy link
Contributor

shindere commented Oct 13, 2023 via email

@dra27
Copy link
Member Author

dra27 commented Oct 19, 2023

So is that to say that the same algorithm will be implemented in two different places (the compiler and Dune)?

I think it's more that they're following the same convention/specification than algorithm - I'm not sure there's much actual code to share per se.

Copy link
Contributor

@nojb nojb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Documentation needs to be added though (manual & man pages).

let[@tail_mod_cons] rec parse segment_begin terminator i =
if i >= len then
(* Done - return the last entry *)
[get_contents (add_segment segment_begin i)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm: in the case of PATH=foo; we want the result of this function to be ["foo";""]? What are the semantics of "" in PATH, is it the current directory?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For XDG, it's moot - the first thing we do is filter the result for absolute paths (and Filename.is_relative "" = true, naturally). POSIX explicitly notes that an initial or final colon (or two consecutive colons) cause the current directory to be included in PATH, yes. On Windows, the CRT skips them, but that's an implementation detail - the interpretation of PATH is always that the current directory is searched first.

On Windows, entries are separated by semicolons. Sections of entries may be
double-quoted (which allows semicolons in filenames to be quoted). The
double-quote characters are stripped (i.e. [f"o"o = foo]; also
[split_path "foo\";\";bar" = ["foo;"; "bar"]) *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my own curiosity, where is this quoting behaviour documented?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out I'd forgotten where I learned it from, so having had to search both the Internet and my memory, I added a comment! Amusingly, the comment in getpath.cpp is very clearly from OS/2 (i.e. 1988 or 1989)! 😂

let default =
if Sys.win32 then
match Lazy.force windows_xdg_defaults with
| dir::_ -> Some dir
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment that the first entry corresponds to %LOCALAPPDATA%?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea - done!

@dra27
Copy link
Member Author

dra27 commented Dec 8, 2023

Thanks, @nojb! I've pushed a rebase addressing hopefully all the technical suggestions - I will shortly push the documentation updates.

@dra27
Copy link
Member Author

dra27 commented Dec 8, 2023

Updates to the manual and man-page now pushed.

@nojb
Copy link
Contributor

nojb commented Dec 16, 2023

This looks good to go to me.

cc @jonahbeckford: could you take a quick look at the specification in the PR description and confirm that it looks OK to you too? Thanks!

@jonahbeckford
Copy link

Yes, the specification looks great. +1

@nojb
Copy link
Contributor

nojb commented Dec 19, 2023

@dra27 is this ready to merge from your side?

@dra27
Copy link
Member Author

dra27 commented Dec 19, 2023

All good to go, I think, yes - I even got to "learn" some troff 😈

@nojb nojb merged commit fb3a6d8 into ocaml:trunk Dec 19, 2023
@nojb
Copy link
Contributor

nojb commented Dec 19, 2023

Merged, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants