Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ?follow option to Unix.link #1061

Merged
merged 1 commit into from Jun 25, 2018

Conversation

Projects
None yet
9 participants
@madroach
Copy link
Contributor

commented Feb 24, 2017

This allows hardlinking symlinks.

Also fix a bug in win32 symlink stub code, which interpreted an
optional parameter (bool option) as simple bool. [EDIT: there was no bug, in fact]

@madroach madroach force-pushed the madroach:link_symlink branch from 3f8de51 to f0f8e1b Feb 24, 2017

DWORD flags =
Is_block(to_dir) && Bool_val(Field(to_dir, 0))
? SYMBOLIC_LINK_FLAG_DIRECTORY
: 0;

This comment has been minimized.

Copy link
@dra27

dra27 Feb 24, 2017

Contributor

I'll can't do a full review today, but I can say immediately that this is wrong - see otherlibs/win32unix/unix.ml

@xavierleroy

This comment has been minimized.

Copy link
Contributor

commented Feb 24, 2017

linkall is POSIX 2008 and I'm not sure it's available everywhere. You need to guard its use with

#if _XOPEN_SOURCE >= 700 || _POSIX_C_SOURCE >= 200809L

and provide an #else implementation that uses link and fails in the cases that link cannot handle.

@xavierleroy

This comment has been minimized.

Copy link
Contributor

commented Feb 24, 2017

Also, some motivation on why we really need to hardlink symlinks would be welcome.

@dsheets

This comment has been minimized.

Copy link
Member

commented Feb 24, 2017

If I'm reading the binding correctly, ?follow defaults to false which is the opposite of the previous default:

link() will resolve and follow symbolic links contained within both path1 and path2.

Additionally, at least on OS X,

On OS X, not assigning AT_SYMLINK_FOLLOW to flag may result in some filesystems returning an error.

@dra27

This comment has been minimized.

Copy link
Contributor

commented Feb 24, 2017

Why not instead add Unix.linkat (with relevant configure-based detection for the underlying C function)?

@shindere

This comment has been minimized.

Copy link
Contributor

commented Feb 24, 2017

@xavierleroy

This comment has been minimized.

Copy link
Contributor

commented Feb 25, 2017

Concerning the "bug fix" I confirm there is no bug and the fix would break the win32 code.

@madroach : if you want to keep this PR up for discussion, please remove the 'bug fix", correct the default behavior of Unix.link, and add a fallback implementation if linkat() is not available. Otherwise we'll close this PR.

@madroach madroach force-pushed the madroach:link_symlink branch 2 times, most recently from e458097 to 3e99f64 Feb 25, 2017

@madroach

This comment has been minimized.

Copy link
Contributor Author

commented Feb 25, 2017

I updated the pull request in the following ways:

  • removed my supposed bug fix in the win32 symlink code. I missed out on the option already being handled in unix.ml. There is no bug.
  • I also added checks for __POSIX_VERSION and _XOPEN_VERSION version to use simple link as fallback.
  • The default behavior if no ~follow parameter is supplied is now following symlinks.

On the behavior of link the linux manpage on link(2) says this:

POSIX.1-2001 says that link() should dereference oldpath if it is a symbolic link. However, since kernel 2.0, Linux does not do so: if oldpath is a symbolic link, then newpath is created as a (hard) link to the same symbolic link file (i.e., newpath becomes a symbolic link to the same file that oldpath refers to). Some other implementations behave in the same manner as Linux. POSIX.1-2008 changes the specification of link(), making it implementation-dependent whether or not oldpath is dereferenced if it is a symbolic link. For precise control over the treatment of symbolic links when creating a link, see linkat(2).

Also http://pubs.opengroup.org/onlinepubs/9699919799/functions/link.html may be of interest.

I have no strong preference, but the not dereferencing approach seems saner to me because one could still dereference the symlink with readlink(2) and then create a hardlink to the target file.

As to why this change is needed:

Creating hardlinks of symlinks is error prone:

  • when following the symlink and the symlink points into another filesystem, the hardlink will fail with EXDEV
  • when not following the symlink and the symlink is relative and the new hardlink resides on a different directory than the symlink, the new (hardlinked) symlink will probably be broken and trying to access it will fail with ENOENT

Therefore the programmer often will need some control on what behaviour he prefers.
I need this feature for implementing a hardlinking backup solution like storeBackup or Apple Time Machine.
I could work around it by recreating new symlinks, so I don't depend on it. Still the cleaner solution would be hardlinking.

@xavierleroy

This comment has been minimized.

Copy link
Contributor

commented Feb 25, 2017

Thanks for the revised code, much appreciated. I have two more suggestions based on the discussion above and on the code:

  • I would prefer that Unix.link behaves exactly like before if the ~follow: optional argument is not given. Since the follow/nofollow behavior of link() seems to depend on the target platform, the only way to achieve this compatibility is by calling link() and not linkat() when ~follow is not given.
  • If ~follow: is given and linkat() is not available, the current code silently falls back to using link(), while I think it should fail.

Taken together, this would give a different code logic such as:

   int ret;
   caml_unix_check_path(path1, "link");
   caml_unix_check_path(path2, "link");
   p1 = caml_strdup(String_val(path1));
   p2 = caml_strdup(String_val(path2));
   caml_enter_blocking_section();
   if (follow == Val_int(0)) {
     ret = link(p1, p2);
   } else {
#if _XOPEN_VERSION >= 700 || _POSIX_VERSION >= 200809L
     int flags = Bool_val(Field(follow, 0)) ? AT_SYMLINK_FOLLOW : 0;
     ret = linkat(AT_FDCWD, p1, AT_FDCWD, p2, flags);
#else
     errno = ENOSYS /*or something more appropriate*/; ret = -1;
#endif
   }

@madroach madroach force-pushed the madroach:link_symlink branch from 3e99f64 to c8ce8e2 Feb 25, 2017

@madroach

This comment has been minimized.

Copy link
Contributor Author

commented Feb 25, 2017

Thanks for your input. This is indeed the most sensible behaviour. I adapted the code as you suggest, but kept the parsing of the option out of the blocking section because I'm not sure whether access to the option is protected from interference by the garbage collector.

@dsheets

This comment has been minimized.

Copy link
Member

commented Feb 28, 2017

It would be great to have the documentation updated to describe the ?follow behavior including None ➡️ link() and Some (true | false) ➡️ linkat().

@madroach madroach force-pushed the madroach:link_symlink branch from c8ce8e2 to e4bf29f Mar 1, 2017

@madroach

This comment has been minimized.

Copy link
Contributor Author

commented Mar 1, 2017

documentation added.

@dra27
Copy link
Contributor

left a comment

Sorry to bang on about it, but I'm still unconvinced why this is being done as an optional parameter, and not simply a separate binding as Unix.linkat, given that the new behaviour can only be triggered by specifying the new optional argument.

It also seems potentially unwise, however unlikely it may be for this particular syscall to gain flags, to bind something that is int flags in C effectively with bool in OCaml.

char * p1;
char * p2;
int ret;
caml_unix_check_path(path1, "link");
caml_unix_check_path(path2, "link");
p1 = caml_strdup(String_val(path1));
p2 = caml_strdup(String_val(path2));
ret =

This comment has been minimized.

Copy link
@dra27

dra27 Mar 2, 2017

Contributor

Memory is cheap - can we risk having another int flags; rather than confusingly co-opting one called ret? That said, see below...

caml_enter_blocking_section();
ret = link(p1, p2);
if (follow == Val_int(0) /* None */)

This comment has been minimized.

Copy link
@dra27

dra27 Mar 2, 2017

Contributor

FWIW, if (!Is_block(follow)) /* None */?

ret = link(p1, p2);
else /* Some bool */
# if _XOPEN_VERSION >= 700 || _POSIX_VERSION >= 200809L
ret = linkat(AT_FDCWD, p1, AT_FDCWD, p2, ret);

This comment has been minimized.

Copy link
@dra27

dra27 Mar 2, 2017

Contributor

Why not, instead of ret, just do (Bool_val(Field(follow, 0)) ? AT_SYMLINK_FOLLOW : 0) here, and not need a variable (ret or otherwise)? You already know that follow is a block by this branch.

This comment has been minimized.

Copy link
@madroach

madroach Mar 2, 2017

Author Contributor

Can I safely dereference follow here in the nonblocking section?

This comment has been minimized.

Copy link
@dra27

dra27 Mar 2, 2017

Contributor

That's a very good point (I don't think the access in the if is correct, therefore, either) - it would be better to have two lots of enter_blocking_section around both link and linkat (which again, begs my previous question of why this isn't just a separate binding).

This comment has been minimized.

Copy link
@madroach

madroach Mar 2, 2017

Author Contributor

The if (follow == Val_int(0)) is correct. It does not dereference follow.

This comment has been minimized.

Copy link
@dra27

dra27 Mar 4, 2017

Contributor

Indeed. Assuming this version is kept, it's therefore up to you whether to wrap the two syscalls with their own leave/enter_blocking_section or declare a flags variable earlier (my comment about co-opting ret still stands!!)

# if _XOPEN_VERSION >= 700 || _POSIX_VERSION >= 200809L
ret = linkat(AT_FDCWD, p1, AT_FDCWD, p2, ret);
# else
{ ret = -1; errno = ENOSYS }

This comment has been minimized.

Copy link
@dra27

dra27 Mar 2, 2017

Contributor

To me, this looks so horribly like some kind of struct initialiser, I'd just move the braces outside the #if...

if (follow == Val_int(0) /* None */)
ret = link(p1, p2);
else /* Some bool */
# if _XOPEN_VERSION >= 700 || _POSIX_VERSION >= 200809L

This comment has been minimized.

Copy link
@dra27

dra27 Mar 2, 2017

Contributor

RHEL 5 is about to go out of support, but is it definitely OK to be ignoring the glibc < 2.10 version of this check?

This comment has been minimized.

Copy link
@madroach

madroach Mar 2, 2017

Author Contributor

How would that look like?

This comment has been minimized.

Copy link
@dra27

dra27 Mar 2, 2017

Contributor

Looking here, I think it just means adding || defined(_ATFILE_SOURCE) /* For 2.4 <= glibc < 2.10 */

This comment has been minimized.

Copy link
@madroach

madroach Mar 2, 2017

Author Contributor

ok. I included the change in my tree. But I have to admit I don't understand feature_test_macros(7). Aren't those macros used to request, not check for features?

This comment has been minimized.

Copy link
@dra27

dra27 Mar 4, 2017

Contributor

Indeed - I'm not certain how this is supposed to work with glibc, as you do not appear to be permitted to check the version of the glibc headers you are compiling. My (quite possibly incorrect) understanding is that if someone is using glibc < 2.10 then they should ensure that -D_ATFILE_SOURCE gets into CFLAGS at configure-time (it's possible that this is something configure itself should determine is required, that's not clear at all).

However, it does seem to me to be OK to test it here.

@madroach

This comment has been minimized.

Copy link
Contributor Author

commented Mar 2, 2017

What about this interface?

type link_flag =
|L_FOLLOW

val fd_cwd:dir_handle

val linkat ~flags:link_flag array ->
  ?src_dir:dir_handle -> ~src:string ->
  ?dst_dir:dir_handle -> ~dst:string -> unit

But on the other hand the ?follow:bool -> string -> string -> unit interface won't break anything and we can still add linkadd in case more flags are needed.

@dra27

This comment has been minimized.

Copy link
Contributor

commented Mar 4, 2017

I agree that the ~follow parameter feels an easy way of adding this, but it also (to me) encodes far too much important (user) consideration into the two constructors of bool option (even though you've documented it). Upon further reflection, I also dislike the Windows treatment - at present, hardlinking to a symbolic link on NTFS is a contradiction in terms, so this parameter should not be ignored on Windows, it really ought to be an exception if follow is false.

I would push for the separate interface (if a user - such as yourself - needs the version with optional argument, it is easy to define) but with the following changes:

  • No need to expose fd_cwd - it's well-encoded as the default for ?src_dir & ?dst_dir and can be compatibly added in future if some extra special values appear (as you would still want the default to be AT_FDCWD)
  • There are of course other *at functions (e.g. fstatat(2)) which might want binding in future. Given the way these flags work, this could well be a case to use a polymorphic variant. Alternatively, the type should be better named (e.g. at_flags with an AT_) prefix. The polymorphic variant would be superior as it would allow each at function to specify which flags are permitted and which aren't, obviously.
  • I assume that the labels would be for UnixLabels only (I think that's the "policy" even for new functions)
@xavierleroy

This comment has been minimized.

Copy link
Contributor

commented Mar 4, 2017

I think this PR is getting out of hand and turning into a design discussion for a much bigger extension than initially offered. In particular, if linkat is exposed in its full generality, the other *at functions should also, and that's not a decision to be taken lightly because the *at functions have no Win32 counterpart as far as I know.

@madroach madroach force-pushed the madroach:link_symlink branch from e4bf29f to 28c3cd9 Mar 4, 2017

@madroach

This comment has been minimized.

Copy link
Contributor Author

commented Mar 4, 2017

I updated the pull request to include most suggestions by dra27.

According to CreateHardLink the only behavior on win32 is following hardlinks.
Therefore the win32 port will now fail with NOSYS if ~follow:false is requested. The unix port will fail in the same way if linkat is unavailable and any specific ~follow behavior is requested.

@madroach madroach force-pushed the madroach:link_symlink branch from 28c3cd9 to cd95237 Mar 5, 2017

@diml

This comment has been minimized.

Copy link
Member

commented Mar 6, 2017

Since we are talking about the *at functions, last year I wrote a bindings or them. This will be released in the main opam repository this week

@madroach madroach force-pushed the madroach:link_symlink branch 2 times, most recently from 4cdc229 to e66d41f May 1, 2017

@damiendoligez damiendoligez added this to the 4.07-or-later milestone Sep 29, 2017

@damiendoligez damiendoligez removed this from the consider-for-4.07 milestone Jun 4, 2018

@damiendoligez

This comment has been minimized.

Copy link
Member

commented Jun 4, 2018

@dra27 @xavierleroy can we get consensus on whether to include this?

int flags =
Is_block(follow) && Bool_val(Field(follow, 0)) /* Some true */
? AT_SYMLINK_FOLLOW
: 0;
caml_unix_check_path(path1, "link");

This comment has been minimized.

Copy link
@xavierleroy

xavierleroy Jun 4, 2018

Contributor

This int flags computation should be moved to the else part below, and protected by the #if linkat_is_available, since AT_SYMLINK_FOLLOW may not be defined.

@xavierleroy

This comment has been minimized.

Copy link
Contributor

commented Jun 4, 2018

What can I say?

  1. The API is fine.
  2. The implementation is almost good, see my suggestion above.
  3. I'm still not sure how much we care about this issue. Hard links are very rarely used nowadays.
@damiendoligez

This comment has been minimized.

Copy link
Member

commented Jun 5, 2018

@madroach if you fix the int flags issue and rebase, we'll merge this PR.

@madroach madroach force-pushed the madroach:link_symlink branch 2 times, most recently from 41701de to 67e4f2f Jun 5, 2018

@madroach

This comment has been minimized.

Copy link
Contributor Author

commented Jun 16, 2018

done and rebased.

@damiendoligez
Copy link
Member

left a comment

Fix Changes and we're good.

Changes Outdated
@@ -242,6 +242,9 @@ OCaml 4.07
now uses nanosleep(1) for enabling better preemption.
(b) Thread.delay is now an alias for Unix.sleepf.

- Add ?follow parameter to Unix.link. This allows hardlinking symlinks.

This comment has been minimized.

Copy link
@damiendoligez

damiendoligez Jun 18, 2018

Member

You need to add the GPR number, move the entry to the right place (not under 4.07) add the real names of everyone involved, and cut your lines to 80 characters, as documented in https://github.com/ocaml/ocaml/blob/trunk/CONTRIBUTING.md#changelog

Changes Outdated
@@ -987,6 +990,8 @@ OCaml 4.06.0 (3 Nov 2017):
(Leo White, Xavier Clerc, report by Alex, review by Gabriel Scherer
and Pierre Chambart)

### Tools:

This comment has been minimized.

Copy link
@damiendoligez

damiendoligez Jun 18, 2018

Member

Looks like you added this by mistake.

Add ~follow option to Unix.link
This allows hardlinking symlinks.

@madroach madroach force-pushed the madroach:link_symlink branch from 67e4f2f to 28e6684 Jun 18, 2018

@madroach

This comment has been minimized.

Copy link
Contributor Author

commented Jun 24, 2018

fixed Changes

@damiendoligez damiendoligez merged commit cba4ca5 into ocaml:trunk Jun 25, 2018

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@gasche

This comment has been minimized.

Copy link
Member

commented Jun 25, 2018

The CI reports failing macos-10.3 and freebsd builds with the same error:

gcc -c -O2 -fno-strict-aliasing -fwrapv -Wall -Werror   -D_FILE_OFFSET_BITS=64 -D_REENTRANT -DCAML_NAME_SPACE  -I../../byterun -o link.o link.c
link.c:45:15: error: use of undeclared identifier 'errno'
    ret = -1; errno = ENOSYS;
              ^
link.c:45:23: error: use of undeclared identifier 'ENOSYS'
    ret = -1; errno = ENOSYS;
                      ^
2 errors generated.
@madroach

This comment has been minimized.

Copy link
Contributor Author

commented Jun 26, 2018

Seems like I used the feature test macros in the wrong way.

@nojb

This comment has been minimized.

Copy link
Contributor

commented Jun 26, 2018

It may be a missing #include <errno.h> in otherlibs/unix/link.c.

@madroach

This comment has been minimized.

Copy link
Contributor Author

commented Jun 26, 2018

A fix is proposed in #1862

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.