Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Racket file locking problem with AFS #2840

Open
ngsankha opened this issue Sep 30, 2019 · 1 comment
Open

Racket file locking problem with AFS #2840

ngsankha opened this issue Sep 30, 2019 · 1 comment

Comments

@ngsankha
Copy link

ngsankha commented Sep 30, 2019

We are teaching compilers with Racket at UMD. The students run Dr. Racket on a shared cluster. When they want to run a file on Dr. Racket it throws this exception:

../../../../../../../../class/fall2019/cmsc/430/0101/public/racket-7.4/collects/racket/file.rkt:435:8: port-try-file-lock?: error getting file shared lock
  system error: Permission denied; errno=13

Tracing through the Racket code this seems to be ending up at

int rktio_file_lock_try(rktio_t *rktio, rktio_fd_t *rfd, int excl)
{
#ifdef RKTIO_SYSTEM_UNIX
# ifdef USE_FLOCK_FOR_FILE_LOCKS
{
intptr_t fd = rktio_fd_system_fd(rktio, rfd);
int ok;
do {
ok = flock(fd, (excl ? LOCK_EX : LOCK_SH) | LOCK_NB);
} while ((ok == -1) && (errno == EINTR));
where the flock system call is apparently failing. The cluster here is running on the Andrew File System. There are well documented issues about flock locking not working well on distributed file systems - like AFS and NFS. It is often advised to just use fcntl locks in general.

To further confirm flock is the cause here, here is something I did. I wrote a dummy C file that indicates that the flock call always succeeds:

int flock(int fd, int operation) {
  return 0;
}

Let's make a shared library with this by using:

gcc -fPIC -shared -o flock.so flock.c

If we launch Dr. Racket with this flock via LD_PRELOAD, things work perfectly without the issue.

LD_PRELOAD=./flock.so drracket
@mflatt
Copy link
Member

mflatt commented Oct 12, 2019

Thank you for the report and investigation (and apologies for the delay; I must have missed the notification from GitHub).

As you may have noticed in "rktio_flock.c", rktio can be compiled to use fcntl instead of flock. The last time I went down this path for an uncooperative NFS installation, fcntl didn't work, either. But the NFS case didn't report a permission error for flock.

Could you check whether it works for your AFS installation? Trying requires building from source. The simplest way to try the change is probably to edit "rktio_platform.h" to add

#  define RKTIO_USE_FCNTL_AND_FORK_FOR_FILE_LOCKS

at the top.

Even if this works, I'm not immediately sure of the way forward. Using fcntl requires forking a new process to manage the lock, so it's not a great choice when flock could work. Maybe the fcntl approach should be tried as a fallback if flock reports a permission error, although that feels heavyweight.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants