New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fcntl(F_SETLK) for setting byte range locks is broken leading to data corruption #1927
Comments
|
The issue referenced looks like lock files, not byte range locks. Another test in my test suite tests lock files and it passes on WSL. Still, the issues could be related. It should be trivially easy to convert the new ORD locks to LockFileEx. The older POSIX locks would require a little more work to emulate the broken semantics required by POSIX. In any case they should return ENOTSUPP or something so implementations can choose a backup locking system like lock files. |
|
FWIW, recent builds of QEMU require F_OFD_SETLK / F_OFD_GETLK for disk image files. It would be nice to have them implemented. |
|
I see this fcntl(F_SETLK) issue with libapr's apr_file_lock() API. A child process is able to run this sequence successfully while the parent holds the lock. |
|
A simple test case for fcntl-style file locking semantics can be found in "locky.c" linked from this blog post: https://apenwarr.ca/log/20101213 |
WSL (Windows Services for Linux) provides a Linux-kernel-compatible ABI for userspace processes, but the current version doesn't not implement fcntl() locks at all; it just always returns success. See microsoft/WSL#1927. This causes us three kinds of problem: 1. sqlite3 in WAL mode gives "OperationalError: locking protocol". 1b. Other sqlite3 journal modes also don't work when used by multiple processes. 2. redo parallelism doesn't work, because we can't prevent the same target from being build several times simultaneously. 3. "redo-log -f" doesn't work, since it can't tell whether the log file it's tailing is "done" or not. To fix #1, we switch the sqlite3 journal back to PERSIST instead of WAL. We originally changed to WAL in commit 5156fea to reduce deadlocks on MacOS. That was never adequately explained, but PERSIST still acts weird on MacOS, so we'll only switch to PERSIST when we detect that locking is definitely broken. Sigh. To (mostly) fix #2, we disable any -j value > 1 when locking is broken. This prevents basic forms of parallelism, but doesn't stop you from re-entrantly starting other instances of redo. To fix that properly, we need to switch to a different locking mechanism entirely, which is tough in python. flock() locks probably work, for example, but python's locks lie and just use fcntl locks for those. To fix #3, we always force --no-log mode when we find that locking is broken.
|
Still seeing this on Version 1803 (4.4.0-17134-Microsoft). Are there plans to fix this? |
touch /tmp/lock
zmodload zsh/system
zsystem flock /tmp/lock
( zsystem flock /tmp/lock ) # must hang but on wsl it doesn't
See:
- microsoft/WSL#1927
- microsoft/WSL#1712
|
So any update on this ? here are some codes in C not working (using lockf) /******************************************************
In this file, we implement a simple mutual exclusion
between processes accessing a shared file.
********************************************************/
#include <stdlib.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/uio.h>
#include <unistd.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
#include <setjmp.h>
#include <signal.h>
#include <sys/file.h>
void on_alrm(int sig_nb);
jmp_buf ctxt;
int open_ret;
int main (int argc, char *argv[])
{
int lock_ret, write_ret ;
char taped_word[256];
if (argc != 3)
{
printf("Wrong usage : %s file_name time_out! \n", argv[0]);
exit(2);
}
/*-----------------------------------------------------
To check that the program works correctly, display
outputs on shell windows for which we first obtained
the name using the tty command. Pass the file name
as an argument to the program.
----------------------------------------------------- */
int timeout = atoi(argv[2]);
if ((open_ret = open (argv[1], O_RDWR)) == -1)
{
perror ("open 2");
exit (1);
}
while (1)
{
sigsetjmp(ctxt, 1);
/* Avoid starvation */
sleep(1);
/* Implement a mutual exclusion */
lseek(open_ret, 0, SEEK_SET);
lock_ret = flock(open_ret, LOCK_EX);
printf ("Pid %d : entered the critical section\n", (int)getpid() );
signal(SIGALRM, on_alrm);
alarm(timeout);
/*----------------------------------------------
Begining of the critical section
----------------------------------------------*/
while ( fgets(taped_word, sizeof(taped_word), stdin) )
{
/*----------------------------------------------
The program stays in the read/write loop until
the user tape "end".
----------------------------------------------*/
if (!strncmp (taped_word, "end", 3)) break;
write_ret = write (open_ret, taped_word, strlen(taped_word));
if (write_ret < 0)
{
printf ("Error when writing in %s\n", argv[1]);
break;
}
}
/*----------------------------------------------
End of the critical section
----------------------------------------------*/
lseek(open_ret, 0, SEEK_SET);
/* Release the second lock */
lock_ret = flock(open_ret, LOCK_UN);
printf ("Pid %d : arret temporaire d'utilisation de %s\n",
(int)getpid(), argv[1]);
/* To avoid starvation */
sleep(1);
printf ("Finished critical section, lock returned = %d\n", lock_ret);
} /* fin de la boucle while */
}
void on_alrm(int sig_nb)
{
lseek(open_ret, 0, SEEK_SET);
/* Release the second lock */
flock(open_ret, LOCK_UN);
printf("Jump: re-execute before critical section\n");
siglongjmp(ctxt, sig_nb);
}/******************************************************
On UNIX, usage of locks when accessing a file verrou
is not mandatory (advisory locking). To make it
mandatory (mandatory locking), it is necessary
(if you have the wright to do so) to modify the access
wrights on the file (cf. man lockf et man chmod).
Depending on the operating systems, commands that follow
should do the job:
chmod 2644
chmod +l
********************************************************/
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <sys/types.h>
#include <sys/uio.h>
#include <unistd.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
int main (int argc, char *argv[]){
int i=0, sentence_size, sleep_index_in_sentence,
deadlock_nb=0, /* counter for the number of detected deadlocks */
correct_iteration_nb =0, /* counter for the number of correct iteration
* of the critical section */
lockf_ret =0, /* value returned by call to lockf */
file_1, file_2; /* shared resources */
char sentence[256];
if (argc != 3){
printf(" Wrong usage : %s file_1_name file_2_name ! \n", argv[0]);
exit(2);
}
sprintf(sentence, " -- Writen by pid %d -- \n", (int)getpid() );
sentence_size=strlen(sentence);
/* Open the two files */
if ((file_1 = open (argv[1], O_RDWR)) == -1) {
perror ("open file_1");
exit (1);
}
if ((file_2 = open (argv[2], O_RDWR)) == -1) {
perror ("open file_2");
exit (1);
}
while(1)
{
/* Take the first lock */
lockf_ret = lockf(file_1, F_LOCK, 0);
if (lockf_ret != 0) perror ("lockf file_1");
printf ("Pid %d : entered critical section 1 (%s), lockf_ret=%d\n", (int)getpid(), argv[1], lockf_ret );
// simulate execution
sleep(random() %4);
/* Take the second lock */
lockf_ret = lockf(file_2, F_LOCK, 0);
/* If a deadlock is detected, release the first lock
execute sleep, increment deadlock_nb, and try to
take it again by restarting the loop */
if (lockf_ret != 0) {
perror ("lockf file_2");
if (errno == EDEADLK){
printf ("Pid %d : ulock %s (deadlock_nb=%d)\n", (int)getpid(), argv[1], deadlock_nb);
deadlock_nb++;
/** Release the first lock */
lockf_ret = lockf(file_1, F_ULOCK, 0);
sleep(deadlock_nb);
/* continue the loop */
continue;
}
}
deadlock_nb=0;
correct_iteration_nb++;
printf ("Pid %d : entered critical section 2 (%s), lockf_ret=%d, correct_iteration_nb =%d\n",
(int)getpid(), argv[2], lockf_ret, correct_iteration_nb);
/*----------------------------------------------
Begining of the critical section
----------------------------------------------*/
sleep_index_in_sentence=random()%sentence_size;
for (i=0; i <sentence_size ;i++){
if (i == sleep_index_in_sentence)sleep(random()%4);
if (write (file_1, sentence+i, sizeof (char)) < 0){
printf ("ERROR: writing to %s failed\n", argv[1]);
break;
}
}
for (i=0; i <sentence_size;i++){
if (i == sleep_index_in_sentence)sleep(random()%4);
if (write (file_2, sentence+i, sizeof (char) ) < 0){
printf ("ERROR: writing to %s failed\n", argv[2]);
break;
}
}
/*----------------------------------------------
End of the critical section
----------------------------------------------*/
lseek(file_2, 0, SEEK_SET);
/* Release the second lock */
lockf_ret = lockf(file_2, F_ULOCK, 0);
if (lockf_ret != 0) perror ("ulockf file_2");
printf ("Pid %d : exit critical section 2 (%s), lockf_ret=%d\n", (int)getpid(), argv[2], lockf_ret );
lseek(file_1, 0, SEEK_SET);
/* Release the first lock */
lockf_ret = lockf(file_1, F_ULOCK, 0);
if (lockf_ret != 0) perror ("ulockf file_1");
printf ("Pid %d : exit critical section 1 (%s), lockf_ret=%d\n", (int)getpid(), argv[1], lockf_ret );
} /* end of for */
return 0;
} |
touch /tmp/lock
zmodload zsh/system
zsystem flock /tmp/lock
( zsystem flock /tmp/lock ) # must hang but on wsl it doesn't
See:
- microsoft/WSL#1927
- microsoft/WSL#1712
|
@benhillis, do you know if this applies to WSL2? In gittup/tup#394 they seem to have fixed the issue moving to WSL2, but com-lihaoyi/mill#874 seems to indicate this bug is still present in WSL2. |
|
Yes all the |
|
Was it fixed for WSL1? |
|
Why will this not be fixed in WSL-1. The WSL-2 environment cannot be used on older CPU-environments and WSL-1 is unusable for me when this is not fixed. Why does Microsoft not fix a bug in a product it still distributes on the newest Windows versions? WSL-2 is not a replacement for WSL-1, says Microsoft. They can be used as different products. This is really terrible, as it is a simple bug that exists already for years. |
|
I'm in the same situation as @gertvanantwerpentno I do not want to move to WSL-2 for various reasons and without this being fixed in WSL-1 I'm basically stuck. |
|
Me too, @StephenAtty . |
|
Will, there be a solution or a workaround for WSL1. It's annoying sometimes |
|
@saurav3199 = It's not been fixed. They never had any intention of fixing it. They want you to move onto WSL2 which is a full virtual machine. So using WSL as light weight linux development environment is a dead duck. |
|
Another WSL1 user here. Recently hit this issue with a systemd update on Debian sid. Really hoping this gets implemented or worked around at some point! https://superuser.com/a/1805742/1298503 |
My WSL is the Creator's Update 1703. uname -r reports 4.4.0-43-Microsoft
I'm seeing fcntl(F_SETLK) permit multiple exclusive locks on WSL which is bad, it's instant data corruption for anything relying on this to work. This test launches four child processes and uses a shared memory map to check if fcntl(F_SETLK) is working correctly. This test passes on real Linux, fails badly on WSL:
While you're at it, if you could implement the non-insane fcntl(F_OFD_SETLK) locks that would be very useful. The fcntl(F_SETLK) type locks are stupid.
The text was updated successfully, but these errors were encountered: