You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have been seeing intermittent rose ana failures in our nightly tests, with a message of:
[FAIL] database disk image is malformed
After some investigation, it turns out our HPC only supports the file locking that sqlite uses within a single node (using Lustre's localflock option). This means if multiple rose ana tasks start at the same time on different nodes SQLite will not see any locks, corrupting the file.
It is still possible to obtain a lock on our system by attempting to open a file in exclusive mode (open('foo.lock','x')). Can a config option be added to use a file-based lock for database writes, or as the database is non-essential can this error be caught and ignored?
@stevewardle to check with partners (in particular @ScottWales) if this problem still exists with new rose ana introduced in #1996 once new release is out.
This doesn't work across nodes at our site. Testing this on two different login nodes shows that they can both obtain the lock simultaneously. Same behavior on /short, /home and /g/data filesystems.
We have been seeing intermittent rose ana failures in our nightly tests, with a message of:
After some investigation, it turns out our HPC only supports the file locking that sqlite uses within a single node (using Lustre's
localflock
option). This means if multiple rose ana tasks start at the same time on different nodes SQLite will not see any locks, corrupting the file.It is still possible to obtain a lock on our system by attempting to open a file in exclusive mode (
open('foo.lock','x')
). Can a config option be added to use a file-based lock for database writes, or as the database is non-essential can this error be caught and ignored?cc @MartinDix
The text was updated successfully, but these errors were encountered: