Skip to content

Add error check to bgzf_seek_common()#1896

Merged
jkbonfield merged 3 commits intosamtools:developfrom
whitwham:bgzf_seek_error_checking
Apr 3, 2025
Merged

Add error check to bgzf_seek_common()#1896
jkbonfield merged 3 commits intosamtools:developfrom
whitwham:bgzf_seek_error_checking

Conversation

@whitwham
Copy link
Copy Markdown
Member

As per issue #1889, multithreaded seeking was not checking the error status on return.

As per issue samtools#1889, multithreaded seeking was not checking the error status on return.
@whitwham whitwham marked this pull request as draft March 13, 2025 14:23
Comment thread bgzf.c Outdated
} while (fp->mt->command != SEEK_DONE);

if (fp->mt->errcode)
return -1;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should also do fp->errcode |= BGZF_ERR_IO and it might be good if bgzf_mt_seek() set mt‑>errcode to errno (instead of BGZF_ERR_IO) so that this could also restore errno in the caller.

Also maybe this should come after …‑>command = NONE and unlocking the mutex, but perhaps continue to not do the resetting of fp‑>block_*.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jmarshall, I don't suppose you have a decent bit of code to test a seek error?

Copy link
Copy Markdown
Member

@jmarshall jmarshall Mar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid not. As in, to make a seek error deterministically actually happen so that you can make a test case that exercises this handling code? I've always vaguely meant to write a LD_PRELOAD hack to inject I/O errors, but have never got around to it. But probably such things exist…

Possibly one could make an HTSlib plugin that did this without needing to have a test-only build of hfile.o… 🤔

@whitwham whitwham marked this pull request as ready for review March 25, 2025 15:36
Comment thread bgzf.c
fp->mt->command = NONE;

if (fp->mt->errcode) {
fp->errcode |= BGZF_ERR_IO;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also errno = fp->mt->errcode to propagate errno over to the calling thread (if this multithreading works the way I assume it does!).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your probably right.

@jkbonfield
Copy link
Copy Markdown
Contributor

jkbonfield commented Apr 3, 2025

For purposes of repeating this experiment:

Testing by generating a set of ranges:

r=$(for i in `seq 1 46`;do echo 1:${i}m-${i}.01m;done)
eval  ./test/test_view -B -@8 ~/scratch/data/novaseq.10m.bam $r

I now have a simple LD_PRELOAD lseek implementation that can generate errors.

// gcc -fPIC -g libseek_io.c -shared -o libseek_io.so
// export LD_PRELOAD=`pwd`/libseek_io.so

#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
#include <stdlib.h>
#include <errno.h>
#include <sys/syscall.h>

#if 0
/*
 * Instead of syscall, we could use dlsym to find the function and redirect
 * via a pointer.  This is useful for library calls instead of system calls.
 */
#include <dlfcn.h>
static off_t (*lseek_p)(int fd, off_t offset, int whence);
static void __attribute__((constructor)) init(void);
static void init(void) { 
    void *handle = dlopen("/usr/lib/x86_64-linux-gnu/libc.so.6", RTLD_NOW);
    lseek_p = dlsym(handle, "lseek");
    dlclose(handle);
}
#endif

static double err_chance = 0.01;
static void __attribute__((constructor)) load_env(void);
static void load_env(void) {
    srand48(0);

    char *p = getenv("SEEK_FAIL");
    if (p) {
	err_chance = atof(p);
	fprintf(stderr, "Setting error to %f\n", err_chance);
    }

    p = getenv("SEEK_SEED");
    if (p)
	srand48(atol(p));
}

off_t lseek(int fd, off_t offset, int whence) {
    double d = drand48();
    off_t ret;
    if (d < err_chance) {
	fprintf(stderr, "Fake error hit, %f < %f\n", d, err_chance);
	errno = EINVAL;
	ret = -1;
    } else {
	ret = syscall(SYS_lseek, fd, offset, whence);
    }
    fprintf(stderr, "lseek(%d, %ld, %d) = %ld\n", fd, offset, whence, ret);
    return ret;
}

We can run it with no errors and see the seeks happening:

SEEK_SEED=1 SEEK_FAIL=0 LD_PRELOAD=~/common/src/libseek_io.so eval  ./test/test_view -B -@8 ~/scratch/data/novaseq.10m.bam $r
Setting error to 0.000000
lseek(3, -28, 2) = 518154002
lseek(3, 3660, 0) = 3660
lseek(3, 11591919, 0) = 11591919
lseek(3, 24749712, 0) = 24749712
lseek(3, 38017666, 0) = 38017666
...

Or we can introduce seek failues:

SEEK_SEED=1 SEEK_FAIL=0.1 LD_PRELOAD=~/common/src/libseek_io.so eval  ./test/test_view -B -@8 ~/scratch/data/novaseq.10m.bam $r
Setting error to 0.100000
Fake error hit, 0.041630 < 0.100000
lseek(3, -28, 2) = -1
[W::bam_hdr_read] EOF marker is absent. The input is probably truncated
lseek(3, 11591919, 0) = 11591919
lseek(3, 24749712, 0) = 24749712
lseek(3, 38017666, 0) = 38017666
lseek(3, 48790523, 0) = 48790523
Fake error hit, 0.001767 < 0.100000
lseek(3, 59306281, 0) = -1
[E::hts_itr_next] Failed to seek to offset 3886696480779: Invalid argument
Error reading input.  
Error closing input.

The [E::hts_itr_next] Failed to seek to offset 3886696480779: Invalid argument is now the same for threads and unthreaded, although in both cases the offset reported is incorrect.

Edit: not incorrect, just highly misleading. It's a virtual offset which is a combination of compressed and uncompressed. It's (59306281<<16)+49163. It should probably say "failed to seek to virtual offset".

@jkbonfield jkbonfield merged commit 2f5a06c into samtools:develop Apr 3, 2025
9 checks passed
@jkbonfield
Copy link
Copy Markdown
Contributor

Small improvement to above, if we're using dlsym we don't need to find the C library. On GNU we can define -D_GNU_SOURCE to enable RTLD_NEXT and just do:

static void init(void) {                                                        
    lseek_p = dlsym(RTLD_NEXT, "lseek");                                        
}                                                                               

I remembered my horrifically scary "nitrous" command and saw this in there :). It replaced all open/close ,read/write/seek, etc with mmap and memcpys for super-fast I/O on any command, via LD_PRELOAD. Scary stuff, but also (mostly) worked amazingly well. lol. (Also scary to see that was back in 2008)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants