I am using pysam as a backend to serve genomic data, and I found it a bit difficult to catch the exception for erroneous tabix index column in pysam.
Here are the steps to reproduce:
# data preparation
printf 'A\t1\nA\tb\nA\t3\nB\t1\n' | bgzip > test.gz
tabix -s 1 -b 2 test.gz
Now echo 'A' | tabix test.gz -R - will print available records, stop at the broken row, print a error message and exit with error code 1
$ echo 'A' | tabix test.gz -R -
A 1
[E::get_intv] Failed to parse TBX_GENERIC: was wrong -p [type] used?
The offending line was: "A b"
Reading "test.gz" failed: No such file or directory
$ echo $?
1
While pysam.TabixFile.fetch() will returns available records until the broken row, print a error message but won't raise any error
>>> t=pysam.TabixFile("/tmp/test.gz")
>>> list(t.fetch("A"))
[E::get_intv] Failed to parse TBX_GENERIC, was wrong -p [type] used?
The offending line was: "A b"
['A\t1']
I want to catch the this exception (to mark the file as malformed), but that doesn't seem to be easy in python. The error message is directly printed to stderr and is also not easy to be captured.
Since python can't return results and throw errors at the same time (like tabix does in the shell), I wish the fetch function could take an additional parameter to control whether or not to raise an exception when such an error is encountered.