Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve index file search and load #870

Merged
merged 9 commits into from Jun 28, 2019
Merged

Conversation

valeriuo
Copy link
Contributor

  1. First search for a local index file with any of the possible extensions (*.bam.csi, *.csi, *.bam.bai, *bai).
  2. If a file is found, load it or report an error.
  3. If no index file is found and the alignment file is remote, try downloading it from the same server.
  4. Before beginning the full download, check if the format is correct and only proceed if it is a known index format.
  5. Load the downloaded file.

Fixes samtools/samtools#1045

@valeriuo
Copy link
Contributor Author

Add method hts_idx_stream in order to create an index without downloading the remote index file.
Add methods sam_index_load3 and tbx_index_load3 to allow the user no to download the index file when performing an indexed query.

valeriuo and others added 9 commits June 27, 2019 14:42
Add internal method `idx_check` to verify the existence an
index file along the alignment file.
Change the names of static index methods to reflect their use.
… file is remote.

Also fix possible read past end of string in hopen_fd_fileuri()
Specifically range querying e.g. https://server/a/b/c.bam?param=val
will now retrieve https://server/a/b/c.bam.bai?param=val and locally
write this to c.bam.bai.

Fixes samtools#784
Allow missing .tbi indexes to be silently ignored and use this
capability in vcf_hdr_read() and synced_bcf_reader.  This
stops unexpected error messages about indexes from being printed
out when reading vcf files that otherwise don't need one.

To enable this, add new function hts_idx_load3() which takes
a flags field that can control both downloading and printing
of errors when the index file is missing.  The specialisations
sam_index_load3(), tbx_index_load3() and bcf_index_load3()
are updated (or added) to take the same flags and interpret them
in the same way so that the API is consistent over all formats.

Make other adjustments needed to ensure the flags get passed
around correctly.  Update functions to use the new API.  Add
documentation.

There is one difference in behaviour.  This makes bcf_index_load()
only look for a .csi suffix (previously it would try .tbi as
well).  .csi is the only index format generated for BCF files,
and .tbi can't be used on them so the only way this would have
worked would be if a .csi index had been deliberately saved
with a .tbi suffix for some reason.  (.bai might work, but it
looks like HTSlib never made them for BCF and it certainly did
not look for them).
@daviesrob
Copy link
Member

I've added a commit to stop vcf_hdr_read() from complaining about missing indexes whenever it gets called. Also some other minor fixes - file: URL handing; a memory leak that appeared in idx_test_and_fetch() that appeared when the htsaddextension() commit was merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

unable to fetch regions from http/https url with local .bai file
3 participants