New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to use bgzipped indexed FASTA the right way ? #1281

Closed
FredericBGA opened this Issue Dec 11, 2018 · 7 comments

Comments

Projects
None yet
3 participants
@FredericBGA
Copy link

FredericBGA commented Dec 11, 2018

Hi,

I'm trying the latest version of JBrowse: 1.15.4

I wanted to try the support for bgzipped indexed FASTA.
My fasta is bgzipped and indexed with samtools (and I can query it samtools faidx reference.fasta.gz 1:1-100

So I tried to set up a really simple browser, with only the reference sequence track.

{
"category" : "Reference sequence",
"chunkSize" : 8000,
"compress" : 1,
"key" : "Reference sequence indexed",
"label" : "DNA",
"seqType" : "dna",
"storeClass" : "JBrowse/Store/Sequence/IndexedFasta",
"type" : "SequenceTrack",
"urlTemplate" : "../fasta/reference.fasta.gz"
}

JBrowse does not want to start:

Unable to load https://xxxxxx/jbrowse/names/meta.json status: 404

So I launched the creation of index generate-names.pl -out . --compress --incremental --verbose
But this step requires the reference to be prepared with prepare-refseqs.pl, otherwise I get this error:

generate-names.pl -out . --compress --incremental --verbose
No reference sequences defined in configuration, nothing to do.

I generate the prepared sequence, then the names.

My tracks are now (I changed label and key for the indexed one):

"tracks" : [
{
"category" : "Reference sequence",
"chunkSize" : 80000,
"compress" : 1,
"key" : "Reference sequence",
"label" : "DNA",
"seqType" : "dna",
"storeClass" : "JBrowse/Store/Sequence/StaticChunked",
"type" : "SequenceTrack",
"urlTemplate" : "seq/{refseq_dirpath}/{refseq}-"
},
{
"category" : "Reference sequence",
"chunkSize" : 80000,
"compress" : 1,
"key" : "Reference sequence indexed",
"label" : "DNAidx",
"seqType" : "dna",
"storeClass" : "JBrowse/Store/Sequence/IndexedFasta",
"type" : "SequenceTrack",
"urlTemplate" : "../fasta/reference.fasta.gz"
}
]

With generate-names.pl and prepare-refseqs.pl I can reach the browser.
But the indexed fasta track is not well displayed:

image

And sometimes I see some network errors:

Error: "HTTP 416 when fetching https://xxxxxxxx/fasta/reference.fasta.gz bytes 1269563392-1269825535"
Unhandled promise rejection Error: "HTTP 416 when fetching https://xxxxx/fasta/reference.fasta.gz bytes 1269563392-1269825535"

So bgzipped indexed FASTA can not be used as reference sequence ? I was hopping to avoid prepare-refseqs.pl step (and seq directory).
thank you

@garrettjstevens

This comment has been minimized.

Copy link
Contributor

garrettjstevens commented Dec 11, 2018

Indexed FASTA should be able to be used for the reference sequence. What happens if you specify a very minimal config, something like:

{
  "refSeqs": "/path/to/genome.fa.gz.fai",
  "tracks": [
    {
      "label": "refseqs",
      "urlTemplate": "/path/to/genome.fa.gz"
    }
  ]
}

Also /path/to/genome.fa.gz.gzi should also exist (it should have been generated by samtools faidx)

(More about minimal configs here)

@FredericBGA

This comment has been minimized.

Copy link

FredericBGA commented Dec 11, 2018

@garrettjstevens Thank you Garrett. It works.

It there a way to use generate-names.pl without the need of seq directory created by prepare-refseqs.pl ?
Is really convenient to use GFF3Tabix and and now bgzipped fasta files, but when we want to be able to search for features, we need generate-names.pl

@garrettjstevens

This comment has been minimized.

Copy link
Contributor

garrettjstevens commented Dec 11, 2018

Pinging @cmdcolin to confirm, but I don't think it's possible to run generate-names without having used prepare-refseqs.

Also, make sure to add a nameAttributes entry to a GFF3Tabix config as noted at the bottom of this page if you want to pick those up when running generate-names

@cmdcolin

This comment has been minimized.

Copy link
Contributor

cmdcolin commented Dec 11, 2018

@garrettjstevens You could do a weird workaround like this to enable generate-names to work with a bgzip fasta track, run prepare-refseqs like so

prepare-refseqs.pl --indexed_fasta bgzfasta.fa.gz

Then this will create a refSeqs.json and but the track config will be wrong, so you just delete the track config in the trackList.json and replace it with the config that @garrettjstevens suggested

@cmdcolin

This comment has been minimized.

Copy link
Contributor

cmdcolin commented Dec 11, 2018

We can treat this as a issue though and it should be fixed :)

@FredericBGA

This comment has been minimized.

Copy link

FredericBGA commented Dec 11, 2018

Ok, nice, thank you for all your comments and advice.
Indeed it willbe awesome if it could be fixed.

@cmdcolin

This comment has been minimized.

Copy link
Contributor

cmdcolin commented Dec 12, 2018

I added a new option for both

  • indexing refSeqs if an FAI is specified in the refSeqs part of the config
  • adding bgzip FASTA via prepare-refseqs.pl --bgzip_fasta

Both are pending release. Thanks for the feature request!

@cmdcolin cmdcolin closed this Dec 12, 2018

@cmdcolin cmdcolin added this to the 1.16.0 milestone Dec 17, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment