References support needed. #374
Comments
Wrt (1):
Do we have any reason for picking one of these over the other? |
PageRank says this one https://pypi.python.org/pypi/pyfasta/ |
I'd try out pyfasta and see how it goes. We're looking for:
|
Ok, where should we get / how should we generate example FASTA data for the datadriven tests? |
- added pyfasta package to requirements.txt Issue ga4gh#374
Here's an example: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz We can snip out a few bits of this as a place to start. It should be straightforward enough to get your hands on some other FASTAs at UCSC! |
That website seems to be down. Any other examples? |
It looks like we have an example FASTA file in the tree already, |
@jeromekelleher can you add a README to Also, should we prefer using FAI or GZI files to FA.GZ files? |
I can't remember to be honest @dcolligan --- I think I just manually snipped a bit out of GRCH37. I have no idea whether we should prefer FAI or GZI. @macieksmuga --- you must be dealing with a bunch of FASTA files for the graph reference stuff. Any ideas here on how to create some good testing examples? |
TODO: - more data for datadriven tests? - implement and tests frontend methods - end-to-end test Issue ga4gh#374
TODO: - more data for datadriven tests? - implement and tests frontend methods - end-to-end test Issue ga4gh#374
TODO: - more data for datadriven tests? - implement and tests frontend methods - end-to-end test Issue ga4gh#374
TODO: - more data for datadriven tests? - test frontend and backend methods - end-to-end test Issue ga4gh#374
TODO: - more data for datadriven tests? - test frontend and backend methods - end-to-end test Issue ga4gh#374
TODO: - more data for datadriven tests? - test frontend and backend methods - end-to-end test Issue ga4gh#374
TODO: - more data for datadriven tests? - test frontend and backend methods - end-to-end test Issue ga4gh#374
Fixed in #390 |
We currently do not support the references section of the API, which we should redress. This can act as a meta-issue for references support, and can be closed once the all of the sub-issues have been resolved.
1. Choose an alternative (pip installable) FASTA file parsing library, for data driven tests.2. Implement
toProtocolElement
forReferenceSet
andReference
in ga4gh/datamodel/references.py, and add data driven tests for this functionality (using the library chosen above).3. Implement
getBases
(or whatever seems appropriate) indatamodel/references.py
as the low-level equivalent of theListReferenceBasesRequest
method. This should havestart
andend
parameters, and return a string. Create datadriven tests for this functionality, and check corner cases. Add lots of examples of FASTA files.4. Add a
ReferenceSimulator
that generates random sequence in a reproducible manner.5. Add support for the ListReferenceBases queries in
backend.py
. Add tests for this functionality in all the appropriate places.6. Update the ga4gh-example-data to include the relevant subsets of the GRC references for the 1000G data (but see #312).
The text was updated successfully, but these errors were encountered: