Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support FeatureData[Sequences] (OTU Map) #92

Closed
jairideout opened this issue Dec 19, 2016 · 6 comments
Closed

support FeatureData[Sequences] (OTU Map) #92

jairideout opened this issue Dec 19, 2016 · 6 comments

Comments

@jairideout
Copy link
Member

jairideout commented Dec 19, 2016

Improvement Description
It'd be useful to support FeatureData[Sequences], i.e. analogous to QIIME 1's "OTU Map". This type/format describes the sequences in each feature (e.g. sequences that clustered into an OTU).

Comments
We had planned to add this type but deferred until we could come up with a reasonable file format (the QIIME 1 OTU Map format is un-parsable in Python when the lines are too long).

References
This type was requested on the QIIME 2 forum here.

@JTFouquier
Copy link

The QIIME 2 semantic type page says "FeatureData[Sequence]: A single unaligned sequence associated with a feature identifier (e.g. a representative sequence).".... So shouldn't FeatureData[Sequences] be a generic .fasta file (unaligned, multiple seqs)? Because I need that one in addition to an OTU map :) Let me know if that exists, and if not I'll register both on q2-ghost-tree for now. Thanks!

@jairideout
Copy link
Member Author

The file format(s) used to implement FeatureData[Sequences] can be anything, as long as the file format(s) can encode this "mapping" of feature IDs to sequence IDs, and the sequences themselves. Your idea of storing the sequences in FASTA format should work -- I think DNAFASTAFormat is what you want. You could design a (second) file format to store the mapping of feature IDs to sequence IDs, and then create your "OTU Map" directory format that is composed of those two file formats (see PairedDNASequencesDirectoryFormat for an example). After that you can register a transformer that converts the "OTU Map" directory format into an appropriate data structure for your plugin.

Get in touch if you have any issues with this -- the docs are currently very sparse on this topic and we'll be reworking the API to make it easier to create directory formats in the future.

@jairideout
Copy link
Member Author

forum xref

@thermokarst
Copy link
Contributor

This recently came up on the forum.

@ebolyen
Copy link
Member

ebolyen commented Jul 20, 2018

Something I think might work better than FeatureData[Sequences] would be a FeatureData[Features] which would let you describe some hierarchical relationship of features.

In practice this would probably just look like a metadata file which would work with feature-table group. Your reads would be the feature-ids and the OTUs would be the column. This would compose easily with other stuff and you can kind of make it work right now.

@lizgehret
Copy link
Member

This is the intention of the new FeatureMap type that's been moved from q2-types-genomics. Closing out this issue as developers should figure out how this should be added to individual plugins.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants