Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support FASTA files which contain all reads per sample (i.e. not just representative seqs) #197

Open
cduvallet opened this issue Nov 5, 2018 · 0 comments

Comments

@cduvallet
Copy link
Contributor

cduvallet commented Nov 5, 2018

Improvement Description
There needs to be a way to import non-representative quality-filtered FASTA files which are somehow associated with their sample IDs, as brought up in our discussion on revamping the import tutorial (PR#358 in docs).

I imagine the trickiest part of this will be figuring out how to de-multiplex samples (if they aren't already). It will probably require a specific format for each sequence header, with the sample ID in a certain spot (probably with a delimiter of some sort separating it from other info). This will be annoying/hard because I imagine there isn't really a standard way for these files to be formatted (especially if they were acquired from previously published studies).

Proposed Behavior
A potentially easier to place to start would be to allow for importing of demultiplexed FASTA files (i.e. one FASTA file per sample). This is probably sufficient for most people's needs, actually -- users would likely need to do some file manipulations and wrangling to make the non-demultiplexed file fit QIIME 2 format specifications, so if they're doing that they might as well just split them into separate files. Not sure, up for discussion!

References
discussion on revamping the import tutorial (PR#358 in docs)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant