New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues with HeaderlessTSVTaxonomyFormat #127
Comments
@JTFouquier we intentionally didn't create a transformer to turn files with headers into headerless files, in order to discourage using/generating these types of files (it's generally considered a bad practice in data science). Thus, we support importing headerless taxonomy files in order to be compatible with popular reference databases (Greengenes, for example, doesn't have headers in its taxonomy files), but we don't support exporting into a headerless format. The feature classifier tutorial has an example of importing a Greengenes (headerless) file into an artifact. Internally, the artifact ( If you'd like to get your taxonomy file back into a headerless format, you can export the data from your Does that solve your issue? I might be misunderstanding what you're trying to do -- if you could provide more details about your use-case in q2-ghost-tree that would be helpful. |
Thanks @jairideout. Yes, the UNITE ITS database's taxonomy files are headerless, so I originally selected the This command I did not realize that my .qza was not actually in And when I used I understand what's going on and why you guys chose to convert to a Thanks! |
@JTFouquier that misunderstanding makes complete sense. I'm not sure we'd want to alert/say anything when it happens as it's typically going to be the rule that the data is converted (and I don't want the user to mistake the alert as something they need to worry about). It's only in the case where you happen to already have it in the canonical format that it will leave it alone. We definitely need some better education/docs on this feature. We're still kind of feeling out how we teach these ideas. |
Yeah I understand.... a warning would only be nice for developers and would be confusing for users. One more thing related to this. Ghost-tree does have stricter requirements in the taxonomy file than a typical taxonomy file. Would I want to just use the existing checks in ghost-tree itself or do I want to proceed with my |
@JTFouquier you may need a custom semantic type (potentially), could you remind us of the restrictions that ghost-tree has on the format/data? |
It has to contain genera, so 'g__' .... I will be adding a feature soon to use other graft points (as an option) so 'f__' or 'o__' etc, .... so maybe it wouldn't make sense to have a special semantic type. I would like to not have to require the 'g__' like format, but I don't always trust taxonomy files without that designation since taxonomy varies.... |
Thinking about this a bit, I think you should probably organize it such that you take a given taxonomic level (a number like in q2-feature-classifier) then you aren't tied to the prefix, and it is the user's responsibility to identify at what level they would like the engraftment. Barring that, you'll need some kind of validation to assert that the format has the prefixes you expect because our format and semantic type do not have an opinion (this is also why the level instead of prefix is probably a better idea). If you were to keep the prefix you might want to require a property on your type, e.g.:
Seeing as you expect a specific ontology for the taxonomy (the greengenes ontology). In short all |
Closing as it looks like the original questions about |
@ebolyen, coming back to this, can you please point me to where in q2-feature-classifier is a good example of how to use the taxonomic level instead of the prefix? I'm not really sure how to apply that. I have something like |
q2-taxa's |
@thermokarst I think the issue I was having with the HeaderlessTSVTaxonomyFormat was possibly related to the wrong base class being used? I'm not sure but it is obviously not like the rest.
https://github.com/qiime2/q2-types/blob/master/q2_types/feature_data/_format.py#L69
I noticed this because I was getting this error and I didn't know why it was trying to do a transformation to a HeaderlessTSVTaxonomyFormat when I am positive I already gave it a HeaderlessTSVTaxonomyFormat.
Well, it's also not defined yet, so I'll just make my own for now. Thanks for all your help!
The text was updated successfully, but these errors were encountered: