Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle whitespace in TSVTaxonomyFormat -> Metadata transformer #179

Closed
thermokarst opened this issue Apr 30, 2018 · 7 comments
Closed

Handle whitespace in TSVTaxonomyFormat -> Metadata transformer #179

thermokarst opened this issue Apr 30, 2018 · 7 comments
Labels
bug-sev:3|medium bug-type:2|workaround-req Progress can't be made in the usual way. diff:1|beginner Only limited knowledge of the languages and platform is required. good first issue Good for newcomers help wanted Extra attention is needed lang:python Python 3 scope:0|this-project No other repositories are impacted. src:forum From the QIIME 2 Forum. time:0|unknown No estimate yet made. type:bug Something is wrong.
Projects

Comments

@thermokarst
Copy link
Contributor

The relevant transformer: https://github.com/qiime2/q2-types/blob/master/q2_types/feature_data/_transformer.py#L192-L194

TSVTaxonomyFormat might include leading or trailing whitespace characters. The Metadata constructor doesn't strip whitespace, so this can lead to the following:

...
metadata_column = self._metadata_column_factory(series)
File “/mnt/research/germs/softwares/miniconda2/envs/qiime2-2018.2/lib/python3.5/site-packages/qiime2/metadata/metadata.py”, line 230, in _metadata_c
olumn_factory
column = CategoricalMetadataColumn(series)
File “/mnt/research/germs/softwares/miniconda2/envs/qiime2-2018.2/lib/python3.5/site-packages/qiime2/metadata/metadata.py”, line 632, in init
self._series = self.normalize(series)
File “/mnt/research/germs/softwares/miniconda2/envs/qiime2-2018.2/lib/python3.5/site-packages/qiime2/metadata/metadata.py”, line 758, in normalize
return series.apply(normalize, convert_dtype=False)
File “/mnt/research/germs/softwares/miniconda2/envs/qiime2-2018.2/lib/python3.5/site-packages/pandas/core/series.py”, line 2551, in apply
mapped = lib.map_infer(values, f, convert=convert_dtype)
File “pandas/_libs/src/inference.pyx”, line 1521, in pandas._libs.lib.map_infer
File “/mnt/research/germs/softwares/miniconda2/envs/qiime2-2018.2/lib/python3.5/site-packages/qiime2/metadata/metadata.py”, line 748, in normalize
“whitespace characters: %r” % (cls.name, value))
ValueError: CategoricalMetadataColumn does not support strings with leading or trailing whitespace characters: 'D_0__Bacteria;D_1__Chloroflexi;D_2__Gi
tt-GS-136;D_3__uncultured bacterium ’

This recently came up on the forum: https://forum.qiime2.org/t/qiime-taxa-filter-table-error/3947

@d4straub
Copy link

This happened already twice to me. I continued analysis by modifying the taxonomy.qza, which is absolutely not ideal. This bug prohibits reproducible and fluid analysis in my case.

@d4straub
Copy link

d4straub commented Nov 16, 2018

I am pretty sure that this issue is related to the occurence of # in the SILVA 132 taxonomy files. For example 74 times # in /SILVA_132_QIIME_release/taxonomy/16S_only/99/taxonomy_7_levels.txt and 61 times <whitespace># might lead to around 30 taxas with trailing whitespaces when the taxonomy is truncated.

Is this issue duplicated in #174 ?

@thermokarst
Copy link
Contributor Author

Hey there @d4straub!

This bug prohibits reproducible and fluid analysis in my case.

Sorry to hear that, PRs are welcome!

I am pretty sure that this issue is related to the occurence of # in the SILVA 132 taxonomy files

Sure, but that is probably a more specific case --- the problem is that the Metadata constructor doesn't allow for any leading or trailing whitespace on values. This applies to any form of Sample of Feature metadata, not just the taxonomy formats. I was on the fence about opening this issue here in this repo, instead of on the framework, but decided to do it here since this taxonomy format issue was a concrete example.

Is this issue duplicated in #174 ?

Not quite --- #174 has to do with the taxonomy IDs, specifically, and how they are consumed in various downstream steps (not necessarily anything to do with qiime2.Metadata --- I think #174 is actually a problem with blast exploding when it has whitespace in IDs).

This issue is referring to the taxonomy values themselves.

Both issues can probably be fixed at the same time by amending a few transformers:

It might also be possible to fix in the framework, since this is a general problem with the MetadataColumn constructor.

If this is something you are interested in addressing we would love that, and would be happy to provide some guidance, if necessary. Thanks!

@MelissaUribe
Copy link

Hello!
I am facing the same problem and all the forums relating to it in the Qiime2 forum are already closed. I am sorry in advance if I am not supposed to ask my specific question regarding this issue here and I hope you can instruct me where it would be appropriate:

So after i got this error I googled and found this possible solution: https://forum.qiime2.org/t/qiime-taxa-filter-table-error/3947/5

I dont fully understand this export function, do you need to create a folder to which the content in the taxonomy.qza is exported to? Anyways, when I try to run it I get this error message:

Error: no such option: --output-dir

I thought if I created a folder called taxonomy-with-spaces it would be solved but I guess I simply dont understand this export function. I tried qiime tools export --help and saw the option presented there is --output-path instead of --output-dir but I just get the same error. Could you help me out please?

Best regards!

@thermokarst
Copy link
Contributor Author

Hi @DevsLilSis --- can you please post this question to the QIIME 2 Forum? You can open a new post there and provide links to the other (closed) posts you found. Thanks!

@ebolyen
Copy link
Member

ebolyen commented Jul 9, 2019

It does seem like we should probably bump up the priority on this issue. I'll add it to the project backlog for now. It seems simple™.

@ebolyen ebolyen added this to Backlog (Automated) in 2019.7 via automation Jul 9, 2019
@ebolyen ebolyen added bug-sev:3|medium bug-type:2|workaround-req Progress can't be made in the usual way. diff:1|beginner Only limited knowledge of the languages and platform is required. good first issue Good for newcomers help wanted Extra attention is needed lang:python Python 3 scope:0|this-project No other repositories are impacted. src:forum From the QIIME 2 Forum. time:0|unknown No estimate yet made. type:bug Something is wrong. labels Jul 9, 2019
@thermokarst thermokarst removed this from Backlog (Automated) in 2019.7 Aug 2, 2019
@thermokarst thermokarst added this to In Progress - Supplanted Issues in 2019.10 Aug 2, 2019
@thermokarst
Copy link
Contributor Author

Fixed in #219

2019.10 automation moved this from In Progress - Supplanted Issues to Changelog Needed Aug 8, 2019
@thermokarst thermokarst moved this from Changelog Needed to Completed in 2019.10 Oct 31, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-sev:3|medium bug-type:2|workaround-req Progress can't be made in the usual way. diff:1|beginner Only limited knowledge of the languages and platform is required. good first issue Good for newcomers help wanted Extra attention is needed lang:python Python 3 scope:0|this-project No other repositories are impacted. src:forum From the QIIME 2 Forum. time:0|unknown No estimate yet made. type:bug Something is wrong.
Projects
No open projects
2019.10
  
Completed
Development

No branches or pull requests

4 participants