Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

non-ascii character in PR2 5.0 #37

Closed
frederic-mahe opened this issue Apr 14, 2023 · 3 comments
Closed

non-ascii character in PR2 5.0 #37

frederic-mahe opened this issue Apr 14, 2023 · 3 comments

Comments

@frederic-mahe
Copy link

Hello,

in the entry MF423350 (Heterocapsa steinii), the 'space' in the species name is not an ascii space. It is encoded by a 16-bit character $C2A0, where it should be a simple 8-bit value $20. That creates an error when processing this release with cutadapt:

>MF423350.1.1769_U;tax=k:Eukaryota,d:TSAR,p:Alveolata-Dinoflagellata,c:Dinophyceae,o:Peridiniales,f:Heterocapsaceae,g:Heterocapsa,s:Heterocapsa steinii

It seems to be the only non-ascii character in that release:

zgrep --color='auto' -P -n '[^\x00-\x7F]' pr2_version_5.0.0_SSU_UTAX.fasta.gz
@vaulot
Copy link
Collaborator

vaulot commented Apr 15, 2023

Thanks Fred

Super good catch... I will upload the updated files in a bit.

Cheers. Daniel

@vaulot
Copy link
Collaborator

vaulot commented Apr 15, 2023

Update done...

@vaulot vaulot closed this as completed Apr 15, 2023
@frederic-mahe
Copy link
Author

Thanks Daniel! I am happy to help, though I am sorry I did not catch that before the release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants