Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsupported TNT file #50

Closed
saramortara opened this issue Mar 28, 2021 · 5 comments · Fixed by #51
Closed

Unsupported TNT file #50

saramortara opened this issue Mar 28, 2021 · 5 comments · Fixed by #51

Comments

@saramortara
Copy link

Dear Martin,

I'm trying to read the TNT matrix from Mirande 2008 (Appendix S5 file characidae.tnt) using ReadTntCharacters() and I'm getting the following error:

Error in toupper (lines): invalid multibyte string 3842
In addition: Warning messages:
1: In grep ("'", lines, fixed = TRUE):
   input string 3857 is invalid at that locale
2: In grep (";", lines, fixed = TRUE):
   input string 3842 is invalid at that locale

Do you have any idea of what is happening? Can it be a problem of encoding? Is there a way to control it?

Best,

Sara

@ms609
Copy link
Owner

ms609 commented Mar 29, 2021

Hi Sara, thanks for this report, and sorry that you're having trouble opening the file. I've made some small changes to the ReadTntCharacters() function and can now open the Mirande file successfully on my local installation, though I didn't encounter the encoding issue.

Could you try installing the development version of 'TreeTools' with devtools::install_github('ms609/TreeTools') and see whether that brings you success? If not, could you let me know the value reported when you type getOption('encoding') into the R console, and if native.enc, the value of Sys.getlocale("LC_CTYPE")?

Thanks,

Martin

@saramortara
Copy link
Author

saramortara commented Mar 29, 2021 via email

@ms609
Copy link
Owner

ms609 commented Mar 29, 2021

Hi Sara,

It's strange that R is failing to autodetect the encoding of the file (which I suspect is the issue; I hate grappling with encoding errors!)

One thing to try would be to modify the encoding of the Mirande file, for example by opening with Notepad++ and using the "Encoding→UTF-8" menu option. (RStudio can do the same thing.)

Failing that there are a few other changes to the code that I can experiment with; of course it's tricky to debug issues that can't be reproduced locally!

Let me know how you get on,

Martin

@saramortara
Copy link
Author

saramortara commented Mar 29, 2021 via email

ms609 pushed a commit that referenced this issue Mar 30, 2021
@ms609
Copy link
Owner

ms609 commented Mar 30, 2021

Thanks for the suggestion. readLines() already does its best at guessing the encoding (successfully in my case; I wonder why it's failing on your system) – perhaps 'readr' has a more sophisticated protocol?

In the first instance I've updated the function documentation; if you get the chance, perhaps you could check that the "Details" text makes sense and the new example works on your machine as well as mine?
Thanks,
Martin

ms609 pushed a commit that referenced this issue Apr 12, 2021
@ms609 ms609 closed this as completed in #51 Apr 12, 2021
ms609 added a commit that referenced this issue Apr 12, 2021
* Prepare 1.4.3 release

Fixes #50

* covr on Win only

* R 4.0.4; rm revdep checks

* 'phangorn' requires R3.6.0

* www.

* Restore `UnshiftTree()`, which retains names.

* ape built on R3.6.3

- On CRAN.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants