-
Notifications
You must be signed in to change notification settings - Fork 448
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Warn if bgzf_getline() returned apparently UTF-16-encoded text
Text files badly transferred from Windows may occasionally be UTF-16-encoded, and this may not be easily noticed by the user. HTSlib should not accept such encoding (as other tools surely don't, hence doing so would cause interoperability problems), but it should ideally emit a warning or error message identifying the problem. Reading text from a htsFile/samFile/vcfFile will already have failed with EFTYPE/ENOEXEC if the text file is UTF-16-encoded, as the encoding will not have been recognised by hts_detect_format(). OTOH bgzf_getline() will return a UTF-16-encoded text line. Add a suitable context-dependent diagnostic to the BGZF-based bgzf_getline() calls in HTSlib: in hts_readlist()/hts_readlines(), emit a warning (once, on the first line); in tbx.c, emit a more specific error message if get_intv() parsing failure is due to UTF-16 encoding. [TODO] If utf16_text_format were added to htsFormatCategory, the new is_utf16_text() function is suitable for detecting it.
- Loading branch information
Showing
3 changed files
with
39 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters