UnicodeReader misdetects UTF-32LE as UTF-16LE

[UnicodeReader](https://github.com/google/gdata-java-client/blob/c6202a55f5f29afb37ffcf876674dca372f3fb4c/java/src/com/google/gdata/util/io/base/UnicodeReader.java#L66) can't actually detect UTF-32LE encodings.  There's a big chain of if/else if/... blocks in the constructor that examine the first few bytes from an input stream.  The blocks for detecting UTF-16LE and UTF-32LE are:

    /* ... * /
    else if ((bom[0] == (byte) 0xFF) && (bom[1] == (byte) 0xFE)) {
      encoding = "UTF-16LE";
      unread = n - 2;
    }
    /* ...code for UTF-32BE ... */
    else if ((bom[0] == (byte) 0xFF) && (bom[1] == (byte) 0xFE)
        && (bom[2] == (byte) 0x00) && (bom[3] == (byte) 0x00)) {
      encoding = "UTF-32LE";
      unread = n - 4;
    } else /* ... */

The condition for the UTF-32LE case:

    (bom[0] == (byte) 0xFF) && (bom[1] == (byte) 0xFE)
      && (bom[2] == (byte) 0x00) && (bom[3] == (byte) 0x00)

can't be true unless the earlier case for UTF-16LE was also true:

    (bom[0] == (byte) 0xFF) && (bom[1] == (byte) 0xFE)

So something that's UTF-32LE would be detected as UTF-16LE.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

UnicodeReader misdetects UTF-32LE as UTF-16LE #471

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

UnicodeReader misdetects UTF-32LE as UTF-16LE #471

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions