Doing things right, in the name of Sun / Oracle
Switch branches/tags
Nothing to show
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.settings removed Eclipse compiler settings that enforced Java 1.4 compliance Sep 3, 2017
example/net/pempek/unicode initial commit Dec 22, 2014
src/net/pempek/unicode Add @override annotations Sep 3, 2017
test/net/pempek/unicode initial commit Dec 22, 2014
.classpath initial commit Dec 22, 2014
.gitignore initial commit Dec 22, 2014
.project initial commit Dec 22, 2014
LICENSE initial commit Dec 22, 2014
README.md initial commit Dec 22, 2014
offending_bom.txt initial commit Dec 22, 2014

README.md

UnicodeBOMInputStream

A helper class to skip Unicode BOMs at the beginning of input streams.

I initially released this class as a Stack Overflow answer and it apparently got copy-pasted into several Java projects already. However, code put as answers on Stack Overflow is licensed under CC-BY-SA 3.0 which may not suit everybody.


Why?

Many years have passed since I wrote this class and today Java still doesn't properly deal with UTF-8 Unicode Byte Order Marks (BOMs) at the beginning of data. In 2001, someone opened bug JDK-4508058 with the sound expectation Java should detect and skip UTF-8 BOMs at the beginning of UTF-8 streams, the same way it does for e.g. UTF-16.

Bug JDK-4508058 remained open for a while, then got fixed and ultimately reverted because some other great programmers relied on that exact same bug:

the Java EE 5 RI and SJSAS 9.0 has been relying on detecting a BOM, setting the appropriate encoding, and discarding the BOM bytes before reading the input

See, they're complaining because shipped code breaks if/when JDK behavior changes. And instead of fixing JDK-4508058 and accept this would be an annoyance only for Java EE 5 RI and SJSAS 9.0 users, people in charge at Sun decided we're all living in a better world if JDK-4508058 gets closed as "won't fix". Because fuck you, just skip the BOM yourself.


Usage

Wrap any InputStream with UnicodeBOMInputStream and use getBOM() and/or skipBOM() methods. See UnicodeBOMInputStreamUsage.java.


If you find this library useful and decide to use it in your own projects please drop me a line @gpakosz.

If you use it in a commercial project, consider using Gittip.