Skip to content

Latest commit

 

History

History
60 lines (44 loc) · 3.01 KB

README.md

File metadata and controls

60 lines (44 loc) · 3.01 KB

jeucreader

Maven Central Maven Central (snapshot) Codecov Java Version

com.io7m.jeucreader

JVM Platform Status
OpenJDK (Temurin) Current Linux Build (OpenJDK (Temurin) Current, Linux)
OpenJDK (Temurin) LTS Linux Build (OpenJDK (Temurin) LTS, Linux)
OpenJDK (Temurin) Current Windows Build (OpenJDK (Temurin) Current, Windows)
OpenJDK (Temurin) LTS Windows Build (OpenJDK (Temurin) LTS, Windows)

jeucreader

The jeucreader package provides an interface for reading Unicode codepoints one at a time.

Features

  • Unicode codepoint reader interface.
  • High coverage test suite.
  • Written in pure Java 17 with no dependencies.
  • OSGi-ready
  • JPMS-ready
  • ISC license.

Motivation

For some reason, Java does not expose any interface to read individual Unicode codepoints from any kind of I/O stream. It does provide methods to, for example, read text into a String and then iterate over the codepoints of the String.

The jeucreader package attempts to provide this missing functionality.

Usage

Given a java.io.Reader r, instantiate a UnicodeCharacterReaderType and use it to read individual codepoints:

Reader r;

try (var u = UnicodeCharacterReader.newReader(r)) {
  int c0 = u.readCodePoint();
  int c1 = u.readCodePoint();
  int c2 = u.readCodePoint();
  ...
}

On consuming malformed text, the reader may raise subtypes of IOException such as InvalidSurrogatePair, MissingLowSurrogate, OrphanLowSurrogate, and etc.