go-utf8n

Package utf8n implements functions and constants to support normalizing text encoded in UTF-8.

This package is similar to the Go built-in "unicode/utf8" package, except it normalizes ‘line separator’ and ‘paragraph separator’ characters.

So that it transforms:

	CR LF ⇒ LS

	LF    ⇒ LS

	CR    ⇒ LS

	NEL   ⇒ LS

And then after (conceptually) doing that, transforms:

	LS LS ⇒ PS

The meanings of LF, CR, NEL, LS, and PS are:

	LF  = “line feed”            = U+000A = '\u000A' = '\n'

	CR  = “carriage return”      = U+000D = '\u000D' = '\r'

	NEL = “next line”            = U+0085 = '\u0085'

	LS  = “line separator”       = U+2028 = '\u2028'

	PS  = “paragraph separator”  = U+2029 = '\u2029'

The result of these transformations is that:

№1: ‘line separator’, and ‘paragraph separator’ characters are always represented by a single rune,

№2: ‘line separator’, and ‘paragraph separator’ characters are always represented by the same runes.

Documention

Online documentation, which includes examples, can be found at: http://godoc.org/github.com/reiver/go-utf8n

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
LICENSE		LICENSE
README.md		README.md
consts.go		consts.go
decode_rune.go		decode_rune.go
decode_rune_ls.go		decode_rune_ls.go
decode_rune_ls_test.go		decode_rune_ls_test.go
decode_rune_ps.go		decode_rune_ps.go
decode_rune_ps_test.go		decode_rune_ps_test.go
doc.go		doc.go
errors.go		errors.go
example_runescanner_test.go		example_runescanner_test.go
runeerror.go		runeerror.go
runescanner.go		runescanner.go
runescanner_test.go		runescanner_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

go-utf8n

Documention

About

Releases

Packages

Languages

License

reiver/go-utf8n

Folders and files

Latest commit

History

Repository files navigation

go-utf8n

Documention

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages