Determine the Encoding of a HTML Byte Stream

This package implements the HTML Standard's encoding sniffing algorithm in all its glory. The most interesting part of this is how it pre-scans the first 1024 bytes in order to search for certain <meta charset>-related patterns.

const htmlEncodingSniffer = require("html-encoding-sniffer");
const fs = require("fs");

const htmlBuffer = fs.readFileSync("./html-page.html");
const sniffedEncoding = htmlEncodingSniffer(htmlBuffer);

The returned value will be a canonical encoding name (not a label). You might then combine this with the whatwg-encoding package to decode the result:

const whatwgEncoding = require("whatwg-encoding");
const htmlString = whatwgEncoding.decode(htmlBuffer, sniffedEncoding);

Options

You can pass two potential options to htmlEncodingSniffer:

const sniffedEncoding = htmlEncodingSniffer(htmlBuffer, {
  transportLayerEncodingLabel,
  defaultEncoding
});

These represent two possible inputs into the encoding sniffing algorithm:

transportLayerEncodingLabel is an encoding label that is obtained from the "transport layer" (probably a HTTP Content-Type header), which overrides everything but a BOM.
defaultEncoding is the ultimate fallback encoding used if no valid encoding is supplied by the transport layer, and no encoding is sniffed from the bytes. It defaults to "windows-1252", as recommended by the algorithm's table of suggested defaults for "All other locales" (including the en locale).

Credits

This package was originally based on the excellent work of @nicolashenry, in jsdom. It has since been pulled out into this separate package.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
lib		lib
test		test
.editorconfig		.editorconfig
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE.txt		LICENSE.txt
README.md		README.md
package.json		package.json
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Determine the Encoding of a HTML Byte Stream

Options

Credits

About

Releases

Packages

Languages

License

openandclose/html-encoding-sniffer

Folders and files

Latest commit

History

Repository files navigation

Determine the Encoding of a HTML Byte Stream

Options

Credits

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages