Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance and allow Uint8Array inputs #15

Merged
merged 1 commit into from
Nov 16, 2023
Merged

Improve performance and allow Uint8Array inputs #15

merged 1 commit into from
Nov 16, 2023

Conversation

dbrockman
Copy link
Sponsor Contributor

The input value can be a string or a Uint8Array. If it's a string, it will be encoded to a Uint8Array using the cached TextEncoder instance.

Added a new utf8Buffer option to allow the caller to provide a pre-allocated Uint8Array buffer to use for encoding. When the utf8Buffer option is provided, the input string is encoded into the buffer using TextEncoder.encodeInto().

The new test cases use the same input strings and expected bigint values as the original test cases, so it's easy to see that the new implementation produces the same results.

Benchmarks

"Original fnv1a" is the implementation on the main branch.
The full benchmark script is included below in a details tag.

Original fnv1a with a small string          339,119 ops/sec ±0.37% (96 runs sampled)
fnv1a with a small string                   395,860 ops/sec ±1.23% (93 runs sampled)
fnv1a with a small string and a utf8Buffer  493,689 ops/sec ±0.70% (93 runs sampled)
fnv1a with a small Uint8Array               512,470 ops/sec ±0.63% (95 runs sampled)

Original fnv1a with a large string            4,423 ops/sec ±0.52% (95 runs sampled)
fnv1a with a large string                     6,923 ops/sec ±0.48% (96 runs sampled)
fnv1a with a large string and a utf8Buffer    6,940 ops/sec ±0.28% (96 runs sampled)
fnv1a with a large Uint8Array                13,447 ops/sec ±0.44% (97 runs sampled)
benchmark script

This is the script used to produce the benchmarks above. It is using benchmark@2.1.4.

/* eslint no-undef: "off" */

import benchmark from 'benchmark';

// FNV_PRIMES and FNV_OFFSETS from
// http://www.isthe.com/chongo/tech/comp/fnv/index.html#FNV-param

const FNV_PRIMES = {
	32: 16_777_619n,
	64: 1_099_511_628_211n,
	128: 309_485_009_821_345_068_724_781_371n,
	256: 374_144_419_156_711_147_060_143_317_175_368_453_031_918_731_002_211n,
	512: 35_835_915_874_844_867_368_919_076_489_095_108_449_946_327_955_754_392_558_399_825_615_420_669_938_882_575_126_094_039_892_345_713_852_759n,
	1024: 5_016_456_510_113_118_655_434_598_811_035_278_955_030_765_345_404_790_744_303_017_523_831_112_055_108_147_451_509_157_692_220_295_382_716_162_651_878_526_895_249_385_292_291_816_524_375_083_746_691_371_804_094_271_873_160_484_737_966_720_260_389_217_684_476_157_468_082_573n,
};

const FNV_OFFSETS = {
	32: 2_166_136_261n,
	64: 14_695_981_039_346_656_037n,
	128: 144_066_263_297_769_815_596_495_629_667_062_367_629n,
	256: 100_029_257_958_052_580_907_070_968_620_625_704_837_092_796_014_241_193_945_225_284_501_741_471_925_557n,
	512: 9_659_303_129_496_669_498_009_435_400_716_310_466_090_418_745_672_637_896_108_374_329_434_462_657_994_582_932_197_716_438_449_813_051_892_206_539_805_784_495_328_239_340_083_876_191_928_701_583_869_517_785n,
	1024: 14_197_795_064_947_621_068_722_070_641_403_218_320_880_622_795_441_933_960_878_474_914_617_582_723_252_296_732_303_717_722_150_864_096_521_202_355_549_365_628_174_669_108_571_814_760_471_015_076_148_029_755_969_804_077_320_157_692_458_563_003_215_304_957_150_157_403_644_460_363_550_505_412_711_285_966_361_610_267_868_082_893_823_963_790_439_336_411_086_884_584_107_735_010_676_915n,
};

const cachedEncoder = new globalThis.TextEncoder();

function fnv1aUint8Array(uint8Array, size) {
	const fnvPrime = FNV_PRIMES[size];
	let hash = FNV_OFFSETS[size];

	// eslint-disable-next-line unicorn/no-for-loop -- This is a performance-sensitive loop
	for (let index = 0; index < uint8Array.length; index++) {
		hash ^= BigInt(uint8Array[index]);
		hash = BigInt.asUintN(size, hash * fnvPrime);
	}

	return hash;
}

function fnv1aEncodeInto(string, size, utf8Buffer) {
	if (utf8Buffer.length === 0) {
		throw new Error('The `utf8Buffer` option must have a length greater than zero');
	}

	const fnvPrime = FNV_PRIMES[size];
	let hash = FNV_OFFSETS[size];
	let remaining = string;

	while (remaining.length > 0) {
		const result = cachedEncoder.encodeInto(remaining, utf8Buffer);
		remaining = remaining.slice(result.read);
		for (let index = 0; index < result.written; index++) {
			hash ^= BigInt(utf8Buffer[index]);
			hash = BigInt.asUintN(size, hash * fnvPrime);
		}
	}

	return hash;
}

function fnv1a(value, {size = 32, utf8Buffer} = {}) {
	if (!FNV_PRIMES[size]) {
		throw new Error('The `size` option must be one of 32, 64, 128, 256, 512, or 1024');
	}

	if (typeof value === 'string') {
		if (utf8Buffer) {
			return fnv1aEncodeInto(value, size, utf8Buffer);
		}

		value = cachedEncoder.encode(value);
	}

	return fnv1aUint8Array(value, size);
}

function fnv1aOriginal(string, {size = 32} = {}) {
	if (!FNV_PRIMES[size]) {
		throw new Error('The `size` option must be one of 32, 64, 128, 256, 512, or 1024');
	}

	let hash = FNV_OFFSETS[size];
	const fnvPrime = FNV_PRIMES[size];

	// Handle Unicode code points > 0x7f
	let isUnicoded = false;

	for (let index = 0; index < string.length; index++) {
		let characterCode = string.charCodeAt(index);

		// Non-ASCII characters trigger the Unicode escape logic
		if (characterCode > 0x7F && !isUnicoded) {
			string = unescape(encodeURIComponent(string));
			characterCode = string.charCodeAt(index);
			isUnicoded = true;
		}

		hash ^= BigInt(characterCode);
		hash = BigInt.asUintN(size, hash * fnvPrime);
	}

	return hash;
}

const largeUint8Array = globalThis.crypto.getRandomValues(new Uint8Array(2000));
const largeString = new globalThis.TextDecoder().decode(largeUint8Array);
const smallString = 'Short non-ascci test string. Ā 𐀀 文 🦄 🌈';
const smallUint8Array = new globalThis.TextEncoder().encode(smallString);
const utf8Buffer64 = new Uint8Array(64);

const suite = new benchmark.Suite();

suite.add('Original fnv1a with a small string', () => fnv1aOriginal(smallString, {}));
suite.add('fnv1a with a small string', () => fnv1a(smallString, {}));
suite.add('fnv1a with a small string and a utf8Buffer', () => fnv1a(smallString, {utf8Buffer: utf8Buffer64}));
suite.add('fnv1a with a small Uint8Array', () => fnv1a(smallUint8Array, {}));

suite.add('Original fnv1a with a large string', () => fnv1aOriginal(largeString, {}));
suite.add('fnv1a with a large string', () => fnv1a(largeString, {}));
suite.add('fnv1a with a large string and a utf8Buffer', () => fnv1a(largeString, {utf8Buffer: utf8Buffer64}));
suite.add('fnv1a with a large Uint8Array', () => fnv1a(largeUint8Array, {}));

suite.on('cycle', event => {
	console.log(event.target.toString());
});

suite.run({async: false});

The input value can be a string or a Uint8Array. If it's a string, it will be encoded to a Uint8Array using the cached TextEncoder instance.

Added a new `utf8Buffer` option to allow the caller to provide a pre-allocated Uint8Array buffer to use for encoding. When the `utf8Buffer` option is provided, the input string is encoded into the buffer using `TextEncoder.encodeInto()`.

The new test cases use the same input strings and expected bigint values as the original test cases, so it's easy to see that the new implementation produces the same results.
@sindresorhus sindresorhus merged commit edb1546 into sindresorhus:main Nov 16, 2023
1 check passed
@sindresorhus
Copy link
Owner

Very nice! Thank you 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants