The scrypt password-base key derivation function (pbkdf) is an algorithm designed to be brute-force resistant that converts human readable passwords into fixed length arrays of bytes, which can then be used as a key for symmetric block ciphers, private keys, et cetera.
- Non-blocking - Gives other events in the event loop opportunities to run (asynchronous)
- Cancellable - If the key is no longer required, the computation can be cancelled
- Progress Callback - Provides the current progress of key derivation as a percentage complete
The scrypt algorithm is, by design, expensive to execute, which increases the amount of time an attacker requires in order to brute force guess a password, adjustable by several parameters which can be tuned:
- N - The CPU/memory cost; increasing this increases the overall difficulty
- r - The block size; increasing this increases the dependency on memory latency and bandwidth
- p - The parallelization cost; increasing this increases the dependency on multi-processing
You should likely not use this module for node.js as there are many faster alternatives, but if you so wish to do so:
npm install scrypt-js
TL;DR - either only allow ASCII characters in passwords, or use String.prototype.normalize('NFKC') on any password
It is HIGHLY recommended that you do NOT pass strings into this (or any password-base key derivation function) library without careful consideration; you should convert your strings to a canonical format that you will use consistently across all platforms.
When encoding passwords with UTF-8, it is important to realize that there may be multiple UTF-8 representations of a given string. Since the key generated by a password-base key derivation function is dependent on the specific bytes, this matters a great deal.
Composed vs. Decomposed
Certain UTF-8 code points can be combined with other characters to create composed characters. For example, the letter a with the umlaut diacritic mark (two dots over it) can be expressed two ways; as its composed form, U+00FC; or its decomposed form, which is the letter "u" followed by U+0308 (which basically means modify the previous character by adding an umlaut to it).
// In the following two cases, a "u" with an umlaut would be seen > '\u00fc' > 'u\u0308' // In its composed form, it is 2 bytes long > new Buffer('u\u0308'.normalize('NFKC')) <Buffer c3 bc> > new Buffer('\u00fc') <Buffer c3 bc> // Whereas the decomposed form is 3 bytes, the letter u followed by U+0308 > new Buffer('\u00fc'.normalize('NFKD')) <Buffer 75 cc 88> > new Buffer('u\u0308') <Buffer 75 cc 88>
Compatibility equivalence mode
Certain strings are often displayed the same, even though they may have different semantic means. For example, UTF-8 provides a code point for the roman number for one, which appears as the letter I, in most fonts identically. Compatibility equivalence will fold these two cases into simply the capital letter I.
> '\u2160' 'I' > 'I' 'I' > '\u2160' === 'I' false > '\u2160'.normalize('NFKC') === 'I' true
normalize() method of a string can be used to convert a string to a specific form. Without going into too much detail, I generally recommend
NFKC, however if you wish to dive deeper into this, a nice short summary can be found in Pythons unicodedata module's documentation.
For browsers without
normalize() support, the npm unorm module can be used to polyfill strings.
Another example of encoding woes
One quick story I will share is a project which used the
SHA256(encodeURI(password)) as a key, which (ignoring rainbow table attacks) had an unfortunate consequence of old web browsers replacing spaces with
+ while on new web browsers, replacing it with a
%20, causing issues for anyone who used spaces in their password.
- While it may be inconvenient to many international users, one option is to restrict passwords to a safe subset of ASCII, for example:
- My personal recommendation is to normalize to the NFKC form, however, one could imagine setting their password to a Chinese phrase on one computer, and then one day using a computer that does not have Chinese input capabilities and therefore be unable to log in.
See: Unicode Equivalence
The test cases from the scrypt whitepaper are included in
test/test-vectors.json and can be run using:
I would like to thank @dchest for his scrypt-async library and for his assistance providing feedback and optimization suggestions.
Obviously, it's all licensed under the MIT license, so use it as you wish; but if you'd like to buy me a coffee, I won't complain. =)
- Bitcoin -
- Dogecoin -
- Testnet3 -