-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add utility functions for dealing with Uint8Arrays #55
Conversation
On my weird side-quest to replace node `Buffer`s with `Uint8Array`s I find myself reimplementing the same functions over and over again in each repo. In the interests of reuse I've made utility functions for them here.
src/uint8arrays/from-string.js
Outdated
|
||
string = `${base.code}${string}` | ||
|
||
return Uint8Array.from(multibase.decode(string)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can get rid of the copy when multiformats/js-multibase#63 is merged, though it depends on this PR. What larks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole circular dependence makes me feel uncomfortable:
- It is awkward that you have to prefix string to decode
- Non trivial chunk of code is being pulled in now where in most cases I think you only care about utf-8
- multibase also uses this function which could short circuit (if things are overlooked)
It also doesn't appear like other encodings are used anywhere, so is this really necessary ? If so I think it would be far better to factor out logic behind multibase.decode(string)
into separate library that just exposes underlying encoders / decoders and share it between multibase and this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can call .base to get the underlying base encoder without prefix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really cool to have this functionality factored out, I have also found myself needing this all over the place. I do however think that introducing circular dependency is unfortunate side effect that should could be addressed by factoring shared functionality into separate shared library. I think it would be best to do following:
- leverage web-encoding for encoding / decoding text into all codecs that
TextEncoder
/TextDecoder
support. - Factor out base codecs from multibase into another library e.g
base-encoding
. - Share above two between this and multibase which would:
- Untangle circular dependency
- Provide a good alternative for
Buffer
to libraries outside of IPFS.
src/uint8arrays/concat.js
Outdated
function concat (arrs, length) { | ||
if (!length) { | ||
length = arrs.reduce((acc, curr) => { | ||
if (ArrayBuffer.isView(curr)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
📝 The logic that verifies each array and takes it's size is happening here once and then again down below. Could you maybe factor out that logic to byteLength(arr)
function that returns length or throws if not a valid input ?
src/uint8arrays/concat.js
Outdated
const output = new Uint8Array(length) | ||
let offset = 0 | ||
|
||
arrs.forEach(arr => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
📝 I'd encourage for
here as I expect to be in hot code paths.
const TextEncoder = require('../text-encoder') | ||
const utf8Encoder = new TextEncoder('utf8') | ||
|
||
function fromString (string, encoding = 'utf8') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add some comment what this supposed to do and a note what encoding this supposed to support.
src/uint8arrays/from-string.js
Outdated
|
||
string = `${base.code}${string}` | ||
|
||
return Uint8Array.from(multibase.decode(string)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole circular dependence makes me feel uncomfortable:
- It is awkward that you have to prefix string to decode
- Non trivial chunk of code is being pulled in now where in most cases I think you only care about utf-8
- multibase also uses this function which could short circuit (if things are overlooked)
It also doesn't appear like other encodings are used anywhere, so is this really necessary ? If so I think it would be far better to factor out logic behind multibase.decode(string)
into separate library that just exposes underlying encoders / decoders and share it between multibase and this.
|
||
function toString (buf, encoding = 'utf8') { | ||
if (encoding !== 'utf8') { | ||
buf = multibase.encode(encoding, buf).subarray(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same sentiment as above.
https://github.com/mikeal/bytesish for inspiration |
Thats.. just multibase? We could probably reach deep into it's heart and pull out the decoding/encoding functions but in the interests of expediency we can solve this in a future PR.
multibase is 6.4kB or 2.4kB gzipped. It's not nothing but I wouldn't call it non-trivial.
My experience converting a few 10s of modules this week has been the opposite. We care about utf-8, hex and base64, more specifically base64urlpad though mostly in tests. I started adding
I think to break the circular dependency I might just copy the functions used into multibase as it's only a few lines of code.
These are good suggestions. As a rule I'm not a massive fan of multi-purpose utility libraries and would like to see this module broken up into smaller more focussed modules, but let's decouple that from this PR. |
@achingbrain Now that you have pulled this into |
No, I'm going to close it shortly - just updating all the dependent PRs to use the |
On my weird side-quest to replace node
Buffer
s withUint8Array
s I find myself reimplementing the same functions over and over again in each repo.In the interests of reuse I've made utility functions for them here.