feat: add utility functions for dealing with Uint8Arrays #55

achingbrain · 2020-07-30T10:37:26Z

On my weird side-quest to replace node Buffers with Uint8Arrays I find myself reimplementing the same functions over and over again in each repo.

In the interests of reuse I've made utility functions for them here.

On my weird side-quest to replace node `Buffer`s with `Uint8Array`s I find myself reimplementing the same functions over and over again in each repo. In the interests of reuse I've made utility functions for them here.

achingbrain · 2020-07-30T16:53:13Z

src/uint8arrays/from-string.js

+
+    string = `${base.code}${string}`
+
+    return Uint8Array.from(multibase.decode(string))


We can get rid of the copy when multiformats/js-multibase#63 is merged, though it depends on this PR. What larks.

This whole circular dependence makes me feel uncomfortable:

It is awkward that you have to prefix string to decode

Non trivial chunk of code is being pulled in now where in most cases I think you only care about utf-8

multibase also uses this function which could short circuit (if things are overlooked)

It also doesn't appear like other encodings are used anywhere, so is this really necessary ? If so I think it would be far better to factor out logic behind multibase.decode(string) into separate library that just exposes underlying encoders / decoders and share it between multibase and this.

You can call .base to get the underlying base encoder without prefix

Gozala

Really cool to have this functionality factored out, I have also found myself needing this all over the place. I do however think that introducing circular dependency is unfortunate side effect that should could be addressed by factoring shared functionality into separate shared library. I think it would be best to do following:

leverage web-encoding for encoding / decoding text into all codecs that TextEncoder / TextDecoder support.
Factor out base codecs from multibase into another library e.g base-encoding.
Share above two between this and multibase which would:
- Untangle circular dependency
- Provide a good alternative for Buffer to libraries outside of IPFS.

Gozala · 2020-07-30T18:19:10Z

src/uint8arrays/concat.js

+function concat (arrs, length) {
+  if (!length) {
+    length = arrs.reduce((acc, curr) => {
+      if (ArrayBuffer.isView(curr)) {


📝 The logic that verifies each array and takes it's size is happening here once and then again down below. Could you maybe factor out that logic to byteLength(arr) function that returns length or throws if not a valid input ?

Gozala · 2020-07-30T18:24:00Z

src/uint8arrays/concat.js

+  const output = new Uint8Array(length)
+  let offset = 0
+
+  arrs.forEach(arr => {


📝 I'd encourage for here as I expect to be in hot code paths.

Gozala · 2020-07-30T19:55:46Z

src/uint8arrays/from-string.js

+const TextEncoder = require('../text-encoder')
+const utf8Encoder = new TextEncoder('utf8')
+
+function fromString (string, encoding = 'utf8') {


Can you please add some comment what this supposed to do and a note what encoding this supposed to support.

Gozala · 2020-07-30T20:08:39Z

src/uint8arrays/from-string.js

+
+    string = `${base.code}${string}`
+
+    return Uint8Array.from(multibase.decode(string))


This whole circular dependence makes me feel uncomfortable:

It is awkward that you have to prefix string to decode

Non trivial chunk of code is being pulled in now where in most cases I think you only care about utf-8

multibase also uses this function which could short circuit (if things are overlooked)

It also doesn't appear like other encodings are used anywhere, so is this really necessary ? If so I think it would be far better to factor out logic behind multibase.decode(string) into separate library that just exposes underlying encoders / decoders and share it between multibase and this.

Gozala · 2020-07-30T20:09:48Z

src/uint8arrays/to-string.js

+
+function toString (buf, encoding = 'utf8') {
+  if (encoding !== 'utf8') {
+    buf = multibase.encode(encoding, buf).subarray(1)


Same sentiment as above.

hugomrdias · 2020-07-31T08:16:42Z

https://github.com/mikeal/bytesish for inspiration

achingbrain · 2020-07-31T09:55:34Z

It is awkward that you have to prefix string to decode

Thats.. just multibase? We could probably reach deep into it's heart and pull out the decoding/encoding functions but in the interests of expediency we can solve this in a future PR.

Non trivial chunk of code is being pulled in now

multibase is 6.4kB or 2.4kB gzipped. It's not nothing but I wouldn't call it non-trivial.

where in most cases I think you only care about utf-8

My experience converting a few 10s of modules this week has been the opposite. We care about utf-8, hex and base64, more specifically base64urlpad though mostly in tests. I started adding to-hex-string, to-base64-string etc type functions but then thought 'erm, we've already solved this problem' and added multibase.

multibase also uses this function which could short circuit (if things are overlooked)

I think to break the circular dependency I might just copy the functions used into multibase as it's only a few lines of code.

Factor out base codecs from multibase into another library e.g base-encoding
leverage web-encoding for encoding / decoding text into all codecs that TextEncoder / TextDecoder support.

These are good suggestions. As a rule I'm not a massive fan of multi-purpose utility libraries and would like to see this module broken up into smaller more focussed modules, but let's decouple that from this PR.

Gozala · 2020-07-31T14:20:27Z

@achingbrain Now that you have pulled this into uint8arrays library is this pull still relevant ?

achingbrain · 2020-07-31T14:38:59Z

No, I'm going to close it shortly - just updating all the dependent PRs to use the uint8arrays module. I'm leaving it open while I do that so they don't break in the meantime as the branch will get deleted.

feat: add utility methods for dealing with Uint8Arrays

cf9c5c6

On my weird side-quest to replace node `Buffer`s with `Uint8Array`s I find myself reimplementing the same functions over and over again in each repo. In the interests of reuse I've made utility functions for them here.

achingbrain changed the title ~~feat: add utility methods for dealing with Uint8Arrays~~ feat: add utility functions for dealing with Uint8Arrays Jul 30, 2020

achingbrain requested review from Gozala and hugomrdias July 30, 2020 10:38

achingbrain mentioned this pull request Jul 30, 2020

fix: replace node buffers with uint8arrays multiformats/js-cid#117

Merged

3 tasks

achingbrain added 5 commits July 30, 2020 13:11

chore: add from hex string and jsdoc comments

837bbb1

chore: rename for consistency

659b76c

chore: unsaved files

079f80f

chore: add to hex string method

371b6f4

chore: add encoding to to/from string

fb83a28

achingbrain mentioned this pull request Jul 30, 2020

fix: replace node Buffers with Uint8Arrays ipld/js-ipld-dag-pb#187

Merged

3 tasks

chore: rename sort to compare

bb5033e

hugomrdias approved these changes Jul 30, 2020

View reviewed changes

This was referenced Jul 30, 2020

fix: replace node buffers with uint8arrays ipfs/js-ipfs-repo-migrations#25

Merged

fix: swap node buffers for uint8arrays ipfs/js-ipfs-repo#249

Merged

fix: return Uint8Arrays multiformats/js-multibase#63

Merged

achingbrain commented Jul 30, 2020

View reviewed changes

Gozala requested changes Jul 30, 2020

View reviewed changes

chore: address PR comments

4030927

achingbrain marked this pull request as draft July 31, 2020 15:30

achingbrain closed this Jul 31, 2020

achingbrain deleted the feat/add-utility-uint8array-functions branch July 31, 2020 15:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add utility functions for dealing with Uint8Arrays #55

feat: add utility functions for dealing with Uint8Arrays #55

achingbrain commented Jul 30, 2020

achingbrain Jul 30, 2020 •

edited

Loading

Gozala Jul 30, 2020

hugomrdias Jul 31, 2020

Gozala left a comment

Gozala Jul 30, 2020

Gozala Jul 30, 2020

Gozala Jul 30, 2020

Gozala Jul 30, 2020

Gozala Jul 30, 2020

hugomrdias commented Jul 31, 2020

achingbrain commented Jul 31, 2020 •

edited

Loading

Gozala commented Jul 31, 2020

achingbrain commented Jul 31, 2020 •

edited

Loading


		string = `${base.code}${string}`

		return Uint8Array.from(multibase.decode(string))

feat: add utility functions for dealing with Uint8Arrays #55

feat: add utility functions for dealing with Uint8Arrays #55

Conversation

achingbrain commented Jul 30, 2020

achingbrain Jul 30, 2020 • edited Loading

Choose a reason for hiding this comment

Gozala Jul 30, 2020

Choose a reason for hiding this comment

hugomrdias Jul 31, 2020

Choose a reason for hiding this comment

Gozala left a comment

Choose a reason for hiding this comment

Gozala Jul 30, 2020

Choose a reason for hiding this comment

Gozala Jul 30, 2020

Choose a reason for hiding this comment

Gozala Jul 30, 2020

Choose a reason for hiding this comment

Gozala Jul 30, 2020

Choose a reason for hiding this comment

Gozala Jul 30, 2020

Choose a reason for hiding this comment

hugomrdias commented Jul 31, 2020

achingbrain commented Jul 31, 2020 • edited Loading

Gozala commented Jul 31, 2020

achingbrain commented Jul 31, 2020 • edited Loading

achingbrain Jul 30, 2020 •

edited

Loading

achingbrain commented Jul 31, 2020 •

edited

Loading

achingbrain commented Jul 31, 2020 •

edited

Loading