Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add utility functions for dealing with Uint8Arrays #55

Closed
wants to merge 8 commits into from

Conversation

achingbrain
Copy link
Member

On my weird side-quest to replace node Buffers with Uint8Arrays I find myself reimplementing the same functions over and over again in each repo.

In the interests of reuse I've made utility functions for them here.

On my weird side-quest to replace node `Buffer`s with `Uint8Array`s
I find myself reimplementing the same functions over and over again
in each repo.

In the interests of reuse I've made utility functions for them here.
@achingbrain achingbrain changed the title feat: add utility methods for dealing with Uint8Arrays feat: add utility functions for dealing with Uint8Arrays Jul 30, 2020

string = `${base.code}${string}`

return Uint8Array.from(multibase.decode(string))
Copy link
Member Author

@achingbrain achingbrain Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can get rid of the copy when multiformats/js-multibase#63 is merged, though it depends on this PR. What larks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole circular dependence makes me feel uncomfortable:

  • It is awkward that you have to prefix string to decode
  • Non trivial chunk of code is being pulled in now where in most cases I think you only care about utf-8
  • multibase also uses this function which could short circuit (if things are overlooked)

It also doesn't appear like other encodings are used anywhere, so is this really necessary ? If so I think it would be far better to factor out logic behind multibase.decode(string) into separate library that just exposes underlying encoders / decoders and share it between multibase and this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can call .base to get the underlying base encoder without prefix

Copy link
Contributor

@Gozala Gozala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really cool to have this functionality factored out, I have also found myself needing this all over the place. I do however think that introducing circular dependency is unfortunate side effect that should could be addressed by factoring shared functionality into separate shared library. I think it would be best to do following:

  • leverage web-encoding for encoding / decoding text into all codecs that TextEncoder / TextDecoder support.
  • Factor out base codecs from multibase into another library e.g base-encoding.
  • Share above two between this and multibase which would:
    • Untangle circular dependency
    • Provide a good alternative for Buffer to libraries outside of IPFS.

function concat (arrs, length) {
if (!length) {
length = arrs.reduce((acc, curr) => {
if (ArrayBuffer.isView(curr)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 The logic that verifies each array and takes it's size is happening here once and then again down below. Could you maybe factor out that logic to byteLength(arr) function that returns length or throws if not a valid input ?

const output = new Uint8Array(length)
let offset = 0

arrs.forEach(arr => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 I'd encourage for here as I expect to be in hot code paths.

const TextEncoder = require('../text-encoder')
const utf8Encoder = new TextEncoder('utf8')

function fromString (string, encoding = 'utf8') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add some comment what this supposed to do and a note what encoding this supposed to support.


string = `${base.code}${string}`

return Uint8Array.from(multibase.decode(string))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole circular dependence makes me feel uncomfortable:

  • It is awkward that you have to prefix string to decode
  • Non trivial chunk of code is being pulled in now where in most cases I think you only care about utf-8
  • multibase also uses this function which could short circuit (if things are overlooked)

It also doesn't appear like other encodings are used anywhere, so is this really necessary ? If so I think it would be far better to factor out logic behind multibase.decode(string) into separate library that just exposes underlying encoders / decoders and share it between multibase and this.


function toString (buf, encoding = 'utf8') {
if (encoding !== 'utf8') {
buf = multibase.encode(encoding, buf).subarray(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same sentiment as above.

@hugomrdias
Copy link
Member

https://github.com/mikeal/bytesish for inspiration

@achingbrain
Copy link
Member Author

achingbrain commented Jul 31, 2020

It is awkward that you have to prefix string to decode

Thats.. just multibase? We could probably reach deep into it's heart and pull out the decoding/encoding functions but in the interests of expediency we can solve this in a future PR.

Non trivial chunk of code is being pulled in now

multibase is 6.4kB or 2.4kB gzipped. It's not nothing but I wouldn't call it non-trivial.

where in most cases I think you only care about utf-8

My experience converting a few 10s of modules this week has been the opposite. We care about utf-8, hex and base64, more specifically base64urlpad though mostly in tests. I started adding to-hex-string, to-base64-string etc type functions but then thought 'erm, we've already solved this problem' and added multibase.

multibase also uses this function which could short circuit (if things are overlooked)

I think to break the circular dependency I might just copy the functions used into multibase as it's only a few lines of code.

Factor out base codecs from multibase into another library e.g base-encoding
leverage web-encoding for encoding / decoding text into all codecs that TextEncoder / TextDecoder support.

These are good suggestions. As a rule I'm not a massive fan of multi-purpose utility libraries and would like to see this module broken up into smaller more focussed modules, but let's decouple that from this PR.

@Gozala
Copy link
Contributor

Gozala commented Jul 31, 2020

@achingbrain Now that you have pulled this into uint8arrays library is this pull still relevant ?

@achingbrain
Copy link
Member Author

achingbrain commented Jul 31, 2020

No, I'm going to close it shortly - just updating all the dependent PRs to use the uint8arrays module. I'm leaving it open while I do that so they don't break in the meantime as the branch will get deleted.

@achingbrain achingbrain marked this pull request as draft July 31, 2020 15:30
@achingbrain achingbrain deleted the feat/add-utility-uint8array-functions branch July 31, 2020 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants