Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add config option to control fanout size #356

Merged
merged 2 commits into from
Aug 25, 2023

Conversation

achingbrain
Copy link
Member

Adds a shardFanoutBytes option to the importer to allow configuring the number of bytes used for the HAMT prefix, also a test.

Adds a `shardFanoutBytes` option to the importer to allow configuring
the number of bytes used for the HAMT prefix, also a test.
super(props, options)

this._bucket = createHAMT({
hashFn: hamtHashFn,
bits: 8
bits: options.shardFanoutBytes ?? 8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jsdoc description says default is 256 bytes, but it seems to be 1 byte here?

Also, is bits really bytes or do we need to do some math here?

@@ -123,6 +123,11 @@ export interface ImporterOptions extends ProgressOptions<ImporterProgressEvents>
*/
shardSplitThresholdBytes?: number

/**
* The maximum number of bytes used as a HAMT prefix for shard entries. Default: 256
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default seems to be 8 bits (1 byte) above? change this comment, or actual default?

@@ -241,6 +246,7 @@ export async function * importer (source: ImportCandidateStream, blockstore: Wri

const wrapWithDirectory = options.wrapWithDirectory ?? false
const shardSplitThresholdBytes = options.shardSplitThresholdBytes ?? 262144
const shardFanoutBytes = options.shardFanoutBytes ?? 8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is 8 bytes the default, or 256?

@rvagg
Copy link
Member

rvagg commented Aug 25, 2023

Yeah, it shouldn't be "bytes". This is a typical problem with HAMT descriptions, we're dealing with different units and the UnixFS one makes it even harder by hex stringifying and then depending on the prefix length of the hex string!

  • bits is at the base, the number of bits in the hash to chew off at each level of the HAMT, could be less than a byte but is 8 by default in the UnixFS HAMT
  • fanout as encoded in the UnixFS data field is the arity of the HAMT, the maximum number of children for each node, calculated from bits as Math.pow(2, bits), the number of different values that numbers represented bits bits can give you.
  • bytes is trickier to fit in here, but it would be something like Math.ceil(bits / 8), but arguably not very meaningful and probably too confusing.
  • prefixLength, or padLength, etc. is more helpful for UnixFS HAMTs and is the number of characters to fit in a hex representation of the prefix bytes. Either like it's done in here: (fanout - 1).toString(16).length (where fanout is bucket.tableSize()), or Math.ceil(Math.log2(fanout) / 8 * 2), or Math.ceil(bits / 8 * 2).

@rvagg rvagg mentioned this pull request Aug 25, 2023
@rvagg
Copy link
Member

rvagg commented Aug 25, 2023

Made my suggested changes in #357 - going with "bits" - the nice thing about using bits is that you don't have to worry about divisibility. You can't really do a fanout that's not a power of 2, so start low and power up from the bit count.

@alanshaw alanshaw merged commit f42a40f into fix/pad-length Aug 25, 2023
1 check passed
@alanshaw alanshaw deleted the feat/add-fanout-config branch August 25, 2023 06:19
@achingbrain
Copy link
Member Author

It was bytes because the shard split threshold option next to it was bytes so it was a (flawed) attempt to keep the units consistent, but quite right bits is more correct. Thanks all.

rvagg added a commit that referenced this pull request Aug 25, 2023
* feat: add config option to control fanout size

Adds a `shardFanoutBytes` option to the importer to allow configuring
the number of bytes used for the HAMT prefix, also a test.

* fix: use fanout "bits" (#357)

---------

Co-authored-by: Rod Vagg <rod@vagg.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants