-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Async indexing with concurrency limit (and progress callback) #11
Conversation
This demo times multiple requests to a zarr store containing (3, 5000, 5000) array. Toggling between my build and v0.1.3 (and playing with the concurrency) demonstrates ~2x increase in performance for these 2D slices (5000 x 5000) of the data. Make sure to turn off cache in your browser's network tab when trying this demo. <!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<meta http-equiv="X-UA-Compatible" content="ie=edge" />
<title>zarr.js demo</title>
</head>
<body>
<!-- <script src="https://unpkg.com/zarr@0.1.3/dist/zarr.umd.js"></script> -->
<script src="./dist/zarr.umd.js"></script>
<script>
const baseUrl =
'https://gist.githubusercontent.com/' +
'manzt/d16dbac0ea3adc3c7b9b61f54fa1f78d';
const config = {
store: `${baseUrl}/raw/`,
path: '95854058512862accb0182d4a02f86a55ad19139',
mode: 'r',
}; // (3, 5000, 5000) array
async function main() {
// connect to HTTPStore
const z = await zarr.openArray(config);
const times = [];
const iters = 5;
// repeat the same request for the (5000, 5000) slice
for (let i = 1; i <= iters; i++) {
console.time(`index [0, null, null] (${i} of ${iters})`);
const t0 = performance.now();
const slice = await z.get([0, null, null], {
concurrencyLimit: 10,
progressCallback: progress => console.log(progress),
});
const t1 = performance.now();
console.timeEnd(`index [0, null, null] (${i} of ${iters})`);
times.push(t1 - t0);
}
// summarize stats
const n = times.length;
const mean = times.reduce((a, b) => a + b) / n;
const s = Math.sqrt(
times.map(x => (x - mean) ** 2).reduce((a, b) => a + b) / n
);
console.log(`mean time ${mean}ms, sd ${s}ms`);
}
main();
</script>
</body>
</html> Try fetching the whole array, const slice = await z.get(null, {
concurrencyLimit: 10,
progressCallback: progress => console.log(progress)
} to see ~4x speed up (12s vs 3.5s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for picking this up 👍
I left some comments that I think can improve it, let me know whether you agree with them
I love that you can use GitHub Gist as a backend! I'm so excited for how this library will be used in the future :) |
It was a bit of a hack since directories are not supported on GitHub Gist. I just created an empty gist, cloned it, dumped the contents of |
Ok! I think I found something really interesting. This seems perfect for our situation, and solves the callback issue. See most recent commits. |
However, I'm still a bit confused with how to supply the default args in public async getBasicSelection(selection: Slice | ":" | "..." | null | (Slice | null | ":" | "...")[]): Promise<NestedArray<TypedArray>>;
public async getBasicSelection(selection: ArraySelection): Promise<NestedArray<TypedArray> | number>;
public async getBasicSelection(selection: ArraySelection, { concurrencyLimit = 10 }: StoreGetOptions): Promise<number | NestedArray<TypedArray>> {
// Refresh metadata
if (!this.cacheMetadata) {
await this.reloadMetadata();
}
// Check fields (TODO?)
if (this.shape === []) {
throw new Error("Shape [] indexing is not supported yet");
} else {
return this.getBasicSelectionND(selection, concurrencyLimit, progressCallback);
//progressCallback isn't defined so how do I pass this explicitly?
}
} Then would we expect to see the subsequent definitions like: private getBasicSelectionND(selection: ArraySelection, concurrencyLimit: number, progressCallback: ((progressUpdate: { progress: number; queueSize: number }) => void) | undefined) {
const indexer = new BasicIndexer(selection, this);
return this.getSelection(indexer, concurrencyLimit, progressCallback);
} |
The way I see it is that this is actually not function overloading, but instead what different types of input (signatures) you can give and then what type that returns. The last one in the list is the "actual" one, the ones above are a subset of it. In the case of this getSelection function it strongly types that if you select with a slice of some sort you will always get back a NestedArray and not a number. Whereas if you pass in an integer it may return a single number. Hopefully that makes sense and helps |
I guess I'm confused because the defaults are ignored when the other signatures are used, and when I try to supply defaults (with destructuring) to the other signatures, the compiler complains
|
public async getBasicSelection(selection: Slice | ":" | "..." | null | (Slice | null | ":" | "...")[], opts?: GetOptions): Promise<NestedArray<TypedArray>>;
public async getBasicSelection(selection: ArraySelection, opts?: GetOptions): Promise<NestedArray<TypedArray> | number>;
public async getBasicSelection(selection: ArraySelection, {concurrencyLimit = 10}: GetOptions): Promise<number | NestedArray<TypedArray>> { Does something like this work perhaps? |
Alright, with a bit of head banging I think I figured it out! Thanks for the guidance & patience 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's worth adding a test case for the progressCallback, but we can do that another day. Thank you for the contribution!
Rationale:
Introduces:
options
object as a parameter toZarrArray.get
. Theoptions
object exposesconcurrencyLimit
(limit of number of concurrent requests), andprogressCallback
(function which takes the current progress of the indexing 0 -> 1 as an argument) fields.Usage:
I think there is something much cleaner to do with the
StoreGetOptions
interface, but I am fairly new to Typescript. Hoping you have some suggestions there!I decided to use
p-limit
as it seems to be the most popular tool in this space (used by 3.8M) and is maintained actively. It also provides a nice way to get the number of items remaining in the queue, which I use.For some reason optional chaining is was not working in typescript, so there are some rather ugly blocks performing checks for
null
andundefined
.