Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(loaders): Introduce more minimal required Zarr interface #700

Merged
merged 3 commits into from
Oct 13, 2023

Conversation

manzt
Copy link
Member

@manzt manzt commented Sep 30, 2023

An alternative to #698. This removes the code paths for importing from ZarrJS in the ZarrPixelSource. It should be straightforward to introduce an adapter for zarrita to support the ZarrSource interface.

import { FetchStore, open, get } from "zarrita";

function zarrSourceAdapter(arr) {
  return {
    dtype: convertToV2DataType(arr.dtype),
    shape: arr.shape,
    chunks: arr.chunks,
    getRaw: (selection, { storeOptions }) => get(arr, selection, { opts: storeOptions }),
  }
}

let store = new FetchStore("http://localhost:8080/data.zarr");
let arr = await open(store, { kind: "array" });
let pixelSource = new ZarrPixelSource(zarrSourceAdapter(arr), ...);

Aligning the TypeScript types for these libraries is really going to be a pain. Some of the typing is really wack in ZarrJS, and the type inference in Zarrita are much more precise so they don't align well. But we can work on that later. The more important thing is making sure ZarrJS can be treeshaken if zarrita is used.

Checklist

  • Update JSdoc types if there is any API change.
  • Make sure Avivator works as expected with your change.

@manzt manzt mentioned this pull request Sep 30, 2023
2 tasks
@manzt
Copy link
Member Author

manzt commented Sep 30, 2023

Here's what I came up with:

import type { Array as ZarrArray, DataType } from "zarrita";
import type { Readable } from "@zarrita/storage";

export function zarrJSAdapter(
  arr: ZarrArray<DataType, Readable>
): Pick<import("zarr").ZarrArray, "dtype" | "shape" | "chunks" | "getRaw"> {
  const dtypes = {
    "uint8": "|u1",
    "uint16": "<u2",
    "uint32": "<u4",
    "int8": "|i1",
    "int16": "<i2",
    "int32": "<i4",
    "float32": "<f4",
    "float64": "<f8",
  } as const;
  const dtype = dtypes[arr.dtype as keyof typeof dtypes];
  if (!dtype) {
    throw Error(`Unsupported dtype: ${arr.dtype}`);
  }
  return {
    dtype: dtype,
    shape: arr.shape,
    chunks: arr.chunks,
    getRaw(selection, getOptions) {
      return zarrGet(arr, selection, { opts: getOptions.storeOptions }) as any;
    }
  };
}

@keller-mark
Copy link
Member

It looks good, I cant really figure out how to test it locally in Vitessce though. I tried checking out this branch and then

pnpm install
pnpm run build
mkdir consumer
pnpm -r exec pnpm pack --pack-destination $(pwd)/consumer/
cd ../vitessce
cd packages/gl
pnpm install ../../../viv/consumer/hms-dbmi-viv-0.13.8.tgz
pnpm install ../../../viv/consumer/vivjs-loaders-0.13.8.tgz

but it still seems to be using the old copy of ZarrPixelSource.

Maybe we should make a pre-release to be able to test it out?

I also see some remaining imports from ZarrJS in packages/loaders/dist/index.mjs - I assume bundlers like Vite are smart enough to tree-shake these if i am not importing any of the functions/classes that use them, right?
Screenshot 2023-10-02 at 8 38 03 AM

@manzt
Copy link
Member Author

manzt commented Oct 9, 2023

Hmm, let me have a closer look.

Maybe we should make a pre-release to be able to test it out?

I think that would be a great idea when this gets merged.

I also see some remaining imports from ZarrJS in packages/loaders/dist/index.mjs - I assume bundlers like Vite are smart enough to tree-shake these if i am not importing any of the functions/classes that use them, right?

Yes, that should work with sideEffects: false in the package.json of ZarrJS. We can also test out bundling from a pre-release.

import { ZarrPixelSource } from "@vivjs/loaders";
console.log(ZarrPixelSource);

in a Vite project.

@manzt
Copy link
Member Author

manzt commented Oct 9, 2023

but it still seems to be using the old copy of ZarrPixelSource.

Ah, you know what. I bet @hms-dbmi/viv points to a version of @vivjs/loaders on npm (and doesn't know to link to the local version).

@manzt
Copy link
Member Author

manzt commented Oct 9, 2023

I can confirm that the above works with Vite (without adding any extra ZarrJS or Zarrita dependencies). However, I inspecting the output reveals that:

  • Blosc/Gzip/Zlib end up in the final bundle (We could import from zarr/core instead)
  • addDecoder calls from geotiff end up in final bundle (We could dynamically se addDecoder in any tiff path rather than top level)
  • all of fast-xml-parser ends up in final bundle (switch this dependency)

I can address each of these in separate PRs. The ZarrJS codecs comes from the import from the top-level "zarr" importing blosc/gzip/zlib (rather than using a dynamic import). This was a choice that was made at a time dynamic imports weren't well supported, but causes this issue with bundling. I added the zarr/core entry to allow avoiding this code path, so for right now the workaround to avoid any ZarrJS code ending up in the final bundle would be to use the following vite config:

import { defineConfig } from 'vite';

export default defineConfig({
  resolve: {
    alias: {
      zarr: 'zarr/core',
    }
  },
  build: {
    lib: {
      entry: './main.js',
      formats: ['es']
    },
    minify: false
  }
});

I think we can just import from zarr/core in viv to avoid this issue for library consumers.

@manzt
Copy link
Member Author

manzt commented Oct 9, 2023

If I add sideEffects: false to geotiff and fast-xml-parser the final bundle of the above is 128 lines of code:

// main.js
import { ZarrPixelSource } from "@vivjs/loaders";
console.log(ZarrPixelSource);
// bundle.js
function isInterleaved(shape) {
  const lastDimSize = shape[shape.length - 1];
  return lastDimSize === 3 || lastDimSize === 4;
}
function getDims(labels) {
  const lookup = new Map(labels.map((name, i) => [name, i]));
  if (lookup.size !== labels.length) {
    throw Error("Labels must be unique, found duplicated label.");
  }
  return (name) => {
    const index = lookup.get(name);
    if (index === void 0) {
      throw Error("Invalid dimension.");
    }
    return index;
  };
}
function getImageSize(source) {
  const interleaved = isInterleaved(source.shape);
  const [height, width] = source.shape.slice(interleaved ? -3 : -2);
  return { height, width };
}
function getIndexer(labels) {
  const size = labels.length;
  const dims = getDims(labels);
  return (sel) => {
    if (Array.isArray(sel)) {
      return [...sel];
    }
    const selection = Array(size).fill(0);
    for (const [key, value] of Object.entries(sel)) {
      selection[dims(key)] = value;
    }
    return selection;
  };
}
const DTYPE_LOOKUP = {
  u1: "Uint8",
  u2: "Uint16",
  u4: "Uint32",
  f4: "Float32",
  f8: "Float64",
  i1: "Int8",
  i2: "Int16",
  i4: "Int32"
};
function slice(start, stop) {
  return { start, stop, step: 1, _slice: true };
}
class BoundsCheckError extends Error {
}
class ZarrPixelSource {
  constructor(data, labels, tileSize) {
    this.labels = labels;
    this.tileSize = tileSize;
    this._indexer = getIndexer(labels);
    this._data = data;
  }
  get shape() {
    return this._data.shape;
  }
  get dtype() {
    const suffix = this._data.dtype.slice(1);
    if (!(suffix in DTYPE_LOOKUP)) {
      throw Error(`Zarr dtype not supported, got ${suffix}.`);
    }
    return DTYPE_LOOKUP[suffix];
  }
  get _xIndex() {
    const interleave = isInterleaved(this._data.shape);
    return this._data.shape.length - (interleave ? 2 : 1);
  }
  _chunkIndex(selection, { x, y }) {
    const sel = this._indexer(selection);
    sel[this._xIndex] = x;
    sel[this._xIndex - 1] = y;
    return sel;
  }
  _getSlices(x, y) {
    const { height, width } = getImageSize(this);
    const [xStart, xStop] = [
      x * this.tileSize,
      Math.min((x + 1) * this.tileSize, width)
    ];
    const [yStart, yStop] = [
      y * this.tileSize,
      Math.min((y + 1) * this.tileSize, height)
    ];
    if (xStart === xStop || yStart === yStop) {
      throw new BoundsCheckError("Tile slice is zero-sized.");
    } else if (xStart < 0 || yStart < 0 || xStop > width || yStop > height) {
      throw new BoundsCheckError("Tile slice is out of bounds.");
    }
    return [slice(xStart, xStop), slice(yStart, yStop)];
  }
  async _getRaw(selection, getOptions) {
    const result = await this._data.getRaw(selection, getOptions);
    if (typeof result !== "object") {
      throw new Error("Expected object from getRaw");
    }
    return result;
  }
  async getRaster({
    selection,
    signal
  }) {
    const sel = this._chunkIndex(selection, { x: null, y: null });
    const result = await this._getRaw(sel, { storageOptions: { signal } });
    const {
      data,
      shape: [height, width]
    } = result;
    return { data, width, height };
  }
  async getTile(props) {
    const { x, y, selection, signal } = props;
    const [xSlice, ySlice] = this._getSlices(x, y);
    const sel = this._chunkIndex(selection, { x: xSlice, y: ySlice });
    const tile = await this._getRaw(sel, { storageOptions: { signal } });
    const {
      data,
      shape: [height, width]
    } = tile;
    return { data, height, width };
  }
  onTileError(err) {
    if (!(err instanceof BoundsCheckError)) {
      throw err;
    }
  }
}
console.log(ZarrPixelSource);

@manzt
Copy link
Member Author

manzt commented Oct 9, 2023

Side note: I wonder if we could just use the web DOMParser: https://developer.mozilla.org/en-US/docs/Web/API/DOMParser instead of fast-xml-parser. It's a really large dep at 50kb uncompressed and we just use it to parse some fields from the omxml. At the least, we could use txml, like geotiff.

https://observablehq.com/d/4ee6fadace757b0f

Update: The implementation on Observable is able to reproduce the result of our omexml.fromString API for an example API ~50lines of code. We'll need to test with more OME-XML files, but I think we should really consider dropping fast-xml-parser.

@manzt manzt merged commit 7d5de72 into master Oct 13, 2023
5 checks passed
@manzt manzt deleted the manzt/abstract-zarr branch October 13, 2023 16:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants