Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No functioning example #489

Closed
v4lue4dded opened this issue Apr 7, 2024 · 11 comments
Closed

No functioning example #489

v4lue4dded opened this issue Apr 7, 2024 · 11 comments

Comments

@v4lue4dded
Copy link

I just spend an entire day trying to get parquet-wasm to read a parquet file and console.log() the result and couldn't get it done.
Admittedly I'm an python programmer and new to javascript.

However as far as I could tell none of the examples that are currently in the README.MD work out of the box.

This is very unfortunate, since this is a javascript library so it should be able to run a functioning example right in the GitHub pages of a repo. (Not necessarily this repo but just some example repo with some code that runs already).

Something similar to https://hyparam.github.io/hyparquet/ would go a long way make this library a lot more user friendly to people like me.

For now I will be giving up on this library since I can not get it to work in a reasonable amount of time.

@kylebarron
Copy link
Owner

These two Observable examples are online, reproducible examples: https://github.com/kylebarron/parquet-wasm#published-examples

@v4lue4dded
Copy link
Author

Thank you for the reply :)

I did see the observable examples I did admittedly find the platform very clunky and very unintuitive to use and like it was hiding a lot of code from me.

I did by now figure out how to download of the code form the example. Though from what I can tell it is not simple javascript at work but instead some proprietary wrapper that I can't really replicate around the javascript.

//...



function _d3(require){return(
require("https://d3js.org/d3.v5.min.js")
)}

function _mapboxgl(require){return(
require("mapbox-gl@1.6.0/dist/mapbox-gl.js")
)}

function _arrow(require){return(
require("apache-arrow")
)}

function _deck(require){return(
require.alias({
  h3: {}
})("deck.gl@8.9/dist.min.js")
)}

function _deckgl(mapContainer,deck,mapboxgl)
{
  // This is an Observable hack: clear previously generated content
  mapContainer.innerHTML = "";

  return new deck.DeckGL({
    // The HTML container to render into
    container: mapContainer,
    map: mapboxgl,
    mapStyle:
      "https://basemaps.cartocdn.com/gl/positron-nolabels-gl-style/style.json",

    // Viewport settings
    initialViewState: {
      longitude: 0,
      latitude: 15,
      zoom: 1,
      pitch: 0,
      bearing: 0
    },
    controller: true
  });
}


export default function define(runtime, observer) {
  const main = runtime.module();
  function toString() { return this.url; }
  const fileAttachments = new Map([
    ["2019-01-01_performance_mobile_tiles_centroids_brotli@2.parquet", {url: new URL("./files/ad0a1f0e7e5cc8290068443d99bbd1307877e1ba631e30622bbd5fd8adca660d2644fe8181db5dbd8d41be0c2eae868304deeb0efc8690d373553dcb859bc767.bin", import.meta.url), mimeType: "application/octet-stream", toString}]
  ]);
  main.builtin("FileAttachment", runtime.fileAttachments(name => fileAttachments.get(name)));
  main.variable(observer()).define(["md"], _1);
  main.variable(observer()).define(["md"], _2);
  main.variable(observer()).define(["md"], _3);
  main.variable(observer()).define(["md"], _4);
  main.variable(observer()).define(["md"], _5);
  main.variable(observer()).define(["md"], _6);
  main.variable(observer()).define(["md"], _7);
  main.variable(observer()).define(["md"], _8);
  main.variable(observer("viewof form")).define("viewof form", ["Inputs"], _form);
  main.variable(observer("form")).define("form", ["Generators", "viewof form"], (G, _) => G.input(_));
  main.variable(observer("mapContainer")).define("mapContainer", ["html"], _mapContainer);
  main.variable(observer("metricMapping")).define("metricMapping", _metricMapping);
  main.variable(observer("readParquet")).define("readParquet", _readParquet);
  main.variable(observer("arrowTable")).define("arrowTable", ["parquetFile","readParquet","arrow"], _arrowTable);
  main.variable(observer("parquetFile")).define("parquetFile", ["FileAttachment"], _parquetFile);
  main.variable(observer("geometryColumn")).define("geometryColumn", ["arrowTable"], _geometryColumn);
  main.variable(observer("flatCoordinateArray")).define("flatCoordinateArray", ["geometryColumn"], _flatCoordinateArray);
  main.variable(observer("layer")).define("layer", ["arrowTable","flatCoordinateArray","colorAttribute","deck","deckgl"], _layer);
  main.variable(observer("colorAttribute")).define("colorAttribute", ["metricMapping","form","arrowTable","colorScale"], _colorAttribute);
  main.variable(observer("colorScale")).define("colorScale", ["d3","form"], _colorScale);
  main.variable(observer("d3")).define("d3", ["require"], _d3);
  main.variable(observer("mapboxgl")).define("mapboxgl", ["require"], _mapboxgl);
  main.variable(observer("arrow")).define("arrow", ["require"], _arrow);
  main.variable(observer("deck")).define("deck", ["require"], _deck);
  main.variable(observer("deckgl")).define("deckgl", ["mapContainer","deck","mapboxgl"], _deckgl);
  return main;
}

I'll probably try again next weekend to unwrap that code to see if I can get it working for my project.

Both examples do seem to use outdated version of the library though:
https://observablehq.com/@bmschmidt/hello-parquet-wasm uses https://unpkg.com/parquet-wasm@0.1.1/web.js
which seems like a very early version
and
https://observablehq.com/@kylebarron/geoparquet-on-the-web uses https://unpkg.com/parquet-wasm@0.4.0-beta.5/esm/arrow2.js
which is no longer recommended since it is a 2 if I understand things correctly.

It would just have been very useful to a javascript beginner like me to have a very simple example on github pages that uses the currently recommended version of the library to simply read a complete parquet file (either a small example from the github repo or a drop in file) and displays the result on screen.
That would be a lot easier for me to iterate from.

@kylebarron
Copy link
Owner

which is no longer recommended since it is a 2 if I understand things correctly

The arrow2 API is deprecated and won't receive updates, but it should still work. The API of the latest beta is very similar to the previous API though.

It would just have been very useful to a javascript beginner like me to have a very simple example on github pages that uses the currently recommended version of the library to simply read a complete parquet file (either a small example from the github repo or a drop in file) and displays the result on screen.
That would be a lot easier for me to iterate from.

I agree that would be nice, but I don't have time to create a standalone example at this point. Contributions (from you or someone else) would be welcome.

I generally recommend that the easiest way to get started is to use the type hints on each function to guide the user for how to fetch data.

@kylebarron
Copy link
Owner

In case it's useful to you, I'm using this in production here: https://github.com/developmentseed/lonboard/blob/dca942da9b5bd40769068a76c45e76c9b1c9c49c/src/parquet.ts

@kylebarron
Copy link
Owner

kylebarron commented Apr 21, 2024

I published 0.6.0, added new content to the README, and updated https://observablehq.com/@kylebarron/geoparquet-on-the-web to use parquet-wasm 0.6. Hopefully this is easier to follow

@mbostock
Copy link

This should work in vanilla JavaScript:

import initParquetWasm, {readParquet} from "https://cdn.jsdelivr.net/npm/parquet-wasm@0.6.0/+esm";

await initParquetWasm("https://cdn.jsdelivr.net/npm/parquet-wasm@0.6.0/esm/parquet_wasm_bg.wasm");

(Unfortunately the default path to parquet_wasm_bg.wasm doesn’t work when using /+esm because it resolves to the wrong directory. I think it’s possible that it would work if you used import.meta.resolve instead of new URL(…, import.meta.url), but I’m not sure whether jsDelivr will rewrite import.meta.resolve calls to fix the relative path when using /+esm.)

@kylebarron
Copy link
Owner

It does work for me (at least in Deno) with

import initParquetWasm, {readParquet} from "https://cdn.jsdelivr.net/npm/parquet-wasm@0.6.0/esm/parquet_wasm.js";
await initParquetWasm();

I don't know how if it's possible rewrite the import with +esm. I specifically enabled that path as a known entry point so that import "parquet-wasm/esm/parquet_wasm.js" would work both in an application and from a browser.

"./esm/parquet_wasm.js": {
"types": "./esm/parquet_wasm.d.ts",
"default": "./esm/parquet_wasm.js"
},

I think it’s possible that it would work if you used import.meta.resolve instead of new URL(…, import.meta.url)

That part is auto-generated by wasm-bindgen, so it's not something easy for me to change.

@mbostock
Copy link

Yes, that would work too. The /+esm is nice because it bundles and minifies local imports, so the module publisher (you) typically doesn’t haven’t to build and publish the bundle — the CDN does it.

It also works if you do this:

import initParquetWasm, {readParquet} from "https://cdn.jsdelivr.net/npm/parquet-wasm@0.6.0/esm/+esm";

await initParquetWasm();

This uses your ./esm entry point, and because it’s in the same folder as the source file, the relative path to the .wasm file works.

I would consider using import.meta.resolve instead of import.meta.url though, as it’s the more semantic way of resolving a relative resource.

Also, I think you’ll want to add the .wasm to your exports map in the package.json because these files are part of your module’s public API and you expect people to load them.

@kylebarron
Copy link
Owner

Thanks for the tips!

Yes, that would work too. The /+esm is nice because it bundles and minifies local imports, so the module publisher (you) typically doesn’t haven’t to build and publish the bundle — the CDN does it.

Oh very cool. I probably should suggest that from the README.

I would consider using import.meta.resolve instead of import.meta.url though, as it’s the more semantic way of resolving a relative resource.

I see. That makes sense. MDN does say

you should use import.meta.resolve(moduleName) instead of new URL(moduleName, import.meta.url) for these use cases wherever possible

I'll make an issue in wasm-bindgen tomorrow.

Also, I think you’ll want to add the .wasm to your exports map in the package.json because these files are part of your module’s public API and you expect people to load them.

Thanks for pointing this out. I see duckdb-wasm does this too. https://github.com/duckdb/duckdb-wasm/blob/58fcb9a46b73eac1abb9b0dee9d7c46d1a84f628/packages/duckdb-wasm/package.json#L99-L101

@v4lue4dded
Copy link
Author

In case it's useful to you, I'm using this in production here: https://github.com/developmentseed/lonboard/blob/dca942da9b5bd40769068a76c45e76c9b1c9c49c/src/parquet.ts

@kylebarron FYI: I had gotten it working a week ago with that code snippet sorry that I hadn't answerd yet!! Thanks for that!!
Had to use the bundler webpack though which was a bit of a step for me. ^^

Do I understand it correctly that (#489 (comment)) means it would work without working with a bundler, just with a cdn.jsdelivr.net import? :)

That would be really cool!!

@kylebarron
Copy link
Owner

it would work without working with a bundler, just with a cdn.jsdelivr.net import? :)

Yes. But you need to ensure you manually initialize the wasm code, whereas with the bundler entry point the wasm should be initialized behind the scenes I think.

I made a PR to update the jsdelivr link in the readme, and made new issues for the other comments above. So I think this issue can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants