Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feat] support GeoArrow format #2385

Merged
merged 43 commits into from Nov 9, 2023

Conversation

lixun910
Copy link
Collaborator

@lixun910 lixun910 commented Oct 20, 2023

Description

GeoArrow support in Kepler.gl will enable efficient loading of big data. For example, loading 1 million polygons takes ~2 seconds with Arrow format vs. ~20 seconds with GeoJson format:

arrow-polygons

Screenshot 2023-10-20 at 10 59 59 AM

GeoParquet is a file format while GeoArrow is a memory format. Both can be saved as a file e.g. .parquet and .arrow. Arrow memory is both zero-copy and has constant-time access, so it could be a very efficient memory format that allows different programs (javascript, C++, WebAssembly, Rust, Python) to exchange data.

Details

  • A new Geoarrow layer

The GeoJsonLayer in deck.gl already has the capability of loading binary geometries directly, so this pull request inherits an ArrowLayer from kepler.gl's GeoJsonLayer.

The geometry types supported: (see https://github.com/geoarrow/geoarrow/blob/main/extension-types.md#extension-names):

  • geoarrow.point
  • geoarrow.multipoint
  • geoarrow.linestring
  • geoarrow.multilinestring
  • geoarrow.polygon
  • geoarrow.multipolygon

Picking and table view are supported.

  • A column-wise data container

This PR adds a new column-wise data container, ArrowDataContainer, which implements the DataContainerInterface. This container is designed to efficiently use the data structure of the Arrow format.

  • CPU Filtering

The GPU filtering in GeojsonLayer is compatible with ArrowLayer. For CPU filtering, to avoid filtering on the raw Arrow table and make a partial copy of the raw table, this PR adds a simple deck.gl layer extension to filter the Geoarrow layer based on the result of CPU filteredIndex. This could impact other functions that rely on a filtered dataset. Please help to check. Thanks!

Other: support drag-n-drop a GeoParquet/GeoArrow file in Kepler.gl

The current version of kepler.gl uses loaders.gl/arrow v3. However, in loaders.gl/arrow version < 4.0.0, the arrow loader in batch didn't return the correct data. Instead, the raw arrow data of each arrow column is returned and stored directly (without the metadata) to kepler.gl. This has been fixed in latest loaders.gl/arrow v4 (see: https://github.com/visgl/loaders.gl/blob/2577ca735878b521f07a556f26ce8ee457a7ad9f/modules/arrow/src/lib/parse-arrow-in-batches.ts#L29).

One can call processArrowTable() directly to add arrow data to Kepler, e.g.:

import {Table as ApacheArrowTable} from 'apache-arrow';

fetchFile(arrowUrl)
  .then(response => response.arrayBuffer())
  .then(buffer => {
    const arrowTable = ApacheArrowTable.from([buffer]);
    const parsedData = processArrowTable(arrowTable);
    this.props.dispatch(
      addDataToMap({
        datasets: {data: parsedData}
      });
  });

To support drag-n-drop a GeoParquet/GeoArrow file in Kepler.gl, there are two more tasks need to be done:

  • upgrade loaders.gl to v4
  • support dnd GeoParquet/GeoArrow file in file-handler.ts

Test arrow files:

flights.arrow.zip
polygons.arrow.zip

One can use ogr2ogr to convert e.g. GeoJson file to Arrow file:

ogr2ogr test.arrow test.geojson -f Arrow

@lixun910
Copy link
Collaborator Author

@kylebarron Hi Kyle, could you also help to take a look? Thanks!

src/deckgl-layers/src/geojson-layer/filter-arrow-layer.ts Outdated Show resolved Hide resolved
src/layers/src/geojson-layer/geojson-layer.ts Outdated Show resolved Hide resolved
src/processors/src/data-processor.ts Outdated Show resolved Hide resolved
src/processors/src/file-handler.ts Show resolved Hide resolved
src/types/actions.d.ts Outdated Show resolved Hide resolved
src/utils/src/arrow-utils.ts Outdated Show resolved Hide resolved
src/utils/src/dataset-utils.ts Outdated Show resolved Hide resolved
src/utils/src/dataset-utils.ts Outdated Show resolved Hide resolved
src/utils/src/dataset-utils.ts Outdated Show resolved Hide resolved
test/node/utils/arrow-utils-test.js Outdated Show resolved Hide resolved
Signed-off-by: Xun Li <lixun910@gmail.com>
Signed-off-by: Xun Li <lixun910@gmail.com>
Signed-off-by: Xun Li <lixun910@gmail.com>
@lixun910
Copy link
Collaborator Author

lixun910 commented Nov 1, 2023

Use GeoArrow utils from loaders.gl/arrow v4.0.1. Support drag-n-drop GeoArrow file in kepler.gl, see the screen recording:

arrow-kepler

Signed-off-by: Xun Li <lixun910@gmail.com>
examples/webpack.config.local.js Show resolved Hide resolved
examples/webpack.config.local.js Outdated Show resolved Hide resolved
jest.setup.js Outdated Show resolved Hide resolved
src/deckgl-layers/src/geojson-layer/filter-arrow-layer.ts Outdated Show resolved Hide resolved
src/layers/src/arrow-layer/arrow-layer.ts Outdated Show resolved Hide resolved
src/processors/src/file-handler.ts Outdated Show resolved Hide resolved
src/processors/src/file-handler.ts Outdated Show resolved Hide resolved
src/processors/src/file-handler.ts Outdated Show resolved Hide resolved
src/effects/package.json Outdated Show resolved Hide resolved
src/layers/src/arrow-layer/arrow-layer.ts Outdated Show resolved Hide resolved
webpack/umd.js Show resolved Hide resolved
Signed-off-by: Xun Li <lixun910@gmail.com>
Signed-off-by: Xun Li <lixun910@gmail.com>
Signed-off-by: Xun Li <lixun910@gmail.com>
Signed-off-by: Xun Li <lixun910@gmail.com>
@lixun910
Copy link
Collaborator Author

lixun910 commented Nov 3, 2023

@ibgreen @heshan0131 see a test of progressive rendering using GeoArrow, loaders.gl v4 in kepler.gl
GeoDaCenter#5

return newBounds;
}

export default class GeoArrowLayer extends GeoJsonLayer {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason we can't use the existing geojson layer but created a new layer type to support geoarrow? All the configuration option is the same.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Shan! After moving the arrow utils to loaders.gl, I think we could handle the arrow format as a special case in Kepler's GeoJsonLayer (or making GeoJsonLayer arrow compatible). Let me think about it.

Signed-off-by: Xun Li <lixun910@gmail.com>
Signed-off-by: Xun Li <lixun910@gmail.com>
Copy link
Collaborator

@ibgreen ibgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just re-read the code and it is looking great. Added some thoughts, but they can be addressed later or not at all.

src/layers/src/base-layer.ts Outdated Show resolved Hide resolved
src/layers/src/geojson-layer/geojson-layer.ts Outdated Show resolved Hide resolved
src/layers/src/geojson-layer/geojson-layer.ts Show resolved Hide resolved
src/layers/src/geojson-layer/geojson-layer.ts Outdated Show resolved Hide resolved
src/layers/src/geojson-layer/geojson-layer.ts Outdated Show resolved Hide resolved
src/layers/src/layer-utils.ts Show resolved Hide resolved

// parse fields
arrowTable.schema.fields.forEach((field: arrow.Field, index: number) => {
const isGeometryColumn = field.metadata.get('ARROW:extension:name')?.startsWith('geoarrow');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This means we have knowledge about the GeoArrow extensions both here and in loader.gl/utils? Maybe hard to avoid duplication, but it is always good to try to centralize knowledge about some aspect in one part of the code. I.e. if we could use one of the utils from loaders...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, agree. Will expose this in loaders.gl/utils and call it from here.

src/processors/src/file-handler.ts Outdated Show resolved Hide resolved
src/utils/src/data-container-interface.ts Outdated Show resolved Hide resolved
src/utils/src/dataset-utils.ts Outdated Show resolved Hide resolved
Signed-off-by: Xun Li <lixun910@gmail.com>
Signed-off-by: Xun Li <lixun910@gmail.com>
Signed-off-by: Xun Li <lixun910@gmail.com>
@lixun910 lixun910 merged commit d975ea1 into keplergl:master Nov 9, 2023
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants