Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use hexWKB in JSON diff/show output #71

Merged
merged 10 commits into from
May 7, 2020
Merged

Use hexWKB in JSON diff/show output #71

merged 10 commits into from
May 7, 2020

Conversation

craigds
Copy link
Member

@craigds craigds commented May 4, 2020

This replaces GeoJSON geometries with (little-endian) hex-encoded WKB
in show --json and diff --json commands. --geojson output is
unaffected.

This:

  • reduces patch/diff size by quite a lot, especially for large
    geometries
  • makes patch generation/consumption more efficient; no pass via OGR is
    required in most circumstances.
  • improves support for exotic geometries (GeoJSON doesn't support
    curved geometry types)

fixes #62

Size of output

# points, with GeoJSON
$ sno show --json | wc -c
5405


# points, with hexWKB:
$ sno show --json | wc -c
4205

If you combine this with --json-style=extracompact from #70:

# points, with GeoJSON
$ sno show --json --json-style=extracompact | wc -c
2576


# points, with hexWKB:
$ sno show --json --json-style=extracompact | wc -c
2316

@craigds craigds requested review from olsen232 and rcoup and removed request for olsen232 May 4, 2020 04:35
Copy link
Member

@rcoup rcoup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sno/gpkg.py Outdated Show resolved Hide resolved
sno/gpkg.py Show resolved Hide resolved
sno/gpkg.py Outdated Show resolved Hide resolved
tests/data/patches/points-1U-1D-1I.snopatch Show resolved Hide resolved
sno/gpkg.py Outdated Show resolved Hide resolved
tests/test_diff.py Outdated Show resolved Hide resolved
tests/test_diff.py Outdated Show resolved Hide resolved
sno/diff_output.py Show resolved Hide resolved
g = gpkg.geom_to_ogr(v)
f["geometry"] = json.loads(g.ExportToJson())
g = gpkg.gpkg_geom_to_ogr(v)
f['geometry'] = json.loads(g.ExportToJson())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OT: I feel like I've written this code 1000 times. Maybe it's time to fix this in OGR?

(not right this second)

Copy link
Member Author

@craigds craigds May 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ie some kind of ogr.ExportToDeserializedJson() method?

One problem is that different bindings would have to implement it differently. I'm completely ignorant of how Java handles deserialized JSON but I'm not sure I want to know either...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle, yes (ExportToJsonStruct()?). It's possible to add language-specific methods in SWIG, so could ignore Java in the meantime.

sno/gpkg.py Outdated Show resolved Hide resolved
sno/gpkg.py Outdated Show resolved Hide resolved
tests/test_gpkg.py Outdated Show resolved Hide resolved
tests/test_gpkg.py Show resolved Hide resolved
sno/diff_output.py Outdated Show resolved Hide resolved
sno/diff_output.py Show resolved Hide resolved
@craigds craigds requested a review from rcoup May 6, 2020 02:14
This replaces GeoJSON geometries with (little-endian) hex-encoded WKB
in `show --json` and `diff --json` commands. `--geojson` output is
unaffected.

This:
 * reduces patch/diff size by quite a lot, especially for large
geometries
 * makes patch generation/consumption more efficient; no pass via OGR is
   required in most circumstances.
 * improves support for exotic geometries (GeoJSON doesn't support
   curved geometry types)

fixes #62
Including endianness and envelopes
They're not GeoJSON anymore anyway, because we're using hexWKB
geometries, so best not to pretend they are.
@rcoup
Copy link
Member

rcoup commented May 7, 2020

Can you summarize/include/link examples of whatever formats we have finalised on?

@@ -125,7 +125,7 @@
}).addTo(map)

var layerGroup = L.featureGroup()
for (let [dataset, diff] of Object.entries(DATA['sno.diff/v1'])) {
for (let [dataset, diff] of Object.entries(DATA['sno.diff/v1+hexwkb'])) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how's it going to view wkb in the browser? Surely needs the GeoJSON output?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this by accident, so good catch. I'm surprised this is using the json diff and not the geojson diff. That throws a spanner in the works a bit, because I can't just switch tot he geojson one, it's quite different. Will attempt to though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's using the JSON because the geojson ends up in multiple files

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, it could serialise all the geojson into an array in the HTML or something. No problem doing that, but it needs to work :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The --geojson diff doesn't actually differentiate multiple datasets, so it's not enough for this.

So I guess to make this work, I need to add a format variant for --json diffs with GeoJSON geometries?

Copy link
Member

@rcoup rcoup May 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don’t understand... the geojson gets written to multiple files in a directory.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aha, so it does (if you have >1 dataset). I just saw it with one dataset and got lost on how the different ones were going to be differentiated, but didn't realise it used multiple files

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise there's no meaningful way to view them in QGIS/etc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in cb28c3b - it now writes geojson files to a temp dir and then reads them back in again. Could have done it some other way, e.g. by passing an empty dict to resolve_object_path and making it create a StringIO for each dataset, but it felt like a Weird Hack so I didn't

tests/test_diff.py Show resolved Hide resolved
Possibly commented out by accident in
b5f7c7c
Generate it from the `geojson` diff rather than the `json` one,
because the `json` one now uses hexWKB geometries.

Because the layout's a bit different, this means the JS in the
HTML template needs to do a bit more work, but it's quite achievable.
Yay for ES7
@craigds craigds requested a review from rcoup May 7, 2020 20:50
@craigds craigds merged commit 25fa954 into master May 7, 2020
@craigds craigds deleted the hex-wkb-geometries branch May 7, 2020 21:42
craigds added a commit that referenced this pull request Jun 13, 2020
Since #71 we haven't used GeoJSON for diffs or patches,
however we still have some GeoJSON artifacts that are now unnecessary
and unused.

This moves all feature attributes into the feature object, and
removes the `properties`, `geometry` and `id` keys.
craigds added a commit that referenced this pull request Jun 13, 2020
Since #71 we haven't used GeoJSON for diffs or patches,
however we still have some GeoJSON artifacts that are now unnecessary
and unused.

This moves all feature attributes into the feature object, and
removes the `properties`, `geometry` and `id` keys.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Don't use GeoJSON for geometry representations in diffs
3 participants