Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Datasets API support #80

Merged
merged 19 commits into from
Dec 16, 2015
Merged

Datasets API support #80

merged 19 commits into from
Dec 16, 2015

Conversation

sgillies
Copy link
Contributor

@sgillies sgillies commented Dec 3, 2015

This PR provides a relatively low-level interface much like mapbox-sdk-js's.

resolves #15

Sean Gillies added 3 commits December 2, 2015 16:33
Copied over from defunct branch.
This is not a resource-oriented interface
@perrygeo
Copy link
Contributor

perrygeo commented Dec 3, 2015

I like this approach, one class with a couple methods, geojson all around, simple.

A few thoughts:

  1. This logic in iter_features seems to be a common theme in every python tool that speaks geojson. Ditto for its CLI equivalent. I don't have a clear proposal yet but it seems like pulling out a small "Geojson feature abstraction" library might be useful so we have a canonical way of going from a feature collection, an input stream, an iterable of geo_interface objects, etc, ⇢ generator of features. (related Input waypoints mapbox-cli-py#11)
  2. This supersedes Add a Datasets class and tests #27, correct?
  3. I'm not clear on how the batch function fits here. Is intended to be used internally for batch updates or as an end-user utility function?
  4. In terms of covering the supported operations of the API, let's use this table to track progress. Even if the SDK doesn't support the operation directly, we can document how one might achieve the same thing through other means.
Operation SDK Support Notes
List the datasets associated with an account list_datasets
Create a new dataset create_dataset
Retrieve information about an existing dataset
Update the properties of an existing dataset
Delete an existing dataset
List the features in a dataset list_features
Insert or update a feature in a dataset update_features
Retrieve an existing feature from a dataset
Remove an existing feature from a dataset
Add, update, or delete features in bulk

@sgillies
Copy link
Contributor Author

sgillies commented Dec 3, 2015

@perrygeo right, I copied iter_features from Fiona. @geowurster and I started this https://github.com/geojson/py-feature-sequences project that could take over some day. I'm looking forward to getting back to it as the GeoJSON WG wraps up the format spec and moves on to feature sequences.

Table looks good to me.

Batching is there to support both paged responses from feature reads and paged writes from streams of features.

@geowurster
Copy link

@perrygeo FYI I did a bit of exploration on py-feature-sequences in geojson/py-feature-sequences#1 and have an open to-do in fio-buffer that provides a good real world use case for reading and writing feature streams. Need to find time to work on it.

@perrygeo
Copy link
Contributor

I'm still fuzzy on why the iter_features construct belongs in the SDK. All the other services take iterables of GeoJSON-like feature mappings and that would seem to fit perfectly here as well. Handling sequences and stream delimiters feels like a detail for the CLI to handle as it's effectively shell argument handling.

@perrygeo
Copy link
Contributor

I did some additional testing this morning and found a few things we might need to address.

  • usernames can be inferred from the access token and don't need to be in the constructor (we can follow the lead of the Uploads API on this: get username from session token #85)
  • If the user's token doesn't support datasets:* scope, the 404 response and corresponding "Not found" message are entirely unhelpful in resolving the problem. We should handle this status code for the user and give them a more relevant error message as we do here
  • Features without string ids are rejected. If the feature id is not present (possible as it's not required by GeoJSON spec) or if id is an integer (another likely scenario), you get a HTTP 422 with a message that "Ids are required for GeoJSON features put in bulk". This seems like an odd quirk of the Datasets API that we should either fix in the API itself or validate and warn the user about.

@perrygeo
Copy link
Contributor

Re: tokens without proper scope. Proposed upstream to change make api-core return a 403 (forbidden) status code with a better error message.

@sgillies
Copy link
Contributor Author

@perrygeo ready for review and merge. there are two issues i'd like to tackle after this is merged:

  1. feature collection paging (and whether to change list_features() to return a list of features like @rclark has suggested elsewhere)
  2. a weirdness possibly involving caching or eventual consistency around batch_update_features(), see my "should return" comments in docs/datasets.md.

>>> len(updated_collection['features'])
1
>>> updated_first = updated_collection['features'][0]
>>> updated_first['id'] # should return '2'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is an eventual consistency issue, isn't it also possible that list_features could return feature 2 under certain circumstances? Maybe we should just note it but not test it until we find a way to poll the service.

@perrygeo
Copy link
Contributor

  • list_features(new_id).json() is consistent with our "return raw requests" principle - not sure why that method in particular would need an exception.
  • the eventually consistency thing worries me. If you wait a few seconds does it eventually return the correct features? Is there a way to poll the service for readiness? For now, maybe we can just talk about it in the doc but not doctest it until we have a better understanding of the behavior.

Other than the list_features consistency, it looks great to me 👍

@sgillies
Copy link
Contributor Author

@perrygeo yeah, further down in the doctest we are able to access a feature with id 2. Because I'm not sure that users should count on the list features endpoint to be instantly consistent with the update endoint, I'm going to remove those lines of the doctest.

perrygeo added a commit that referenced this pull request Dec 16, 2015
@perrygeo perrygeo merged commit 6a264dd into master Dec 16, 2015
@perrygeo perrygeo deleted the datasets branch December 16, 2015 17:25
@sgillies sgillies added this to the 0.6 milestone Dec 17, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Datasets API
3 participants