-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[wip] high-level dataset api methods #39
Conversation
why have the cli assign feature ids and not let the api that already does that handle it? |
@rclark This is sweet. I'm taking a more in-depth look this morning but some initial thoughts:
@mick Last I checked, when I attempted to put features without ids (or features with non-string ids), they were rejected by the datasets api. I think autogenerating ids (and accepting integer ids) could ultimately be handled at the API level. In the meantime, I'm torn if handling it in the CLI, the SDK or an external tool would be more appropriate. I'd lean towards just making the change in the API but don't have a sense of how much of lift that would be.
I like the idea but I think it could wait for a future release, maybe living beside but not replacing the existing functions.
Definitely. Longer term, attribute indexing/filtering at the API level might be the best way to achieve this. Not sure if anything like that is currently planned? In the meantime, something like |
service = ctx.obj.get('service') | ||
to_put = [] | ||
|
||
def id(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
id
is a python builtin - while python allows you to override builtins, we should probably just rename to fid
to avoid conflicts.
@rclark What about
|
def put_features(ctx, dataset, features): | ||
"""Insert or update features in a dataset. | ||
|
||
$ mapbox dataset put-features features |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mapbox dataset put-features DATASET features
@perrygeo I think we should keep the basic, API-mirrored commands Though I'm not sure that it is warranted in our CLI at this point, I'll just say that I'm fond of the aws cli's approach to S3 commands where...
... contains the set of commands which are straight-up, single API calls, while
... are higher-level functions that may involve multiple API calls. |
Coming back to this after the break, and I think that if we can agree on some useful unixy command style, it makes sense to do that now, next to the functions that already exist as one-to-one API calls. dataset uri schemeIn order to do things like
proposed commands
What I'm not sure about is how to differentiate an "append to dataset" command in a unixy kind of way. Any good ideas? @sgillies does this align at all with what you were thinking about when you originally made this suggestion? |
@rclark I like it, though it feels like there's some repetition in OTOH, it's a lot like Appending... maybe that's an |
dataset urls IOW you could write commands
In terms of the other commands, are these more unixy variants intended to replace the more API-centric commands entirely? Or live side-by-side? |
In terms of URI, I'm really mirroring aws-sdk here, where you have two ways (at least) to refer to an object:
Following that model,
So I'm inclined to only support full
100% side-by-side. No intention of removing the functions that are straight mirrors of individual API calls.
Yes, but if somewhere down the road we were to allow for cross-account dataset reads, having built the username into the URI would allow updating the mapbox cli to be a non-breaking change. Finally, for additional commands that don't require multiple API calls or any trickery, we still have the API mirror functions:
|
@rclark @perrygeo let's definitely aim for more of an I'm concerned that our semantics aren't enough like the semantics of |
Okay, I've sketched out one vision of the future here:
Next up: Does this make sense to @perrygeo @sgillies? The logic is getting more complex, and want to make sure that this is a path we're all comfortable with. Then, tests. |
... another simple addition that might help round out these URI-based higher-level routines: mapbox datasets cat URI: just print a dataset's JSON or a single GeoJSON feature to stdout |
@rclark - back to this after a long hiatus. Apologies for the silence on this front Per voice with @sgillies yesterday, it seems like keeping the old commands alongside the higher-level commands could lead to much confusion. I spent some time going through the current set of The proposed idea below would not be too much work: Mostly removing/renaming with a few minor tweaks to functionality plus two new commands (metadata and rm) which are largely just adjusting existing commands to the URI way of specifiying datasets. The overarching goals that I'm aiming for with this proposed interface:
@rclark, @sgillies How does this look to you? I can start sketching out an implementation if we agree that this is a good path forward. I'm not married to this exact form so if you have any other ideas on how to clean up the command list, I'm flexible. |
/cc @lyzidiamond We should make sure that the commands make sense in the context of the cli examples we are planning to add to our api-documentation. This is where it might make sense to retain a baseline low-level interface so there is consistency api surface across the CLI and all SDKs. |
Good point re: keeping the interfaces 1:1 with the API Spending the morning working the new functions introduced in the PR (
So I see the value in the low level commands (consistency) and I see the value in the high-level commands (better UI) but I just don't see how they can live side-by-side in the same command without being painfully confusing. Options?
|
👍 Seems to me to be the path of least resistance. The uri-based, higher-level commands are generally easier to use and should be more straightforward to type. |
I rebased this branch to catch up with master, then split low/high-level functions between
|
@rclark The python 3 failures are due to some backwards-incompatible refactoring of the standard library (see http://python3porting.com/stdlib.html#urllib-urllib2-and-urlparse). A try/except block is the way to support both python 2 and 3 in the same codebase. A bit clunky but we're stuck with python 2 for a while. |
Speaking of backwards compatibility, if we're going to Alternatively, we could keep the low-level I'm fine with it either way - I hate to instigate more bikeshedding on names but it seems worth a few minutes of consideration, to make sure everyone's on the same page. |
Thanks for the fix and lint 👍 I'm game to coordinate the doc/example updates. For my part, I'm not very worried about breaking backwards compatibility for an API that's not even in beta yet. |
@rclark @perrygeo I like everything about this except the
That said, we definitely do have a technical need for a datasets URI scheme. It's just that A solution for the second (not addressing the first) would be to rename these higher level commands. Ultimately, what I'd like to convince you of is that |
1 similar comment
Back here again, visiting my black sheep of a PR. I've rebased this branch over master, written another test, and removed some dead code from private functions in datasets.py. I think that this is ready to go aside from: |
|
||
assert result.exit_code == 0 | ||
assert result.output.strip() == datasets.strip() | ||
body = json.loads(responses.calls[0].request.body) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rclark here's the root of the Python 3 failure: .body
is type bytes
. We want .text
which is type str
. I think these assertions on the mock response objects can be eliminated anyway, this stuff is already tested by Responses. All we need to assert is result.exit_code
and result.output
.
@rclark I found the Python 3 problem and suggested a bunch of assert deletions of one kind: we don't need to test the mock responses (except maybe while debugging), just our own API and CLI. Want to make the Python 3 fix? I'll merge after that and then we'll revise the CLI from there. |
This has PR has gotten stale. A lot of good ideas here but nothing moving. Feel free to re-open if there is a need. |
Begins work adding higher-level functionality to dataset api support, including:
I started by writing a
put-features
function and aput-dataset
function. This are conceptually pretty simple, but I really like @sgillies suggestion to make more "unixy" commands likels
,cp
, etc. Some thoughts:ls
: list features in a dataset, paginates for you unless you supplylimit
and/orstart
argumentsrm
: delete a dataset (indicated by dataset id), or delete a feature (indicated by datasetid.featureid).cp
: copy features from one dataset to another, from a file to a dataset, or from a dataset to a file.Other ideas? These kinds of commands start to beg for "query" parameters that would let you
cp
a subset of features, but I'm not sure we want to go there....cc @mick @willwhite @perrygeo