Add streamflow API spec #88

rod-glover · 2018-08-14T23:32:24Z

Resolves #86

Updated from initial draft circulated a few weeks ago. Updates:

Base for streamflow-related endpoints: services.pacificclimate.org/streamflow/ (not routed_streamflows).
All resource names plural (e.g., /streamflow/orders, not /streamflow/order).
All response bodies are fully fleshed out (or should be).
Consistent use of hyperlinks in all response bodies.
Hyperlink ref properties are abbreviated (e.g., 'cancel') instead of long-form hyperlink.
Full factoring of commonalities in schemas, parameters, responses, etc.

Notes:

The Streamflow Architecture document is a key resource to understanding the lifecycles of Orders and Results.
See also Streamflow API Design discussion.

rod-glover · 2018-08-14T23:45:29Z

streamflow-api.yaml

+      description: Defines a new order for a streamflow result set.
+      type: object
+      properties:
+        hydromodel_output_id:


I think, but am not sure, that the way to indicate the hydromodel output desired as input to this computation is with such an id. This has the virtue of simplicity but it exposes an internal implementation detail, and that is regarded as undesirable in resource-oriented architectures (RESTful architecture).

The alternative would be the hyperlink to the hydromodel output. This does not expose an internal, but it does require the backend to translate the hyperlink to an actual hydromodel output record, which is more roundout.

I'm not sure which to choose.

rod-glover · 2018-08-14T23:49:39Z

streamflow-api.yaml

+          properties:
+            method:
+              type: string
+              enum: [email]


Add none option?

I'm having a hard time thinking of a circumstance where we'd want the none option.

Yeah, and although it is easily circumvented, it is a way to attribute orders to users.

rod-glover · 2018-08-14T23:51:24Z

streamflow-api.yaml

+        latitude:
+          type: number
+          example: 48.5
+        notification:


Should make this notifications: [ ... list ... ] ? Is there a plausible use case for multiple notifications?

rod-glover · 2018-08-14T23:59:00Z

An overarching question: Should the list resources (e.g., /streamflow/orders) return less metadata for each listed item than do the individual item resources (e.g., /streamflow/orders/4). Alternately stated: Should the individual item resources contain more info?

I'd like to review all such resources with an eye to this question and the general question of whether the representations for each type of resource is yet complete.

corviday · 2018-08-15T16:37:06Z

My approach has been to assume that the resource collection endpoints provide all information needed to answer the question: which resource am I looking for? And that the resource documents return all information needed to answer the question: I've now selected this resource, what information is available about it?

Maybe this is the backwards approach, though, since what information is needed to select the resource of interest varies based on how one expects the information to be used or what one expects the front end to be like.

One could, for example, imagine multiple UIs allowing a user to select a hydromodel output dataset:

Possible UI 1: the user selects which GCM and emissions scenario they are interested in from dropdown menus, the spatial coverage of that hydromodel output dataset is displayed on a map, then the user picks a point they are interested in from a map. For this scenario, you'd want the hydromodel output resource collection to return GCM and emissions scenario information, but since you don't display the hydromodel output on a map until after the user picks one, you could have grid info on the resource document, and not in the resource collection.
Possible UI 2: the user picks a point on a map, and then gets a list of all the available GCM/scenario combinations of hydromodel output that could provide data at that map point. In this case, you'd want the hydromodel output collections resource to contain spatial coverage information, but individual resource documents would have the GCM and scenario.

(I've been assuming our UI will look more like the former, because I think it would be easier to design a UI like that, but the latter might be more in line with how the engineers want to approach things; they probably start out knowing exactly what point in space they want to get information about, and don't care as much about available models.)

I'm not particularly attached to this approach and wouldn't lose anything if we decided that the resource collections endpoints should have as much information as the resource document endpoints.

corviday · 2018-08-15T17:08:46Z

streamflow-api.yaml

+              example: rcp85
+            etc:
+              type: string
+              example: ... more tbd ...


Spatial coverage info is needed, I think, so that it's possible to tell whether a result for a given point can be generated from this hydromodel output.

Something like minimum and maximum latitude and longitude, as well as latitude and longitude cell size would be enough to provide context to the cell_x and cell_y values in the result endpoint.

Yuck, it looks like hydromodel output files have irregular spatial extents:

So in order to provide enough information to determine whether a result file for given point can be made from this hydromodel output, we may need to return a geoJSON polygon or something as the spatial output., not just a bounding box. That's a bit of a headache.

Hm, maybe if we need to provide actual spatial coverage, it could be a separate resource, like maybe, like /streamflow/hydromodel_output/4/bounds? We probably don't want to calculate it until a client is specifically looking for it, though caching could be a big help, it's not like the coverage of a netCDF file is going to change.

I agree completely. Caching will make a very big difference here, but it still may be better to provide the bounds at a separate endpoint. At this stage, I prefer that.

(BTW, the bounds would likely be a kind of information not supplied in the list resource that would be present in the hypothetical item resources. But partitioning it as a separate endpoint eliminates that question.)

Will we want to add filtering (to the list resource) by location? E.g., list all hydromodels that contain this location? Or some spatial extent?

Hm. Do you think it's better to do that sort of filtering in the back end or the front end?

Good question. If the unfiltered list is large or the filtering is compute-intensive, then backend, because there is some hope that caching can speed up the response for common or repeated searches. Otherwise (short list, cheap filtering) doing it on the frontend is neutral to better, since it simplifies the backend.

In this case, we have a short list. I don't know how compute-intensive filtering on spatial criteria is, but I'd guess it's not prohibitive. So we could probably do it just as easily on the frontend, assuming we can find a JS library with the required spatial testing method(s).

corviday · 2018-08-15T18:03:45Z

streamflow-api.yaml

+
+    # Individual streamflow results
+
+    StreamflowResultState:


Temporal coverage information would be useful. A start date, a stop date, and a resolution, maybe? Temporal coverage info would mean you could actually figure out what dates to request for the annual cycle resource.

Would that temporal coverage also (or alternatively) be part of the hydromodel_output, which I believe determines the temporal coverage of the computed result?

(There's no reason we can't add temporal metadata to both. I tend to want to not duplicate information, but this isn't a database that should be normalized, but information resources that are meant to be useful!)

Also, what do you recommend as the resolution datum?

A string enum like daily, monthly, yearly?

An exact interval specified either in seconds or by an ISO8601 duration string?

I think putting temporal coverage in either the result metadata or the hydromodel output metadata would be fine, as long as it's accessible somewhere. It is theoretically possible to generate a result with a shorter timespan than the hydromodel output, but I can't imagine why we'd want to, and don't see any reason to build in support for that possibility out of the gate.

Upon further reflection, I think I was wrong. The result metadata does not need a time resolution attribute because I can't see any reason a client would need that information. A client never works with the raw result data and doesn't need details of its representation in the datafile; the client only interacts with the data in the form of the aggregated annual-mean, annual-max, and annual-cycle endpoints.

Agree on both points. I think that the temporal metadata should go into hydromodel outputs as well.

corviday · 2018-08-15T18:31:56Z

streamflow-api.yaml

+                - rel: 'results/metadata'
+                  uri: 'http://apis.paccifcclimate.org/streamflow/results/6'
+                - rel: 'results/annual-max'
+                  uri: 'http://apis.paccifcclimate.org/streamflow/results/6/annual-max'


So it's the RESTy thing to do to include these links to annual cycle, annual mean, and annual max even though they don't exist when the order is created and will be 404s?

Great observation. REST has nothing to say about this either way. A link to an absent resource isn't really non-RESTful, but it's not really very logical either. A good rule is don't have links that you can know in advance are invalid.

corviday · 2018-08-15T18:32:30Z

streamflow-api.yaml

+                - rel: 'cancel'
+                  uri: 'http://apis.paccifcclimate.org/streamflow/orders/5'
+                - rel: 'hydromodel_output'
+                  uri: 'http://apis.paccifcclimate.org/hydromodel_outputs/4'


URI should probably be http://apis.pacificclimate.org/streamflow/hydromodel_outputs/4

I intentionally named the hydromodel resources as /hydromodel_outputs and not /streamflow/hydromodel_outputs because they aren't necessarily associated only with streamflow. Streamflow uses them, but so could other things.

Alternative viewpoint is that these resources are tuned to streamflow's needs, and we'd not want other concerns changing the API and breaking streamflow app(s).

What do you think?

I guess either way seems justifiable to me. No preference.

rod-glover · 2018-08-15T18:58:30Z

@corviday, I think your reasoning (first 2 paragraphs) is right on. It matches what I understand of resource-oriented/RESTful design, and my own opinions.

I think the "support multiple UIs" principle is also important. You want UIs to be able to slice and dice information in various ways (like your example). It's an anti-pattern to have endpoints that are tuned for specific views of the information; instead you want endpoints that deliver broader information that the UI can present in various ways.

This does argue for collection endpoints that deliver as much metadata as the individual endpoints. Given that our endpoints are pretty much 100% metadata, aside from the explicitly data endpoints, I think that clinches it.

corviday · 2018-08-16T19:30:28Z

streamflow-api.yaml

+
+
+  /streamflow/results/{id}/annual-cycle/{startYear}-{endYear}:
+    get:


A different way to handle the annual cycle resource that just occurred to me, if one of the REST design goals is to not make the client construct arbitrary URLs:

/streamflow/result/0 includes a link to /streamflow/result/0/annual-cycle (along with other metadata)

/streamflow/result/0/annual-cycle is solely a collection of links to annual cycles for valid climatologies present in this data, so it would have links to, eg:

/streamflow/result/0/annual-cycle/1961-1999

/streamflow/result/0/annual-cycle/1971-2000

/streamflow/result/0/annual-cycle/1981-2010

/streamflow/result/0/annual-cycle/2010-2039

/streamflow/result/0/annual-cycle/2040-2069

/streamflow/result/0/annual-cycle/2070-2099

Pages like /streamflow/result/0/annual-cycle/2070-2099 would actually return a timeseries.

Or there could be a climatology resource type, like /streamflow/result/0/climatology/1960-1989/annual-cycle or something, but I don't know of any other data analysis we'd like broken out by climatology, so I don't think that buys us anything.

You rock. This is an excellent idea.

This presumes that one knows in advance what climatologies are of interest. But it can't hurt to have a standard set listed.

corviday · 2018-08-17T19:06:22Z

I did a final readthrough of the API just now, and I think it looks great!

The only thing I noticed is that I think we've made the cell_x_index and cell_y_index attributes of result superfluous: a client doesn't need to care about cell indices in the netCDF file - a client cares about:

whether the point it is interested in corresponds to an existing result, which it can determine with cell_x, cell_y, cell_x_size, and cell_y_size on result
if not, whether the point it is interested in lies inside the spatial coverage of a particular hydromodel_output, which it can determine with hydromodel_output/{id}/bounds

There's no longer much use for cell_x_index or cell_y_index, though it doesn't do any harm to have them.

(I can, however, imagine scenarios in which a client might still want to know cell_x_size and cell_y_size on hydromodel_output; I can imagine a client that cares about the resolution of a dataset even if it doesn't have to calculate an exact cell itself.)

rod-glover · 2018-08-17T20:46:55Z

Good point. I'll remove the cell indices; leaner is better.

rod-glover · 2018-08-18T00:34:53Z

I think this is it for the API design ... except for security (user authentication and authorization). But that's a big enough issue that I'll create a separate issue and PR (if needed) for it.

rod-glover · 2018-08-18T01:05:15Z

Ack. API versioning. Which is a fraught subject. But important to consider and possibly do regardless.

corviday

Looks great to me.

rod-glover · 2018-08-21T18:49:57Z

Re. API versioning. Definitely worth reading the article I mentioned above, Your API versioning is wrong.

And it's even more worth reading the often very well-considered responses to that article. The general sentiment there seems to be that putting versioning in the resource URIs (e.g., /api/v2/...), however "wrong" it may be, is the most pragmatic approach and arguably not all that wrong anyway. The article's author agrees. I agree too.

So assuming we all agree on that, the question is whether to introduce versioning (in the URIs) now, or wait until there is actually another version. If we do that, then URIs sans version should default to v1 (to preserve compatibility) and anyone who wants v2 or later must specify so in the URIs.

I vote for the latter: initially without explicit version. I think that we will have a little more iteration before v1 can be considered stable, but that's OK. If there ever is a genuine v2, then we start versioning then.

corviday · 2018-08-22T17:57:47Z

That all seems reasonable to me.

rod-glover · 2018-08-23T20:25:58Z

The failing tests have nothing to do with the API spec. Gonna merge.

Add streamflow API spec

a20bbb7

rod-glover requested a review from corviday August 14, 2018 23:32

rod-glover commented Aug 14, 2018

View reviewed changes

Correct 409 Conflict message

6971a3c

rod-glover commented Aug 14, 2018

View reviewed changes

corviday reviewed Aug 15, 2018

View reviewed changes

rod-glover added 8 commits August 15, 2018 13:10

Add grid metadata and bounds endpoint for hydromodel_outputs resources

0d247a3

Add temporal metadata to streamflow results resources

4b5cf32

Fix syntax error

a74a671

Document and show variant links in streamflow orders reps

da72246

Correct all URIs to https://services.pacificclimate.org/...

584f46c

Add temporal metadata to hydromodel resources

01035c8

Add grid cell size and location metadata to results

912ede8

Make annual-cycle hyperlink a template (startYear - endYear)

a69e504

corviday reviewed Aug 16, 2018

View reviewed changes

rod-glover added 2 commits August 16, 2018 18:13

Clean up coding

114d1e7

Add endpoint for annual cycle standard climo periods

f2162d8

Remove cell indices from result representation

de83b5b

Improve descriptions

704a8fe

corviday approved these changes Aug 20, 2018

View reviewed changes

Rename Order status 'processing' to 'accepted'

acc6530

rod-glover merged commit 8fb7e35 into master Aug 23, 2018

rod-glover deleted the i86-streamflow-API branch August 23, 2018 20:26

rod-glover mentioned this pull request Aug 23, 2018

Travis CI tests are failing on master #91

Closed



		/streamflow/results/{id}/annual-cycle/{startYear}-{endYear}:
		get:

Add streamflow API spec #88

Add streamflow API spec #88

Conversation

rod-glover commented Aug 14, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rod-glover commented Aug 14, 2018 • edited Loading

corviday commented Aug 15, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rod-glover Aug 15, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

corviday Aug 16, 2018 • edited Loading

Choose a reason for hiding this comment

rod-glover commented Aug 15, 2018

corviday Aug 16, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

corviday commented Aug 17, 2018 • edited Loading

rod-glover commented Aug 17, 2018

rod-glover commented Aug 18, 2018 • edited Loading

rod-glover commented Aug 18, 2018

corviday left a comment

Choose a reason for hiding this comment

rod-glover commented Aug 21, 2018 • edited Loading

corviday commented Aug 22, 2018

rod-glover commented Aug 23, 2018

rod-glover commented Aug 14, 2018 •

edited

Loading

rod-glover commented Aug 14, 2018 •

edited

Loading

corviday commented Aug 15, 2018 •

edited

Loading

rod-glover Aug 15, 2018 •

edited

Loading

corviday Aug 16, 2018 •

edited

Loading

corviday Aug 16, 2018 •

edited

Loading

corviday commented Aug 17, 2018 •

edited

Loading

rod-glover commented Aug 18, 2018 •

edited

Loading

rod-glover commented Aug 21, 2018 •

edited

Loading