Data API call doesn't handle split climatology files #57

corviday · 2017-07-19T22:51:13Z

The data API call collects and returns data from all files that match a set of parameters (model, experiment, ensemble, variable, etc).

Previously, it took a time index argument between 0 and 16 to indicate time resolution and position sought, and would return the nth time slice in the file, which worked when all the files were 17-point chronologies.

Using split data files, with separate monthly (0-11), seasonal (0-3) and yearly(0) files, this API call returns a mix of monthly, seasonal, and annual data when you call it with &time=0, and an IndexError if you call it with any other time index.

Here's the result for GET /api/data?ensemble_name=downscaled&model=bcc-csm1-1-m&variable=tasmax&emission=historical,+rcp45&area=&time=0:

{
  "r1i1p1": {
    "units": "degC",
    "data": {
      "2085-07-02T00:00:00Z": -0.5808153910727394,
      "2025-07-02T00:00:00Z": -0.8622292459716103,
      "1977-01-16T00:00:00Z": -18.78877913696885,
      "1997-01-16T00:00:00Z": -17.92588300616977,
      "2055-07-02T00:00:00Z": -0.6485861922775469,
      "2055-01-16T00:00:00Z": -15.875643052269846,
      "1986-07-02T00:00:00Z": -2.4333102893767307,
      "1997-07-02T00:00:00Z": -2.077599634237314,
      "2025-01-16T00:00:00Z": -16.226023053533392,
      "2025-01-15T00:00:00Z": -18.02569051912951,
      "2085-01-16T00:00:00Z": -15.60062807695439,
      "1986-01-15T00:00:00Z": -19.950983475060585,
      "1977-07-02T00:00:00Z": -2.671051067797724,
      "1977-01-15T00:00:00Z": -20.599000150601793,
      "2055-01-15T00:00:00Z": -17.825752320828578,
      "1986-01-16T00:00:00Z": -18.487454919078537,
      "1997-01-15T00:00:00Z": -19.534196834187902,
      "2085-01-15T00:00:00Z": -17.498223073165622
    }
  }
}

This appears to be a mix of annual data (July 2 dates), monthly data (January 15), and seasonal data (January 16).

I don't have strong preferences about how exactly this query should behave in the split-file context. It doesn't need to function identically to the old query as long as it's possible to get the data in some reasonably straightforward way. For example, if given the new file structure it makes sense to expect the front end to pass a time resolution (monthly, seasonal, annual) along with the time index, that would be no trouble.

The text was updated successfully, but these errors were encountered:

rod-glover · 2017-07-20T18:32:05Z

The problem arises from the fact that this endpoint spans data files. Now that climatological data files are split into monthly, seasonal, and yearly subsets instead of being concatenated, this endpoint accesses data in all 3 of the split files, instead of from just one. Naturally enough, with a time index of 0, it pulls the first item from each of the 3 split files; when time index is > 0, it encounters an error because the yearly split file has only one (index 0) element.

Approaches to fixing this:

Map the existing 0-17 index to the correct (monthly, seasonal, or yearly) data file and adjusted index for that file.
Add a query parameter specifying the climatological averaging period of interest (monthly, seasonal, yearly).
Both of the above, since they are not mutually exclusive.

Option 2 is the cleanest and most flexible. Since Climate Explorer is the only client of this API, there is no reason to preserve the earlier 0-17 model, so there is no need to accommodate option 1. If that decision changes, option 1 can be added with minimal effort.

corviday · 2017-07-20T18:35:08Z

I like your second option better, it seems more flexible for future development, and it won't be much trouble to add the second parameter in the front end.

jameshiebert · 2017-07-20T18:41:11Z

I'd agree with both of you that option two is the cleanest and most explicit. Option 1 is more "magical" and requires specific knowledge about how we have historically represented climatologies. Option 2 is more sensible for someone coming in to the project fresh.

rod-glover · 2017-07-20T18:46:29Z

Hah, there is already such a query parameter and it appears that all should work as planned.

However, experiments on @corviday's dev database seem to indicate that the wrong values are returned even with timescale=monthly specified in the API request.

Will adjust the test framework to use split datasets and investigate further.

rod-glover mentioned this issue Jul 25, 2017

Handle climo files with separate chronology #60

Merged

rod-glover closed this as completed in #60 Aug 15, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data API call doesn't handle split climatology files #57

Data API call doesn't handle split climatology files #57

corviday commented Jul 19, 2017 •

edited

Loading

rod-glover commented Jul 20, 2017 •

edited

Loading

corviday commented Jul 20, 2017

jameshiebert commented Jul 20, 2017

rod-glover commented Jul 20, 2017 •

edited

Loading

Data API call doesn't handle split climatology files #57

Data API call doesn't handle split climatology files #57

Comments

corviday commented Jul 19, 2017 • edited Loading

rod-glover commented Jul 20, 2017 • edited Loading

corviday commented Jul 20, 2017

jameshiebert commented Jul 20, 2017

rod-glover commented Jul 20, 2017 • edited Loading

corviday commented Jul 19, 2017 •

edited

Loading

rod-glover commented Jul 20, 2017 •

edited

Loading

rod-glover commented Jul 20, 2017 •

edited

Loading