Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data API call doesn't handle split climatology files #57

Closed
corviday opened this issue Jul 19, 2017 · 4 comments · Fixed by #60
Closed

Data API call doesn't handle split climatology files #57

corviday opened this issue Jul 19, 2017 · 4 comments · Fixed by #60

Comments

@corviday
Copy link
Contributor

corviday commented Jul 19, 2017

The data API call collects and returns data from all files that match a set of parameters (model, experiment, ensemble, variable, etc).

Previously, it took a time index argument between 0 and 16 to indicate time resolution and position sought, and would return the nth time slice in the file, which worked when all the files were 17-point chronologies.

Using split data files, with separate monthly (0-11), seasonal (0-3) and yearly(0) files, this API call returns a mix of monthly, seasonal, and annual data when you call it with &time=0, and an IndexError if you call it with any other time index.

Here's the result for GET /api/data?ensemble_name=downscaled&model=bcc-csm1-1-m&variable=tasmax&emission=historical,+rcp45&area=&time=0:

{
  "r1i1p1": {
    "units": "degC",
    "data": {
      "2085-07-02T00:00:00Z": -0.5808153910727394,
      "2025-07-02T00:00:00Z": -0.8622292459716103,
      "1977-01-16T00:00:00Z": -18.78877913696885,
      "1997-01-16T00:00:00Z": -17.92588300616977,
      "2055-07-02T00:00:00Z": -0.6485861922775469,
      "2055-01-16T00:00:00Z": -15.875643052269846,
      "1986-07-02T00:00:00Z": -2.4333102893767307,
      "1997-07-02T00:00:00Z": -2.077599634237314,
      "2025-01-16T00:00:00Z": -16.226023053533392,
      "2025-01-15T00:00:00Z": -18.02569051912951,
      "2085-01-16T00:00:00Z": -15.60062807695439,
      "1986-01-15T00:00:00Z": -19.950983475060585,
      "1977-07-02T00:00:00Z": -2.671051067797724,
      "1977-01-15T00:00:00Z": -20.599000150601793,
      "2055-01-15T00:00:00Z": -17.825752320828578,
      "1986-01-16T00:00:00Z": -18.487454919078537,
      "1997-01-15T00:00:00Z": -19.534196834187902,
      "2085-01-15T00:00:00Z": -17.498223073165622
    }
  }
}

This appears to be a mix of annual data (July 2 dates), monthly data (January 15), and seasonal data (January 16).
screenshot from 2017-07-19 15-49-44

I don't have strong preferences about how exactly this query should behave in the split-file context. It doesn't need to function identically to the old query as long as it's possible to get the data in some reasonably straightforward way. For example, if given the new file structure it makes sense to expect the front end to pass a time resolution (monthly, seasonal, annual) along with the time index, that would be no trouble.

@rod-glover
Copy link
Contributor

rod-glover commented Jul 20, 2017

The problem arises from the fact that this endpoint spans data files. Now that climatological data files are split into monthly, seasonal, and yearly subsets instead of being concatenated, this endpoint accesses data in all 3 of the split files, instead of from just one. Naturally enough, with a time index of 0, it pulls the first item from each of the 3 split files; when time index is > 0, it encounters an error because the yearly split file has only one (index 0) element.

Approaches to fixing this:

  1. Map the existing 0-17 index to the correct (monthly, seasonal, or yearly) data file and adjusted index for that file.
  2. Add a query parameter specifying the climatological averaging period of interest (monthly, seasonal, yearly).
  3. Both of the above, since they are not mutually exclusive.

Option 2 is the cleanest and most flexible. Since Climate Explorer is the only client of this API, there is no reason to preserve the earlier 0-17 model, so there is no need to accommodate option 1. If that decision changes, option 1 can be added with minimal effort.

@corviday
Copy link
Contributor Author

I like your second option better, it seems more flexible for future development, and it won't be much trouble to add the second parameter in the front end.

@jameshiebert
Copy link
Contributor

I'd agree with both of you that option two is the cleanest and most explicit. Option 1 is more "magical" and requires specific knowledge about how we have historically represented climatologies. Option 2 is more sensible for someone coming in to the project fresh.

@rod-glover
Copy link
Contributor

rod-glover commented Jul 20, 2017

Hah, there is already such a query parameter and it appears that all should work as planned.

However, experiments on @corviday's dev database seem to indicate that the wrong values are returned even with timescale=monthly specified in the API request.

Will adjust the test framework to use split datasets and investigate further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants