Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-model after time average #41

Closed
ledm opened this issue Mar 12, 2019 · 3 comments · Fixed by #1808
Closed

Multi-model after time average #41

ledm opened this issue Mar 12, 2019 · 3 comments · Fixed by #1808
Assignees
Labels
preprocessor Related to the preprocessor

Comments

@ledm
Copy link
Contributor

ledm commented Mar 12, 2019

Hey,

I'm hoping that someone can help me figure out whats going wrong here. I'm trying produce a multi-model mean of a 2D (x-z dimensional) field. It's a fairly complex preprocessor, several of the stages can be quite slow, and I'll need to run it over lots (dozens?) of model datasets. With that in mind, I'm trying to keep it lightweight:

  prep_transect: # For extracting a transect
    custom_order: true
    time_average:
    regrid:
      target_grid: 1x1
      scheme: linear
    zonal_means:
      coordinate: longitude
      mean_type: mean
    extract_levels:
      levels: [0.1, 0.5, 1, 10, 20, 40, 80, 120, 160, 200, 240, 280, 320, 360, 400, 440, 480, 520, 560, 600, 640, 680, 720, 760, 800, 840, 880, 920, 960, 1000, 1200, 1400, 1600, 1800, 2000, 2200, 2400, 2600, 2800, 3000, 3200, 3400, 3600, 3800, 4000, 4200, 4400, 4600, 4800, 5000, 5200, 5400, 5600, 5800]
      scheme: linear
    multi_model_statistics:
      span: full
      statistics: [mean, ]

(The extract_levels field is a bit silly, please don't worry about it too much.)

The problem that I'm seeing now is that the multi_model_statistics part doesn't produce any results. I think that this is because it can't find a time overlap between the files:

2019-03-12 15:56:35,921 UTC [29013] DEBUG   esmvaltool.preprocessor._multimodel:304 Multimodel statistics: computing: ['mean']
2019-03-12 15:56:35,923 UTC [29013] INFO    esmvaltool.preprocessor._multimodel:313 Time overlap between cubes is none or a single point.check datasets: will not compute statistics.

The first step of the preprocessor is to take a time average, as this reduces the workload of the function by an order of magnitude or more. However, I suspect that this is the reason why it can't find any overlap in the time range between the models.

Perhaps people can suggest a better way to do this - or perhaps a way to get the multi-model mean function to ignore the time overlap?

Cheers!

@valeriupredoi
Copy link
Contributor

@ledm _get_overlap() is your enemy here:

    all_times = []
    for cube in cubes:
        span = _datetime_to_int_days(cube)
        start, stop = span[0], span[-1]
        all_times.append([start, stop])
    bounds = [range(b[0], b[-1] + 1) for b in all_times]
    time_pts = reduce(np.intersect1d, bounds)
    if len(time_pts) > 1:

if the cube has a single time point then start = stop = single_cube_time_point and if single_cube_time_point is different from cube to cube by more than 1 then bounds will have no overlapping interval. You have two options: either make the cubes' single time points the same or introduce a condition in this function alike if start == stop: time_pts = range(start - 1, stop + 1), return time_pts - the second option will work fine but may introduce significant errors if the time points of the cubes are far apart in time eg you compute a multimodel between say, two models that one has its time point in 1971 and the second has its time point in 2019 which works functionally but has no statistical meaning

@valeriupredoi
Copy link
Contributor

so a minimal temporal check has to be done, but that one is up to you ie how far part in time you can still compute the multimodel stats

@mattiarighi mattiarighi added the preprocessor Related to the preprocessor label Jun 11, 2019
@mattiarighi mattiarighi transferred this issue from ESMValGroup/ESMValTool Jun 11, 2019
@ledm
Copy link
Contributor Author

ledm commented Jun 25, 2019

I found a bug related to this today. The multi_model fails when the template_cube has no latitude or longitude component. The solution is to move the lines:

        lats = template_cube.coord('latitude')
        lons = template_cube.coord('longitude')

inside the shape statements:

    if len(template_cube.shape) == 3:
        lats = template_cube.coord('latitude')
        lons = template_cube.coord('longitude')
        ...

The problem with this whole part of multimodel is that it relies on an arbitrary set of coordinates - it should be generalised.

For instance, this line

plev = template_cube.coord('depth')
)
has depth hardwired into it! Not useful for atmospheric people!.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
preprocessor Related to the preprocessor
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants