Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Order or grid_cells and grid_coordinates() #81

Closed
coroa opened this issue Jun 4, 2020 · 2 comments
Closed

Order or grid_cells and grid_coordinates() #81

coroa opened this issue Jun 4, 2020 · 2 comments

Comments

@coroa
Copy link
Member

coroa commented Jun 4, 2020

Originally posted by @euronion in #20 (comment)

Order or grid_cells and grid_coordinates()

I am thinking more along the line of: Are people expecting the order in the lists returned by cutout.grid_cells and cutout.grid_coordinates() to be reliable or not.
Maybe it does cancel out in PyPSA-EUR or one codes it independent of this order (I just did so).
But maybe people are expecting the order to be consistent between version, i.e. then we are changing the API behaviour in an unexpected way -> we should at lteast add this to the release notes.

The bigger problem

The code with a more hideous and less obvious problem I am refering to is this one [1]:

    profile, capacities = func(
        # Redindex, stack index and conv. to CSR to be consistent with the format expected by atlite
        matrix=layoutmatrix.reindex_like(
            cutout.data).stack(spatial=('y', 'x')).data.tocsr(),
        index=buses,
        per_unit=True,
        return_capacity=True,
        show_progress='Compute profiles: ',
        **resource)

More specifically how the matrix passed to atlite has to be structured. This matrix is 2D, where one dimension is reserved for the index=buses and the other dimension is a .stack()-ed version of a 2D array.
Now the behaviour of .stack() depends on the order of the index, that's why the .reindex_like(...) is relevant here. In the version before it looked like this [2]:

    profile, capacities = func(matrix=layoutmatrix, index=buses, per_unit=True,
                               return_capacity=True, show_progress='Compute profiles: ',
                               **resource)

where the matrix=layoutmatrix as implicitly assuming the 2D array of values refering to the coordinates in the cutout to be stacked with descending y and ascending x. That was the old atlite structure.
If I use the same code with the new atlite=v0.2 the result is different, because we assume (inside atlite!) that this matrix is stacked with ascending y and x.

It boils down to

cutout.sortby('y', ascending=True).stack(spatial=('y','x'))

not being the same as

cutout.sortby('y', ascending=False).stack(spatial=('y','x'))

but we use

cutout.stack(spatial=('y','x')

here

da = da.stack(spatial=('y', 'x')).transpose('spatial', 'time')

Possible solution

The easiest solution could be to guarantee the old behaviour by fixing the index order of created cutouts during their creation

cutout.data = cutout.data.sortby('y', ascending=False)
cutout.data = cutout.data.sortby('x', ascending=True)

This would allow us to make the change in the .Cutout(...) signature while ensuring backwards compatability.

Links

[1] Source: New code in PyPSA-EUR for the upcoming atlite version
https://github.com/PyPSA/pypsa-eur/blob/283042d1dd3cd11b5313479c7ca03cbe58f778c9/scripts/build_renewable_profiles.py#L390-L398
[2] Source: https://github.com/PyPSA/pypsa-eur/blob/bb3477cd693d1c8e77e75e61a1a7e1a4647e6a3c/scripts/build_renewable_profiles.py#L303-L305

@coroa coroa changed the title ## Order or grid_cells and grid_coordinates() Order or grid_cells and grid_coordinates() Jun 4, 2020
@coroa
Copy link
Member Author

coroa commented Jun 5, 2020

I never addressed this properly, because I was extremely busy and unhappy with personal matters; and it's quite subtle.

The short answer is: Yes, of course, if we change the ordering of the y dimension of the coordinate system, then it is necessary that we change the order of the columns in the matrix and, since grid_coordinates/grid_cells returns the basis of the columns it must change accordingly as well.

Do people expect this? I don't know. I thought it was obvious. Quite obviously, I'm wrong.

The code you are referring to is hideous because you are not following the convention that y is ascending in the layoutmatrix. And yes, it seems like sparse does not help you too much to compensate that.

What you could do, for example, is to switch to this convention early on, ie. use

    availabilities = xr.DataArray(
        np.array(availabilities),
        coords=[
            buses,
            y[::-1], # GDAL uses a north to south latitude
            x
        ],
        dims=['bus', 'y', 'x']
    ).isel(y=slice(None, None, -1))

for https://github.com/PyPSA/pypsa-eur/blob/f5382021a42dcd3fb55cfe18d3648972857d653f/scripts/build_renewable_profiles.py#L343-L350 .

Or embrace it even earlier just after calling extractMatrix ie use

    return gk.raster.extractMatrix(availability)[::-1] # GDAL uses a north to south latitude

Or just don't use sparse and stick with the final matrix structure as csr by following my initial advise #20 (comment) .

I don't understand the possible solution, you mention. The index order is fixed now as ascending y and x, we do not need any sort operations for this. I do not think it is less good than the other possible convention. I can come up with pros and cons for each, if we want to compile a list.

There will always be different conventions, it is important to communicate them clearly, but ultimately it is up to the users to follow them!

And I would even go a bit further and say that any mechanisms that try to predict and compensate for user errors automatically will on the long run lead to further and then less obvious mistakes.

@euronion
Copy link
Collaborator

euronion commented Jun 8, 2020

Adressed in 12a86b3 in #20 .

A few comments:

Do people expect this? I don't know. I thought it was obvious. Quite obviously, I'm wrong.

I doubt it. One would have to care about the index order in cutouts, know that it changes and know that the functions are internally designed that way (no doc string mentioning this).

What you could do, for example, is to switch to this convention early on, ie. use
...

Thanks for the advice. Since the changes affect more than just this piece it is insufficient.
I prefer to keep the meta data and data close together with sparse, making the conversion step more obvious via reindex_like suits my style more than multiple list reversals with [::-1].

There will always be different conventions, it is important to communicate them clearly, but ultimately it is up to the users to follow them!

Of course. But we never exposed or documented them.

And I would even go a bit further and say that any mechanisms that try to predict and compensate for user errors automatically will on the long run lead to further and then less obvious mistakes.

I fully agree!

@euronion euronion closed this as completed Jun 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants