Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Odd errors from using pixel_overlaps with a weights option #28

Closed
jrising opened this issue Feb 6, 2022 · 15 comments
Closed

Odd errors from using pixel_overlaps with a weights option #28

jrising opened this issue Feb 6, 2022 · 15 comments

Comments

@jrising
Copy link

jrising commented Feb 6, 2022

This issue is sort of three issues that I encountered while trying to solve a problem. Fixes to any of these would work for me.

I'm trying to use xagg with some fairly large files including a weights file, and I was getting an error during the regridding process:

>>> weightmap = xa.pixel_overlaps(ds_tas, gdf_regions, weights=ds_pop.Population, subset_bbox=False)
creating polygons for each pixel...
lat/lon bounds not found in dataset; they will be created.
regridding weights to data grid...
Create weight file: bilinear_1800x3600_1080x2160.nc
zsh: illegal hardware instruction  python

(at which point, python crashes)

I decided to do the regridding myself and save the result. Here are what the data file (ds_tas) and weights file (ds_pop) look like:

>>> ds_tas
<xarray.Dataset>
Dimensions:      (band: 12, x: 2160, y: 1080)
Coordinates:
  * band         (band) int64 1 2 3 4 5 6 7 8 9 10 11 12
  * x            (x) float64 -179.9 -179.8 -179.6 -179.4 ... 179.6 179.7 179.9
  * y            (y) float64 89.92 89.75 89.58 89.42 ... -89.58 -89.75 -89.92
    spatial_ref  int64 ...
Data variables:
    band_data    (band, y, x) float32 ...

>>> ds_pop
<xarray.Dataset>
Dimensions:     (longitude: 2160, latitude: 1080)
Coordinates:
  * longitude   (longitude) float64 -179.9 -179.8 -179.6 ... 179.6 179.8 179.9
  * latitude    (latitude) float64 89.92 89.75 89.58 ... -89.58 -89.75 -89.92
Data variables:
    crs         int32 ...
    Population  (latitude, longitude) float32 ...
Attributes:
    Conventions:  CF-1.4
    created_by:   R, packages ncdf4 and raster (version 3.4-13)
    date:         2022-02-05 22:14:16

The dimensions line up exactly. But xagg still wanted to regrid my weights file. My guess is that this is because the dimensions are labeled differently (and so an np.allclose fails because taking a difference between the coordinates results in a 2-D matrix).

So I relabeled my coordinates and dimensions. This results in a new error:

>>> weightmap = xa.pixel_overlaps(ds_tas, gdf_regions, weights=ds_pop.Population, subset_bbox=False)
creating polygons for each pixel...
lat/lon bounds not found in dataset; they will be created.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/admin/opt/anaconda3/envs/ccenv2/lib/python3.7/site-packages/xagg/wrappers.py", line 50, in pixel_overlaps
    pix_agg = create_raster_polygons(ds,subset_bbox=None,weights=weights)
  File "/Users/admin/opt/anaconda3/envs/ccenv2/lib/python3.7/site-packages/xagg/core.py", line 127, in create_raster_polygons
    ds = get_bnds(ds)
  File "/Users/admin/opt/anaconda3/envs/ccenv2/lib/python3.7/site-packages/xagg/aux.py", line 190, in get_bnds
    bnds_tmp[1:,:] = xr.concat([ds[var]-0.5*ds[var].diff(var),
  File "/Users/admin/opt/anaconda3/envs/ccenv2/lib/python3.7/site-packages/xarray/core/_typed_ops.py", line 209, in __sub__
    return self._binary_op(other, operator.sub)
  File "/Users/admin/opt/anaconda3/envs/ccenv2/lib/python3.7/site-packages/xarray/core/dataarray.py", line 3081, in _binary_op
    self, other = align(self, other, join=align_type, copy=False)
  File "/Users/admin/opt/anaconda3/envs/ccenv2/lib/python3.7/site-packages/xarray/core/alignment.py", line 349, in align
    f"arguments without labels along dimension {dim!r} cannot be "
ValueError: arguments without labels along dimension 'lat' cannot be aligned because they have different dimension sizes: {1080, 1079}

To be clear, neither of my datasets has a dimension of size 1079.

@jrising
Copy link
Author

jrising commented Feb 6, 2022

I was able to get past the regridding step by running fix_ds and get_bnds on the datasets myself, and by changing the first test in

if ((not ((ds.sizes['lat'] is weights.sizes['lat']) & (ds.sizes['lon'] == weights.sizes['lon']))) or

from is to == (which then matches the second test).

@ks905383
Copy link
Owner

ks905383 commented Feb 7, 2022

oof yeah that's not great.

Maybe making sure the code runs fix_ds() on every dataset that gets inputted (including weights, etc.) is probably the way to go?

More broadly this is more motivation to implement #1 , since cf_xarray should dispose of all of these lat/lon convention issues.

I'll take a look.

@ks905383
Copy link
Owner

ks905383 commented Feb 17, 2022

I changed that equality check as you had suggested in the main branch (will be published with v0.3, which will have a huge performance boost likely in general as well); I'm also bumping the adoption of cf_xarray up the priority list.

@jrising
Copy link
Author

jrising commented Feb 18, 2022

@ks905383 I ran into a few more issues-- more importantly that the aggregation process was super slow-- and fixed them in my own version of the code. But now I can't figure out which version of the code conda had installed for me. I installed it fairly recently (a month ago?), but it doesn't seem to match 0.2.4 - 0.2.6. Any clues?

@ks905383
Copy link
Owner

That's odd - anything between about 6 months ago until yesterday should be 0.2.5.

I'd be curious to hear about your performance improvements in any case.

@jrising
Copy link
Author

jrising commented Feb 21, 2022

First, on the versioning, if I install xagg from pip (which was the only way I eventually got it to work with the right constellation of package versions), I get the old version. I just confirmed this now. It's an older version than anything in the xagg repository-- as an indication, the docstring of the normalize function in aux.py starts with "Normalizes the vector a", but you can't even get to that version within github's blame.

My main speed-up was to remove a bunch of the lines from the loop here:

for loc_idx in np.arange(0,ds_bnds.dims['loc']):

The only line I think that needs to be in that loop is gdf_pixels.loc[loc_idx,'geometry'] = Polygon(pix_poly_coords[loc_idx]). The other lines can be moved out, replacing, e.g., gdf_pixels['lat'] = [None]*ds_bnds.dims['loc'] with gdf_pixels['lat'] = ds_bnds.lat.values.

@ks905383
Copy link
Owner

Hm, truly odd - it's all updated on pypi. Maybe there are some conflicting dependencies. Have you tried installing it in a fresh environment? Would be good to see if that reproduces the issue.

Re: the speed-up - yes, that would be a huge improvement, I don't know what I was thinking when writing that for-loop. Do you want to start a PR? Otherwise I can just do it.

@jrising
Copy link
Author

jrising commented Feb 23, 2022

On a new environment I get:

% conda create --name testxagg
...
% conda activate testxagg 
...
% conda install pip
...
% pip install xagg
Collecting xagg
  Using cached xagg-0.2.6-py3-none-any.whl (34 kB)
Collecting geopandas
  Using cached geopandas-0.10.2-py2.py3-none-any.whl (1.0 MB)
Collecting pandas
  Using cached pandas-1.4.1-cp39-cp39-macosx_10_9_x86_64.whl (11.5 MB)
Collecting xagg
  Using cached xagg-0.2.5-py3-none-any.whl (33 kB)
  Using cached xagg-0.2.4-py3-none-any.whl (33 kB)
  Using cached xagg-0.1.4-py3-none-any.whl (60 kB)
Installing collected packages: xagg
Successfully installed xagg-0.1.4

So, it wants to install xagg-0.2.6, and the versions of geopandas and pandas that it grabs are the most recent on pypi, but then it gets kicked down to xagg-0.1.4 at the end. pip install -U xagg doesn't change anything. If I do pip install xagg==0.2.6, I get the error:

ERROR: Could not find a version that satisfies the requirement esmpy>=8.1.0 (from xagg) (from versions: none)
ERROR: No matching distribution found for esmpy>=8.1.0

I would make a PR, but I don't see a 0.1.4 tag in the current xagg repo. I guess I could set up xagg-archive as a separate remote and then try to branch off of that and update.

@ks905383
Copy link
Owner

Ah, the problem seems to be that esmpy isn't updated on pip anymore - it's now only available via conda or a source install. We had to force esmpy>=8.1.0 because of an unrelated issue with a dependency of xesmf (not even related to the core functionality; something involving cf_xarray compatibility).

I was able to replicate the issue on my machine. There are a few options I'm trying. One would just be a source install of the latest esmpy, which isn't ideal, but should work. There's also an unofficial port of esmpy back onto pip as pyESMF (here's a link to the project), but the installation doesn't seem to be working on a clean environment, due to some issue with the wheel.

I'll keep you updated.

I'll just add the speedup fix myself then, will be great to add it to the other speed fixes coming up in v0.3.0

@ODOU
Copy link

ODOU commented Feb 24, 2022

I have also got this error while installing the xagg package. I got this message ModuleNotFoundError: No module named 'xesmf'
and when I have tried to install 'xesmf' but i got this error message : ERROR: Cannot install xesmf==0.1.1, xesmf==0.1.2, xesmf==0.2.0, xesmf==0.2.1, xesmf==0.2.2 and xesmf==0.3.0 because these package versions have conflicting dependencies.

I would like to know if someone were able to install xagg recently. I have tried many ways to install but it failed.

@ks905383
Copy link
Owner

ks905383 commented Feb 26, 2022

Hi - so import xagg from conda should install fine (I just did it in a new environment), and get you v0.2.6. However, numba, which is a dependency of xesmf, may be running into some issues because the latest version of numba needs NumPy <=1.21.0.

Here is a sequence that just worked for me:

  1. Install xagg through conda-forge (conda install -c conda-forge xagg)
  2. downgrade NumPy to 1.21.0, which only works through pip right now (pip install numpy==1.21.0)

This should allow you to use xagg.

This doesn't yet solve @jrising 's problem if you can't use conda, but it does mean that you can install it. I'm honestly not sure when we can get the issues relating to xesmf's dependencies fixed - it's a very complex package that depends on things that tend to break often (esmpy, etc.)... but we'll keep you updated!

ks905383 added a commit that referenced this issue Feb 28, 2022
Until the current installation issues with `conda`, `pip` are fixed, the workaround mentioned in #28  should be posted in an easier spot to see
@ks905383
Copy link
Owner

ks905383 commented Jul 3, 2023

@jrising are these issues still occurring? (If so, can you try with xagg==0.3.1?)

@jrising
Copy link
Author

jrising commented Jul 4, 2023

@ks905383 I'm not going to have a chance to check for a couple weeks. But I can look after that.

@jrising
Copy link
Author

jrising commented Jul 28, 2023

I can't find the code that produced these errors, but I have since been able to successfully use xagg under similar conditions. So, I think it's fixed. Feel free to close.

@ks905383
Copy link
Owner

ks905383 commented Aug 7, 2023

Great, thanks for checking!

@ks905383 ks905383 closed this as completed Aug 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants