-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add basic adaptive masking to solve #29 #65
Add basic adaptive masking to solve #29 #65
Conversation
…o adaptative_masking
Salut Stephane, the reason, why the tests are failing, is that in tests/test_frontend.py the A solution is for example to prevent def test_build_regridder_with_masks():
ds_in_test=ds_in.copy()
ds_in_test['mask'] = xr.DataArray(np.random.randint(2, size=ds_in_test['data'].shape), dims=('y', 'x'))
# 'patch' is too slow to test
for method in [
'bilinear',
'conservative',
'conservative_normed',
'nearest_s2d',
'nearest_d2s',
]:
regridder = xe.Regridder(ds_in_test, ds_out, method)
# check screen output
assert repr(regridder) == str(regridder)
assert 'xESMF Regridder' in str(regridder)
assert method in str(regridder)
@pytest.mark.parametrize(
"method, adaptative_masking, nvalid", [
("bilinear", False, 380),
("bilinear", True, 395),
("conservative", False, 385),
("conservative", True, 394)
])
def test_adaptative_masking(method, adaptative_masking, nvalid):
dai = ds_in["data4D"].copy()
dai[0, 0, 4:6, 4:6] = np.nan
rg = xe.Regridder(ds_in, ds_out, method)
dao = rg(dai, adaptative_masking=adaptative_masking)
assert int(dao[0, 0, 1:-1, 1:-1].notnull().sum()) == nvalid In the following a few comments:
This is how these two suggestions could look in your function: @staticmethod
def _regrid_array(
indata, *,
weights, shape_in, shape_out,
sequence_in,
adaptative_masking):
[ ... ]
# interpreting adaptative_masking
if isinstance(adaptative_masking, bool):
mask_threshold=float(not adaptative_masking)
elif adaptative_masking>=1 or adaptative_masking<0:
adaptative_masking=False
mask_threshold=1.
else:
adaptative_masking=True
mask_threshold=float(adaptative_masking)
[ ... ]
# allow adaptive masking not only for non permanent masks
# and allow it for only 2 dimensions
# Is there any non-permament missing values?
ndim = np.ndim(indata)
if ndim > 2:
inmask = np.isnan(indata)
has_non_perm_mask = (
np.apply_over_axes(
np.sum, inmask, [-2, -1]).ravel().ptp() != 0)
if not adaptative_masking and has_non_perm_mask:
warnings.warn(
"Your data has transient missing values. "
"You should set adaptative_masking to True, "
"which will be the default in future versions.")
[ ... ]
|
Thanks alot Martin! |
Merge branch 'master' into adaptative_masking
@sol1105 can you test it before I rebase and squash? |
@stefraynaud I will test it soon and then get back to you |
@stefraynaud The adaptive masking works nicely, in the following some feedback:
import numpy as np
indata = np.zeros((2,3,3,4))
ndim=indata.ndim
# 1 # for one timestep and one level, a gridpoint is masked
# has_non_perm_mask should return True
indata[0,0,1,1]=np.nan
inmask = np.isnan(indata)
many = np.apply_over_axes(np.any, inmask, range(ndim-2)) # Applying over all axes but lat-lon
mall = np.apply_over_axes(np.all, inmask, range(ndim-2)) # assuming lat-lon are the rightmost axes
has_non_perm_mask = not (many == mall).all() # If both are the same, there is either none or a permanent mask
print("#1 should be True:", has_non_perm_mask)
# 2 # for one timestep and all levels, a gridpoint is masked
# has_non_perm_mask should return True
indata[0,:,1,1]=np.nan
inmask = np.isnan(indata)
many = np.apply_over_axes(np.any, inmask, range(ndim-2))
mall = np.apply_over_axes(np.all, inmask, range(ndim-2))
has_non_perm_mask = not (many == mall).all()
print("#2 should be True:", has_non_perm_mask)
# 3 # for all timesteps and all levels, a gridpoint is masked
# has_non_perm_mask should return False
indata[:,:,1,1]=np.nan
inmask = np.isnan(indata)
many = np.apply_over_axes(np.any, inmask, range(ndim-2))
mall = np.apply_over_axes(np.all, inmask, range(ndim-2))
has_non_perm_mask = not (many == mall).all()
print("#3 should be False:", has_non_perm_mask)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Questions:
- Does it work with dask arrays ?
- Does it significantly impact performance ?
Otherwise good for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this!
I am also interested in knowing how it impacts performances.
This needs to be tested through, but I believe it's already dask-proof because _regrid_array
is wrapped in apply_ufunc
. Thus, it always receives numpy arrays, no matter what __call__
receives.
Well, just tried it and it works really well. Even more so, it (almost) fixes #77! In fact, if we follow the comment of @sol1105 and allow it for Proposition, rename it Personnally, I don't see the advantage of checking for transient nan values when it is not activated. In the new perspective that this is a I am ok with passing |
Thank you all for these reviews which culminate with @aulemahal 's review. |
Super! With the risk of seeming too nitpicky, I would lean towards |
You are not nitpicky! In fact, I wanted to say |
Fix linting
41663eb
to
342a6c2
Compare
Hi guys, you can have a look to this new version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be nice to have some docs accompanying this new functionality, but it could be in another PR.
Co-authored-by: David Huard <huard.david@ouranos.ca>
Co-authored-by: David Huard <huard.david@ouranos.ca>
@huard yes I'm more in favor of creating another PR to add this feature to the notebook dedicated to masking. |
@raphaeldussin I'll let you merge this one since you requested changes. Good on my end. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
found 1 typo but very happy with the PR!
As explained in #29, the regridding leads to unexpected results when the input dataset has transient values i.e. when the number of missing values varies along dimensions other than the horizontal ones.
This PR fixes #29 by normalising the results with the regridded transient mask.