Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extra dimension gets added when not using lat and lon in xdim and ydim #31

Open
rwegener2 opened this issue Apr 18, 2022 · 0 comments
Open

Comments

@rwegener2
Copy link

rwegener2 commented Apr 18, 2022

What's happening

When I use Ocetrac with a dataset that has dimension names that aren't lat or lon (for example, y and x) I get the error TypeError: Only 2-D and 3-D images supported.. I specified the different dimension names using xdim and ydim.

ocetrac.Tracker(hot_water, mask, radius=2, min_size_quartile=0.75, timedim = 'time', xdim = 'x', ydim='y', positive=True)

Guess about the problem

When I traced this back it seems like what is going on is that when you specify an xdim or ydim that aren't lat or lon, lat and lon still get added as new dimensions and the user specified xdim and ydim don't get removed. That is why the resulting error is scolding you for not having 2 or 3 dimensions, because you now have 5 (time, x, y, lat, lon).

Example

This example uses a subset of the MUR SST dataset from AWS. I manually changed the dimension names to x and y to demonstrate the error.
The example takes about 1 minute to run. You need s3fs installed to open the data.

Example Code
import fsspec
import xarray as xr
import ocetrac


# Load zarr SST (NASA MUR)
ikey = fsspec.get_mapper('s3://mur-sst/zarr', anon=True)

mur_full = xr.open_zarr(ikey, consolidated=True)
mur = mur_full['analysed_sst']

# Get a small area for this test
mur_subset = mur.sel(lat=slice(32, 32.5), lon=slice(121.4, 122.2))

# Compute the climatology
climatology = mur_subset.groupby(mur_subset.time.dt.month).mean()

# Calculate the anomaly (with only a month's worth of data)
mur_2018_subset = mur_subset.sel(time='2018-06')
anomaly = mur_2018_subset.groupby(mur_2018_subset.time.dt.month) - climatology
anomaly = anomaly.load()

# Create a land mask
mask = xr.where(mur_2018_subset.isel(time=0) <= 270, 0, 1)
mask = mask.load()

# Calculate the 90th percentile values
percentile = 0.9
threshold = mur_subset.groupby(mur_subset.time.dt.month).quantile(percentile, dim='time', keep_attrs=True, skipna=True, )

# Use the threhold values to mask out values that are not marine heatwave
hot_water = anomaly.where(mur_2018_subset.groupby(mur_2018_subset.time.dt.month)>threshold)
hot_water = hot_water.load()

# Change the dimensions from lat, lon to x, y (in order to show the error)
hot_water = hot_water.rename({'lon': 'x', 'lat': 'y'})
# Run Ocetrac
Tracker = ocetrac.Tracker(hot_water, mask, radius=2, min_size_quartile=0.75, timedim = 'time', xdim = 'x', ydim='y', positive=True)
blobs = Tracker.track()

Removing the line hot_water = hot_water.rename({'lon': 'x', 'lat': 'y'}) and using Tracker = ocetrac.Tracker(hot_water, mask, radius=2, min_size_quartile=0.75, timedim = 'time', xdim = 'lon', ydim='lat', positive=True) instead works fine. (No objects are found, but we get past the dimension error)

Full Traceback
TypeError                                 Traceback (most recent call last)
<ipython-input-5-fea20ce37126> in <module>
     44 Tracker = ocetrac.Tracker(hot_water, mask, radius=2, min_size_quartile=0.75, timedim = 'time', xdim = 'x', ydim='y', positive=True)
     45 # works: Tracker = ocetrac.Tracker(hot_water, mask, radius=2, min_size_quartile=0.75, timedim = 'time', xdim = 'lon', ydim='lat', positive=True)
---> 46 blobs = Tracker.track()

~/.conda/envs/mhw/lib/python3.8/site-packages/ocetrac/tracker.py in track(self)
     78 
     79         # Filter area
---> 80         area, min_area, binary_labels, N_initial = self._filter_area(binary_images_with_mask)
     81 
     82         # Label objects

~/.conda/envs/mhw/lib/python3.8/site-packages/ocetrac/tracker.py in _filter_area(self, binary_images)
    199 
    200         # Calculate Area of each object and keep objects larger than threshold
--> 201         props = regionprops(labels_wrapped.astype('int'))
    202 
    203         labelprops = [p.label for p in props]

~/.conda/envs/mhw/lib/python3.8/site-packages/skimage/measure/_regionprops.py in regionprops(label_image, intensity_image, cache, coordinates)
    851 
    852     if label_image.ndim not in (2, 3):
--> 853         raise TypeError('Only 2-D and 3-D images supported.')
    854 
    855     if not np.issubdtype(label_image.dtype, np.integer):

TypeError: Only 2-D and 3-D images supported.

Status

I am happy to work on a fix for this. I think the dimension is getting added either in the _apply_mask() or _filter_area() area functions. If anyone has additional thoughts on where to start I definitely welcome those, otherwise I'll post updates here. I am assuming the behavior we want is to use the user specified xdim and ydim and avoid the behavior where lat and lon get added to the Dataset.
Also, if I am just using Ocetrac wrong and this isn't actually a bug please let me know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant