Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

31 file part3advanced ufunc #91

Merged
merged 7 commits into from
Nov 6, 2023

Conversation

tinaok
Copy link
Collaborator

@tinaok tinaok commented Nov 4, 2023

It should work on EOSC, but I only tested on local environment.

It is on purpose that it does not do 'compute' or 'persist' after apply_ufunc. (We can remind about 'lazyness' of Xarray-Dask there too)

@jdries I hope this update is fine for connecting to your section.

Setted 3 Gateway cells as 'Raw' so that users will activate during the lesson.  This should make jupyterbook rendering possible for this notebook
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@@ -2,9 +2,7 @@
"cells": [
Copy link
Collaborator

@pl-marasco pl-marasco Nov 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe would be nice to use the same approach I used in the notebook for warnings just to be more consistent


Reply via ReviewNB

@@ -2,9 +2,7 @@
"cells": [
Copy link
Collaborator

@pl-marasco pl-marasco Nov 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you didn't test the time and have numbers I would suggest to add a statement that where you at least make users aware that in some cases apply Ufunc can be slower that the usual approach.


Reply via ReviewNB

@@ -2,9 +2,7 @@
"cells": [
Copy link
Collaborator

@pl-marasco pl-marasco Nov 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I would suggest to be more clear about what the UFunc does. Below the description from the documentation.

"Apply a vectorized function for unlabeled arrays on xarray objects.

The function will be mapped over the data variable(s) of the input arguments using

xarray’s standard rules for labeled computation, including alignment, broadcasting,

looping over GroupBy/Dataset variables, and merging of coordinates. "

It's not only a matter of mapping over the chunks


Reply via ReviewNB

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it would be beneficial to add further explanation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this notebook, we 'introduce' apply_ufunc, and in next OpenEO part it comes back to it. So I left it in 'light'. Please go ahead for adding explanation if you have one to make users understand better.

Copy link
Collaborator

@keewis keewis Nov 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To explain apply_ufunc, it is important to note that it takes a function that works with numpy-like arrays and wraps it to support xarray objects. In case of a reduction, this means that the original function takes axis numbers instead of dimension names. "vectorized" itself is not that important (though it does make a difference performance-wise), that's what the vectorize parameter takes care of.

If you're looking for a more comprehensible explanation, https://tutorial.xarray.dev/advanced/apply_ufunc/simple_numpy_apply_ufunc.html is the place to look.

Copy link
Collaborator

@keewis keewis Nov 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this?

* here we use xarray.apply_ufunc to apply the function written for numpy-like arrays to the xarray.DataArray object. Note that in this case the function is composed of element-wise functions and thus can be directly applied to the underlying daskarrays.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, I would use np.isin instead of comparing all array values with 8,9 and 3 separately for the mask

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To explain apply_ufunc, it is important to note that it takes a function that works with numpy-like arrays and wraps it to support xarray objects. In case of a reduction, this means that the original function takes axis numbers instead of dimension names. "vectorized" itself is not that important (though it does make a difference performance-wise), that's what the vectorize parameter takes care of.

for that reason I'm pushing to use a different example but I don't think that we have time

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keewis I will add the resource you suggested at the top

@@ -2,9 +2,7 @@
"cells": [
Copy link
Collaborator

@pl-marasco pl-marasco Nov 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also change a little bit the scope otherwise could be boring like hell see all the time the same stuff.


Reply via ReviewNB

Copy link
Member

@acocac acocac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tinaok all ok running the notebook in Pangeo-EOSC, however it seems it fails to build in the Jupyter book.

I'd suggest add a if/else statement that detects if the environment has dask.gateaway installed, otherwise run the dask.distributed.

@acocac acocac self-requested a review November 6, 2023 07:25
@annefou annefou merged commit b1a7912 into pangeo-data:main Nov 6, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants