remove for loops #16

malmans2 · 2021-06-07T17:51:53Z

@oceandie
There's a couple of things a think xarray can simplify:

No need to initialize variables. You can write the code without even knowing the shape of your variables (e.g., 1D z, 2D sigma, ..) and xarray will automatically broadcast whenever it's needed.
Because of that, you could get rid of a for loop you had in __call__

codecov · 2021-06-07T18:28:47Z

Codecov Report

❗ No coverage uploaded for pull request base (zco_dev@ce734d8). Click here to learn what that means.
The diff coverage is n/a.

❗ Current head 92ebee3 differs from pull request most recent head adc8fb1. Consider uploading reports for the commit adc8fb1 to get more accurate results

@@            Coverage Diff             @@
##             zco_dev      #16   +/-   ##
==========================================
  Coverage           ?   92.30%           
==========================================
  Files              ?        2           
  Lines              ?       39           
  Branches           ?        0           
==========================================
  Hits               ?       36           
  Misses             ?        3           
  Partials           ?        0

Flag	Coverage Δ
unittests	`92.30% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ce734d8...adc8fb1. Read the comment docs.

jdha · 2021-06-07T19:48:23Z

pydomcfg/domzgr/zgr.py

-            range(1, self._jpk + 1), coords={"z": range(self._jpk)}, dims="z"
-        ).astype(float)
+        kindx = DataArray(range(self._jpk), coords={"z": range(self._jpk)}, dims="z")
+        kindx = kindx + 1.0  # to deal with python convention


to be consistent with ln 60:
kindx += 1.0 # Fortran indexing

Can't do it because int += float is not allowed with DataArrays.
I don't like it either, so many options that I can't decide what would be the best to clarify what's going on:

kindx = DataArray(np.arange(1., self._jpk+1), coords={"z": range(self._jpk)}, dims="z") # or kindx = DataArray(range(1, self._jpk+1), coords={"z": range(self._jpk)}, dims="z").astype("float") # or kindx = DataArray(range(self._jpk), coords={"z": range(self._jpk)}, dims="z").astype("float") kindx += 1.

Thoughts?

My vote is for: kindx = DataArray(np.arange(1., self._jpk+1), coords={"z": range(self._jpk)}, dims="z")

oceandie · 2021-06-08T07:23:19Z

Hi @pyNEMO/pydomcfg guys,

That's very interesting, thanks a lot. Getting rid of loops and writing a more efficient pythonic code is surely our ultimate goal!
However I am not so sure the solution you are proposing here would work in general. Let me explain.

While in z-coordinates z3 is effectively only 1D, i.e. z3(k), in s- and MEs-coordinates z3 is completely 3D z3(x,y,z). In addition, in s- and MEs-coordinates the coordinate transformation will have some parameters that will vary with the 2D bathymetry (e.g., the transitioning from sigma to stretched s-coordinates in sh94 or sf12). I personally think that using loop structures will help me writing faster and a more cleaner code that I could debug and check more easily. Of course the price would be the less efficiency , but I think we could worry about this at the end of this first phase, when we will a more comprehensive picture of all the cases we will have to deal with.

So my suggestion would be to leave all these developments in a branch, continuing with the looping code and once we will have a base version of the code that we will be sure is able to correctly replicate the Fortran90 version then we could massively invest our energies in optimise the code and make it more efficient.

What do you think?

jdha · 2021-06-08T07:44:40Z

Hi @pyNEMO/pydomcfg guys,

That's very interesting, thanks a lot. Getting rid of loops and writing a more efficient pythonic code is surely our ultimate goal!
However I am not so sure the solution you are proposing here would work in general. Let me explain.

While in z-coordinates z3 is effectively only 1D, i.e. z3(k), in s- and MEs-coordinates z3 is completely 3D z3(x,y,z). In addition, in s- and MEs-coordinates the coordinate transformation will have some parameters that will vary with the 2D bathymetry (e.g., the transitioning from sigma to stretched s-coordinates in sh94 or sf12). I personally think that using loop structures will help me writing faster and a more cleaner code that I could debug and check more easily. Of course the price would be the less efficiency , but I think we could worry about this at the end of this first phase, when we will a more comprehensive picture of all the cases we will have to deal with.

So my suggestion would be to leave all these developments in a branch, continuing with the looping code and once we will have a base version of the code that we will be sure is able to correctly replicate the Fortran90 version then we could massively invest our energies in optimise the code and make it more efficient.

What do you think?

Hi @oceandie, sorry I was busy with other things yesterday - I'll try and catch up with this thread a little later. I would agree with you that to get the base code down is a priority. @malmans2 is introducing me to quite a few new aspects to python - which will take a while to permeate the 'old grey matter'. I'm not too sure it's necessary to keep these devs in a branch. The task at hand is to lay down the base code. In my mind once that is done, tested and approved it can be merged. We can open an 'efficiency' branch later. I was also thinking that Friday coffee/tea meet-ups are great way to review and discuss pythonic approaches... I think that way some of your/@malmans2 ideas/practices will slowly creep into the way I code too ;)

oceandie · 2021-06-08T07:58:17Z

I defenitely agree with you on everything @jdha. If you agree I will setup a recursive 30 min python-tea/coffee-chat for Fridays: what time do you generally prefer guys?

malmans2 · 2021-06-08T08:01:25Z

This PR is not really meant to get merged but just to showcase how the final version could look like (lemme move it to draft). Would have not been easy to do it with inline comments. We can introduce these kind of more aggressive changes once you have a test that makes sure we are getting consistent results, but I think it's better sooner rather than later. So 1. conversion from Fortran -> 2. test -> 3. refactor

The point I wanted to make is that your code is pretty much already written in a way that with a little change it would work with both for loops or multidimensional variables. The only thing you have to get rid of is basically the few times you use float(var), that's the only thing that assumes that you are using numbers and not ND variables (and also I don't think you need it most of the times with Python 3). I hardcoded kindx in sigma to keep it easy, but that can be still an input of sigma. The point is that whether that input is 0, 1, 2, ND, it doesn't matter with xarray. You'll end up with a 0, 1, 2, ND variable, then it's up to you (e.g., fill the variable with a for loop or broadcast to 3D).

malmans2 · 2021-06-08T08:05:47Z

Fridays work for me! My afternoons are usually clear...

jdha · 2021-06-08T08:19:31Z

Fridays work for me! My afternoons are usually clear...

me too - preferably between 14h00 and 15h00. I usually have to be out the door at 15h15 for the school run.

jdha · 2021-06-08T08:23:07Z

This PR is not really meant to get merged but just to showcase how the final version could look like (lemme move it to draft). Would have not been easy to do it with inline comments. We can introduce these kind of more aggressive changes once you have a test that makes sure we are getting consistent results, but I think it's better sooner rather than later. So 1. conversion from Fortran -> 2. test -> 3. refactor

The point I wanted to make is that your code is pretty much already written in a way that with a little change it would work with both for loops or multidimensional variables. The only thing you have to get rid of is basically the few times you use float(var), that's the only thing that assumes that you are using numbers and not ND variables (and also I don't think you need it most of the times with Python 3). I hardcoded kindx in sigma to keep it easy, but that can be still an input of sigma. The point is that whether that input is 0, 1, 2, ND, it doesn't matter with xarray. You'll end up with a 0, 1, 2, ND variable, then it's up to you (e.g., fill the variable with a for loop or broadcast to 3D).

.... oh, just realised that this is a second PR - sorry I had it in my head we were talking about this one. Happy to keep it as a PR as I try and digest the message ;)

oceandie · 2021-06-08T09:04:33Z

This PR is not really meant to get merged but just to showcase how the final version could look like (lemme move it to draft). Would have not been easy to do it with inline comments. We can introduce these kind of more aggressive changes once you have a test that makes sure we are getting consistent results, but I think it's better sooner rather than later. So 1. conversion from Fortran -> 2. test -> 3. refactor

The point I wanted to make is that your code is pretty much already written in a way that with a little change it would work with both for loops or multidimensional variables. The only thing you have to get rid of is basically the few times you use float(var), that's the only thing that assumes that you are using numbers and not ND variables (and also I don't think you need it most of the times with Python 3). I hardcoded kindx in sigma to keep it easy, but that can be still an input of sigma. The point is that whether that input is 0, 1, 2, ND, it doesn't matter with xarray. You'll end up with a 0, 1, 2, ND variable, then it's up to you (e.g., fill the variable with a for loop or broadcast to 3D).

I think I like what you are saying (not sure if I am getting correctly :) ): basically you are saying that all methods returning float could be converted to return Dataset and then doesn't matter if you call them to return just one scalar (so behaving effectively as a float(var)) or to return an N-dim array. In this way we could keep the loops for the moment but having a code which is almost ready for a more massive optimisation in the future? ... Am I getting right? ... because if it is like this, if you agree I think I will implement your suggestions for modifying sigma, compute_z3 etc from float to Dataarray ...

malmans2 · 2021-06-08T09:19:18Z

I think it might be even easier than that... Most of the time you wrap variables in float() I don't think you need it.
In Python 3:

>>> type(1/1)
<class 'float'>
>>> type(1//1)
<class 'int'>
>>> type(1/float(1))
<class 'float'>
>>> type(1/int(1))
<class 'float'>

As you can't do float(DataArray) or float(Dataset), if float(var) is not needed just replace with var

malmans2 added 5 commits June 7, 2021 18:47

remove for loops

73008a4

actually, better indexing than fillna I think

4857f37

fix coordinates and shapes

48aa1ba

restore indexing instead of where

fa6de68

restore comment

92ebee3

dim order consistent with Bathymetry

adc8fb1

jdha reviewed Jun 7, 2021

View reviewed changes

malmans2 marked this pull request as draft June 8, 2021 08:01

malmans2 closed this Jun 11, 2021

malmans2 deleted the no_loops branch June 13, 2021 11:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remove for loops #16

remove for loops #16

malmans2 commented Jun 7, 2021

codecov bot commented Jun 7, 2021 •

edited

jdha Jun 7, 2021

malmans2 Jun 7, 2021

jdha Jun 7, 2021 •

edited

oceandie commented Jun 8, 2021

jdha commented Jun 8, 2021

oceandie commented Jun 8, 2021 •

edited

malmans2 commented Jun 8, 2021 •

edited

malmans2 commented Jun 8, 2021

jdha commented Jun 8, 2021

jdha commented Jun 8, 2021

oceandie commented Jun 8, 2021

malmans2 commented Jun 8, 2021

remove for loops #16

remove for loops #16

Conversation

malmans2 commented Jun 7, 2021

codecov bot commented Jun 7, 2021 • edited

Codecov Report

jdha Jun 7, 2021

Choose a reason for hiding this comment

malmans2 Jun 7, 2021

Choose a reason for hiding this comment

jdha Jun 7, 2021 • edited

Choose a reason for hiding this comment

oceandie commented Jun 8, 2021

jdha commented Jun 8, 2021

oceandie commented Jun 8, 2021 • edited

malmans2 commented Jun 8, 2021 • edited

malmans2 commented Jun 8, 2021

jdha commented Jun 8, 2021

jdha commented Jun 8, 2021

oceandie commented Jun 8, 2021

malmans2 commented Jun 8, 2021

codecov bot commented Jun 7, 2021 •

edited

jdha Jun 7, 2021 •

edited

oceandie commented Jun 8, 2021 •

edited

malmans2 commented Jun 8, 2021 •

edited