Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove for loops #16

Closed
wants to merge 6 commits into from
Closed

remove for loops #16

wants to merge 6 commits into from

Conversation

malmans2
Copy link
Member

@malmans2 malmans2 commented Jun 7, 2021

@oceandie
There's a couple of things a think xarray can simplify:

  1. No need to initialize variables. You can write the code without even knowing the shape of your variables (e.g., 1D z, 2D sigma, ..) and xarray will automatically broadcast whenever it's needed.
  2. Because of that, you could get rid of a for loop you had in __call__

@codecov
Copy link

codecov bot commented Jun 7, 2021

Codecov Report

❗ No coverage uploaded for pull request base (zco_dev@ce734d8). Click here to learn what that means.
The diff coverage is n/a.

❗ Current head 92ebee3 differs from pull request most recent head adc8fb1. Consider uploading reports for the commit adc8fb1 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##             zco_dev      #16   +/-   ##
==========================================
  Coverage           ?   92.30%           
==========================================
  Files              ?        2           
  Lines              ?       39           
  Branches           ?        0           
==========================================
  Hits               ?       36           
  Misses             ?        3           
  Partials           ?        0           
Flag Coverage Δ
unittests 92.30% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ce734d8...adc8fb1. Read the comment docs.

range(1, self._jpk + 1), coords={"z": range(self._jpk)}, dims="z"
).astype(float)
kindx = DataArray(range(self._jpk), coords={"z": range(self._jpk)}, dims="z")
kindx = kindx + 1.0 # to deal with python convention
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be consistent with ln 60:
kindx += 1.0 # Fortran indexing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't do it because int += float is not allowed with DataArrays.
I don't like it either, so many options that I can't decide what would be the best to clarify what's going on:

kindx = DataArray(np.arange(1., self._jpk+1), coords={"z": range(self._jpk)}, dims="z")
# or
kindx = DataArray(range(1, self._jpk+1), coords={"z": range(self._jpk)}, dims="z").astype("float")
# or
kindx = DataArray(range(self._jpk), coords={"z": range(self._jpk)}, dims="z").astype("float")
kindx += 1.

Thoughts?

Copy link
Contributor

@jdha jdha Jun 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My vote is for: kindx = DataArray(np.arange(1., self._jpk+1), coords={"z": range(self._jpk)}, dims="z")

@oceandie
Copy link
Contributor

oceandie commented Jun 8, 2021

Hi @pyNEMO/pydomcfg guys,

That's very interesting, thanks a lot. Getting rid of loops and writing a more efficient pythonic code is surely our ultimate goal!
However I am not so sure the solution you are proposing here would work in general. Let me explain.

While in z-coordinates z3 is effectively only 1D, i.e. z3(k), in s- and MEs-coordinates z3 is completely 3D z3(x,y,z). In addition, in s- and MEs-coordinates the coordinate transformation will have some parameters that will vary with the 2D bathymetry (e.g., the transitioning from sigma to stretched s-coordinates in sh94 or sf12). I personally think that using loop structures will help me writing faster and a more cleaner code that I could debug and check more easily. Of course the price would be the less efficiency , but I think we could worry about this at the end of this first phase, when we will a more comprehensive picture of all the cases we will have to deal with.

So my suggestion would be to leave all these developments in a branch, continuing with the looping code and once we will have a base version of the code that we will be sure is able to correctly replicate the Fortran90 version then we could massively invest our energies in optimise the code and make it more efficient.

What do you think?

@jdha
Copy link
Contributor

jdha commented Jun 8, 2021

Hi @pyNEMO/pydomcfg guys,

That's very interesting, thanks a lot. Getting rid of loops and writing a more efficient pythonic code is surely our ultimate goal!
However I am not so sure the solution you are proposing here would work in general. Let me explain.

While in z-coordinates z3 is effectively only 1D, i.e. z3(k), in s- and MEs-coordinates z3 is completely 3D z3(x,y,z). In addition, in s- and MEs-coordinates the coordinate transformation will have some parameters that will vary with the 2D bathymetry (e.g., the transitioning from sigma to stretched s-coordinates in sh94 or sf12). I personally think that using loop structures will help me writing faster and a more cleaner code that I could debug and check more easily. Of course the price would be the less efficiency , but I think we could worry about this at the end of this first phase, when we will a more comprehensive picture of all the cases we will have to deal with.

So my suggestion would be to leave all these developments in a branch, continuing with the looping code and once we will have a base version of the code that we will be sure is able to correctly replicate the Fortran90 version then we could massively invest our energies in optimise the code and make it more efficient.

What do you think?

Hi @oceandie, sorry I was busy with other things yesterday - I'll try and catch up with this thread a little later. I would agree with you that to get the base code down is a priority. @malmans2 is introducing me to quite a few new aspects to python - which will take a while to permeate the 'old grey matter'. I'm not too sure it's necessary to keep these devs in a branch. The task at hand is to lay down the base code. In my mind once that is done, tested and approved it can be merged. We can open an 'efficiency' branch later. I was also thinking that Friday coffee/tea meet-ups are great way to review and discuss pythonic approaches... I think that way some of your/@malmans2 ideas/practices will slowly creep into the way I code too ;)

@oceandie
Copy link
Contributor

oceandie commented Jun 8, 2021

I defenitely agree with you on everything @jdha. If you agree I will setup a recursive 30 min python-tea/coffee-chat for Fridays: what time do you generally prefer guys?

@malmans2
Copy link
Member Author

malmans2 commented Jun 8, 2021

This PR is not really meant to get merged but just to showcase how the final version could look like (lemme move it to draft). Would have not been easy to do it with inline comments. We can introduce these kind of more aggressive changes once you have a test that makes sure we are getting consistent results, but I think it's better sooner rather than later. So 1. conversion from Fortran -> 2. test -> 3. refactor

The point I wanted to make is that your code is pretty much already written in a way that with a little change it would work with both for loops or multidimensional variables. The only thing you have to get rid of is basically the few times you use float(var), that's the only thing that assumes that you are using numbers and not ND variables (and also I don't think you need it most of the times with Python 3). I hardcoded kindx in sigma to keep it easy, but that can be still an input of sigma. The point is that whether that input is 0, 1, 2, ND, it doesn't matter with xarray. You'll end up with a 0, 1, 2, ND variable, then it's up to you (e.g., fill the variable with a for loop or broadcast to 3D).

@malmans2 malmans2 marked this pull request as draft June 8, 2021 08:01
@malmans2
Copy link
Member Author

malmans2 commented Jun 8, 2021

Fridays work for me! My afternoons are usually clear...

@jdha
Copy link
Contributor

jdha commented Jun 8, 2021

Fridays work for me! My afternoons are usually clear...

me too - preferably between 14h00 and 15h00. I usually have to be out the door at 15h15 for the school run.

@jdha
Copy link
Contributor

jdha commented Jun 8, 2021

This PR is not really meant to get merged but just to showcase how the final version could look like (lemme move it to draft). Would have not been easy to do it with inline comments. We can introduce these kind of more aggressive changes once you have a test that makes sure we are getting consistent results, but I think it's better sooner rather than later. So 1. conversion from Fortran -> 2. test -> 3. refactor

The point I wanted to make is that your code is pretty much already written in a way that with a little change it would work with both for loops or multidimensional variables. The only thing you have to get rid of is basically the few times you use float(var), that's the only thing that assumes that you are using numbers and not ND variables (and also I don't think you need it most of the times with Python 3). I hardcoded kindx in sigma to keep it easy, but that can be still an input of sigma. The point is that whether that input is 0, 1, 2, ND, it doesn't matter with xarray. You'll end up with a 0, 1, 2, ND variable, then it's up to you (e.g., fill the variable with a for loop or broadcast to 3D).

.... oh, just realised that this is a second PR - sorry I had it in my head we were talking about this one. Happy to keep it as a PR as I try and digest the message ;)

@oceandie
Copy link
Contributor

oceandie commented Jun 8, 2021

This PR is not really meant to get merged but just to showcase how the final version could look like (lemme move it to draft). Would have not been easy to do it with inline comments. We can introduce these kind of more aggressive changes once you have a test that makes sure we are getting consistent results, but I think it's better sooner rather than later. So 1. conversion from Fortran -> 2. test -> 3. refactor

The point I wanted to make is that your code is pretty much already written in a way that with a little change it would work with both for loops or multidimensional variables. The only thing you have to get rid of is basically the few times you use float(var), that's the only thing that assumes that you are using numbers and not ND variables (and also I don't think you need it most of the times with Python 3). I hardcoded kindx in sigma to keep it easy, but that can be still an input of sigma. The point is that whether that input is 0, 1, 2, ND, it doesn't matter with xarray. You'll end up with a 0, 1, 2, ND variable, then it's up to you (e.g., fill the variable with a for loop or broadcast to 3D).

I think I like what you are saying (not sure if I am getting correctly :) ): basically you are saying that all methods returning float could be converted to return Dataset and then doesn't matter if you call them to return just one scalar (so behaving effectively as a float(var)) or to return an N-dim array. In this way we could keep the loops for the moment but having a code which is almost ready for a more massive optimisation in the future? ... Am I getting right? ... because if it is like this, if you agree I think I will implement your suggestions for modifying sigma, compute_z3 etc from float to Dataarray ...

@malmans2
Copy link
Member Author

malmans2 commented Jun 8, 2021

I think it might be even easier than that... Most of the time you wrap variables in float() I don't think you need it.
In Python 3:

>>> type(1/1)
<class 'float'>
>>> type(1//1)
<class 'int'>
>>> type(1/float(1))
<class 'float'>
>>> type(1/int(1))
<class 'float'>

As you can't do float(DataArray) or float(Dataset), if float(var) is not needed just replace with var

@malmans2 malmans2 closed this Jun 11, 2021
@malmans2 malmans2 deleted the no_loops branch June 13, 2021 11:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants