Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Final tweaks to binDensity.py #19

Closed
9 of 10 tasks
durack1 opened this issue Sep 22, 2014 · 63 comments
Closed
9 of 10 tasks

Final tweaks to binDensity.py #19

durack1 opened this issue Sep 22, 2014 · 63 comments

Comments

@durack1
Copy link
Collaborator

durack1 commented Sep 22, 2014

So there are a few things to cleanup the code and get it running across all the historical simulations:

  • Fix mask issue [numpy 1.9.0/UV-CDAT2.0 issue to be checked Variable regridding returning NaN values CDAT/cdat#628]
  • Add more comments - explain code subsections and what they're doing [~DONE]
  • Chase down memory usage - can del(var) ; gc.collect() reduce memory footprint? [more to do]
  • Output annual mean binned fields (4D: time, rhon, lat, lon). Which grid is function of space available: either keep 4D data on both grids (> 3TB!) and 150 yrs annual mean on WOA grid (1TB) - save first 12 months of native grid vertical binned data for validating WOA interpolated output fields - Estimation for WOA grid 4D output (61 rhon levels) is 1TB (single precision) for 190 simulations and 4 variables (depth, thickness, T and S)
    ==> current version on my github outputs 4x4D monthly variables on WOA grid
  • Modify axis definition for proper interpolation [Eric did I write this, what does it mean?]
    ==> I think you said you needed this to make sure the regrid works fine for distorted grids (issue 4 resolved below) ?

Completed:

  • Write 4D dimension first (so all coords are written in top of file header) [DONE]
  • Add timeint argument so that debugging can run through a single timestep quickly (rather than ingesting large time chunks) [DONE]
  • Modify all output variables to float32 precision (currently are float64 - will halve file sizes) [DONE https://github.com/durack1/Density_bining/commit/8a41dd66dcee24eb7b58135c3466506651f5cd6b]
  • Rewrite cdm.createVariable calls to use correct axes descriptors (solving issue with non lon/lat grids) [DONE https://github.com/durack1/Density_bining/commit/10cc948e545b8aaaa0991201eeac063fc4d280d1#diff-af81de396d909c42d04f33bd9832d633]
  • Rename variables to be more descriptive (a->Atl, p->Pac etc) - collapse each basin zonal mean into a single variable [basin,time, ...] [DONE https://github.com/durack1/Density_bining/commit/0a016255197f440b162659f6d7c336d188165f32]
@durack1
Copy link
Collaborator Author

durack1 commented Sep 22, 2014

This mask issue is related to this difference between pre-function and post-function code:
screen shot 2014-09-22 at 09 39 28

@eguil
Copy link
Owner

eguil commented Sep 23, 2014

actually it comes from the first time step, i.e. K=1 in

plot/over/thick=1 ptopsigmap[K=1:20@ave]
plot/over/thick=1 ptopsigmap[K=131:150@ave]

As you can see, the red line is ok.

On 22/9/14 19:56, Paul J. Durack wrote:

This mask issue is related to this difference between pre-function and
post-function code:
screen shot 2014-09-22 at 09 39 28
https://cloud.githubusercontent.com/assets/3229632/4361144/d27697fc-4281-11e4-8207-34da0ba22ed7.jpeg

�
Reply to this email directly or view it on GitHub
#19 (comment).

Eric Guilyardi
IPSL/LOCEAN - Dir. Rech. CNRS
Tour 45, 4eme, piece 406
UPMC, case 100
4 place Jussieu, F-75252 Paris
Tel: +33 (0)1 44 27 70 76 Prof. Eric Guilyardi
NCAS Climate
Meteorology Department
University of Reading
Reading RG6 6BB - UK
Tel: +44 (0)118 378 8315

             http://ncas-climate.nerc.ac.uk/~ericg

@eguil
Copy link
Owner

eguil commented Sep 23, 2014

ok I had a first go at points 4 and 6 below. You can start from this
version (on my github and currently in test for IPSL) for the other points.

On 22/9/14 19:30, Paul J. Durack wrote:

So there are a few things to cleanup the code and get it running
across all the historical simulations:

  1. Fix mask issue
  2. Write 4D dimension first (so all coords are written in top of file
    header)
  3. Rename variables to be more descriptive (a->Atl, p->Pac etc)
  4. Add more comments throughout code to explain subsections and what
    they're doing
  5. Add timeint argument so that debugging can run through a single
    timestep quickly (rather than ingesting large time chunks)
  6. Chase down memory usage - can well placed |del(var) ; gc.collect()|
    statements reduce the memory footprint?

�
Reply to this email directly or view it on GitHub
#19.

Eric Guilyardi
IPSL/LOCEAN - Dir. Rech. CNRS
Tour 45, 4eme, piece 406
UPMC, case 100
4 place Jussieu, F-75252 Paris
Tel: +33 (0)1 44 27 70 76 Prof. Eric Guilyardi
NCAS Climate
Meteorology Department
University of Reading
Reading RG6 6BB - UK
Tel: +44 (0)118 378 8315

             http://ncas-climate.nerc.ac.uk/~ericg

@durack1
Copy link
Collaborator Author

durack1 commented Sep 23, 2014

Ok great, I've just pulled this across into my repo and will start where you finished up..

@durack1
Copy link
Collaborator Author

durack1 commented Sep 23, 2014

I'm guessing you've had trouble getting this to run as you redeclare the so/thetao variables as npy arrays, thereby breaking their cdms2 transient variable status and then leading to calls like x1Bin.long_name = thetao.long_name failing.. I'll try to put this all back in..

@durack1
Copy link
Collaborator Author

durack1 commented Sep 24, 2014

Is there a reason why arrays are preallocated with 'NaN' values rather than valmask?
https://github.com/eguil/Density_bining/blob/master/binDensity.py#L534-537
https://github.com/eguil/Density_bining/blob/master/binDensity.py#L760

We should really avoid using NaNs within code, cdms2 doesn't play nicely with them.

@durack1
Copy link
Collaborator Author

durack1 commented Sep 24, 2014

Ok so my latest durack1@8e6d581 should (I've just kicked off a run for testing) solve problems 2 and 5.

I'll still need your help with the mask issue (1) and will also have to get back to renaming the netcdf file variables (3), which will also probably lead me to addressing (4) as I learn what is going on. I've also replaced the newAttribute = so.longname with a new variable handle (not loaded, just referenced, so we can then pick off variable attributes, shapes etc, without loading the variable into memory).

https://github.com/durack1/Density_bining/blob/master/binDensity.py#L321-324
https://github.com/durack1/Density_bining/blob/master/binDensity.py#L959-992
https://github.com/durack1/Density_bining/blob/master/binDensity.py#L1055-1058

@durack1
Copy link
Collaborator Author

durack1 commented Sep 24, 2014

I was just thinking.. This mask issue could be due to different behaviours with NaNs from numpy 1.9.0 (mine) to 1.7.1 (yours).. And is a motivator to address some of the NaN declarations noted above

@durack1
Copy link
Collaborator Author

durack1 commented Sep 24, 2014

Ok so seems with durack1@9d654dc I'm now getting this error, which only occurs into the second timestep of the analysis:

Traceback (most recent call last):
  File ".//drive_IPSL.py", line 29, in <module>
    densityBin(modelThetao,modelSo,modelAreacello,outfileDensity)
  File "/export/durack1/git/Density_bining/binDensity.py", line 662, in densityBin
    depthBini[t,ks,:,:]         = regridObj(dy [t,ks,:,:])
UnboundLocalError: local variable 'depthBini' referenced before assignment

I've got a feeling that purging this is causing the issue:
https://github.com/durack1/Density_bining/blob/master/binDensity.py#L749

What else shouldn't be purged (or what code can we comment out)?

@eguil
Copy link
Owner

eguil commented Sep 24, 2014

yep, too violent a purge...
On 24/9/14 06:34, Paul J. Durack wrote:

Ok so seems with durack1/Density_bining@9d654dc
durack1@9d654dc
I'm now getting this error, which only occurs into the second timestep
of the analysis:

|Traceback (most recent call last):
File ".//drive_IPSL.py", line 29, in
densityBin(modelThetao,modelSo,modelAreacello,outfileDensity)
File "/export/durack1/git/Density_bining/binDensity.py", line 662, in densityBin
depthBini[t,ks,:,:] = regridObj(dy [t,ks,:,:])
UnboundLocalError: local variable 'depthBini' referenced before assignment
|

I've got a feeling that purging this is causing the issue:
https://github.com/durack1/Density_bining/blob/master/binDensity.py#L749

What else shouldn't be purged (or what code can we comment out)?

�
Reply to this email directly or view it on GitHub
#19 (comment).

Eric Guilyardi
IPSL/LOCEAN - Dir. Rech. CNRS
Tour 45, 4eme, piece 406
UPMC, case 100
4 place Jussieu, F-75252 Paris
Tel: +33 (0)1 44 27 70 76 Prof. Eric Guilyardi
NCAS Climate
Meteorology Department
University of Reading
Reading RG6 6BB - UK
Tel: +44 (0)118 378 8315

             http://ncas-climate.nerc.ac.uk/~ericg

@eguil
Copy link
Owner

eguil commented Sep 24, 2014

well, these arrays are used in a numpy interpolation below (npy.interp)
and we only want the interpolation to happen on the part of the profile
that is defined (i.e. not NaN, see loop below). If we can find a way to
tell numpy to ignore the valmask points then we can init the arrays with
valmask.

On 24/9/14 03:09, Paul J. Durack wrote:

Is there a reason why arrays are preallocated with 'NaN' values rather
than valmask?
https://github.com/eguil/Density_bining/blob/master/binDensity.py#L534-537

�
Reply to this email directly or view it on GitHub
#19 (comment).

Eric Guilyardi
IPSL/LOCEAN - Dir. Rech. CNRS
Tour 45, 4eme, piece 406
UPMC, case 100
4 place Jussieu, F-75252 Paris
Tel: +33 (0)1 44 27 70 76 Prof. Eric Guilyardi
NCAS Climate
Meteorology Department
University of Reading
Reading RG6 6BB - UK
Tel: +44 (0)118 378 8315

             http://ncas-climate.nerc.ac.uk/~ericg

@eguil
Copy link
Owner

eguil commented Sep 24, 2014

indeed ! because I cannot use the test mode, it takes a while to see
obvious bugs...

On 24/9/14 01:43, Paul J. Durack wrote:

I'm guessing you've had trouble getting this to run as you redeclare
the so/thetao variables as npy arrays, thereby breaking their cdms2
transient variable status and then leading to calls like
|x1Bin.long_name = thetao.long_name| failing.. I'll try to put this
all back in..

�
Reply to this email directly or view it on GitHub
#19 (comment).

Eric Guilyardi
IPSL/LOCEAN - Dir. Rech. CNRS
Tour 45, 4eme, piece 406
UPMC, case 100
4 place Jussieu, F-75252 Paris
Tel: +33 (0)1 44 27 70 76 Prof. Eric Guilyardi
NCAS Climate
Meteorology Department
University of Reading
Reading RG6 6BB - UK
Tel: +44 (0)118 378 8315

             http://ncas-climate.nerc.ac.uk/~ericg

@eguil
Copy link
Owner

eguil commented Sep 24, 2014

hum... why would this happen only for the first year ?
On 24/9/14 06:03, Paul J. Durack wrote:

I was just thinking.. This mask issue could be due to different
behaviours with NaNs from numpy 1.9.0 (mine) to 1.7.1 yours.. And is a
motivator to address some of the NaN declarations noted above

�
Reply to this email directly or view it on GitHub
#19 (comment).

Eric Guilyardi
IPSL/LOCEAN - Dir. Rech. CNRS
Tour 45, 4eme, piece 406
UPMC, case 100
4 place Jussieu, F-75252 Paris
Tel: +33 (0)1 44 27 70 76 Prof. Eric Guilyardi
NCAS Climate
Meteorology Department
University of Reading
Reading RG6 6BB - UK
Tel: +44 (0)118 378 8315

             http://ncas-climate.nerc.ac.uk/~ericg

@eguil
Copy link
Owner

eguil commented Sep 24, 2014

ok, my last commit (76dfac1) works.

I changed drive_IPSL.py on crunchy to have the timeint='1,12' flag and that speeds up testing

@eguil
Copy link
Owner

eguil commented Sep 25, 2014

IPSL run using last commit:
[This is still 8 GB more than before changes to independent proc.]
[ ratio of 15 is not great - can reach 6]

 thetao: units corrected
 CPU of density bining      = 267.8
 CPU of annual mean compute = 171.79
 CPU of interpolation       = 112.2
 CPU of zonal mean          = 171.66
 CPU of persistence compute = 930.28
 CPU of chunk               = 1654.09
 [ Time stamp 24/09/2014 16:17:43 ]
 Max memory use 25.881148 GB
 Ratio to grid*nyears 0.197351116473 kB/unit(size*nyears)
 CPU use, elapsed 24254.94 24350.408776
 Ratio to grid*nyears 15.4125681525 1.e-6 sec/unit(size*nyears)
 Wrote file:  test/cmip5.IPSL-CM5A-LR.historical.r1i1p1.an.ocn.Omon.density.ver-v20111119-compressed.nc

@eguil
Copy link
Owner

eguil commented Sep 25, 2014

just tested the compressed file done on crunchy and the mask issue for year 1 is gone. So it must indeed be due to the new bumpy. Let me know if replacing nan with valmask would work with npy.interp (but I still don't understand why it would have this specific behaviour...)

@durack1
Copy link
Collaborator Author

durack1 commented Sep 25, 2014

@eguil I have a feeling that it's the NaNs that are causing this issue.. Although it highlights a numpy1.9.0 vs numpy1.7.1 behaviour change that we need to get to the bottom of.

@eguil
Copy link
Owner

eguil commented Sep 29, 2014

ok - I have now a version without NaNs that looks ok (commit ff5781d). Can you please try it with your version of numpy ?

@eguil eguil closed this as completed Sep 29, 2014
@eguil eguil reopened this Sep 29, 2014
@eguil
Copy link
Owner

eguil commented Sep 29, 2014

Actually, here is the latest version to use: 81d17f8

@eguil
Copy link
Owner

eguil commented Sep 29, 2014

there was a bug in the thickness calculation. Now correct with commit 189a5ce

@durack1
Copy link
Collaborator Author

durack1 commented Sep 29, 2014

Ok so you seem to have things working with b655466 - should I pull this across and run again?

@durack1
Copy link
Collaborator Author

durack1 commented Oct 1, 2014

@eguil so I'm now starting to test across the suite, and have hit a problem with non-standard (lat/lon) grids:

Debug - File names:
thetao: /work/cmip5/historical/ocn/mo/thetao/cmip5.ACCESS1-0.historical.r1i1p1.mo.ocn.Omon.thetao.ver-1.latestX.xml
so    : /work/cmip5/historical/ocn/mo/so/cmip5.ACCESS1-0.historical.r1i1p1.mo.ocn.Omon.so.ver-1.latestX.xml
...
Traceback (most recent call last):
  File ".//drive_density.py", line 195, in <module>
    densityBin(model[3],model[1],model[5],outfileDensity)
  File "/export/durack1/git/Density_bining/binDensity.py", line 671, in densityBin
    dy   = cdm.createVariable(dy  , axes = [timeyr, s_axis, ingrid], id = 'isondy')
  File "/usr/local/uvcdat/2014-09-22/lib/python2.7/site-packages/cdms2/tvariable.py", line 826, in createVariable
    return TransientVariable(*args,**kargs)
  File "/usr/local/uvcdat/2014-09-22/lib/python2.7/site-packages/cdms2/tvariable.py", line 172, in __init__
    grid = grid.reconcile(axes) # Make sure grid and axes are consistent
  File "/usr/local/uvcdat/2014-09-22/lib/python2.7/site-packages/cdms2/hgrid.py", line 681, in reconcile
    if (item not in used) and len(selfaxes[i])==len(item) and allclose(selfaxes[i], item):
TypeError: object of type 'DatasetCurveGrid' has no len()

https://github.com/durack1/Density_bining/blob/master/binDensity.py#L671 I might have to engage with @doutriaux1 to determine a way to work around this issue..

@doutriaux1
Copy link
Contributor

@durack1 if ingrid is a grid even regular lat/lon it wouldn't work

I can't remember how to make a curv grid out of thin air, I'll look, but in the mean time if you're going to save to do:

axesList = original_var.getAxisList()

replace the depth axes

axesList[1] = newDepthAxes

and then

dy   = cdm.createVariable(dy  , axes = axesList, id = 'isondy')

then on dy don't forget to add the coordinates attribute on dy and save the "bound variables" into the file.

Then reading in the file back in you should be good to go, I will look for curvlinear calls as well and will post here when I find them.

@eguil
Copy link
Owner

eguil commented Oct 2, 2014

Hi Paul,

your message on grids is a bit cryptic ! Not sure what you want me to do
and why it would not work with regular lat/lon grids.
We can do a quick skype today (9 am for you?) if you want (send me a
text message if possible/when ready).

I was also thinking of writing out the annual mean 3D lat/lon/rhon on
the WOA grid as just looking at zonal means can be quickly restrictive.

On 2/10/14 02:20, Charles wrote:

@durack1 https://github.com/durack1 if ingrid is a grid even regular
lat/lon it wouldn't work

I can't remember how to make a curv grid out of thin air, I'll look,
but in the mean time if you're going to save to file:
if ewoulddo:

|axesList = original_var.getAxisList()
|

replace the depth axes

|axesList[1] = newDepthAxes
|

and then

| dy = cdm.createVariable(dy , axes = axesList, id = 'isondy')
|

then on dy don't forget to add the coordinates attribute on dy
ad to save the "bound variables" into the file.

Then reading in the file back in you should be good to go, I will look
for curvlinear calls as well and will post here when I find them.

�
Reply to this email directly or view it on GitHub
#19 (comment).

Eric Guilyardi
IPSL/LOCEAN - Dir. Rech. CNRS
Tour 45, 4eme, piece 406
UPMC, case 100
4 place Jussieu, F-75252 Paris
Tel: +33 (0)1 44 27 70 76 Prof. Eric Guilyardi
NCAS Climate
Meteorology Department
University of Reading
Reading RG6 6BB - UK
Tel: +44 (0)118 378 8315

             http://ncas-climate.nerc.ac.uk/~ericg

@durack1
Copy link
Collaborator Author

durack1 commented Oct 2, 2014

@eguil the issue here is that the cdm.createVariable syntax is not standard, and when you pass this a curvilinear grid it falls over.. Which is half the CMIP5 suite.. I did hit this issue earlier, but wanted to validate results with the IPSL-CM5A-LR model first before attempting to run across the archive..

I have some work to do to get things working across all the models, ACCESS1-0 was the first in the list to process, and this model is what led to the error above #19 (comment)

Skype at 9am sounds fine, I'll try from the office when I get in..

@durack1
Copy link
Collaborator Author

durack1 commented Oct 2, 2014

This is what I see for current files:

[durack1@oceanonly Density_bining]$ ncdump -h test/cmip5.IPSL-CM5A-LR.historical.r1i1p1.an.ocn.Omon.density.ver-v20111119-compressed.nc | grep '(time,'
        double bounds_time(time, bound) ;
        double persim(time, latitude, longitude) ; [ 40 MB per such field our of 220 MB for the file]
        double ptopdepth2(time, latitude, longitude) ;
        double ptoptemp2(time, latitude, longitude) ;
        double ptopsalt2(time, latitude, longitude) ;
        double isonpers(time, rhon, latitude) ;
        double isonpersa(time, rhon, latitude) ;
        double isonpersp(time, rhon, latitude) ;
        double isonpersi(time, rhon, latitude) ;
        double ptopdepth(time, latitude) ;
        double ptopsigma(time, latitude) ;
        double ptoptemp(time, latitude) ;
        double ptopsalt(time, latitude) ;
        double ptopdeptha(time, latitude) ;
        double ptopsigmaa(time, latitude) ;
        double ptoptempa(time, latitude) ;
        double ptopsalta(time, latitude) ;
        double ptopdepthp(time, latitude) ;
        double ptopsigmap(time, latitude) ;
        double ptoptempp(time, latitude) ;
        double ptopsaltp(time, latitude) ;
        double ptopdepthi(time, latitude) ;
        double ptopsigmai(time, latitude) ;
        double ptoptempi(time, latitude) ;
        double ptopsalti(time, latitude) ;
        double isondepth(time, rhon, latitude) ;
        double isonthick(time, rhon, latitude) ;
        double isonvol(time, rhon, latitude) ;
        double thetao(time, rhon, latitude) ;
        double so(time, rhon, latitude) ;
        double isondeptha(time, rhon, latitude) ;
        double isonthicka(time, rhon, latitude) ;
        double isonvola(time, rhon, latitude) ;
        double thetaoa(time, rhon, latitude) ;
        double soa(time, rhon, latitude) ;
        double isondepthp(time, rhon, latitude) ;
        double isonthickp(time, rhon, latitude) ;
        double isonvolp(time, rhon, latitude) ;
        double thetaop(time, rhon, latitude) ;
        double sop(time, rhon, latitude) ;
        double isondepthi(time, rhon, latitude) ;
        double isonthicki(time, rhon, latitude) ;
        double isonvoli(time, rhon, latitude) ;
        double thetaoi(time, rhon, latitude) ;
        double soi(time, rhon, latitude) ;

@durack1
Copy link
Collaborator Author

durack1 commented Oct 3, 2014

@eguil I got sidetracked today and made little progress on this.. I'll get back to it tomorrow..

@eguil
Copy link
Owner

eguil commented Oct 7, 2014

@durack1 - ok let me know when you made progress. We need to start to do
some science !

On 3/10/14 07:23, Paul J. Durack wrote:

@eguil https://github.com/eguil I got sidetracked today and made
little progress on this.. I'll get back to it tomorrow..

�
Reply to this email directly or view it on GitHub
#19 (comment).

Eric Guilyardi
IPSL/LOCEAN - Dir. Rech. CNRS
Tour 45, 4eme, piece 406
UPMC, case 100
4 place Jussieu, F-75252 Paris
Tel: +33 (0)1 44 27 70 76 Prof. Eric Guilyardi
NCAS Climate
Meteorology Department
University of Reading
Reading RG6 6BB - UK
Tel: +44 (0)118 378 8315

             http://ncas-climate.nerc.ac.uk/~ericg

@durack1
Copy link
Collaborator Author

durack1 commented Oct 21, 2014

@eguil what's the z_s variable and is it needed? https://github.com/durack1/Density_bining/blob/master/binDensity.py#L520 I've been trying to figure out what the indexing is (it's a weird combo of boolean and -1's).. I've just commented it out and seems to be running across the models now.. Will see how far it gets, and also what the outputs look like tomoz.. Looks like it's got to the C*'s..

@eguil
Copy link
Owner

eguil commented Oct 23, 2014

@durack1 z_s is the depth of the bin. The line you commented sets the max depth of each column to the highest density (index N_s). It is most of the time rewritten by the interpolation line below :
z_s [0:N_s,i] = npy.interp(s_s[:,i], szm[:,i], zzm[:,i], right = valmask) ; # consider spline
It is strange this is creating problems...

@durack1
Copy link
Collaborator Author

durack1 commented Oct 23, 2014

@eguil there are some weird cases where the input data trip over your code, so think I've worked out most of these quirks.. I've managed to get it running across all models (alphabetically) up to MIROC4h.. And have just tweaked things to deal with that model, so assuming it runs across the remaining inputs we should have the code working.. I'm now rewriting the outputs to clean them up, and then will kick it off for the whole historical archive..

@eguil
Copy link
Owner

eguil commented Oct 24, 2014

@durack1 ok let me know how it goes. Have you integrated my latest version of binDensity.py ?

@durack1
Copy link
Collaborator Author

durack1 commented Oct 27, 2014

@eguil I've hit a problem with the code that is going to need your intervention to sort out - I'm still not completely following how you're getting the result. My repo durack1@0a01625 should have a working version that is returning all masked values - I'm not sure what I've tangled up..

If you use drive_density:
[durack1@oceanonly Density_bining]$ drive_density.py cmip5 historical test it should work as advertised, and I've got the 5 models (which I had to fiddle code to get it working) only set to run for the 24 months, so get this test case working and I think we're off.. You may need to update your version of durolib - https://raw.githubusercontent.com/durack1/pylib/master/durolib.py - I've dropped in some more dob functions to let us know which version of code was used to write the outfiles..

I've also collapsed all the zonal means into a single variable (and done some renaming), however the data in the arrays is rubbish due to what I think is a mask propagation issue..?

If you can get this back working, it's good to go (I think!?!).

It could still be further optimized - converting all code to pure numpy and removing use of npy.ma when we can.. But I think a result is more important than optimization at this stage, so we can leave it as is.. The output files are not fully CF-compliant, but that again is another less important issue, as these files wont be redistributed..

Probably a good idea to skype about this early this week if you're able..

@eguil
Copy link
Owner

eguil commented Oct 27, 2014

@durack1 ok I'll have a look tomorrow. We can skype tomorrow (Tuesday)
at 5 or 6 pm for me 9 or 10am for you.
On 27/10/14 05:08, Paul J. Durack wrote:

@eguil https://github.com/eguil I've hit a problem with the code
that is going to need your intervention to sort out - I'm still not
completely following how you're getting the result. My repo
durack1/Density_bining@0a01625
durack1@0a01625
should have a working version that is returning all masked values -
I'm not sure what I've tangled up..

If you use drive_density:
|[durack1@oceanonly Density_bining]$ drive_density.py cmip5 historical
test| it should work as advertised, and I've got the 5 models which I
had to fiddle code to get it working.. You may need to update your
version of durolib -
https://raw.githubusercontent.com/durack1/pylib/master/durolib.py -
I've dropped in some more dob functions to let us know which version
of code was used to write the outfiles..

I've also collapsed all the zonal means into a single variable (and
done some renaming), however the data in the arrays is rubbish due to
what I think is a mask propagation issue..?

If you can get this back working, it's good to go (I think!?!).

It could still be further optimized - converting all code to pure
numpy and removing use of npy.ma when we can.. But I think a result is
more important than optimization at this stage, so we can leave it as
is.. The output files are not fully CF-compliant, but that again is
another less important issue, as these files wont be redistributed..

Probably a good idea to skype about this early this week if you're able..

�
Reply to this email directly or view it on GitHub
#19 (comment).

Eric Guilyardi
IPSL/LOCEAN - Dir. Rech. CNRS
Tour 45, 4eme, piece 406
UPMC, case 100
4 place Jussieu, F-75252 Paris
Tel: +33 (0)1 44 27 70 76 Prof. Eric Guilyardi
NCAS Climate
Meteorology Department
University of Reading
Reading RG6 6BB - UK
Tel: +44 (0)118 378 8315

             http://ncas-climate.nerc.ac.uk/~ericg

@durack1
Copy link
Collaborator Author

durack1 commented Oct 27, 2014

@eguil sounds good.. We're close.. I also checked out the CF-compliance of the output files (even though the data is rubbish at the moment) and it's not as bad as I thought, so once this masking issue is resolved should be good to start running across the archive and doing some science!

9am tomorrow sounds great..

The new output format looks like:

[durack1@oceanonly Density_bining]$ ncdump -h test/cmip5.IPSL-CM5A-LR.historical.r5i1p1.an.ocn.Omon.density.ver-v20111119.nc 
netcdf cmip5.IPSL-CM5A-LR.historical.r5i1p1.an.ocn.Omon.density.ver-v20111119 {                                              
dimensions:                                                                                                                  
        time = UNLIMITED ; // (2 currently)                                                                                  
        bound = 2 ;                                                                                                          
        latitude = 180 ;                                                                                                     
        longitude = 360 ;                                                                                                    
        basin = 4 ;                                                                                                          
        lev = 61 ;                                                                                                           
variables:                                                                                                                   
        double time(time) ;                                                                                                  
                time:bounds = "bounds_time" ;                                                                                
                time:units = "days since 1850-01-01 00:00:00" ;                                                              
                time:calendar = "gregorian" ;                                                                                
                time:axis = "T" ;                                                                                            
        double bounds_time(time, bound) ;                                                                                    
        float latitude(latitude) ;                                                                                           
                latitude:bounds = "bounds_latitude" ;                                                                        
                latitude:units = "" ;                                                                                        
                latitude:axis = "Y" ;                                                                                        
        double bounds_latitude(latitude, bound) ;                                                                            
        float longitude(longitude) ;                                                                                         
                longitude:bounds = "bounds_longitude" ;                                                                      
                longitude:units = "" ;                                                                                       
                longitude:axis = "X" ;                                                                                       
                longitude:modulo = 360. ;                                                                                    
                longitude:topology = "circular" ;                                                                            
        double bounds_longitude(longitude, bound) ;                                                                          
        float persistmxy(time, latitude, longitude) ;                                                                        
                persistmxy:long_name = "Fraction of persistence on isopycnal bins" ;                                         
                persistmxy:units = "% of column" ;                                                                           
                persistmxy:missing_value = 1.e+20f ;                                                                         
        float ptopdepthxy(time, latitude, longitude) ;                                                                       
                ptopdepthxy:long_name = "Depth of shallowest persistent ocean on ison" ;                                     
                ptopdepthxy:units = "m" ;                                                                                    
                ptopdepthxy:missing_value = 1.e+20f ;                                                                        
        float ptopthetaoxy(time, latitude, longitude) ;                                                                      
                ptopthetaoxy:long_name = "Temp. of shallowest persistent ocean on ison" ;                                    
                ptopthetaoxy:units = "degrees_C" ;                                                                           
                ptopthetaoxy:missing_value = 1.e+20f ;                                                                       
        float ptopsoxy(time, latitude, longitude) ;                                                                          
                ptopsoxy:long_name = "Salinity of shallowest persistent ocean on ison" ;                                     
                ptopsoxy:units = "psu" ;                                                                                     
                ptopsoxy:missing_value = 1.e+20f ;                                                                           
        int basin(basin) ;                                                                                                   
                basin:bounds = "bounds_basin" ;                                                                              
                basin:units_long = "0: global_ocean 1: atlantic_ocean; 2: pacific_ocean; 3: indian_ocean" ;
                basin:long_name = "ocean basin index" ;
                basin:standard_name = "basin" ;
                basin:units = "basin index" ;
                basin:axis = "B" ;
        double bounds_basin(basin, bound) ;
        double lev(lev) ;
                lev:bounds = "bounds_lev" ;
                lev:units_long = "kg m-3 (anomaly, minus 1000)" ;
                lev:positive = "down" ;
                lev:long_name = "ocean neutral density coordinate" ;
                lev:standard_name = "lev" ;
                lev:units = "kg m-3" ;
                lev:axis = "Z" ;
        double bounds_lev(lev, bound) ;
        float isonpers(time, basin, lev, latitude) ;
                isonpers:long_name = "zonal persistence of isopycnal bins" ;
                isonpers:units = "% of time" ;
                isonpers:missing_value = 1.e+20f ;
        float ptopdepth(time, basin, latitude) ;
                ptopdepth:long_name = "Zonal depth of shallowest persistent ocean on ison" ;
                ptopdepth:units = "m" ;
                ptopdepth:missing_value = 1.e+20f ;
        float ptopsigma(time, basin, latitude) ;
                ptopsigma:long_name = "Zonal rhon of shallowest persistent ocean on ison" ;
                ptopsigma:units = "sigma_n" ;
                ptopsigma:missing_value = 1.e+20f ;
        float ptopthetao(time, basin, latitude) ;
                ptopthetao:long_name = "Zonal Temp. of shallowest persistent ocean on ison" ;
                ptopthetao:units = "degrees_C" ;
                ptopthetao:missing_value = 1.e+20f ;
        float ptopso(time, basin, latitude) ;
                ptopso:long_name = "Zonal Salinity of shallowest persistent ocean on ison" ;
                ptopso:units = "psu" ;
                ptopso:missing_value = 1.e+20f ;
        float isondepth(time, basin, lev, latitude) ;
                isondepth:long_name = "Global zonal depth of isopycnal" ;
                isondepth:units = "m" ;
                isondepth:missing_value = 1.e+20f ;
        float isonthick(time, basin, lev, latitude) ;
                isonthick:long_name = "Global zonal thickness of isopycnal" ;
                isonthick:units = "m" ;
                isonthick:missing_value = 1.e+20f ;
        float isonvol(time, basin, lev, latitude) ;
                isonvol:long_name = "Volume of isopycnal" ;
                isonvol:units = "10.e12 m^3" ;
                isonvol:missing_value = 1.e+20f ;
        float isonthetao(time, basin, lev, latitude) ;
                isonthetao:long_name = "Sea Water Potential Temperature" ;
                isonthetao:units = "degrees_C" ;
                isonthetao:missing_value = 1.e+20f ;
        float isonso(time, basin, lev, latitude) ;
                isonso:long_name = "Sea Water Salinity" ;
                isonso:units = "psu" ;
                isonso:missing_value = 1.e+20f ;

// global attributes:
                :Conventions = "CF-1.0" ;
                :institution = "Program for Climate Model Diagnosis and Intercomparison (LLNL)" ;
                :data_contact = "Paul J. Durack; pauldurack@llnl.gov; +1 925 422 5208" ;
                :history = "File processed: 26-10-2014 23:05:51 PM UTC; San Francisco, CA, USA" ;
                :host = "oceanonly.llnl.gov; UVCDAT version: 2.0.0-3-g38214e5; Python version: 2.7.7 (default, Oct 15 2014, 10:21:57); [GCC 4.4.7 20120313 (Red Hat 4.4.7-11)]" ;
                :binDensity_version = "commit c44e26d892c017f996c4233e06afb8bcb25691fe Author: Paul J. Durack <durack1@llnl.gov> Date: Thu Oct 23 18:42:58 2014 -0700" ;
}

@eguil
Copy link
Owner

eguil commented Oct 27, 2014

@durack1 can you remind me how to pull your version accross to my repo
for a merge ?

On 27/10/14 19:44, Paul J. Durack wrote:

@eguil https://github.com/eguil sounds good.. We're close.. I also
checked out the CF-compliance of the output files (even though the
data is rubbish at the moment) and it's not as bad as I thought, so
once this masking issue is resolved should be good to start running
across the archive and doing some science!

9am tomorrow sounds great..

The new output format looks like:

|[durack1@oceanonly Density_bining]$ ncdump -h test/cmip5.IPSL-CM5A-LR.historical.r5i1p1.an.ocn.Omon.density.ver-v20111119.nc
netcdf cmip5.IPSL-CM5A-LR.historical.r5i1p1.an.ocn.Omon.density.ver-v20111119 {
dimensions:
time = UNLIMITED ; // (2 currently)
bound = 2 ;
latitude = 180 ;
longitude = 360 ;
basin = 4 ;
lev = 61 ;
variables:
double time(time) ;
time:bounds = "bounds_time" ;
time:units = "days since 1850-01-01 00:00:00" ;
time:calendar = "gregorian" ;
time:axis = "T" ;
double bounds_time(time, bound) ;
float latitude(latitude) ;
latitude:bounds = "bounds_latitude" ;
latitude:units = "" ;
latitude:axis = "Y" ;
double bounds_latitude(latitude, bound) ;
float longitude(longitude) ;
longitude:bounds = "bounds_longitude" ;
longitude:units = "" ;
longitude:axis = "X" ;
longitude:modulo = 360. ;
longitude:topology = "circular" ;
double bounds_longitude(longitude, bound) ;
float persistmxy(time, latitude, longitude) ;
persistmxy:long_name = "Fraction of persistence on isopycnal bins" ;
persistmxy:units = "% of column" ;
persistmxy:missing_value = 1.e+20f ;
float ptopdepthxy(time, latitude, longitude) ;
ptopdepthxy:long_name = "Depth of shallowest persistent ocean on ison" ;
ptopdepthxy:units = "m" ;
ptopdepthxy:missing_value = 1.e+20f ;
float ptopthetaoxy(time, latitude, longitude) ;
ptopthetaoxy:long_name = "Temp. of shallowest persistent ocean on ison" ;
ptopthetaoxy:units = "degrees_C" ;
ptopthetaoxy:missing_value = 1.e+20f ;
float ptopsoxy(time, latitude, longitude) ;
ptopsoxy:long_name = "Salinity of shallowest persistent ocean on ison" ;
ptopsoxy:units = "psu" ;
ptopsoxy:missing_value = 1.e+20f ;
int basin(basin) ;
basin:bounds = "bounds_basin" ;
basin:units_long = "0: global_ocean 1: atlantic_ocean; 2: pacific_ocean; 3: indian_ocean" ;
basin:long_name = "ocean basin index" ;
basin:standard_name = "basin" ;
basin:units = "basin index" ;
basin:axis = "B" ;
double bounds_basin(basin, bound) ;
double lev(lev) ;
lev:bounds = "bounds_lev" ;
lev:units_long = "kg m-3 (anomaly, minus 1000)" ;
lev:positive = "down" ;
lev:long_name = "ocean neutral density coordinate" ;
lev:standard_name = "lev" ;
lev:units = "kg m-3" ;
lev:axis = "Z" ;
double bounds_lev(lev, bound) ;
float isonpers(time, basin, lev, latitude) ;
isonpers:long_name = "zonal persistence of isopycnal bins" ;
isonpers:units = "% of time" ;
isonpers:missing_value = 1.e+20f ;
float ptopdepth(time, basin, latitude) ;
ptopdepth:long_name = "Zonal depth of shallowest persistent ocean on ison" ;
ptopdepth:units = "m" ;
ptopdepth:missing_value = 1.e+20f ;
float ptopsigma(time, basin, latitude) ;
ptopsigma:long_name = "Zonal rhon of shallowest persistent ocean on ison" ;
ptopsigma:units = "sigma_n" ;
ptopsigma:missing_value = 1.e+20f ;
float ptopthetao(time, basin, latitude) ;
ptopthetao:long_name = "Zonal Temp. of shallowest persistent ocean on ison" ;
ptopthetao:units = "degrees_C" ;
ptopthetao:missing_value = 1.e+20f ;
float ptopso(time, basin, latitude) ;
ptopso:long_name = "Zonal Salinity of shallowest persistent ocean on ison" ;
ptopso:units = "psu" ;
ptopso:missing_value = 1.e+20f ;
float isondepth(time, basin, lev, latitude) ;
isondepth:long_name = "Global zonal depth of isopycnal" ;
isondepth:units = "m" ;
isondepth:missing_value = 1.e+20f ;
float isonthick(time, basin, lev, latitude) ;
isonthick:long_name = "Global zonal thickness of isopycnal" ;
isonthick:units = "m" ;
isonthick:missing_value = 1.e+20f ;
float isonvol(time, basin, lev, latitude) ;
isonvol:long_name = "Volume of isopycnal" ;
isonvol:units = "10.e12 m^3" ;
isonvol:missing_value = 1.e+20f ;
float isonthetao(time, basin, lev, latitude) ;
isonthetao:long_name = "Sea Water Potential Temperature" ;
isonthetao:units = "degrees_C" ;
isonthetao:missing_value = 1.e+20f ;
float isonso(time, basin, lev, latitude) ;
isonso:long_name = "Sea Water Salinity" ;
isonso:units = "psu" ;
isonso:missing_value = 1.e+20f ;

// global attributes:
:Conventions = "CF-1.0" ;
:institution = "Program for Climate Model Diagnosis and Intercomparison (LLNL)" ;
:data_contact = "Paul J. Durack; pauldurack@llnl.gov; +1 925 422 5208" ;
:history = "File processed: 26-10-2014 23:05:51 PM UTC; San Francisco, CA, USA" ;
:host = "oceanonly.llnl.gov; UVCDAT version: 2.0.0-3-g38214e5; Python version: 2.7.7 (default, Oct 15 2014, 10:21:57); [GCC 4.4.7 20120313 (Red Hat 4.4.7-11)]" ;
:binDensity_version = "commit c44e26d Author: Paul J. Durack durack1@llnl.gov Date: Thu Oct 23 18:42:58 2014 -0700" ;
}
|

�
Reply to this email directly or view it on GitHub
#19 (comment).

Eric Guilyardi
IPSL/LOCEAN - Dir. Rech. CNRS
Tour 45, 4eme, piece 406
UPMC, case 100
4 place Jussieu, F-75252 Paris
Tel: +33 (0)1 44 27 70 76 Prof. Eric Guilyardi
NCAS Climate
Meteorology Department
University of Reading
Reading RG6 6BB - UK
Tel: +44 (0)118 378 8315

             http://ncas-climate.nerc.ac.uk/~ericg

@durack1
Copy link
Collaborator Author

durack1 commented Oct 27, 2014

@eguil I can issue a pull request.. But that will wipe out your changes since my last pull from your repo..

You could try:
git pull https://github.com/durack1/Density_bining master

@durack1
Copy link
Collaborator Author

durack1 commented Oct 27, 2014

@eguil October 1st was my last merge/pull from your repo: https://github.com/durack1/Density_bining/commits/master?page=2

@eguil
Copy link
Owner

eguil commented Oct 27, 2014

@durack1 but how can I merge the changes made from Oct 1 by you and by me ?
On 27/10/14 23:28, Paul J. Durack wrote:

@eguil https://github.com/eguil I can issue a pull request.. But
that will wipe out your changes since my last pull from your repo..

You could try:
|git pull https://github.com/durack1/Density_bining master|

�
Reply to this email directly or view it on GitHub
#19 (comment).

Eric Guilyardi
IPSL/LOCEAN - Dir. Rech. CNRS
Tour 45, 4eme, piece 406
UPMC, case 100
4 place Jussieu, F-75252 Paris
Tel: +33 (0)1 44 27 70 76 Prof. Eric Guilyardi
NCAS Climate
Meteorology Department
University of Reading
Reading RG6 6BB - UK
Tel: +44 (0)118 378 8315

             http://ncas-climate.nerc.ac.uk/~ericg

@durack1
Copy link
Collaborator Author

durack1 commented Oct 27, 2014

It'll require some intervention.. Here's what git recommends:

Step 1: From your project repository, check out a new branch and test the changes.

git checkout -b durack1-master master
git pull git://github.com/durack1/Density_bining.git master

Step 2: Merge the changes and update on GitHub.

git checkout master
git merge --no-ff durack1-master
git push origin master

@durack1
Copy link
Collaborator Author

durack1 commented Oct 29, 2014

@eguil if your 5 model test was still running you've probably found that it failed.. I've managed to make some headway toward getting this running, but am still struggling with it.. If you have another go pull my version across.. I've commented out your new variables as they're particularly problematic..

If you're able to get it running across the 5 test models then let me know.. If you achieve this I think we then have a functioning prototype and I'll kick this off on the whole archive..

@eguil
Copy link
Owner

eguil commented Oct 29, 2014

@durack1 I am making progress. ACCESS and BNU are ok (although for some
reason the basin zonal means for ptopthetao/so/sigma are all equal to
the global mean, but ok for ptopdepth - I am investigating). I propose
to drop EC-EARTH as the so and thetao have no valmask value (only 0!).
The other option is the change the files themselves to add back the
right valmask or to create a proc that would rebuild it from the zero
salinity values - I suggest we keep this one for later.
On 29/10/14 05:11, Paul J. Durack wrote:

@eguil https://github.com/eguil if your 5 model test was still
running you've probably found that it failed.. I've managed to make
some headway toward getting this running, but am still struggling with
it.. If you have another go pull my version across.. I've commented
out your new variables as they're particularly problematic..

If you're able to get it running across the 5 test models then let me
know.. If you achieve this I think we then have a functioning
prototype and I'll kick this off on the whole archive..

�
Reply to this email directly or view it on GitHub
#19 (comment).

Eric Guilyardi
IPSL/LOCEAN - Dir. Rech. CNRS
Tour 45, 4eme, piece 406
UPMC, case 100
4 place Jussieu, F-75252 Paris
Tel: +33 (0)1 44 27 70 76 Prof. Eric Guilyardi
NCAS Climate
Meteorology Department
University of Reading
Reading RG6 6BB - UK
Tel: +44 (0)118 378 8315

             http://ncas-climate.nerc.ac.uk/~ericg

@durack1
Copy link
Collaborator Author

durack1 commented Oct 29, 2014

@eguil I have code which should sort out EC-EARTH - it's at https://github.com/durack1/Density_bining/blob/master/binDensity.py#L515-520

@eguil
Copy link
Owner

eguil commented Oct 29, 2014

@durack1 - ok I will try
On 29/10/14 14:24, Paul J. Durack wrote:

@eguil https://github.com/eguil I have code which /should/ sort out
EC-EARTH - it's at
https://github.com/durack1/Density_bining/blob/master/binDensity.py#L515-520

�
Reply to this email directly or view it on GitHub
#19 (comment).

Eric Guilyardi
IPSL/LOCEAN - Dir. Rech. CNRS
Tour 45, 4eme, piece 406
UPMC, case 100
4 place Jussieu, F-75252 Paris
Tel: +33 (0)1 44 27 70 76 Prof. Eric Guilyardi
NCAS Climate
Meteorology Department
University of Reading
Reading RG6 6BB - UK
Tel: +44 (0)118 378 8315

             http://ncas-climate.nerc.ac.uk/~ericg

@eguil
Copy link
Owner

eguil commented Oct 29, 2014

@durack1 the bug on zonal means all equal to global only happens to the
first year. The others years are ok. This is due to the new numpy and we
still have to understand what is going on. Probably some init somewhere
that needs the first year to happen...

On 29/10/14 14:24, Paul J. Durack wrote:

@eguil https://github.com/eguil I have code which /should/ sort out
EC-EARTH - it's at
https://github.com/durack1/Density_bining/blob/master/binDensity.py#L515-520

�
Reply to this email directly or view it on GitHub
#19 (comment).

Eric Guilyardi
IPSL/LOCEAN - Dir. Rech. CNRS
Tour 45, 4eme, piece 406
UPMC, case 100
4 place Jussieu, F-75252 Paris
Tel: +33 (0)1 44 27 70 76 Prof. Eric Guilyardi
NCAS Climate
Meteorology Department
University of Reading
Reading RG6 6BB - UK
Tel: +44 (0)118 378 8315

             http://ncas-climate.nerc.ac.uk/~ericg

@durack1
Copy link
Collaborator Author

durack1 commented Oct 30, 2014

@eguil I get the impression from continuing the harvest through the code today that numpy indexing is an unknown entity.. This link should provide some insight:
http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
And this link too:
http://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html#item-selection-and-manipulation
The take home to me is:

>>> import numpy as np
>>> np.array([[1,2,3],[4,5,6],[7,8,9]])
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])
>>> a = np.array([[1,2,3],[4,5,6],[7,8,9]])
>>> b = a.flat
>>> for count,x in enumerate(b):
...  print count,x
... 
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
>>>

@eguil
Copy link
Owner

eguil commented Oct 30, 2014

@durack1 can we do a skype at 9am/5pm today ? I am confused by why
EC-EARTH still fails even though we now have modified thetao and so to
have a mask...
On 30/10/14 02:24, Paul J. Durack wrote:

@eguil https://github.com/eguil I get the impression from continuing
the harvest through the code today that numpy indexing is an unknown
entity.. This link should provide some insight:
http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html

The take home to me is:

|>>> import numpy as np

np.array([[1,2,3],[4,5,6],[7,8,9]])
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
a = np.array([[1,2,3],[4,5,6],[7,8,9]])
b= a.flat
for count,x in enumerate(b):
... print count,x
...
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9

|
|

�
Reply to this email directly or view it on GitHub
#19 (comment).

Eric Guilyardi
IPSL/LOCEAN - Dir. Rech. CNRS
Tour 45, 4eme, piece 406
UPMC, case 100
4 place Jussieu, F-75252 Paris
Tel: +33 (0)1 44 27 70 76 Prof. Eric Guilyardi
NCAS Climate
Meteorology Department
University of Reading
Reading RG6 6BB - UK
Tel: +44 (0)118 378 8315

             http://ncas-climate.nerc.ac.uk/~ericg

@durack1
Copy link
Collaborator Author

durack1 commented Oct 30, 2014

Ok so example:

>>> a = cdm.createVariable([1,2,3,4,5],id='a',mask=[True,False,True,False,True])
>>> a
a
masked_array(data = [-- 2 -- 4 --],
             mask = [ True False  True False  True],
       fill_value = -9223372036854775808)
>>> a.data
array([1, 2, 3, 4, 5])
>>> a.filled(0)
array([0, 2, 0, 4, 0])

@durack1
Copy link
Collaborator Author

durack1 commented Oct 30, 2014

@eguil ok so try this, it works for me on the single EC-EARTH test case, will run again over 5 models:

        # Check for missing_value/mask
        print 'valmask',valmask
        if ( 'missing_value' not in thetao.attributes.keys() and modeln == 'EC-EARTH' ) \
           or (modeln == 'MIROC4h' ):
            print 'trigger mask fix - EC-EARTH/MIROC4h'
            #maskvalt = so.data[0,0,0,0]
            #maskvalt = 0.0000000000000000000000000000001
            #print 'maskvalt:',maskvalt
            #thetao = mv.masked_equal(thetao,maskvalt) ; # Convert all 0. values to masked
            #thetao.filled(valmask) ; # Convert all masked values to valmask
            #so = mv.masked_equal(so,maskvalt)
            #so.filled(valmask)
            #testval = 0. ; # to define masked value as above did not replace thetato.data by valmask
            # Use so to mask
            #so = mv.masked_less_equal(so,maskvalt)
            #if 0. < maskvalt:
            #    print 'test = True'
            so = mv.masked_equal(so,0.)
            print so.count()
            so.data[:] = so.filled(valmask)
            thetao.mask = so.mask
            thetao.data[:] = thetao.filled(valmask)
        # Define rho output axis
        rhoAxesList[0]  = time ; # replace time axis

@durack1
Copy link
Collaborator Author

durack1 commented Oct 30, 2014

@eguil we need to be careful with this masking stuff, as in the test case it works, but as you're indexing the data directly so.data[t] and bypassing the mask, we're getting computations that are no longer mask aware and consequently the missing values 1.e20 are being included in calculations:

x1_content (42, 105704)                                                                                                                                
 x1_content.mean 5.01751e+19                                                                                                                           
 x1_content.min -2.02946                                                                                                                               
 x1_content.max 1e+20                                                                                                                                  
 x2_content.mean 5.01751e+19                                                                                                                           
 x2_content.min 6.80319                                                                                                                                
 x2_content.max 1e+20                                                                                                                                  
vmask_3D: (42, 105704)

@durack1
Copy link
Collaborator Author

durack1 commented Oct 30, 2014

@eguil I've just updated the block two comments up #19 (comment) this also required a gotcha check for MIROC4h as it also has 0/incorrectly specified missing data.. This now works as advertised for the 5 model test case, and appears to return reasonable values..

If you can pull that across and check your versions, once you've then given me the all clear I'll pull back your updated (confirmed and committed) code from your repo and start processing over the archive..

@durack1
Copy link
Collaborator Author

durack1 commented Nov 3, 2014

@eguil strike that I need to re-sync to your master.. My code was missing the MIROC4h mask fudge..

@eguil
Copy link
Owner

eguil commented Nov 3, 2014

@durack1 nope. it was a memory error further down the code (when trying
to do annual mean). When it breaks down is a bit random (classical for a
memory problem) Here is what I got from one attempt to 1,24:

CPU of density bining      = 4022.43
CPU of annual mean compute = 298.26
CPU of interpolation       = 11.45
CPU of zonal mean          = 2.66
CPU of persistence compute = 1428.81
CPU of chunk               = 5763.74

[ Time stamp 01/11/2014 14:40:29 ]
Max memory use 97.142024 GB
Ratio to grid_nyears 1.73364871733 kB/unit(size_nyears)
CPU use, elapsed 5802.74 6252.32117391
Ratio to grid_nyears 8.62990113494 1.e-6 sec/unit(size_nyears)
Traceback (most recent call last):
File "drive_density.py", line 224, in
densityBin(model[3],model[1],model[5],outfileDensity,debug=True,timeint='1,12')
File "/work/guilyardi/Density_bining/binDensity.py", line 1348, in
densityBin
outFile_f.binDensity_version = '
'.join(getGitInfo(eosNeutralPath)[0:3])
File "/work/guilyardi/Density_bining/durolib.py", line 271, in getGitInfo
p =
subprocess.Popen(['git','log','-n1','--',filePath],stdout=subprocess.PIPE,stderr=subprocess.PIPE,cwd='/'.join(filePath.split('/')[0:-1]))
File "/usr/local/uvcdat/latest/lib/python2.7/subprocess.py", line
710, in init
errread, errwrite)
File "/usr/local/uvcdat/latest/lib/python2.7/subprocess.py", line
1223, in _execute_child
self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory

On 3/11/14 07:08, Paul J. Durack wrote:

@eguil https://github.com/eguil is this your MIROC4h error?

|FileCount: 4
outPath: test
outfile: cmip5.MIROC4h.historical.r1i1p1.mo.ocn.Omon.density.ver-1.nc
so: cmip5.MIROC4h.historical.r1i1p1.mo.ocn.Omon.so.ver-1.latestX.xml
thetao: cmip5.MIROC4h.historical.r1i1p1.mo.ocn.Omon.thetao.ver-1.latestX.xml
areacello: cmip5.MIROC4h.rcp45.r0i0p0.fx.ocn.fx.areacello.ver-1.latestX.xml
valmask = 1e+20
grdsize: 56033280
==> model: MIROC4h (grid size: 56033280 )
==> time interval: 0 23
==> size of time chunk, number of time chunks (memory optimization) : 12 2
--> time chunk (bounds) = 0 / 2 ( 0 11 ) MIROC4h
valmask 1e+20
thetao : [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.

                    1. 0.]
                      so : [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
                    1. 0.]
                      (12, 48, 1167360)
                      thetao : [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
                    1. 0.]
                      so : [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
                    1. 0.]
                      1
                      x1_content (48, 1167360)
                      x1_content.mean 146.755
                      x1_content.min 0.0
                      x1_content.max 308.896
                      x2_content.mean 18.0481
                      x2_content.min 0.0
                      x2_content.max 44.0615
                      vmask_3D ()
                      Traceback (most recent call last):
                      File ".//drive_density.py", line 224, in
                      densityBin(model[3],model[1],model[5],outfileDensity,debug=True,timeint='1,24')
                      File "/export/durack1/git/Density_bining/binDensity.py", line 586, in densityBin
                      nomask = npy.equal(vmask_3D[0],0) ; # Returns boolean
                      IndexError: invalid index to scalar variable.
                      [durack1@oceanonly Density_bining]$
                      |

�
Reply to this email directly or view it on GitHub
#19 (comment).

Eric Guilyardi
IPSL/LOCEAN - Dir. Rech. CNRS
Tour 45, 4eme, piece 406
UPMC, case 100
4 place Jussieu, F-75252 Paris
Tel: +33 (0)1 44 27 70 76 Prof. Eric Guilyardi
NCAS Climate
Meteorology Department
University of Reading
Reading RG6 6BB - UK
Tel: +44 (0)118 378 8315

             http://ncas-climate.nerc.ac.uk/~ericg

@durack1
Copy link
Collaborator Author

durack1 commented Nov 24, 2014

@eguil it seems that MIROC4h has caused another memory error on crunchy and has caused the job to bomb.. I'll figure out which run fell over and start from a point which excludes MIROC4h, I'll do this tomorrow..

@durack1
Copy link
Collaborator Author

durack1 commented Nov 24, 2014

@eguil ok, so running again from MIROC5 onward, we should hopefully have the "full" archive complete by the end of the next week..

@eguil
Copy link
Owner

eguil commented Nov 24, 2014

@durack1 ok great !

On 24/11/14 20:05, Paul J. Durack wrote:

@eguil https://github.com/eguil ok, so running again from MIROC5
onward, we should hopefully have the "full" archive complete by the
end of the next week..


Reply to this email directly or view it on GitHub
#19 (comment).

Eric Guilyardi
IPSL/LOCEAN - Dir. Rech. CNRS
Tour 45, 4eme, piece 406
UPMC, case 100
4 place Jussieu, F-75252 Paris
Tel: +33 (0)1 44 27 70 76 Prof. Eric Guilyardi
NCAS Climate
Meteorology Department
University of Reading
Reading RG6 6BB - UK
Tel: +44 (0)118 378 8315

             http://ncas-climate.nerc.ac.uk/~ericg

@durack1
Copy link
Collaborator Author

durack1 commented Nov 24, 2014

@eguil FYI, it seems that we still have mask issues for MIROC5 as per below:

FileCount:  0
outPath:    /work/durack1/Shared/141103_Density
outfile:    cmip5.MIROC5.historical.r5i1p1.mo.ocn.Omon.density.ver-1.nc
so:         cmip5.MIROC5.historical.r5i1p1.mo.ocn.Omon.so.ver-1.latestX.xml
thetao:     cmip5.MIROC5.historical.r5i1p1.mo.ocn.Omon.thetao.ver-1.latestX.xml
areacello:  cmip5.MIROC5.rcp85.r0i0p0.fx.ocn.fx.areacello.ver-1.latestX.xml
grdsize: 2867200
 ==> model: MIROC5  (grid size: 2867200 )
 ==> time interval:  0 1955
 ==> size of time chunk, number of time chunks (memory optimization) : 12 163
/usr/local/uvcdat/2014-10-27/lib/python2.7/site-packages/ESMP/src/ESMP_API.py:1241: FutureWarning: comparison to `None` will result in an elementwise object comparison in the future.
  if (srcMaskValues != None):
/usr/local/uvcdat/2014-10-27/lib/python2.7/site-packages/ESMP/src/ESMP_API.py:1248: FutureWarning: comparison to `None` will result in an elementwise object comparison in the future.
  if (dstMaskValues != None):
 --> time chunk (bounds) =  0 / 163  ( 0 11 ) MIROC5
Traceback (most recent call last):
  File "/export/durack1/git/Density_bining/drive_density.py", line 231, in <module>
    densityBin(model[3],model[1],model[5],outfileDensity)
  File "/export/durack1/git/Density_bining/binDensity.py", line 565, in densityBin
    nomask      = npy.equal(vmask_3D[0],0) ; # Returns boolean
IndexError: invalid index to scalar variable.

We'll need to sort this mask issue out during the rewrite of the function.. I've kicked things off after skipping all the MIROC5 simulations, so from cmip5.MPI-ESM-LR.historical.r1i1p1.mo.ocn.Omon.density.ver-1.nc to the end of the alphabet

@durack1 durack1 closed this as completed Jul 26, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants