Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue Running with Sflux generated from HAFS #121

Closed
SorooshMani-NOAA opened this issue Jan 9, 2024 · 10 comments
Closed

Issue Running with Sflux generated from HAFS #121

SorooshMani-NOAA opened this issue Jan 9, 2024 · 10 comments

Comments

@SorooshMani-NOAA
Copy link

I'm running into this fatal error when trying to use sflux I generated from HAFS outputs. This is how my sflux header looks like:

Dimensions:  (ny_grid: 1361, nx_grid: 1826, time: 5)
Coordinates:
  * time     (time) datetime64[ns] 2023-08-17 2023-08-17T06:00:00 ... 2023-08-18
Dimensions without coordinates: ny_grid, nx_grid
Data variables:
    lon      (ny_grid, nx_grid) float32 ...
    lat      (ny_grid, nx_grid) float32 ...
    prmsl    (time, ny_grid, nx_grid) float32 ...
    spfh     (time, ny_grid, nx_grid) float32 ...
    stmp     (time, ny_grid, nx_grid) float32 ...
    uwind    (time, ny_grid, nx_grid) float32 ...
    vwind    (time, ny_grid, nx_grid) float32 ...
Attributes:
    Conventions:  CF-1.0

And this is the error:

 23: ABORT:  QUICKSEARCH: Cannot find a vert. level:  -249.56895039521086        6.9526567718092286E-310   6.9526567718102167E-310  -258.64307520892299                             NaN

This is my JCG:

********CG Solve at timestep        1
Itn, 2Norm, Rnorm:      0           NaN           NaN
...

Itn, 2Norm, Rnorm:   1400           NaN           NaN
Itn, 2Norm, Rnorm:   1450           NaN           NaN
Itn, 2Norm, Rnorm:   1500           NaN           NaN
 JCG did not converge in         1500  iterations

and this is from my mirror file:

 netCDF dataset and existence: sflux_air_1 T

 getting additional info for: sflux_air_1
 num_files =            1
 num_times =            5
 first time =    2460174.0000000000
 last  time =    2460175.0000000000
 Longitude range in hgrid.ll=  -97.897365530000002       -59.999332850000002
 Longitude range in sflux =   245.25000000000000        354.75000000000000

 netCDF dataset and existence: sflux_air_2 F

Done initializing time history...
 done computing initial vgrid...
 done computing initial nodal vel...
 done computing initial density...
 time stepping begins...           1         576
 done adjusting wind stress ...
 done flow b.c.
 done hvis...
 done backtracking
 done 1st preparation
 done 2nd preparation
 done solver; etatot=                       NaN ; average |eta|=                       NaN
 done solving momentum eq...
 done solving w
 done solving transport equation
 done recomputing levels...
 done density and flux calculation...
TIME STEP=            1;  TIME=           150.000000
 done adjusting wind stress ...
 done flow b.c.
 done hvis...

I'm trying a minimal tidal + atmospheric setup with only sflux1 air files. I tried running this setup with tidal only (by setting nws=0) and it worked fine (no errors), but with nws=2 I got this error. I am also trying with ipre=1 and am still waiting to see what happens (30 min so far). For the run with sflux I tried both with and without ramp. This is my param.nml:

&CORE
  ipre=1
  ibc=1
  ibtp=0
  rnday=1.0
  dt=150.0
  nspool=24
  ihfskip=576
/

&OPT
  start_year=2023
  start_month=8
  start_day=17
  start_hour=0.0
  utc_start=-0.0
  ics=2
  ihot=0
  dramp=1.0
  dramp_ss=1.0
  nchi=-1
  hmin_man=1.0
  ncor=1
  nws=2
  wtiminc=150.0
  drampwind=1.0
  ihconsv=0
/

&SCHOUT
  nhot=1
  nhot_write=576
  iof_hydro(1)=1
  iof_hydro(16)=1
/

Do you have any suggestions where I should look to find the issue?

@josephzhang8
Copy link
Member

@SorooshMani-NOAA ipre=1 may hang under new IO but the std outputs should see 'Pre-processing completed...' and you can kill the job. In any case, since you have success with nws=0, ipre=1 won't shd any new info.

The NaN is likely coming from sflux. You can use ncview to see which variable has that.

@SorooshMani-NOAA
Copy link
Author

@josephzhang8 thank you for your quick response. I submitted the ipre=1 job to slurm, but I don't see anything in the std out log file. In any case if ipre=1 won't give me more info I'll just kill it.

I tried plotting the values for a couple of time steps by exporting as tiff and viewing in QGIS. So far I haven't seen any strange nan regions close to mesh, e.g.:
image

So based on the error message, is there any way I can narrow down where I should expect the invalid nans to be? Should I look at all the variables?

Thanks again

@josephzhang8
Copy link
Member

You have to look at all.

@SorooshMani-NOAA
Copy link
Author

SorooshMani-NOAA commented Jan 9, 2024

A preliminary check seems to suggest there are no nans in the relevant region:

In [145]: grd = Hgrid.open('./cmd_4/hgrid.gr3', crs=4326)

In [146]: sflux_1 = sflux.rio.clip_box(*grd.bbox.extents)

In [147]: sflux_1.isnull().any()
Out[147]:
<xarray.Dataset>
Dimensions:      ()
Coordinates:
    spatial_ref  int64 0
Data variables:
    prmsl        bool False
    spfh         bool False
    stmp         bool False
    uwind        bool False
    vwind        bool False

This is an example of the prmsl clipped to mesh box (in epsg:4326). The boundaries of the clipped region is larger than mesh extent on all four sides and there are no nans within the clipped box (as shown above):
image

Is there anywhere else to look for values causing nan? There are the value bounds:

In [148]: sflux_1.max()
Out[148]:
<xarray.Dataset>
Dimensions:      ()
Coordinates:
    spatial_ref  int64 0
Data variables:
    prmsl        float32 1.037e+05
    spfh         float32 0.023
    stmp         float32 316.7
    uwind        float32 14.0
    vwind        float32 18.05

In [149]: sflux_1.min()
Out[149]:
<xarray.Dataset>
Dimensions:      ()
Coordinates:
    spatial_ref  int64 0
Data variables:
    prmsl        float32 9.875e+04
    spfh         float32 0.005323
    stmp         float32 277.4
    uwind        float32 -17.17
    vwind        float32 -15.14

@SorooshMani-NOAA
Copy link
Author

SorooshMani-NOAA commented Jan 9, 2024

Is there anyway I can check if my final sflux_air_1.0001.nc sflux format is correct? Any type of preprocessing, etc.? I might have made mistakes in the script that generates the .nc files that I feed to pyschism's SfluxDataset

btw. the above dataset is the final sflux, not the intermediate one I generate from HAFS. I do
HAFS --> intermediate --pyschism--> final sflux

@josephzhang8
Copy link
Member

sflux files are in CF1.0 convention and there are many tools to check but I'm not expert in that.

@SorooshMani-NOAA
Copy link
Author

Does it matter if the part of the sflux grid which is not within the mesh has nan values?

@josephzhang8
Copy link
Member

It might. The min/max of each variable seem reasonable.

@jreniel
Copy link
Member

jreniel commented Jan 9, 2024

It looks like your input grid is curvilinear. It needs to be rectilinear. If you crop it to the grid bounds with a buffer before giving it to pyschism it should work. Note also that schism requires the xgris and ygrid vars to be 1D (ergo rectilinear). While I'm not 100% sure of these statements I think that's what the problem is. pyschism won't check for this because I didn't know about that requirement when I put the SfluxDataset together.

Essentially, since your input grid is curvilinear, you'll have to build a new grid with linspace and interpolate to the np.meshgrid, while keeping xgrid and ygrid 1D in the nc file.

@SorooshMani-NOAA
Copy link
Author

@josephzhang8 @jreniel thank you for your replies!

I think I've found the culprit! Maybe it's worth discussing it in more details and also I should probably do more testing, but after the first successful run, it seems to me that the issue was in fact nan values outside the meshed area in the sflux file.

@jreniel the HAFS output (at least the ones I'm using) is on a rectiliniear grid, however it has lot of nan values - I read HAFS grib2 files using xarray (at multiple typeOfLevel values) and then create a new dataset from all variables and write it to .nc files. I believe the reason behind the curvilinear-looking plot is that HAFS input might be curvilinear but they output to rectilinear with a lot of nans.

In any case, I just clipped the sflux (the same one I have used above) by the mesh box (as also pointed out by @jreniel) and wrote the clipped version to .nc file and used this new clipped version. SCHISM didn't complain! I haven't looked at the results yet, but at least I've progressed one step!

I'll close the ticket for now since the original issue is addressed. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants