Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bayesian analysis for calibration of input parameters #88

Open
SorooshMani-NOAA opened this issue Sep 13, 2022 · 67 comments
Open

Bayesian analysis for calibration of input parameters #88

SorooshMani-NOAA opened this issue Sep 13, 2022 · 67 comments
Assignees
Labels
enhancement New feature or request

Comments

@SorooshMani-NOAA
Copy link
Collaborator

SorooshMani-NOAA commented Sep 13, 2022

There's a package called PyMC3 that can be used for Bayesian analysis.

There's a good "hacker book" available at: https://nbviewer.org/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Prologue/Prologue.ipynb

Initial discussion:

  • Bayesian approach is for parameter calibration
  • Using NN for surrogate generation vs polynomial chaos (we most accurate results for param calibration)
  • All the steps up to NN surrogate generation is the same as before
    • We still run ensembles because we want to try it on SCHISM vs ADCirc from before
  • Priors for MCMC can be the same as ensemble generation probabilities
  • How to obtain the right likelihood function is the main question.
    • One can just assume i.i.d. Gaussian to start with, but this won't give us the right results (measurements = data + Gaussian noise)
    • Gaussian i.i.d. assumption is similar to LSE fit in deterministic world
  • There's an custom MCMC package that Khachik can provide.
    • We can switch to PyMC whenever we need to as it is a well known widely used package.
    • Using the internal package has the benefit that Khachik can help us with the issue details.
  • We'll be using best track forcing (no forecast) for hindcast of several storms
    • We can try adding NWM forcing
    • We can also try adding in rainfall by using PaHM to generate best track winds and then use ERA5 (?)
@SorooshMani-NOAA SorooshMani-NOAA changed the title Bayesian analysis for calibration of input perturbation Bayesian analysis for calibration of input parameters Sep 13, 2022
@SorooshMani-NOAA
Copy link
Collaborator Author

@WPringle based on the discussion we had today, we only need to run best track for a couple of storms and with a couple of different ensemble sizes with Korobov sampling right?

@WPringle
Copy link
Contributor

@SorooshMani-NOAA because it is best track we should change how we do the ensemble perturbation. I think we need to work out that strategy first, then we do ensemble then check that the ensemble results encompass the observations.

This paper is relevant which also look at best track:
Sochala, P., Chen, C., Dawson, C., & Iskandarani, M. (2020). A polynomial chaos framework for probabilistic predictions of storm surge events. Computational Geosciences, 24, 109–128. https://doi.org/10.1007/s10596-019-09898-5

See section 4.2. In there they treat the errors as uniform random variables with some amplification factor which we can tune.
Check out Figure 5 and 6.

I think we should follow their methodology for doing the track and intensity perturbations.

@WPringle
Copy link
Contributor

@SorooshMani-NOAA You can see they also perturb manning's roughness too.

@SorooshMani-NOAA
Copy link
Collaborator Author

Just for the sake of completeness of this discussion:
We discussed how to generate ensembles and eventually decided to just run ensembles of different storms with different sizes for now. This is due to the limitation we had on using the available resources in terms of time.

Later we'll explore other parameter perturbations.

@SorooshMani-NOAA
Copy link
Collaborator Author

@WPringle have you tried running KL-NN code on the sample data? During our meeting (and also in the slides) it says the data is time dependent. But looking at the ydata.txt and the code that was shared with us, it seems that it's using data that is of the shape ensemble x mesh-node. Do you know if the plots in the slides where generated using the same code or not?

I haven't tried running it yet. I was just reviewing the slides and the code.

@WPringle
Copy link
Contributor

WPringle commented Oct 3, 2022

I did run the code using the sample data and the plots on the slides are the result of that execution.

I wouldn't worry about whether it is "x" or "t", it is somewhat arbitrary and methodology can handle spatiotemporal in theory. But I am pretty sure "x" is referring to a time dimension here based on how it looks

@SorooshMani-NOAA
Copy link
Collaborator Author

I see, thank you

@SorooshMani-NOAA
Copy link
Collaborator Author

@WPringle I forgot to also ask, have you tried it on an ensemble too? Was there any issue I need to be aware of if I want to run this on combine maxelev ensemble dataset?

@WPringle
Copy link
Contributor

WPringle commented Oct 4, 2022

No I did not. Just be aware of the dimensions of the data, number of parameters, number of nodes, and number of ensembles and how that fits into that code. It should be able to run through to the part where it requires the observations

@SorooshMani-NOAA
Copy link
Collaborator Author

OK, thank you

@SorooshMani-NOAA
Copy link
Collaborator Author

@WPringle I'm testing the KLNN (without MCMC part) on the same ensemble I used with your KLPC. I actually subsetted the maxelev dataset using your subsetting part of the KLPC script (from examples/karhunenloeve_polynomialchaos/klpc_wetonly.py). Although the dataset now fits into memory, still the KLE.build takes a lot of time. Would it be OK if we replace the KL part of Khachik's script with the same PCA that you were using in the following code?

pca_obj = PCA(n_components=neig, random_state=666, whiten=True)

Out of curiosity, is there any reason behind using KLE/PCA for reducing dimensions? Why not use other methods?

@WPringle
Copy link
Contributor

Yes, please use the PCA method we use in EnsemblePerturbation

What other methods are there? For the KLPC we have an analytical solution to convert the KL fit surrogate back to the full-dimension surrogate, which is at least one advantage I can think of...

@SorooshMani-NOAA
Copy link
Collaborator Author

Yes, please use the PCA method we use in EnsemblePerturbation

OK thanks!

What other methods are there?

I was talking in general, I know there are a bunch of dimension reduction techniques which might be better than PCA based certain ways/applications; so I was wondering if there's any specific reason. e.g.: https://towardsdatascience.com/11-dimensionality-reduction-techniques-you-should-know-in-2021-dcb9500d388b

... to convert the KL fit surrogate back to the full-dimension surrogate...

I see, I didn't really think about this part. It would still be interesting to see what other methods have to offer if we get the chance to explore

@WPringle
Copy link
Contributor

yeah that's interesting! For sure explore them if you want to, we can have a meeting with Khachik sometime to discuss this, he might be able to say why we would prefer one type of method over another.

@SorooshMani-NOAA
Copy link
Collaborator Author

Copying the email for future reference:

As I was going over the KL-NN-MCMC code thinking about how to use USGS data within the model, or later how to use the temporal data, I decided to focus more on the PCA part of the code and read more about it. I stumbled upon this not so new paper:

https://www.researchgate.net/publication/263138615_Principal_Component_Analysis_on_Spatial_Data_An_Overview

This is an overview of PCA use for spatial data. For the standard PCA that we are using, it defines different modes of running, and neither of them incorporates all time, location and attributes at the same time.

Based on my understanding, in our case node index is "location", each ensemble is an "attribute", and the elevation output time in the time series is "time" (for later). We could also consider both ensemble-member and the timeseries as "time", but I'm not sure if that makes a lot of sense. So right now for max-elev analysis we are actually in Q-mode, but I don't know which mode we should use for (later) time series analysis.

Another question that I have is how to see if we can find a way to "train" KLNN on all the nodes, but use USGS data on ~270 HWM locations for MCMC tuning. The alternative obvious approach is to train KLNN on the station locations and then do MCMC on the same station locations (noting that since we don't have a lot of computational time, we interpolate field outputs to the HWM locations for now!).

@SorooshMani-NOAA
Copy link
Collaborator Author

I tried using the sklearn PCA and the KLNN part goes through successfully now. I downloaded USGS data and selected all the locations that are within the polygon of the mesh. Since the stations were not available at the time of simulation, I'm interpolating results to the station locations for the analysis. Now I'm trying to fix nan data and/or dimension mismatches to run the full KLNNMCMC.

@WPringle
Copy link
Contributor

@SorooshMani-NOAA Could you please share the code you have so far for the results we looked at yesterday with Khachik?

@SorooshMani-NOAA
Copy link
Collaborator Author

@WPringle I uploaded the code on a private repo for now. I just added you and Saeed as collaborator. Note that the code must run on an already subsetted dataset. I tried to get it to work with minimum changes possible.

@SorooshMani-NOAA
Copy link
Collaborator Author

SorooshMani-NOAA commented Nov 16, 2022

This is the plot I get for the MCMC results. If I discard the first half and pick every 10th:
image

Each plot is for one of the parameters values in Y axis vs X axis, the sample index. It seems that gamma in the code plays a major role in determining how white-noise like the parameters are; this if for gamma=0.01

Update
These plots are not in the repo code yet!

@SorooshMani-NOAA
Copy link
Collaborator Author

I also increased the 2000 epochs to 20,000 and it definitely improved the NN:
image
and
image

@SorooshMani-NOAA
Copy link
Collaborator Author

@WPringle if I want to train on the whole model but only MCMC on the USGS HWM stations, what should the non-station dimension value be when transforming observations using PCA? Should it be 0?

@WPringle
Copy link
Contributor

WPringle commented Nov 16, 2022

@WPringle if I want to train on the whole model but only MCMC on the USGS HWM stations, what should the non-station dimension value be when transforming observations using PCA? Should it be 0?

@SorooshMani-NOAA That's a good question. I suppose 0 makes sense because in the matrix multiplication zeros will add nothing. Or if it doesn't work then we need to probably do the MCMC on the full result, rather than the dimensionally reduced result. First let's try zero and see...

Just to be clear, you are asking about what dummy value to put in obs_data at non-station locations right?

@SorooshMani-NOAA
Copy link
Collaborator Author

Just to be clear, you are asking about what dummy value to put in obs_data at non-station locations right?

@WPringle, sorry for late reply. Yes that's what I meant. I'll try and see how it goes

@SorooshMani-NOAA
Copy link
Collaborator Author

On a separate note, I plotted the MCMC results vs obs as well as model vs obs and it doesn't really look good. Note that I've already cut off HWM locations that are higher than 2 meter or less that 1 meter outside the min/max of HWM station resutls in the model and still it doesn't look good. I think I need to actually rerun the simulations:
image
where each thin bar is an station and the limits indicate max/min results from all the 30. and
image

@SorooshMani-NOAA
Copy link
Collaborator Author

SorooshMani-NOAA commented Nov 21, 2022

Next

  • Drop the med to low quality data before plotting
  • Plot both ADCIRC and SCHISM results on the scatter plot against the observations.
  • See how the peak of observation timeseries compare against the model

@SorooshMani-NOAA
Copy link
Collaborator Author

SorooshMani-NOAA commented Dec 22, 2022

The result is not good for sure, but this time there seems to be a method to this madness! I mean there seems to be a bias involved here in the MCMC errorbar results. Still not sure how to interpret it, what do you think @WPringle ? Also please let me know how to share the .nc files. The combined for the 50 ensemble (42-43 went through) is around 36 GB, the subset is ~10GB

Updated sizes

@WPringle
Copy link
Contributor

I guess the bias mostly comes from the fact that model results are generally under predicted. But it's a bit weird we would think the MAP crosses should be closest to the observations, instead they are lower down on the error bars.
Could you add the MCMC error bars to the Ensemble vs observation plot? So we can compare original ensemble to MCMC.

@WPringle
Copy link
Contributor

WPringle commented Jan 3, 2023

@SorooshMani-NOAA I think we can use this function to transfer Gaussian to uniform https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.ndtr.html#scipy.special.ndtr

This will give you values of [0,1] then can rescale it to [-1,1] by multiply by 2 then subtract 1.

@WPringle
Copy link
Contributor

@SorooshMani-NOAA I put my jupyter notebook and some modifications to mcmc.py and utils.py for using PCs on the globus Ensemble research collection under KLPC_MCMC_William

@WPringle
Copy link
Contributor

WPringle commented Jan 18, 2023

@SorooshMani-NOAA Also check out https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html for trying to figure out best hyperparameters for the ANN model (used by this paper: https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2022JD037617)

@SorooshMani-NOAA
Copy link
Collaborator Author

@WPringle I think we missed one point during our discussion about Physical vs KL space:

When using Polynomial Chaos and we're in physical space, it's clear what is the input to the physical surrogate, but for the NN case, even if MCMC takes care of transforming back to physical, still we don't know what to pass to the NN in the first place.

NN only accepts KL space input, so we still run into the same issue of transforming data from observation-points-only to KL space, to then pass it to NN. So we should find a way to "transform" the neural network to accept physical inputs. Does that make sense?

@WPringle
Copy link
Contributor

WPringle commented Jan 31, 2023 via email

@SorooshMani-NOAA
Copy link
Collaborator Author

“input” to the surrogate is just the value of the set of parameters

Oh I see my mistake ... thanks! I'll just need to inverse_transform the KL space predictions from NN and then get the ones on obs location for MCMC to use!

Apart from that, I'm not sure if I use cross validation term correctly. Based on the description here https://scikit-learn.org/stable/modules/grid_search.html, this term is used for hyperparameter tuning (e.g. number of hidden layers or neurons in case of NN) and not model parameter selection. What we're doing really is looking at train/test for tuning parameters.

In the case of PC model, the existing code has:

cross_validator = LeaveOneOut()
regression_model = ElasticNetCV(
    fit_intercept=False,
    cv=cross_validator,
    l1_ratio=0.5,
    selection='random',
    random_state=666,
)

based on https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNetCV.html my understanding is that cross validation is actually not used here, since l1_ratio is not a list of possible values.

@WPringle
Copy link
Contributor

In the case of PC model, the existing code has:

cross_validator = LeaveOneOut()
regression_model = ElasticNetCV(
    fit_intercept=False,
    cv=cross_validator,
    l1_ratio=0.5,
    selection='random',
    random_state=666,
)

based on https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNetCV.html my understanding is that cross validation is actually not used here, since l1_ratio is not a list of possible values.

Yep so it still is doing the CV for the alpha parameter in the cost function just not the l1_ratio but yes that could also be added as a list.

@SorooshMani-NOAA
Copy link
Collaborator Author

doing the CV for the alpha

Oh, yeah I missed it. In any case, what I'm trying to say is that unless we want to really find a good # of neurons or layers, CV approach might not make much sense for NN and we can do just the internal validation it already has for choosing parameters

@WPringle
Copy link
Contributor

WPringle commented Feb 1, 2023

@SorooshMani-NOAA hmm I don't know about that. Maybe those CV functions aren't appropriate but I think it's important to do CV so we reduce the over training problem.

@SorooshMani-NOAA
Copy link
Collaborator Author

There seems to be other methods to avoid overfitting, e.g.:
https://www.kdnuggets.com/2019/12/5-techniques-prevent-overfitting-neural-networks.html
In any case, this is the results by using val input but without any further change to the NN code:
loss_history_log
The best validation loss happens early in the epochs, so doing more and more epochs doesn't really help based on this graph.

And this is the observation vs ensemble (with dry node extrapolation):
model_vs_obs

Also the results from SCHISM ensemble also improved by using physical-space-surrogate:
fit

@SorooshMani-NOAA
Copy link
Collaborator Author

SorooshMani-NOAA commented Feb 21, 2023

Action items as of 2/21/23 meeting:

Soroosh:

  • Try running a "nice" mesh ensemble (30 members) - best track
  • Generate plot
    • model vs observed for MCMC
  • SCHISM-Internal Wave Model (update in parameters and compiled code)
  • Forecast track run instead of best (needs pre processing for it to become like best track for passing to SCHISM/PAHM)
    • TroPycal might have some tools (for advisories, etc.)
    • Need scripts from William
    • Maybe stormevents is already doing the pre processing

William:

  • GAHM: Make everything Gaussian errors, 34,50,60-kt radiis, TC velocities, etc.
  • Bayesian Optimization of the NN
  • Uncertainty Maps

What do we need for finalizing/publishing?

  • Ideas of what's new compared to William's previous paper
    • e.g. Hydrology, GAHM, BT vs Forecast, MCMC, NN vs PC

@WPringle
Copy link
Contributor

@SorooshMani-NOAA See these scripts and best_track and advisory files here for Florence:
florence_tracks_scripts.zip

The plot_nhc_adv.py uses the tropycal package and get_advisory.py uses StormEvents, but not sure if works properly for getting advisory. But it's supposed to be able to.

@WPringle
Copy link
Contributor

WPringle commented Mar 7, 2023

Plumlee et al. - 2021 - High-fidelity hurricane surge forecasting using em.pdf
lynomial.chaos.framework.for.probabilistic.predictions.of.storm.surge.events.pdf)

@SorooshMani-NOAA
Copy link
Collaborator Author

SorooshMani-NOAA commented Mar 7, 2023

Todo list:
Soroosh:

William:

  • How to perturb/analyze GAHM?

@SorooshMani-NOAA
Copy link
Collaborator Author

@WPringle when I download data for all the past storms from stormevents it looks like there are only two advisory_numbers, 01 and 03, which I think is different from that actual advisory ID that is based on the forecast start time. This comes from the ATCF track files. Based on this, here's how I'd like to get the 48hr forecast data for past storms using stormevents:

  1. First get all the OFCL tracks for the storm using VortexTrack.from_storm_name by passing advisories=['OFCL'] and file_deck='a'
  2. Next get the underlying dataframe using either .data or .unfiltered_data
  3. Find the intersection of the acquired dataframe with the shape of US (from NaturalEarth_Lowres)
  4. For each track_start_date (eqv. to date from ATCF), get the first datetime (eqv. to date from ATCF + forecast hour) that the track point falls on land.
  5. In the result of above operation get the difference of datetime and track_start_date
  6. Get the last entry whose time diff is equal or immediately greater than 2 days to be the "48-hour forecast"
  7. Use the track with the start time found in step 6 for perturbing.

What do you think of this approach?

@WPringle
Copy link
Contributor

WPringle commented Mar 8, 2023

That sounds good to me @SorooshMani-NOAA. What do you mean by only two advisory_numbers?

@SorooshMani-NOAA
Copy link
Collaborator Author

If I get the tracks using:

tracks = stormevents.nhc.track.VortexTrack.from_storm_name('florence', 2018, file_deck='a')

then this would be the result of tracks.data

      basin storm_number            datetime advisory_number advisory  ... isowave_radius_for_NWQ  isowave_radius_for_SWQ  extra_values                    geometry    track_start_time
0        AL           06 2018-08-29 06:00:00              01     CARQ  ...                    NaN                     NaN          <NA>  POINT (-10.90000 12.90000) 2018-08-30 06:00:00
1        AL           06 2018-08-29 12:00:00              01     CARQ  ...                    NaN                     NaN          <NA>  POINT (-12.30000 12.90000) 2018-08-30 06:00:00
2        AL           06 2018-08-29 18:00:00              01     CARQ  ...                    NaN                     NaN          <NA>  POINT (-13.80000 12.90000) 2018-08-30 06:00:00
3        AL           06 2018-08-30 00:00:00              01     CARQ  ...                    NaN                     NaN          <NA>  POINT (-15.40000 12.90000) 2018-08-30 06:00:00
4        AL           06 2018-08-30 06:00:00              01     CARQ  ...                    NaN                     NaN          <NA>  POINT (-17.00000 12.90000) 2018-08-30 06:00:00
...     ...          ...                 ...             ...      ...  ...                    ...                     ...           ...                         ...                 ...
10229    AL           06 2018-09-18 12:00:00              01     CARQ  ...                    NaN                     NaN          <NA>  POINT (-73.30000 42.20000) 2018-09-18 12:00:00
10230    AL           06 2018-09-18 12:00:00              01     CARQ  ...                    NaN                     NaN          <NA>  POINT (-73.30000 42.20000) 2018-09-18 12:00:00
10231    AL           06 2018-09-18 12:00:00              03     OFCL  ...                    NaN                     NaN          <NA>  POINT (-73.30000 42.20000) 2018-09-18 12:00:00
10232    AL           06 2018-09-18 15:00:00              03     OFCL  ...                    NaN                     NaN          <NA>  POINT (-71.90000 42.60000) 2018-09-18 12:00:00
10233    AL           06 2018-09-19 00:00:00              03     OFCL  ...                    NaN                     NaN          <NA>  POINT (-67.70000 43.50000) 2018-09-18 12:00:00

There's an advisory_number in the columns. Before looking at the values I thought this is just a number counting advisories as they are issued, but this seems to mean something different.

@WPringle
Copy link
Contributor

WPringle commented Mar 8, 2023

yeah maybe that is being parsed incorrectly or the column in ALCF meant something else

@SorooshMani-NOAA
Copy link
Collaborator Author

SorooshMani-NOAA commented Mar 10, 2023

@WPringle following our discussion about getting forecast track this is where I am:

  • I'm using the approach described in this comment to get the OFCL forecast track at ~48hr prior to landfall.
  • I realized that when one tries to fetch the data from the VortexTrack object, it automatically runs the correction function if OFCL is inside the requested advisories
  • PySCHISM and ADCIRCPy rely on BEST track for their forcing type, so if I use a forecast OFCL track, I need to pass it as a fake BEST. Due to how StormEvents triggers the correction I also need to do some gymnastics to do this:
    • I get the OFCL and CARQ tracks with track_start_time at ~48hr before landfall
    • Then I write it to a file to force running the correction algorithm for OFCL and I only request OFCL advisory so that after the correction CARQ is automatically removed (some of it just relates to StormEvents implementation details)
    • Then I load this new file as a VortexTrack and change advisory name from OFCL to BEST and also set forecast_hours all to 0 (since I noticed that's how it is in actual best track as written by StormEvents)
    • Then I write this as my fake best track to be used by perturbation and SCHISM.

After this I still get a wrong central pressure and r-max for some of the entries; for example, this is one such output (before I set forecast hours to 0) for Florence 2018:

AL, 06, 2018091306, 03, BEST,   0, 325N,  743W,   95,    0, HU,  34, NEQ,  170,  150,  110,  140,    1,    0,   0, 115,   0,    ,   0, DPB,   0,   0,           ,,,,,,,,
AL, 06, 2018091306, 03, BEST,   0, 325N,  743W,   95,    0, HU,  50, NEQ,  100,   90,   70,   80,    1,    0,   0, 115,   0,    ,   0, DPB,   0,   0,           ,,,,,,,,
AL, 06, 2018091306, 03, BEST,   0, 325N,  743W,   95,    0, HU,  64, NEQ,   70,   60,   50,   60,    1,    0,   0, 115,   0,    ,   0, DPB,   0,   0,           ,,,,,,,,
AL, 06, 2018091309, 03, BEST,   3, 328N,  747W,   95,  956, HU,  34, NEQ,  170,  150,  110,  140,  957,    0,   0, 115,   0,    ,   0, DPB, 312,   5,           ,,,,,,,,
AL, 06, 2018091309, 03, BEST,   3, 328N,  747W,   95,  956, HU,  50, NEQ,  100,   90,   70,   80,  957,    0,   0, 115,   0,    ,   0, DPB, 312,   5,           ,,,,,,,,
AL, 06, 2018091309, 03, BEST,   3, 328N,  747W,   95,  956, HU,  64, NEQ,   70,   60,   50,   60,  957,    0,   0, 115,   0,    ,   0, DPB, 312,   5,           ,,,,,,,,
AL, 06, 2018091318, 03, BEST,  12, 337N,  761W,   95,    0, HU,  34, NEQ,  170,  150,  110,  130,    1,    0,   0, 115,   0,    ,   0, DPB, 308,   5,           ,,,,,,,,
AL, 06, 2018091318, 03, BEST,  12, 337N,  761W,   95,    0, HU,  50, NEQ,  100,   90,   70,   80,    1,    0,   0, 115,   0,    ,   0, DPB, 308,   5,           ,,,,,,,,
AL, 06, 2018091318, 03, BEST,  12, 337N,  761W,   95,    0, HU,  64, NEQ,   70,   60,   50,   60,    1,    0,   0, 115,   0,    ,   0, DPB, 308,   5,           ,,,,,,,,
AL, 06, 2018091406, 03, BEST,  24, 342N,  774W,   90,    0, HU,  34, NEQ,  160,  150,  100,   90,    1,    0,   0, 110,   0,    ,   0, DPB, 295,   3,           ,,,,,,,,
AL, 06, 2018091406, 03, BEST,  24, 342N,  774W,   90,    0, HU,  50, NEQ,   90,   80,   60,   50,    1,    0,   0, 110,   0,    ,   0, DPB, 295,   3,           ,,,,,,,,
AL, 06, 2018091406, 03, BEST,  24, 342N,  774W,   90,    0, HU,  64, NEQ,   60,   60,   40,   35,    1,    0,   0, 110,   0,    ,   0, DPB, 295,   3,           ,,,,,,,,
AL, 06, 2018091418, 03, BEST,  36, 343N,  784W,   70,    0, HU,  34, NEQ,  150,  150,  100,   80,    1,    0,   0,  85,   0,    ,   0, DPB, 277,   2,           ,,,,,,,,
AL, 06, 2018091418, 03, BEST,  36, 343N,  784W,   70,    0, HU,  50, NEQ,   80,   80,   50,   40,    1,    0,   0,  85,   0,    ,   0, DPB, 277,   2,           ,,,,,,,,
AL, 06, 2018091418, 03, BEST,  36, 343N,  784W,   70,    0, HU,  64, NEQ,   50,   50,   30,   20,    1,    0,   0,  85,   0,    ,   0, DPB, 277,   2,           ,,,,,,,,
AL, 06, 2018091506, 03, BEST,  48, 341N,  792W,   50,    0, TS,  34, NEQ,  140,  140,   90,   70,    1,    0,   0,  60,   0,    ,   0, DPB, 253,   2,           ,,,,,,,,
AL, 06, 2018091506, 03, BEST,  48, 341N,  792W,   50,    0, TS,  50, NEQ,   60,   70,   40,    0,    1,    0,   0,  60,   0,    ,   0, DPB, 253,   2,           ,,,,,,,,
AL, 06, 2018091606, 03, BEST,  72, 339N,  812W,   30,    0, TD,  34, NEQ,    0,    0,    0,    0,    1,    0,   0,  40,   0,    ,   0, DPB, 264,   2,           ,,,,,,,,
AL, 06, 2018091706, 03, BEST,  96, 354N,  833W,   25,    0, TD,  34, NEQ,    0,    0,    0,    0,    1,    0,   0,  35,   0,    ,   0, DPB, 311,   3,           ,,,,,,,,
AL, 06, 2018091806, 03, BEST, 120, 395N,  810W,   20,    0, LO,  34, NEQ,    0,    0,    0,    0,    1,    0,   0,  30,   0,    ,   0, DPB,  23,   6,           ,,,,,,,,

for reference, this is part of an actual best track from a previous successfully run ensemble:

AL, 06, 2018091306,   , BEST,   0, 324N,  742W,  100,  955, HU,  34, NEQ,  170,  150,  110,  140, 1010,  200,  20, 115,   0,   L,   0,    , 317,   6,   FLORENCE,,,,,,,,
AL, 06, 2018091306,   , BEST,   0, 324N,  742W,  100,  955, HU,  50, NEQ,  100,   90,   70,   80, 1010,  200,  20, 115,   0,   L,   0,    , 317,   6,   FLORENCE,,,,,,,,
AL, 06, 2018091306,   , BEST,   0, 324N,  742W,  100,  955, HU,  64, NEQ,   70,   60,   50,   60, 1010,  200,  20, 115,   0,   L,   0,    , 317,   6,   FLORENCE,,,,,,,,
AL, 06, 2018091312,   , BEST,   0, 331N,  751W,   95,  954, HU,  34, NEQ,  170,  150,  120,  140, 1011,  200,  20, 115,   0,   L,   0,    , 313,   5,   FLORENCE,,,,,,,,
AL, 06, 2018091312,   , BEST,   0, 331N,  751W,   95,  954, HU,  50, NEQ,  100,   90,   80,   80, 1011,  200,  20, 115,   0,   L,   0,    , 313,   5,   FLORENCE,,,,,,,,
AL, 06, 2018091312,   , BEST,   0, 331N,  751W,   95,  954, HU,  64, NEQ,   70,   60,   50,   60, 1011,  200,  20, 115,   0,   L,   0,    , 313,   5,   FLORENCE,,,,,,,,
AL, 06, 2018091318,   , BEST,   0, 336N,  760W,   90,  953, HU,  34, NEQ,  170,  150,  120,  140, 1011,  200,  20, 110,   0,   L,   0,    , 304,   5,   FLORENCE,,,,,,,,
AL, 06, 2018091318,   , BEST,   0, 336N,  760W,   90,  953, HU,  50, NEQ,  100,   90,   80,   80, 1011,  200,  20, 110,   0,   L,   0,    , 304,   5,   FLORENCE,,,,,,,,
AL, 06, 2018091318,   , BEST,   0, 336N,  760W,   90,  953, HU,  64, NEQ,   70,   60,   50,   60, 1011,  200,  20, 110,   0,   L,   0,    , 304,   5,   FLORENCE,,,,,,,,
AL, 06, 2018091400,   , BEST,   0, 340N,  765W,   90,  952, HU,  34, NEQ,  170,  150,  130,  100, 1012,  200,  20, 105,   0,   L,   0,    , 314,   3,   FLORENCE,,,,,,,,
AL, 06, 2018091400,   , BEST,   0, 340N,  765W,   90,  952, HU,  50, NEQ,  100,   80,   80,   70, 1012,  200,  20, 105,   0,   L,   0,    , 314,   3,   FLORENCE,,,,,,,,
AL, 06, 2018091400,   , BEST,   0, 340N,  765W,   90,  952, HU,  64, NEQ,   70,   60,   50,   50, 1012,  200,  20, 105,   0,   L,   0,    , 314,   3,   FLORENCE,,,,,,,,
AL, 06, 2018091406,   , BEST,   0, 342N,  772W,   85,  952, HU,  34, NEQ,  170,  150,  130,  100, 1012,  200,  20, 100,   0,   L,   0,    , 289,   3,   FLORENCE,,,,,,,,
AL, 06, 2018091406,   , BEST,   0, 342N,  772W,   85,  952, HU,  50, NEQ,  100,   80,   80,   70, 1012,  200,  20, 100,   0,   L,   0,    , 289,   3,   FLORENCE,,,,,,,,
AL, 06, 2018091406,   , BEST,   0, 342N,  772W,   85,  952, HU,  64, NEQ,   70,   60,   60,   50, 1012,  200,  20, 100,   0,   L,   0,    , 289,   3,   FLORENCE,,,,,,,,
AL, 06, 2018091411, 15, BEST,   0, 342N,  778W,   80,  956, HU,  34, NEQ,  170,  150,  140,   90, 1012,  200,  25,   0,   0,    ,   0,    , 270,   3,   FLORENCE,,,,,,,,
AL, 06, 2018091411, 15, BEST,   0, 342N,  778W,   80,  956, HU,  50, NEQ,  100,   80,   80,   60, 1012,  200,  25,   0,   0,    ,   0,    , 270,   3,   FLORENCE,,,,,,,,
AL, 06, 2018091411, 15, BEST,   0, 342N,  778W,   80,  956, HU,  64, NEQ,   70,   60,   60,   40, 1012,  200,  25,   0,   0,    ,   0,    , 270,   3,   FLORENCE,,,,,,,,
AL, 06, 2018091412,   , BEST,   0, 341N,  779W,   80,  957, HU,  34, NEQ,  170,  150,  140,   80, 1012,  200,  25,  90,   0,   L,   0,    , 220,   5,   FLORENCE,,,,,,,,
AL, 06, 2018091412,   , BEST,   0, 341N,  779W,   80,  957, HU,  50, NEQ,  100,   80,   80,   40, 1012,  200,  25,  90,   0,   L,   0,    , 220,   5,   FLORENCE,,,,,,,,
AL, 06, 2018091412,   , BEST,   0, 341N,  779W,   80,  957, HU,  64, NEQ,   60,   60,   60,   20, 1012,  200,  25,  90,   0,   L,   0,    , 220,   5,   FLORENCE,,,,,,,,
AL, 06, 2018091418,   , BEST,   0, 340N,  784W,   65,  969, HU,  34, NEQ,  150,  130,  120,   70, 1012,  200,  30,  80,   0,   L,   0,    , 257,   2,   FLORENCE,,,,,,,,
AL, 06, 2018091418,   , BEST,   0, 340N,  784W,   65,  969, HU,  50, NEQ,   90,   70,   60,   30, 1012,  200,  30,  80,   0,   L,   0,    , 257,   2,   FLORENCE,,,,,,,,
AL, 06, 2018091418,   , BEST,   0, 340N,  784W,   65,  969, HU,  64, NEQ,    0,   30,   30,    0, 1012,  200,  30,  80,   0,   L,   0,    , 257,   2,   FLORENCE,,,,,,,,
AL, 06, 2018091500,   , BEST,   0, 339N,  788W,   60,  978, TS,  34, NEQ,  150,  150,  100,   60, 1013,  210,  30,  65,   0,   L,   0,    , 253,   2,   FLORENCE,,,,,,,,
AL, 06, 2018091500,   , BEST,   0, 339N,  788W,   60,  978, TS,  50, NEQ,   70,   70,   50,   30, 1013,  210,  30,  65,   0,   L,   0,    , 253,   2,   FLORENCE,,,,,,,,
AL, 06, 2018091506,   , BEST,   0, 337N,  793W,   55,  986, TS,  34, NEQ,  150,  150,   90,   50, 1013,  210,  50,  60,   0,   L,   0,    , 245,   2,   FLORENCE,,,,,,,,
AL, 06, 2018091506,   , BEST,   0, 337N,  793W,   55,  986, TS,  50, NEQ,   70,   70,    0,    0, 1013,  210,  50,  60,   0,   L,   0,    , 245,   2,   FLORENCE,,,,,,,,
AL, 06, 2018091512,   , BEST,   0, 336N,  795W,   55,  992, TS,  34, NEQ,  150,  130,   80,   40, 1013,  220,  60,  55,   0,   L,   0,    , 239,   1,   FLORENCE,,,,,,,,
AL, 06, 2018091512,   , BEST,   0, 336N,  795W,   55,  992, TS,  50, NEQ,   60,  100,    0,    0, 1013,  220,  60,  55,   0,   L,   0,    , 239,   1,   FLORENCE,,,,,,,,
AL, 06, 2018091518,   , BEST,   0, 336N,  798W,   50,  997, TS,  34, NEQ,  140,  130,    0,    0, 1013,  220, 110,  50,   0,   L,   0,    , 270,   1,   FLORENCE,,,,,,,,
AL, 06, 2018091518,   , BEST,   0, 336N,  798W,   50,  997, TS,  50, NEQ,    0,  110,    0,    0, 1013,  220, 110,  50,   0,   L,   0,    , 270,   1,   FLORENCE,,,,,,,,
AL, 06, 2018091600,   , BEST,   0, 336N,  802W,   45,  998, TS,  34, NEQ,  130,  130,    0,    0, 1013,  240, 110,  50,   0,   L,   0,    , 270,   2,   FLORENCE,,,,,,,,
AL, 06, 2018091606,   , BEST,   0, 336N,  808W,   40,  999, TS,  34, NEQ,  130,  130,    0,    0, 1013,  260, 110,  40,   0,   L,   0,    , 270,   3,   FLORENCE,,,,,,,,
AL, 06, 2018091612,   , BEST,   0, 336N,  815W,   35, 1002, TS,  34, NEQ,    0,  140,    0,    0, 1013,  280, 140,  40,   0,   L,   0,    , 270,   3,   FLORENCE,,,,,,,,
AL, 06, 2018091618,   , BEST,   0, 341N,  821W,   30, 1006, TD,   0,    ,    0,    0,    0,    0, 1013,  300, 140,  40,   0,   L,   0,    , 315,   4,   FLORENCE,,,,,,,,
AL, 06, 2018091700,   , BEST,   0, 350N,  822W,   25, 1007, TD,   0,    ,    0,    0,    0,    0, 1013,  320, 150,  35,   0,   L,   0,    , 355,   5,   FLORENCE,,,,,,,,
AL, 06, 2018091706,   , BEST,   0, 364N,  826W,   25, 1008, TD,   0,    ,    0,    0,    0,    0, 1013,  340, 160,  35,   0,   L,   0,    , 347,   7,   FLORENCE,,,,,,,,
AL, 06, 2018091712,   , BEST,   0, 378N,  822W,   25, 1008, EX,   0,    ,    0,    0,    0,    0, 1013,  360, 160,  30,   0,   L,   0,    ,  13,   7,   FLORENCE,,,,,,,,
AL, 06, 2018091718,   , BEST,   0, 388N,  820W,   25, 1008, EX,   0,    ,    0,    0,    0,    0, 1013,  360, 160,  30,   0,   L,   0,    ,   9,   5,   FLORENCE,,,,,,,,
AL, 06, 2018091800,   , BEST,   0, 395N,  805W,   25, 1008, EX,   0,    ,    0,    0,    0,    0, 1013,  360, 160,   0,   0,    ,   0,    ,  59,   7,   FLORENCE,,,,,,,,
AL, 06, 2018091806,   , BEST,   0, 413N,  768W,   25, 1007, EX,   0,    ,    0,    0,    0,    0, 1013,  360, 170,   0,   0,    ,   0,    ,  56,  17,   FLORENCE,,,,,,,,
AL, 06, 2018091812,   , BEST,   0, 422N,  733W,   25, 1006, EX,  34, NEQ,    0,    0,    0,    0, 1013,  360, 180,  30,   0,   L,   0,    ,  70,  14,   FLORENCE,,,,,,,,

I'm still investigating what is actually causing the error in SCHISM, but is there anything that you see in the above example? I'm assuming the wrong pressures would maybe sufficient to throw the simulation off!

When you get a change, can you please review the correction done to the OFCL using CARQ here and see if it makes sense? Maybe we should review the StormEvents correction scheme together soon. If you notice anything obvious there, can you please create and Issue on that repo to follow up?

UPDATE

I just saw the issue in the logs of SCHISM from PAHM:

InitLogging not called :: ProcessAsymmetricVortexData: Central pressure set to zero on first line/record when processing the best track file: hurricane-track.dat

@WPringle
Copy link
Contributor

@SorooshMani-NOAA This looks very promising, nice job!
The problem you are encountering is associated with the fact that the advisories only provide the pressures at the 3-hr forecast and I think the Rmax also needs to be taken from initial time!

In my methodology the initial Rmax, maybe this is from CARQ, is assumed for all forecast times in the advisory.
The 0-hr pressure is also taken from the initial CARQ value.
The >3hr forecast pressures are computed as a function of Vmax by keeping Holland B constant. i.e., calculate Holland B for the 3-hr forecast and use that to get the pressures for all the forecast times in the future.

@WPringle
Copy link
Contributor

@SorooshMani-NOAA This is one of big reasons why the Rmax methodology sucks because we although have a lot of information of Vmax and r34, r50, r64 we have no estimate of Rmax in the future. So better off to use r34, r50, r64 in a GAHM implementation

@SorooshMani-NOAA
Copy link
Collaborator Author

@WPringle, thanks. Do you have any suggestion for how to resolve this issue based on what we get above? I looked at the CARQ for the same track_start_time and the pressure is 0 there too.

@WPringle
Copy link
Contributor

WPringle commented Mar 10, 2023

@SorooshMani-NOAA Do you get Rmax in the CARQ?

I think I remember now that actually for the pressure at the 0-hr I used the pressure from the 3-hr in the previous advisory (3 hours prior). If that is too complicated then we can fill in 0-hr using same methodology as the forecast times.

@SorooshMani-NOAA
Copy link
Collaborator Author

@WPringle I believe that should be what's happening. I'm just using StormEvents and I know that the correction code referred to above is being triggered. I believe based on the current StormEvents logic the pressure is indeed taken from 0-hr (the first entry of CARQ for the same track start time).

I can manually modify some of the values in my workflow script to try different things., e.g. I can try setting it to 3-hr forecast pressure. Do you want us to change the logic in the StormEvents repo as well if it works when I try here?

@SorooshMani-NOAA
Copy link
Collaborator Author

Interesting ... I just noticed this in my dataframe, looking only at certain columns:

8909     CARQ            -24 2018-09-13 06:00:00 2018-09-12 06:00:00               0.0
8910     CARQ            -18 2018-09-13 06:00:00 2018-09-12 12:00:00               0.0
8911     CARQ            -12 2018-09-13 06:00:00 2018-09-12 18:00:00               0.0
8912     CARQ             -6 2018-09-13 06:00:00 2018-09-13 00:00:00               0.0
8913     CARQ              0 2018-09-13 06:00:00 2018-09-13 06:00:00             956.0
8914     CARQ              0 2018-09-13 06:00:00 2018-09-13 06:00:00             956.0
8915     CARQ              0 2018-09-13 06:00:00 2018-09-13 06:00:00             956.0                                                                                                                    9020     OFCL              0 2018-09-13 06:00:00 2018-09-13 06:00:00               0.0                                                                                                                    9021     OFCL              0 2018-09-13 06:00:00 2018-09-13 06:00:00               0.0                                                                                                                    9022     OFCL              0 2018-09-13 06:00:00 2018-09-13 06:00:00               0.0                                                                                                                    9023     OFCL              3 2018-09-13 06:00:00 2018-09-13 09:00:00             956.0                                                                                                                    9024     OFCL              3 2018-09-13 06:00:00 2018-09-13 09:00:00             956.0                        

I think the logic in StormEvents gets the first entry meaning -24 hour. It needs to be changed to take the hour 0 forecast instead ...

@WPringle
Copy link
Contributor

WPringle commented Mar 10, 2023

@SorooshMani-NOAA yes, what you just showed sounds good if that can be corrected

@SorooshMani-NOAA
Copy link
Collaborator Author

SorooshMani-NOAA commented Mar 10, 2023

@WPringle I just realized that in the original unprocessed ATCF files, the pressure fields are not NA or empty, they are just 0. That's why StormEvents doesn't actually fix them, since it only updates the fields that are NA! I'll create a ticket on StormEvents side to keep track of updating this issue.

Update
These are the tickets:

Update 2
This is another culprit:

The correction is calculated, but never stored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants