# Estimation of Vs30 Using the Low-Dimensional Models

## License Information

This file is part of _mHVSR-Vs30_, a collection of data-driven models
for predicting Vs30 from mHVSR.

    Copyright (C) 2025 Sharma Wagle, Rodriguez-Marek, Vantassel (joseph.p.vantassel@gmail.com)

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <https: //www.gnu.org/licenses/>.
    
## About _mHVSR-Vs30_

`mHVSR-Vs30` is a collection of data-driven models to predict the
time-averaged shear wave velocity in the upper 30 m (Vs30), from 
microtremor horizontal-to-vertical spectral ratio (mHVSR). The developed
models developed include both low-dimensional (`low_dim_models.ipynb`) and
high-dimensional (`high_dim_models.ipynb`). The details of the model's
development and performance are presented in the reference below.

## Citation

If you use `mHVSR-Vs30` in your research or consulting, we ask you please cite the following:

> Sharma Wagle, K., Vantassel, J.P., and Rodriguez-Marek, A. (2025). "A Set of Data-Driven Models to Predict VS30 from the
> Horizontal-to-Vertical Spectral Ratio of Microtremors". Bulletin of the Seismological Society of America. [In-Review]


## About this notebook

This notebook illustrates how `mHVSR-Vs30` estimates Vs30 from mHVSR measurements.

The processing has been done following the SESAME (2004) guidelines.
If you use this notebook, please also cite SESAME (2004) to recognize their original work.

> SESAME (2004). Guidelines for the Implementation of the H/V Spectral Ratio Technique on Ambient Vibrations
> Measurements, Processing, and Interpretation. European Commission - Research General Directorate, 62,
> European Commission - Research General Directorate.

To use this notebook, you need at least one clear resonance peak in the mHVSR curve. For more information regarding the clear resonance peak, see the SESAME (2004) guidelines. If your mHVSR peak fails the clarity, it is __not recommended__ to proceed further with the prediction.

In addition to a measurement of ambient noise (from which an mHVSR measurement can be made), you need two topographic features: station elevation and the average elevation around the station across a 1500 m diameter circle centered on the station. For the topographic features use the 1 arc-second Digital Elevation Model (DEM) for consistency with the models development. If you do not have topographic features for your data, the notebook `high_dim_models.ipynb` can be used with only the microtremor recording (no topographic features required).

## Getting Started

1. Follow the instructions in the software's [README.md](https://github.com/geoimaging/mhvsr-vs30?tab=readme-ov-file#getting-started) to get started by downloading the software and installing the dependencies.
2. To run the default example, open this notebook in JupyterLab and select `Kernel > Restart Kernel and Run All Cells`.
3. To try the other examples, uncomment the corresponding example number in the cell labeled __Data Input and Recording Check__ below before selecting `Kernel > Restart Kernel and Run All Cells`.
4. Once you are comfortable running the examples provided, try supplying your own data.

In [1]:
import pathlib
import os

import numpy as np
import pandas as pd
import hvsrpy
from hvsrpy import sesame
import obspy
import matplotlib
matplotlib.use("TkAgg")
import matplotlib.pyplot as plt
from scipy.stats import skew
import joblib

plt.style.use(hvsrpy.postprocessing.HVSRPY_MPL_STYLE)

### Data Input and Recording Check

In [2]:
# Data Input

# example 1
fname = "./data/AR.STN01.A1.2019.006.mseed"
elevation_in_m =  2246
elevation_1500m_avg_in_m =  2237.903656

# # example 2
# fname = "./data/CI.FUR-Rec288-Sen450.mseed"
# elevation_in_m =  -44 #in meters.
# elevation_1500m_avg_in_m =  -43.80603662 # in meters.

# # example 3
# fname = "./data/UT.STN09_20130320_020000.mseed"
# elevation_in_m =  35 #in meters.
# elevation_1500m_avg_in_m =  35.33800078 # in meters.

stream = obspy.read(fname)

# Check trace count
if len(stream) != 3:
    print("Recording should have exactly three traces.")
else:
    # Find latest start time and earliest end time
    start_times = [tr.stats.starttime for tr in stream]
    end_times = [tr.stats.endtime for tr in stream]

    common_start = max(start_times)
    common_end = min(end_times)

    # Check duration
    common_duration = common_end - common_start
    if common_duration <= 0:
        print("No overlapping time window between traces.")
    else:
        # Trim all traces to common time window
        stream.trim(starttime=common_start, endtime=common_end)

        # Save to MiniSEED file
        fpath = pathlib.Path(fname)
        fname_updated = f"{fname[:len(fpath.suffix)]}_updated{fpath.suffix}"
        stream.write(fname_updated, format="MSEED")

        # Check if duration exceeds 30 minutes
        if common_duration >= 1800:  # 1800 seconds = 30 minutes
            print(f"Trimmed duration is {common_duration/60 :.2f} minutes.")
        else:
            print("Trimmed duration is less than 30 minutes.")

Trimmed duration is 60.00 minutes.


In [3]:
# HVSR Preprocessing Settings
preprocessing_settings = hvsrpy.settings.HvsrPreProcessingSettings()
preprocessing_settings.detrend = "linear"
significant_cycles = 15  # require 15 significant cycles.
time_windows = 35  # require 35 time windows.
duration_in_seconds = common_duration  # window length (s)
windowlength_in_seconds = duration_in_seconds / time_windows
preprocessing_settings.window_length_in_seconds = windowlength_in_seconds

print("Preprocessing Summary")
print("-"*60)
preprocessing_settings.psummary()

# HVSR Processing Settings
processing_settings = hvsrpy.settings.HvsrTraditionalProcessingSettings()
processing_settings.window_type_and_width = ("tukey", 0.2)
processing_settings.smoothing=dict(operator="konno_and_ohmachi",
                                   bandwidth=40,
                                   center_frequencies_in_hz=np.geomspace(0.05, 50, 256))
processing_settings.method_to_combine_horizontals = "geometric_mean"
processing_settings.handle_dissimilar_time_steps_by = "frequency_domain_resampling"

desired_frequency_vector_in_hz = np.geomspace(0.05, 50, 256)
minimum_frequency = significant_cycles / windowlength_in_seconds
fids = desired_frequency_vector_in_hz > minimum_frequency
frequency_resampling_in_hz = desired_frequency_vector_in_hz[fids]
processing_settings.smoothing["center_frequencies_in_hz"] = frequency_resampling_in_hz

print("Processing Summary")
print("-"*60)
processing_settings.psummary()

Preprocessing Summary
------------------------------------------------------------
hvsrpy_version                           : 2.0.0
orient_to_degrees_from_north             : 0.0
filter_corner_frequencies_in_hz          : [None, None]
window_length_in_seconds                 : 102.85714285714286
detrend                                  : linear
preprocessing_method                     : hvsr
Processing Summary
------------------------------------------------------------
hvsrpy_version                           : 2.0.0
window_type_and_width                    : ('tukey', 0.2)
smoothing                                :
     operator                            : konno_and_ohmachi
     bandwidth                           : 40
     center_frequencies_in_hz            : [0.14776046176014435 ... 6371930790751, 50.0]
fft_settings                             : None
handle_dissimilar_time_steps_by          : frequency_domain_resampling
processing_method                        : traditional
metho

### HVSR Processing and Manual Window Rejection

In [4]:
# Compute HVSR
srecords = hvsrpy.read([fname_updated])
srecords = hvsrpy.preprocess(srecords, preprocessing_settings)
hvsr = hvsrpy.process(srecords, processing_settings)

In [5]:
# Create HvsrTraditional object
mhvsr = hvsrpy.HvsrTraditional(frequency=hvsr.frequency, amplitude=hvsr.amplitude)

# Perform manual window rejection
hvsrpy.window_rejection.manual_window_rejection(
    mhvsr, y_limit=15, plot_frequency_std=False, fig=None, ax=None  #Change y_limit as required in the plot.
)
plt.close("all")

In [6]:
search_range_in_hz = (None, None)
verbose = 1

print("\nSESAME (2004) Clarity and Reliability Criteria:")
print("-"*47)
hvsrpy.sesame.reliability(
    windowlength=preprocessing_settings.window_length_in_seconds,
    passing_window_count=np.sum(mhvsr.valid_window_boolean_mask),
    frequency=mhvsr.frequency,
    mean_curve=mhvsr.mean_curve(distribution="lognormal"),
    std_curve=mhvsr.std_curve(distribution="lognormal"),
    search_range_in_hz=search_range_in_hz,
    verbose=verbose,
)
hvsrpy.sesame.clarity(
    frequency=mhvsr.frequency,
    mean_curve=mhvsr.mean_curve(distribution="lognormal"),
    std_curve=mhvsr.std_curve(distribution="lognormal"),
    fn_std=mhvsr.std_fn_frequency(distribution="normal"),
    search_range_in_hz=search_range_in_hz,
    verbose=verbose,
)

fig, axs = hvsrpy.plot_single_panel_hvsr_curves(mhvsr)
plt.show()


SESAME (2004) Clarity and Reliability Criteria:
-----------------------------------------------
[1mAssessing SESAME (2004) reliability criteria ... [0m
  Criteria i): [32mPass[0m
  Criteria ii): [32mPass[0m
  Criteria iii): [32mPass[0m
  The chosen peak [32mPASSES[0m the peak reliability criteria, with 3 of 3.
[1mAssessing SESAME (2004) clarity criteria ... [0m
  Criteria i): [32mPass[0m
  Criteria ii): [32mPass[0m
  Criteria iii): [32mPass[0m
  Criteria iv): [32mPass[0m
  Criteria v): [32mPass[0m
  Criteria vi): [32mPass[0m
  The chosen peak [32mPASSES[0m the peak clarity criteria, with 6 of 6.


### Parameters for low dim model.

In [7]:
fn_mean = mhvsr.mean_fn_frequency(distribution="lognormal")
an_mean = mhvsr.mean_fn_amplitude(distribution="lognormal")
TPI = elevation_in_m - elevation_1500m_avg_in_m

# Function to compute skewness ignoring leading NaNs and filtering to 0.2–50 Hz
def compute_skew_ignore_leading_nans(freqs, amps):
    # Filter by frequency range
    mask = (freqs > 0.199) & (freqs < 50)
    filtered_amps = amps[mask]

    # Ignore leading NaNs
    first_valid_index = np.argmax(~np.isnan(filtered_amps))
    trimmed = filtered_amps[first_valid_index:]
    valid = trimmed[~np.isnan(trimmed)]

    return skew(valid, bias=False) if len(valid) >= 3 else np.nan


mhvsr_mean_curve = mhvsr.mean_curve(distribution="lognormal")  #Taking the lognormal mean curve among the accepted windows.
skewness = compute_skew_ignore_leading_nans(mhvsr.frequency, mhvsr_mean_curve)

# Assemble into DataFrame
X = pd.DataFrame([{
    "fn_mean": fn_mean,
    "an_mean": an_mean,
    "TPI": TPI,
    "Skewness": skewness,
    "elevation": elevation_in_m
}])

# Elevation Binning
elevation_bins = [-500, 0, 500, 1000, 1500, 2000, 2500, 3000]
elevation_labels = [
    'Elevation_Bin_[-500.0, 0.0)',
    'Elevation_Bin_[0.0, 500.0)',
    'Elevation_Bin_[500.0, 1000.0)',
    'Elevation_Bin_[1000.0, 1500.0)',
    'Elevation_Bin_[1500.0, 2000.0)',
    'Elevation_Bin_[2000.0, 2500.0)',
    'Elevation_Bin_[2500.0, 3000.0)'
]

# Bin the elevation into categories
X = X.copy()
X['Elevation_Bin'] = pd.cut(X['elevation'], bins=elevation_bins, labels=elevation_labels, right=False)

# One-hot encode the elevation bin
elevation_dummies = pd.get_dummies(X['Elevation_Bin']).astype(int)

# Concatenate back to original DataFrame
X = pd.concat([X, elevation_dummies], axis=1)

# Optional: Drop 'Elevation_Bin' column if you only need one-hot encoded version
X = X.drop(columns=['elevation','Elevation_Bin'])

In [8]:
X

Unnamed: 0,fn_mean,an_mean,TPI,Skewness,"Elevation_Bin_[-500.0, 0.0)","Elevation_Bin_[0.0, 500.0)","Elevation_Bin_[500.0, 1000.0)","Elevation_Bin_[1000.0, 1500.0)","Elevation_Bin_[1500.0, 2000.0)","Elevation_Bin_[2000.0, 2500.0)","Elevation_Bin_[2500.0, 3000.0)"
0,0.460113,11.509159,8.096344,2.209668,0,0,0,0,0,1,0


### Load low-dimensional models

In [9]:
# Load Models
linear_model = joblib.load("./models/linear_model.joblib")
decisiontree_model = joblib.load("./models/decision_tree_model.joblib")
randomforest_model = joblib.load("./models/random_forest_model.joblib")
xgboost_model = joblib.load("./models/xgboost_model.joblib")

# Feature Scaling for linear model
X_linear = X.copy().to_numpy()

boxcox_scaler = joblib.load("./models/linear_f0hvsr_scaler.pkl") # fn_mean: Box-Cox Transform
X_linear[:, 0] = boxcox_scaler.transform(X_linear[:, 0].reshape(-1, 1)).flatten()

X_linear[:, 1] = np.log(X_linear[:, 1]) #an_mean: log transformation

std_scaler_tpi = joblib.load("./models/linear_topo_scaler.pkl") #TPI: Standard transformation
X_linear[:, 2] = std_scaler_tpi.transform(X_linear[:, 2].reshape(-1, 1)).flatten()

std_scaler_skew = joblib.load("./models/linear_skewness_scaler.pkl") #Skewness: standard transformation
X_linear[:, 3] = std_scaler_tpi.transform(X_linear[:, 3].reshape(-1, 1)).flatten()

configuration generated by an older version of XGBoost, please export the model by calling
`Booster.save_model` from that version first, then load it back in current version. See:

    https://xgboost.readthedocs.io/en/stable/tutorials/saving_model.html

for more details about differences between saving model and serializing.

  setstate(state)


### Predictions

In [10]:
Vs30_pred_linear = linear_model.predict(X_linear)
Vs30_pred_decisiontree = decisiontree_model.predict(X.to_numpy())
Vs30_pred_randomforest = randomforest_model.predict(X.to_numpy())
Vs30_pred_xgboost = xgboost_model.predict(X.to_numpy())

In [11]:
# Stack predictions and model labels
models = ["Linear", "Decision Tree", "Random Forest", "XGBoost"]
predictions = [
    np.exp(Vs30_pred_linear),
    np.exp(Vs30_pred_decisiontree),
    np.exp(Vs30_pred_randomforest),
    np.exp(Vs30_pred_xgboost)
]

# Create dataframe
# Create dataframe with 0 decimal precision
vs30_modelwise_df = pd.DataFrame({
    "Model": np.repeat(models, [len(p) for p in predictions]),
    "Vs30 (m/s)": np.round(np.concatenate(predictions), 0).astype(int)
})

#Show Results
vs30_modelwise_df

Unnamed: 0,Model,Vs30 (m/s)
0,Linear,113
1,Decision Tree,102
2,Random Forest,81
3,XGBoost,120
