# Estimation of Vs30 Using the High-Dimensional Models

## License Information

This file is part of _mHVSR-Vs30_, a collection of data-driven models
for predicting Vs30 from mHVSR.

    Copyright (C) 2025 Sharma Wagle, Rodriguez-Marek, Vantassel (joseph.p.vantassel@gmail.com)

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <https: //www.gnu.org/licenses/>.
    
## About _mHVSR-Vs30_

`mHVSR-Vs30` is a collection of data-driven models to predict the
time-averaged shear wave velocity in the upper 30 m (Vs30), from 
microtremor horizontal-to-vertical spectral ratio (mHVSR). The developed
models developed include both low-dimensional (`low_dim_models.ipynb`) and
high-dimensional (`high_dim_models.ipynb`). The details of the model's
development and performance are presented in the reference below.

## Citation

If you use `mHVSR-Vs30` in your research or consulting, we ask you please cite the following:

> Sharma Wagle, K., Vantassel, J.P., and Rodriguez-Marek, A. (2025). "A Set of Data-Driven Models to Predict VS30 from the
> Horizontal-to-Vertical Spectral Ratio of Microtremors". Bulletin of the Seismological Society of America. [In-Review]


## About this notebook

This notebook illustrates how `mHVSR-Vs30` estimates Vs30 from mHVSR measurements.

The processing has been done following the SESAME (2004) guidelines.
If you use this notebook, please also cite SESAME (2004) to recognize their original work.

> SESAME (2004). Guidelines for the Implementation of the H/V Spectral Ratio Technique on Ambient Vibrations
> Measurements, Processing, and Interpretation. European Commission - Research General Directorate, 62,
> European Commission - Research General Directorate.

To use this notebook, you need at least 30 minutes of ambient noise, sampled at 100 Hz, as measured on a three-component broadband sensor.

This notebook predicts Vs30 with two separate models: Single-Mode ANN and Dual-Mode ANN. For the Single-Mode ANN, only the microtremor recording mentioned above is required. However, for the Dual-Mode ANN, you will also need the topographic features: station elevation and the average elevation around the station across a 1500 m diameter circle centered on the station. For the topographic features use the 1 arc-second Digital Elevation Model (DEM) for consistency with the model's development. If you do not have topographic features for your data, you can still just use the Single-Mode ANN by supplying "NA" for the topographic information.

## Getting Started

1. Follow the instructions in the software's [README.md](https://github.com/geoimaging/mhvsr-vs30?tab=readme-ov-file#getting-started) to get started by downloading the software and installing the dependencies.
2. To run the default example, open this notebook in JupyterLab and select `Kernel > Restart Kernel and Run All Cells`.
3. To try the other examples, uncomment the corresponding example number in the cell labeled __Data Input and Recording Check__ below before selecting `Kernel > Restart Kernel and Run All Cells`.
4. Once you are comfortable running the examples provided, try supplying your own data.

In [1]:
import pathlib
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
os.environ['TF_ENABLE_ONEDNN_OPTS'] = '0'

import numpy as np
import pandas as pd
import hvsrpy
from hvsrpy import sesame
import obspy
import tensorflow as tf
tf.get_logger().setLevel('ERROR')
import matplotlib
matplotlib.use("TkAgg")
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d
import joblib

plt.style.use(hvsrpy.postprocessing.HVSRPY_MPL_STYLE)

E0000 00:00:1747338898.934921   31089 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1747338898.943484   31089 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1747338898.964160   31089 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1747338898.964211   31089 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1747338898.964213   31089 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1747338898.964215   31089 computation_placer.cc:177] computation placer already registered. Please check linka

### Data Input and Recording Check

In [2]:
# Data Input

# example 1
fname = "./data/AR.STN01.A1.2019.006.mseed"
elevation_in_m =  2246 # Enter "NA" if unknown.
elevation_1500m_avg_in_m =  2237.903656 # Enter "NA" if unknown.

# example 2
# fname = "./data/CI.FUR-Rec288-Sen450.mseed"
# elevation_in_m =  -44 # Enter "NA" if unknown.
# elevation_1500m_avg_in_m =  -43.80603662 # Enter "NA" if unknown.

# # example 3
# fname = "./data/UT.STN09_20130320_020000.mseed"
# elevation_in_m =  35 # Enter "NA" if unknown.
# elevation_1500m_avg_in_m =  35.33800078 # Enter "NA" if unknown.

stream = obspy.read(fname)

# Check trace count
if len(stream) != 3:
    print("Recording should have exactly three traces.")
else:
    # Check sampling rates
    sampling_rates = [tr.stats.sampling_rate for tr in stream]
    if any(sr < 100 for sr in sampling_rates):
        print("At least 100 Hz sampling rate is required. Found rates:", sampling_rates)
    else:
        # Find latest start time and earliest end time
        start_times = [tr.stats.starttime for tr in stream]
        end_times = [tr.stats.endtime for tr in stream]

        common_start = max(start_times)
        common_end = min(end_times)

        # Check duration
        common_duration = common_end - common_start
        if common_duration <= 0:
            print("No overlapping time window between traces.")
        else:
            # Trim all traces to common time window
            stream.trim(starttime=common_start, endtime=common_end)

            # Save to MiniSEED file
            fpath = pathlib.Path(fname)
            fname_updated = f"{fname[:len(fpath.suffix)]}_updated{fpath.suffix}"
            stream.write(fname_updated, format="MSEED")

            # Check if duration exceeds 30 minutes
            if common_duration >= 1800:  # 1800 seconds = 30 minutes
                print(f"Trimmed duration is {common_duration/60:.2f} minutes.")
            else:
                print("Trimmed duration is less than 30 minutes.")

Trimmed duration is 60.00 minutes.


In [3]:
# HVSR Preprocessing Settings
preprocessing_settings = hvsrpy.settings.HvsrPreProcessingSettings()
preprocessing_settings.detrend = "linear"
significant_cycles = 15  # require 15 significant cycles.
time_windows = 35  # require 35 time windows.
duration_in_seconds = common_duration  # window length (s)
windowlength_in_seconds = duration_in_seconds / time_windows
preprocessing_settings.window_length_in_seconds = windowlength_in_seconds

print("Preprocessing Summary")
print("-"*60)
preprocessing_settings.psummary()

# HVSR Processing Settings
processing_settings = hvsrpy.settings.HvsrTraditionalProcessingSettings()
processing_settings.window_type_and_width = ("tukey", 0.2)
processing_settings.smoothing=dict(operator="konno_and_ohmachi",
                                   bandwidth=40,
                                   center_frequencies_in_hz=np.geomspace(0.05, 50, 256))
processing_settings.method_to_combine_horizontals = "geometric_mean"
processing_settings.handle_dissimilar_time_steps_by = "frequency_domain_resampling"

desired_frequency_vector_in_hz = np.geomspace(0.05, 50, 256)
minimum_frequency = significant_cycles / windowlength_in_seconds
fids = desired_frequency_vector_in_hz > minimum_frequency
frequency_resampling_in_hz = desired_frequency_vector_in_hz[fids]
processing_settings.smoothing["center_frequencies_in_hz"] = frequency_resampling_in_hz

print("Processing Summary")
print("-"*60)
processing_settings.psummary()

Preprocessing Summary
------------------------------------------------------------
hvsrpy_version                           : 2.0.0
orient_to_degrees_from_north             : 0.0
filter_corner_frequencies_in_hz          : [None, None]
window_length_in_seconds                 : 102.85714285714286
detrend                                  : linear
preprocessing_method                     : hvsr
Processing Summary
------------------------------------------------------------
hvsrpy_version                           : 2.0.0
window_type_and_width                    : ('tukey', 0.2)
smoothing                                :
     operator                            : konno_and_ohmachi
     bandwidth                           : 40
     center_frequencies_in_hz            : [0.14776046176014435 ... 6371930790751, 50.0]
fft_settings                             : None
handle_dissimilar_time_steps_by          : frequency_domain_resampling
processing_method                        : traditional
metho

### HVSR Processing and Manual Window Rejection

In [4]:
# Compute HVSR
srecords = hvsrpy.read([fname_updated])
srecords = hvsrpy.preprocess(srecords, preprocessing_settings)
hvsr = hvsrpy.process(srecords, processing_settings)

In [5]:
# Create HvsrTraditional object
mhvsr = hvsrpy.HvsrTraditional(frequency=hvsr.frequency, amplitude=hvsr.amplitude)

# Perform manual window rejection
hvsrpy.window_rejection.manual_window_rejection(
    mhvsr, y_limit=15, plot_frequency_std=False, fig=None, ax=None  #Change y_limit as required in the plot.
)
plt.close("all")

### Parameters for high dim model.

In [6]:
# HVSR Features
# Original frequency and amplitude arrays
freqs = mhvsr.frequency  # original frequency array
amps = mhvsr.mean_curve(distribution="lognormal")  #Taking the lognormal mean curve among the accepted windows.

# Step 1: Trim to 0.3–50 Hz range
mask = (freqs >= 0.3) & (freqs <= 50)
freqs_trimmed = freqs[mask]
amps_trimmed = amps[mask]

# Step 2: Create 35 log-spaced frequencies between 0.3 and 50 Hz
resampled_freqs = np.logspace(np.log10(0.3), np.log10(50), 35)

# Step 3: Interpolate amplitudes linearly
interp_func = interp1d(freqs_trimmed, amps_trimmed, kind='linear', bounds_error=False, fill_value="extrapolate")
resampled_amps = interp_func(resampled_freqs)

X_hvsr = pd.DataFrame([resampled_amps], columns=[f for f in resampled_freqs])

if elevation_in_m!="NA" and elevation_1500m_avg_in_m!="NA":
    dual_ANN = True
    # Topographic Features
    TPI = elevation_in_m - elevation_1500m_avg_in_m
    
    # Assemble into DataFrame
    X_topo = pd.DataFrame([{
        "TPI": TPI,
        "elevation": elevation_in_m
    }])
    
    # Elevation Binning
    elevation_bins = [-500, 0, 500, 1000, 1500, 2000, 2500, 3000]
    elevation_labels = [
        'Elevation_Bin_[-500.0, 0.0)',
        'Elevation_Bin_[0.0, 500.0)',
        'Elevation_Bin_[500.0, 1000.0)',
        'Elevation_Bin_[1000.0, 1500.0)',
        'Elevation_Bin_[1500.0, 2000.0)',
        'Elevation_Bin_[2000.0, 2500.0)',
        'Elevation_Bin_[2500.0, 3000.0)'
    ]
    
    # Bin the elevation into categories
    X_topo = X_topo.copy()
    X_topo['Elevation_Bin'] = pd.cut(X_topo['elevation'], bins=elevation_bins, labels=elevation_labels, right=False)
    
    # One-hot encode the elevation bin
    elevation_dummies = pd.get_dummies(X_topo['Elevation_Bin']).astype(int)
    
    # Concatenate back to original DataFrame
    X_topo = pd.concat([X_topo, elevation_dummies], axis=1)
    
    # Optional: Drop 'Elevation_Bin' column if you only need one-hot encoded version
    X_topo = X_topo.drop(columns=['elevation','Elevation_Bin'])
else:
    dual_ANN = False
    print("Only Single mode ANN model can be used.")

In [7]:
X_hvsr

Unnamed: 0,0.300000,0.348714,0.405339,0.471158,0.547665,0.636596,0.739967,0.860123,0.999791,1.162138,...,12.907246,15.003137,17.439361,20.271181,23.562835,27.388991,31.836442,37.006075,43.015157,50.000000
0,1.70023,2.746539,6.006496,10.087151,7.861836,5.539016,3.996405,2.770711,1.665763,0.798002,...,0.420818,0.531571,0.566392,0.641381,0.707136,0.801803,0.863421,0.983277,1.234071,1.266722


### Load high-dimensional models and Scale Features

In [8]:
# Load Models
single_mode_ann_model = tf.keras.models.load_model("./models/log_ANN_model.keras")
X_hvsr_scaled = np.log(X_hvsr.to_numpy().astype(np.float32)) 

if dual_ANN:
    dual_mode_ann_model = tf.keras.models.load_model("./models/Multi-input_log_ANN.keras")
    topo_scaler = joblib.load("./models/log_model_standard_scaler_metadata.pkl")
    X_topo_scaled = topo_scaler.transform(X_topo.to_numpy().astype(np.float32))

E0000 00:00:1747338922.345180   31089 cuda_executor.cc:1228] INTERNAL: CUDA Runtime error: Failed call to cudaGetRuntimeVersion: Error loading CUDA libraries. GPU will not be used.: Error loading CUDA libraries. GPU will not be used.
W0000 00:00:1747338922.346729   31089 gpu_device.cc:2341] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...


### Predictions

In [9]:
# Always predict with single mode
Vs30_pred_single_ann = single_mode_ann_model.predict(X_hvsr_scaled).flatten()

models = ["Single Mode ANN"]
predictions = [np.exp(Vs30_pred_single_ann)]

# If topo features are available.
if dual_ANN:
    Vs30_pred_dual_ann = dual_mode_ann_model.predict([X_hvsr_scaled, X_topo_scaled]).flatten()
    models.append("Dual mode ANN")
    predictions.append(np.exp(Vs30_pred_dual_ann))

# Create DataFrame
vs30_modelwise_df = pd.DataFrame({
    "Model": np.repeat(models, [len(p) for p in predictions]),
    "Vs30 (m/s)": np.round(np.concatenate(predictions), 0).astype(int)
})

# Show the result
vs30_modelwise_df

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 153ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 159ms/step


Unnamed: 0,Model,Vs30 (m/s)
0,Single Mode ANN,75
1,Dual mode ANN,57
