# Estimation of the Vs30 (m/s) with the mHVSR using high-dimensional models.

## Information
This file is a free software as a part of: 
>{citation}.

Copyright (C) 2025 Kushal Sharma Wagle (wagleyk@vt.edu).


## Citation

If you use _hvsrpy_ in your research or consulting, we ask you please cite the following:

>Joseph Vantassel. (2020). jpvantassel/hvsrpy: latest (Concept). Zenodo.
[http://doi.org/10.5281/zenodo.3666956](http://doi.org/10.5281/zenodo.3666956)


>Vantassel, J.P. (2025). hvsrpy: An Open‐Source Python Package for Microtremor and Earthquake Horizontal‐to‐Vertical Spectral Ratio Processing. Seismological Research Letters 2025. [https://doi.org/10.1785/0220240395].

_Note: For software, version specific citations should be preferred to
general concept citations, such as that listed above. To generate a version
specific citation for hvsrpy, please use the citation tool on the hvsrpy
[archive](http://doi.org/10.5281/zenodo.3666956)._

## About this notebook

This notebook illustrates how mHVSR can be leveraged to estimate the Vs30 for the site. The processing has been done following the SESAME (2004) guidelines.
If you use this notebook, please also cite SESAME (2004) to recognize their original work.

> SESAME (2004). Guidelines for the Implementation of the H/V Spectral Ratio Technique on Ambient Vibrations
> Measurements, Processing, and Interpretation. European Commission - Research General Directorate, 62,
> European Commission - Research General Directorate.

To use this notebook, you need a three-component microtremor recording with at least 30 minutes of recording with a broadband sensor of at least 100Hz sampling rate.

This notebook predicts Vs30 with two separate models: Single mode ANN and Dual Mode ANN. Foe Single mode ANN, just the microtremor recording is sufficient (as mentioned above). However, for the Dual mode ANN, you need topographic features: elevation and average elevation around 1500m diameter of the station. These must be computed using 1 arcsec Digital Elevation Model (DEM) to remain consistent with the model development. If you don't have these data, you can use another notebook and use the model "Single mode ANN" with just the microtremor recording.

Steps to use:
1. Simply, download the zipped folder with the name "Models Public", and do not change the files structure in it.
2. Input all the required parameters, input recording (full or relative path is fine) and topographic features in the input section.
3. Restart Kernel and run all cells.

In [None]:
import numpy as np
import pandas as pd
import os
import hvsrpy
from hvsrpy import sesame
import obspy
import tensorflow as tf
import matplotlib
matplotlib.use("TkAgg")
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d
import joblib

plt.style.use(hvsrpy.postprocessing.HVSRPY_MPL_STYLE)

### Data Input and Recording Check

In [None]:
# Data Input
input_recording = "CI.FUR-Rec288-Sen450.mseed"  #Actual elevation = -44m and elevation_1500m_avg = -43.80603662m.
# input_recording = "AR.STN01.A1.2019.006"  #Actual elevation = 2246m and elevation_1500m_avg = 2237.903656m.
# input_recording = "UT.STN09_20130320_020000.miniseed" #Actual elevation = 35m and elevation_1500m_avg = 35.33800078m.
elevation = 2246  #in meters. Enter "NA" if unknown.
elevation_1500m_avg = 2237.903656  # in meters. Enter "NA" if unknown.

stream = obspy.read(input_recording)

# Check trace count
if len(stream) != 3:
    print("Recording should have exactly three traces.")
else:
    # Check sampling rates
    sampling_rates = [tr.stats.sampling_rate for tr in stream]
    if any(sr < 100 for sr in sampling_rates):
        print("At least 100 Hz sampling rate is required. Found rates:", sampling_rates)
    else:
        # Find latest start time and earliest end time
        start_times = [tr.stats.starttime for tr in stream]
        end_times = [tr.stats.endtime for tr in stream]

        common_start = max(start_times)
        common_end = min(end_times)

        # Check duration
        common_duration = common_end - common_start
        if common_duration <= 0:
            print("No overlapping time window between traces.")
        else:
            # Trim all traces to common time window
            stream.trim(starttime=common_start, endtime=common_end)

            # Save to MiniSEED file
            stream.write(f"updated_{input_recording}", format="MSEED")

            # Check if duration exceeds 30 minutes
            if common_duration >= 1800:  # 1800 seconds = 30 minutes
                print(f"Trimmed duration is {common_duration/60:.2f} minutes.")
            else:
                print("Trimmed duration is less than 30 minutes.")

In [None]:
# HVSR Preprocessing Settings
preprocessing_settings = hvsrpy.settings.HvsrPreProcessingSettings()
preprocessing_settings.detrend = "linear"
significant_cycles = 15  # require 15 significant cycles.
time_windows = 35  # require 35 time windows.
duration_in_seconds = common_duration  # window length (s)
windowlength_in_seconds = duration_in_seconds / time_windows
preprocessing_settings.window_length_in_seconds = windowlength_in_seconds

print("Preprocessing Summary")
print("-"*60)
preprocessing_settings.psummary()

# HVSR Processing Settings
processing_settings = hvsrpy.settings.HvsrTraditionalProcessingSettings()
processing_settings.window_type_and_width = ("tukey", 0.2)
processing_settings.smoothing=dict(operator="konno_and_ohmachi",
                                   bandwidth=40,
                                   center_frequencies_in_hz=np.geomspace(0.05, 50, 256))
processing_settings.method_to_combine_horizontals = "geometric_mean"
processing_settings.handle_dissimilar_time_steps_by = "frequency_domain_resampling"

desired_frequency_vector_in_hz = np.geomspace(0.05, 50, 256)
minimum_frequency = significant_cycles / windowlength_in_seconds
fids = desired_frequency_vector_in_hz > minimum_frequency
frequency_resampling_in_hz = desired_frequency_vector_in_hz[fids]
processing_settings.smoothing["center_frequencies_in_hz"] = frequency_resampling_in_hz

print("Processing Summary")
print("-"*60)
processing_settings.psummary()

### HVSR Processing and Manual Window Rejection

In [None]:
# Compute HVSR

srecords = hvsrpy.read([f"updated_{input_recording}"])
srecords = hvsrpy.preprocess(srecords, preprocessing_settings)
hvsr = hvsrpy.process(srecords, processing_settings)

In [None]:
# Create HvsrTraditional object
mhvsr = hvsrpy.HvsrTraditional(frequency=hvsr.frequency, amplitude=hvsr.amplitude)

# Perform manual window rejection
hvsrpy.window_rejection.manual_window_rejection(
    mhvsr, y_limit=15, plot_frequency_std=False, fig=None, ax=None  #Change y_limit as required in the plot.
)
plt.close("all")

### Parameters for high dim model.

In [None]:
# HVSR Features
# Original frequency and amplitude arrays
freqs = mhvsr.frequency  # original frequency array
amps = mhvsr.mean_curve(distribution="lognormal")  #Taking the lognormal mean curve among the accepted windows.

# Step 1: Trim to 0.3–50 Hz range
mask = (freqs >= 0.3) & (freqs <= 50)
freqs_trimmed = freqs[mask]
amps_trimmed = amps[mask]

# Step 2: Create 35 log-spaced frequencies between 0.3 and 50 Hz
resampled_freqs = np.logspace(np.log10(0.3), np.log10(50), 35)

# Step 3: Interpolate amplitudes linearly
interp_func = interp1d(freqs_trimmed, amps_trimmed, kind='linear', bounds_error=False, fill_value="extrapolate")
resampled_amps = interp_func(resampled_freqs)

X_hvsr = pd.DataFrame([resampled_amps], columns=[f for f in resampled_freqs])

if elevation!="NA" and elevation_1500m_avg!="NA":
    dual_ANN = True
    # Topographic Features
    elevation = elevation
    TPI = elevation - elevation_1500m_avg
    
    # Assemble into DataFrame
    X_topo = pd.DataFrame([{
        "TPI": TPI,
        "elevation": elevation
    }])
    
    # Elevation Binning
    elevation_bins = [-500, 0, 500, 1000, 1500, 2000, 2500, 3000]
    elevation_labels = [
        'Elevation_Bin_[-500.0, 0.0)',
        'Elevation_Bin_[0.0, 500.0)',
        'Elevation_Bin_[500.0, 1000.0)',
        'Elevation_Bin_[1000.0, 1500.0)',
        'Elevation_Bin_[1500.0, 2000.0)',
        'Elevation_Bin_[2000.0, 2500.0)',
        'Elevation_Bin_[2500.0, 3000.0)'
    ]
    
    # Bin the elevation into categories
    X_topo = X_topo.copy()
    X_topo['Elevation_Bin'] = pd.cut(X_topo['elevation'], bins=elevation_bins, labels=elevation_labels, right=False)
    
    # One-hot encode the elevation bin
    elevation_dummies = pd.get_dummies(X_topo['Elevation_Bin']).astype(int)
    
    # Concatenate back to original DataFrame
    X_topo = pd.concat([X_topo, elevation_dummies], axis=1)
    
    # Optional: Drop 'Elevation_Bin' column if you only need one-hot encoded version
    X_topo = X_topo.drop(columns=['elevation','Elevation_Bin'])
else:
    dual_ANN = False
    print("Only Single mode ANN model can be used.")

In [None]:
X_hvsr

### Load high-dimensional models and Scale Features

In [None]:
# Load Models

single_mode_ann_model = tf.keras.models.load_model("../Models Public/Models/log_ANN_model.keras")
X_hvsr_scaled = np.log(X_hvsr.to_numpy().astype(np.float32)) 

if dual_ANN:
    dual_mode_ann_model = tf.keras.models.load_model("../Models Public/Models/Multi-input_log_ANN.keras")
    topo_scaler = joblib.load("../Models Public/Models/log_model_standard_scaler_metadata.pkl")
    X_topo_scaled = topo_scaler.transform(X_topo.to_numpy().astype(np.float32))

### Predictions

In [None]:
# Always predict with single mode
Vs30_pred_single_ann = single_mode_ann_model.predict(X_hvsr_scaled).flatten()

models = ["Single Mode ANN"]
predictions = [np.exp(Vs30_pred_single_ann)]

# If topo features are available.
if dual_ANN:
    Vs30_pred_dual_ann = dual_mode_ann_model.predict([X_hvsr_scaled, X_topo_scaled]).flatten()
    models.append("Dual mode ANN")
    predictions.append(np.exp(Vs30_pred_dual_ann))

# Create DataFrame
vs30_modelwise_df = pd.DataFrame({
    "Model": np.repeat(models, [len(p) for p in predictions]),
    "Vs30 (m/s)": np.round(np.concatenate(predictions), 0).astype(int)
})

# Show the result
vs30_modelwise_df