# A model workflow for LaharML

### LaharML is a machine-learning classifier for the detection of floods and debris flows from seismic signals.

This notebook defines the inputs to run the `extract_from_directory_general_model` module which in depends on some functions in the `laharml` module. The `XX.XXXX.XX.XXX_modelscaler.py` file is used to build the machine learning based model and data scaler that LaharML is based on.

In [None]:
from extract_from_directory_general_model import run_detection

### Set variables

* `stream_filepath` (str, required): Full file path to stream file with seismic data in MSEED format.
* `network` (str, optional): Network code of the station from where the stream comes from. If `None`, this information is extracted from the MSEED headers. Defaults to `None`.
* `station` (str, optional): Station code of the station from where the stream comes from. If `None`, this information is extracted from the MSEED headers. Defaults to `None`.
* `location` (str, optional): Location code of the station from where the stream comes from. If `None`, this information is extracted from the MSEED headers. Defaults to `None`.
* `channel` (str, optional): Channel code of the station from where the stream comes from. It is recommended to use the vertical channel or component of the sensor. If `None`, this information is extracted from the MSEED headers. Defaults to `None`.
* `model_label` (str, required): Detection model name. Choose between `'50Hz01min25'` or `'50Hz05min25'`.
* `minimum_frequency` (float, optional): High-pass filters the signal above this frequency. If `None`, no filter is performed. Defaults to `None`.
* `minimum_duration_to_alert` (float, required): Lahar events below this duration will not be accounted for.


### A quick explanation

The stream files you are working with contain seismic data recorded at specific stations, capturing ground vibrations over time. These files are used to detect lahars—volcanic mudflows—by analyzing patterns in the seismic signals. Each file represents a continuous record from a particular location and channel, providing the raw data needed for automated event detection using the LaharML classifier.

For your reference, in the figure below, FEC4 and FEC1 stations (where data streams in the "Sample data" folder come from) are located in section B of the figure which focuses on the lahar channel known as Ceniza.

<img src="/Users/gustavo/Developer/sample/LaharML/LocationMap.jpeg" alt="LaharML Workflow" width="400"/>

In the snippet below, replace the `stream_filepath` variable with the path of any of the files in the Sample data folder. The 2022 file comes from a larger dataset including records from 2018 to 2022 on which this detector was trained. On the other hand, while 2023 data is new for the classifier, it still performs well for unobserved datasets.

In [None]:
# You set your variables here
stream_filepath = "" # Replace with file path
network = None #
station = None #
location = None #
channel = None #
sensitivity = 3.0172e8
model_label = '50Hz01min25' # or '50Hz05min25' 
minimum_frequency = None
minimum_duration_to_alert = 10

### What's happening?

When you run the code below, the `run_detection` function processes the seismic data file specified in `stream_filepath` using the LaharML classifier. It applies the selected detection model (`model_label`) and uses the provided parameters to filter and analyze the signal. The models are stored in the "Models" folder. The function identifies potential lahar events based on the seismic patterns, only reporting events that exceed the minimum duration set by `minimum_duration_to_alert`. The output will summarize detected events, helping you assess flood and debris flow activity at the chosen station and channel.

The output includes a .csv file whose name starts with the `detections` prefix and a figure with three panes. In the upper pane, you will find the raw waveform for the file you selected. In the middle pane, you will find the raw classifications based on the K-nearest neighbor algorithm. In the lower pane, you will find the post-processed detections the include a routine for noise reduction and interval delineation. Those intervals explain the start and end time of the lahar event and are saved in the .csv output file.

In [None]:
run_detection(
    stream_filepath,
    network,
    station,
    location,
    channel,
    sensitivity,
    model_label,
    minimum_frequency,
    minimum_duration_to_alert
)