<figure>
  <IMG SRC="logoeost.png" WIDTH=100 ALIGN="right">
</figure>

# Classification of Seismic Sources - Random Forest Classifier


Based on, and with the courtesy of, the "*IA in geosciences" practical by C. Hibert / 28 January 2020*.

Adapted for the Skience2024 workshop by Thomas Lecocq.

---------

In this tutorial you will see how to implement a machine learning algorithm for a discrimination/classification problem using the Python function library `sickit-learn`. This function library is very comprehensive and one of the most widely used in the world for everything to do with Machine Learning. 

You will be working on seismological data, with the aim of achieving the best rate of correct identification between any number of source: signals generated by volcano-tectonic earthquakes, other type of volcano-generated signals, as well as noise samples. Having an algorithm that can make this discrimination on continuous data will make it possible to reconstruct chronicles of events on a volcano. These chronicles will potentially provide a better understanding of the volcano dynamics.

## Extract Features

The dataset we use here includes a (very) small number of labelled "events" recorded by the a temporary deployment on Mount Merapi, Indonesa.

This notebook will compute 58 attributes for each seismic traces.

In [None]:
from IPython.display import display, HTML
display(HTML("<style>.container { width:90% !important; }; .cell {width:100%} ; .code_cell{width:100%}</style>"))

In [None]:
%matplotlib inline
import matplotlib
import os
import glob
import datetime
import traceback
from obspy.core import read, UTCDateTime
from obspy import UTCDateTime, Stream, read
from obspy.geodetics.base import gps2dist_azimuth
from obspy.core.util import AttribDict
import matplotlib
import matplotlib as mpl
new_style = {'grid': False}
mpl.rc('axes', **new_style)
# mpl.rcParams['font.family'] = 'Helvetica'
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
sns.set_style("whitegrid")
sns.set_palette("dark")
# import tqdm

from obspy.signal.cross_correlation import xcorr_pick_correction
import warnings
from collections import defaultdict
from obspy.signal.cross_correlation import correlate,xcorr_max
import sys
sys.path.append(".")
from ComputeAttributesV_MAT import calculate_all_attributes, get_attribute_names


In [None]:
station = "GRW0"
channel = "BHZ"

outfolder = os.path.join("attributes", "%s.%s"%(station, channel))
os.makedirs(outfolder, exist_ok=True)

attributes = {}
for typ in ["VTB", "MP", "gugu_long", "gugu_short", "NN", "ND"]:
    outfile = os.path.join(outfolder, "%s.npy"%typ)
    if os.path.isfile(outfile):
        attributes[typ] = np.load(outfile)
        continue
    st = read("events/%s/*.mseed" % typ).select(station=station, channel=channel)
    attributes[typ] = []
    for tr in st:
        attributes[typ].append(calculate_all_attributes(tr.data, st[0].stats.sampling_rate, 0)[0])
    attributes[typ] = np.asarray(attributes[typ])
    np.save(outfile, attributes[typ])


In [None]:
names = get_attribute_names()
for i in range(attributes["VTB"].shape[1]):
    previous = 0
    for typ in ["VTB", "MP", "gugu_long", "gugu_short", "NN", "ND"]:
        _ = attributes[typ][:,i]
        x = np.arange(len(_)) + previous
        plt.scatter(x, _, label=typ)
        previous += len(x)
    plt.legend()
    plt.title(names[i])
    plt.xlabel("Event ID")
    plt.ylabel("Attribute value")
    plt.savefig(os.path.join(outfolder, "attr_%02i.png"%i))
    plt.close()