# MSNoise 101 - Fast track to Cross-Correlation Functions from continuous seismic data

# Setup and Imports

This cell sets up the Python environment for MSNoise analysis by:
- Enabling inline matplotlib plotting
- Importing required libraries for data processing and visualization
- Configuring plot styling to use "ggplot" theme
- Setting up pandas datetime handling
- Importing all functions from MSNoise API

Key imports:
- obspy: For seismic data processing
- pandas & numpy: For data manipulation
- matplotlib: For visualization
- msnoise.api: Core MSNoise functionality

In [None]:
%matplotlib inline
import datetime
import matplotlib.pyplot as plt
import glob
import os
from obspy import read, UTCDateTime, read_inventory
from obspy.signal import PPSD
import warnings
import pandas as pd
import numpy as np
import matplotlib.gridspec as gridspec
from matplotlib.dates import DateFormatter
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()
plt.style.use("ggplot")

from msnoise.api import *

In [None]:
# Path to where you have copied the DATA/ folder (that contains the SDS, RESP etc folders)
DATA_PATH = "DATA"

# Initialize MSNoise Database

Initializes the MSNoise database with:
- Command: `msnoise db init --tech 1`
- Parameters:
  - `--tech 1`: Sets up the database with технология = 1 configuration

This is typically the first step in setting up a new MSNoise project.

In [None]:
! msnoise db init --tech 1

# Database Connection

Establishes connection to the MSNoise database using the `connect()` function.
This connection will be used throughout the notebook for database operations.

In [None]:
db = connect()

# Configure Processing Filters

Sets up two different processing filters for the cross-correlation analysis:

Filter 1 (Broadband):
- Frequency band: 0.05-30.0 Hz
- MWCS parameters:
  - Window length: 12
  - Step size: 4

Filter 2 (Specific band):
- Frequency band: 4-8 Hz
- MWCS parameters:
  - Window length: 2
  - Step size: 1

Both filters are set to be used (used=1)

In [None]:
update_filter(db, ref=1, low=0.05, mwcs_low=0.05, high=30.0, mwcs_high=30.0, mwcs_wlen=12, mwcs_step=4, used=1)
update_filter(db, ref=2, low=4, mwcs_low=4, high=8.0, mwcs_high=8.0, mwcs_wlen=2, mwcs_step=1, used=1)

# MSNoise Configuration Settings

Updates core MSNoise configuration parameters:
- Data paths:
  - `data_folder`: Location of seismic data in SDS format
  - `response_path`: Location of instrument response files
- Processing parameters:
  - `maxlag`: Maximum lag time (60 seconds)
  - `components_to_compute`: Components for cross-correlation (ZZ,EE,NN)
  - `components_to_compute_single_station`: Additional singletation c (ful cl tensor)omponents
  - `cc_sampling_rate`: Sampling rate for cross-correlations (200 Hz)
  - `preprocess_lowpass`: Lowpass filter cutoff (99 Hz)

In [None]:
update_config(db, name="data_folder", value=os.path.join(DATA_PATH, "SDS"))
update_config(db, name="response_path", value=os.path.join(DATA_PATH, "RESP"))

update_config(db, name="maxlag", value="60")
update_config(db, name="components_to_compute_single_station", value="ZZ,EE,NN,EN,EZ,EN")
update_config(db, name="components_to_compute", value="ZZ,EE,NN")
update_config(db, name="cc_sampling_rate", value="200")
update_config(db, name="preprocess_lowpass", value="99")

# Display MSNoise Configuration

Runs `msnoise info` to display:
- Current configuration settings
- Database status
- Processing parameters
- Station information

This command is useful for verifying the setup and current state of the MSNoise environment.

In [None]:
! msnoise info

# Populate Database

Executes `msnoise populate` with verbose output (-v flag) to:
- Scan for available data files
- Add station information to the database
- Set up initial database structure
- Prepare for data processing

This is a key step that must be run after initial setup and whenever new data is added.

In [None]:
! msnoise -v populate

# List All Stations

Retrieves and displays all stations in the database:
- Uses `get_stations()` function with all=True
- Returns raw format output
- Prints network code (net) and station name (sta) for each station

Parameters:
- `all=True`: Include all stations, even if not currently active
- `format="raw"`: Return raw database objects

In [None]:
for station in get_stations(db, all=True, format="raw"):
    print(station.net, station.sta)

# Scan Data Archive

Initializes the archive scanning process:
- Command: `msnoise scan_archive --init`
- Parameters:
  - `-v`: Verbose output
  - `-t 4`: Uses 4 threads for parallel processing
  
This step catalogs all available seismic data files in the configured data directory.

In [None]:
! msnoise -v -t 4 scan_archive --init

# Update Location and Channel Codes

Updates the database with current location and channel information:
- Command: `msnoise db update_loc_chan`
- Synchronizes database with actual data structure
- Updates any changed station metadata

In [None]:
! msnoise db update_loc_chan

# Data Availability Plot Setup and Execution

Imports and runs the data availability plotting function:
1. Import the plotting function from MSNoise
2. Create plot for EH channels (plot_DA(chan="EH?"))

Parameters:
- `chan="EH?"`: Filter to show only EH channels (where ? matches any character)

This visualization helps identify data gaps and coverage periods.

In [None]:
from msnoise.plots.data_availability import main as plot_DA

In [None]:
plot_DA(chan="EH?")

# Initialize Processing Jobs

Creates new processing jobs:
- Command: `msnoise new_jobs --init`
- Sets up the initial batch of cross-correlation jobs
- Prepares system for processing

This command should be run after data has been added and before starting computations.

In [None]:
! msnoise -v new_jobs --init

# Display Job Status

Shows current job statistics:
- Command: `msnoise info -j`
- Displays:
  - Total number of jobs
  - Jobs completed/remaining
  - Job types and their status

In [None]:
! msnoise info -j

# Compute Cross-Correlations

Runs the cross-correlation computation:
- Command: `msnoise cc compute_cc`
Parameters:
- `-t 5`: Use 5 threads
- `-delay the start of the next thread by 5 secondsy step
- `-v`: Verbose output

This is the main processing step that computes cross-correlations between stati
! NOTE: this will NOT output in real time, it's best to run it in the consoleon pairs.

In [None]:
! msnoise -t 5 -d 5 -v cc compute_cc

# Prepare Data for Analysis

Sets up the analysis environment and loads results:
1. Build date list between start and end dates
2. Load parameters from database
3. Get time axis for CCF plotting
4. Load cross-correlation results for:
   - Station pair: 8N.HB04.00 - 8N.HB04.00
   - Filter ID: 1
   - Component: ZZ
   - Format: xarray

This cell prepares data for visualization and analysis.

In [None]:
# Obtain a list of dates between ``start_date`` and ``enddate``
start, end, datelist = build_movstack_datelist(db)

# Get the list of parameters from the DB:
params = get_params(db)

# Get the time axis for plotting the CCF:
taxis = get_t_axis(db)

filter_id = 1

# Get the results for two station, filter id=1, ZZ component, mov_stack=("1d","1d") and the results as a 2D array:
ccfs = get_results_all(db, "8N.HB01.00", "8N.HB04.00", filter_id, "ZZ", datelist, format="xarray")


# Cross-Correlation Function (CCF) Visualization Series

A sequence of plots showing the CCF data in different views:
1. Full CCF plot using xarray plotting (ccfs.CCF.plot)
2. Zoomed CCF view (±20 seconds in the coda)
3. 12-hour resampled mean of zoomed CCF
4. 12-hour resampled median of zoomed CCF

Parameters:
- `robust=True`: Uses robust scaling for better visualization
- Time window: Controlled by `zoom` variable (set to ±20 seconds)
- Resampling period: 12 hours

These visualizations help examine the stability and quality of the cross-correlations at different temporal scales.

In [None]:
ccfs.CCF.plot(robust=True)

In [None]:
zoom = 20 # +-seconds in the coda

In [None]:
ccfs.CCF.loc[:,-zoom:zoom].plot(robust=True)

In [None]:
ccfs.CCF.loc[:,-zoom:zoom].resample(times='12h').mean().plot(robust=True)

In [None]:
ccfs.CCF.loc[:,-zoom:zoom].resample(times='12h').median().plot(robust=True)

# Compare Mean and Median CCFs

Creates a comparison plot of the mean and median CCFs:
- Figure size: 12x5 inches
- Shows both statistics on same axes
- Includes legend for identification
- Uses the same zoom window as previous plots

This comparison helps identify potential data quality issues and the stability of the correlations.

In [None]:
fig, ax = plt.subplots(1,1, figsize=(12,5))
print(ax)
ccfs.CCF.loc[:,-zoom:zoom].mean(axis=0).plot(ax=ax,label="mean")
ccfs.CCF.loc[:,-zoom:zoom].median(axis=0).plot(ax=ax, label="median")
plt.legend()


# Reference Stack Computation and Distance Plot

Two-step process to create and visualize reference stacks:
1. Compute reference stack:
   - Syncs configuration
   - Runs stack computation with -r flag

2. Create distance plot:
   - Uses MSNoise distance plotting function
   - Parameters:
     - filterid=1
     - components="ZZ"
     - show=False (for custom display)

This visualization helps understand the spatial relationship between stations and their correlations.

In [None]:
! msnoise config sync
! msnoise cc stack -r

In [None]:
from msnoise.plots.distance import main as plot_distance
plot_distance(filterid=1, components="ZZ", show=False)
# plt.xlim(-20,20)
plt.show()

# Moving Stack Configuration and Processing

Series of commands to set up and process moving stacks:
1. Reset stack jobs: `msnoise reset STACK`
2. Configure moving stack parameters:
   - Stack configurations: ('1d','1d'), ('2d','1d')
3. Process moving stacks:
   - Uses 4 threads (-t 4)
4. Check job status

This sequence creates shorter-term averages for monitoring temporal changes in the correlations.

In [None]:
! msnoise reset STACK
! msnoise config set mov_stack=(('1d','1d'),('2d','1d'))


In [None]:
! msnoise -t 4 cc stack -m

In [None]:
! msnoise info -j