# Getting Data via Serial Download

This tutorial demonstrates a common method of acquiring data that is useful for data exploration. This method involves the following:

1. Download one or several miniseed files from a data provider. We will use EarthScope's FDSN service to request files.
2. Read each stream extract metadata.
3. Process the data by removing trends (linear, mean, taper) and applying a bandpass filter. This process normalizes the data for comparison.
4. Visualize the data pre and post processing

## Setup

We will use built-in python packages and obspy. These packages are already included in GeoLab; you will not need to install them. We start the script by importing the packages.

In [2]:
from __future__ import annotations

import pandas as pd
import io
import os
import sys
from dataclasses import dataclass
from pathlib import Path
from typing import Dict, Iterable, List, Optional, Tuple

import numpy as np
import requests
from obspy import UTCDateTime, read as obspy_read
from obspy.clients.fdsn import Client, URL_MAPPINGS
from obspy.clients.fdsn.header import FDSNNoDataException


## IMUSH Data

A listing for miniseed data is available for Mt. St. Helens from IMUSH (
Imaging Magma Under St. Helens). The [web page](https://ds.iris.edu/mda/XD/?starttime=2014-01-01T00%3A00%3A00&endtime=2016-12-31T23%3A59%3A59#XD_2014-01-01_2016-12-31) lists the stations that recorded activity from 2014 to 2016.

The stations that have miniseed data have been saved in the IMUSH.csv file.

**Protip**
> The listing is dynamically generated by JavaScript, which makes scraping the stations we want more complicated. A simple solution is to copy the stations of interest and paste them into a spreadsheet such Google Sheets and save it as a CSV file.

## Getting Stations Data

The IMUSH.csv provides the station names and the start and end times for the recorded data. We can use this information to request the data by reading the CSV file. When we read each row of the CSV file, we need to store the data using a `@dataclass`

In [3]:
@dataclass(frozen=True)
class StationRow:
    station: str
    datacenter: str
    start: UTCDateTime
    end: UTCDateTime
    site: str
    latitude: float
    longitude: float
    elevation_m: float

We will need a function to read the CSV file and put them in a list. The function has three parameters, the first is the path and name of the file, a start date, and an end date. The function uses the `pandas` package to read the file. Pandas treats the rows of the data as a table so we can select all the rows or a specific set.

Note that the function uses the `StationRow` data class and returns a Python list. 

In [4]:
def read_csv(
    csv_path: str | Path,
    *,
    start_row: Optional[int] = None,
    end_row: Optional[int] = None,
) -> List[StationRow]:
    """
    Read a station CSV using pandas and return StationRow dataclass objects.

    Parameters
    ----------
    csv_path : str or Path
        Path to the CSV file.

    start_row : int, optional
        Zero-based index of the first row to read (inclusive).
        If None, starts from the beginning.

    end_row : int, optional
        Zero-based index of the last row to read (exclusive).
        If None, reads through the end of the file.

    Behavior
    --------
    - If start_row and end_row are both None, all rows are read.
    - Rows are selected using df.iloc[start_row:end_row].
    """
    df = pd.read_csv(csv_path)

    # Slice rows (pandas handles None cleanly)
    df_sel = df.iloc[start_row:end_row]

    station_rows: List[StationRow] = []

    for _, r in df_sel.iterrows():
        station_rows.append(
            StationRow(
                station=str(r["Station"]).strip(),
                datacenter=str(r["DataCenter"]).strip(),
                start=UTCDateTime(str(r["Start"])),
                end=UTCDateTime(str(r["End"])) + 86400,  # inclusive end date
                site=str(r["Site"]).strip(),
                latitude=float(r["Latitude"]),
                longitude=float(r["Longitude"]),
                elevation_m=float(r["Elevation"]),
            )
        )

    return station_rows

Let's try out the `read_csv` function and print out the rows.

In [5]:
stations = read_csv("IMUSH.csv", start_row=0, end_row=5)

for s in stations:
    print(s)

StationRow(station='KRES', datacenter='IRISDMC', start=UTCDateTime(2014, 5, 1, 0, 0), end=UTCDateTime(2017, 1, 1, 0, 0), site='KRES', latitude=47.758739, longitude=-122.29097, elevation_m=52.0)
StationRow(station='MA05', datacenter='IRISDMC', start=UTCDateTime(2014, 5, 1, 0, 0), end=UTCDateTime(2017, 1, 1, 0, 0), site='MA05', latitude=46.754669, longitude=-122.226189, elevation_m=488.0)
StationRow(station='MB05', datacenter='IRISDMC', start=UTCDateTime(2014, 5, 1, 0, 0), end=UTCDateTime(2017, 1, 1, 0, 0), site='MB05', latitude=46.620869, longitude=-122.281021, elevation_m=641.0)
StationRow(station='MB07', datacenter='IRISDMC', start=UTCDateTime(2014, 5, 1, 0, 0), end=UTCDateTime(2017, 1, 1, 0, 0), site='MB07', latitude=46.623779, longitude=-122.042389, elevation_m=878.0)
StationRow(station='MC06', datacenter='IRISDMC', start=UTCDateTime(2014, 5, 1, 0, 0), end=UTCDateTime(2017, 1, 1, 0, 0), site='MC06', latitude=46.552021, longitude=-122.157204, elevation_m=770.0)


Note that the index of the first row of a pandas table or dataframe starts at 0. Note the station in the first row. Change the `start_row` parameter to 0 and compare the result.

## Downloading MiniSeed files

Next we will download a three miniseed file and save it in GeoLab. Using `obspy` we'll write a function to download files by a list we provide. We can use `obspy` to request the data from EarthScope's FDSN service.

In [29]:
def download_miniseed(station_rows, *, starttime=None, endtime=None, output_dir="./seismic_data"):
    """
    Download miniseed file from EarthScope's FDSN service.
    
    Parameters:
    -----------
    station_rows : str
        Station code (e.g., 'ANMO')
    start_date : str
        Start date in format 'YYYY-MM-DD' or 'YYYY-MM-DDTHH:MM:SS'
    end_date : str
        End date in format 'YYYY-MM-DD' or 'YYYY-MM-DDTHH:MM:SS'
    output_dir : str, optional
        Directory to save the miniseed file (default: './seismic_data')
    
    Returns:
    --------
    str : Path to the saved miniseed file
    
    Example:
    --------
    >>> download_miniseed('ANMO', 'IU', '2024-01-01', '2024-01-02')
    """
    
    # default values
    network = "XD"
    location = "*"
    channel = "BH?"
    
    # parse list as a tuple
    station_data = [(row.station, row.start, row.end) for row in station_rows]
    
    # Create output directory if it doesn't exist
    os.makedirs(output_dir, exist_ok=True)
    
    # Initialize EarthScope FDSN client
    client = Client('IRIS')  # IRIS is part of EarthScope
    
    # Download waveform data
    for station, start, end in station_data:
        actual_start = starttime if starttime is not None else start
        actual_end = endtime if endtime is not None else end
        starttime=UTCDateTime(actual_start)
        endtime=UTCDateTime(actual_end)
        
        try:
            st = client.get_waveforms(
                network=network,
                station=station,
                location=location,
                channel=channel,
                starttime=starttime,
                endtime=endtime
            )
        except:
            continue
    
        # Create filename
        filename = f"{network}_{station}_{starttime.strftime('%Y%m%d')}_{endtime.strftime('%Y%m%d')}.mseed"
        filepath = os.path.join(output_dir, filename)
        
        # Save to miniseed file
        st.write(filepath, format='MSEED')
        print(f"Successfully saved to: {filepath}")
        
    return filepath

In [30]:
#---debug---#

# import logging

# # Configure logging to show HTTP request/response details
# try:
#     # Python 3
#     import http.client as http_client
# except ImportError:
#     # Python 2
#     import httplib as http_client

# http_client.HTTPConnection.debuglevel = 1
# logging.basicConfig() # you need to initialize logging, otherwise you will not see anything from requests
# logging.getLogger().setLevel(logging.DEBUG)
# requests_log = logging.getLogger("requests.packages.urllib3")
# requests_log.setLevel(logging.DEBUG)
# requests_log.propagate = True

stations = read_csv("IMUSH.csv", start_row=1, end_row=6)

filepath = download_miniseed(stations, starttime="2015-06-01T00:00:00", endtime="2015-06-30T00:00:00")

http_client.HTTPConnection.debuglevel = 0



Successfully saved to: ./seismic_data/XD_MA05_20150601_20150630.mseed
Successfully saved to: ./seismic_data/XD_MB05_20150601_20150630.mseed
Successfully saved to: ./seismic_data/XD_MB07_20150601_20150630.mseed
Successfully saved to: ./seismic_data/XD_MC06_20150601_20150630.mseed
Successfully saved to: ./seismic_data/XD_MC08_20150601_20150630.mseed
