# Accessing Data from EarthScope Web Services

![](images/Web_Services_Data_Flow.png)


## Getting Seismic Data from SAGE Web Services

The fdsnws-dataselect service provides access to time series data for specified channels and time ranges. Dataselect implements the [FDSN web service specification](https://www.fdsn.org/webservices/).

Data queries use SEED time series identifiers (network, station, location & channel) in addition to time ranges. Data are returned in miniSEED , SAC zip , and GeoCSV format.

To create a request the Dataselect API takes these parameters at a minimum:

| parameters | examples | discussion | default |type |
| ---------- | -------- | ---------- | ------- |-----|
| start[time] |	2010-02-27T06:30:00	| Specifies the desired start-time for miniSEED data | | day/time |
| end[time]	| 2010-02-27T10:30:00 | Specify the end-time for the miniSEED data | | day/time | 
|net[work] | IU | Select one or more network codes. Accepts wildcards and lists. Can be SEED codes or data center defined codes. | any | string |
| sta[tion] | ANMO | Select one or more SEED station codes. Accepts wildcards and lists. | any| string |
|loc[ation] |00 | Select one or more SEED location identifier. Accepts wildcards and lists. Use -- for “Blank” location IDs (ID’s containing 2 spaces). | any | string |
| cha[nnel] | BHZ | Select one or more SEED channel codes. Accepts wildcards and lists. | any | string |

To download a file, we can use the `requests` package send the HTTP request to the dataselect web service. As discussed in the previous section, request must include a authorization token which can be acquired with the `get_token` function.

The `download_data` function requires the query parameters required by the FDSN Web Service specification and where to write the data. The function does several things. First, it requests a authorization token. Next, it creates a file name for the data. Finally, it makes the request to dataselect and writes the data to a file to a directory.

In [12]:
import requests, os
from pathlib import Path
from datetime import datetime
from earthscope_sdk import EarthScopeClient

# SAGE archive
URL = "http://service.iris.edu/fdsnws/dataselect/1/query?"

# function to get authorization token 
def get_token(token_path='./'):
    
    # refresh the token if it has expired
    client.ctx.auth_flow.refresh_if_necessary()

    token = client.ctx.auth_flow.access_token
    
    return token

def download_data(params, data_directory):

    # get authorization Bearer token
    token = get_token()

    # get year and day from string start time
    start_date = datetime.strptime(params['start'], '%Y-%m-%dT%H:%M:%S')
    year = start_date.year
    day = start_date.day
    
    
    # file name format: STATION.NETWORK.YEAR.DAYOFYEAR
    file_name = ".".join([params["sta"], params["net"],params['loc'],params['cha'], str(year), "{:03d}".format(day),'mseed'])
    
    
    r = requests.get(URL, params=params, headers={"authorization": f"Bearer {token}"}, stream=True)
    if r.status_code == requests.codes.ok:
        # save the file
        with open(Path(Path(data_directory) / file_name), 'wb') as f:
            for data in r:
                f.write(data)
    else:
        #problem occured
        print(f"failure: {r.status_code}, {r.reason}")
        

# create client to get token
client = EarthScopeClient()

# create directory for data
data_directory = "./miniseed_data"
os.makedirs(data_directory, exist_ok=True)

# parameters specifying the miniSEED file
params = {"net" : 'IU',
          "sta" : 'ANMO',
          "loc" : '00',
          "cha" : 'BHZ',
          "start": '2010-02-27T06:30:00',
          "end": '2010-02-27T10:30:00'}

download_data(params, data_directory)


## Getting Geodetic Data from GAGE Web Services

The GAGE archive holds many types of data ranging from GPS/GNSS data to borehole strain data. We will focus on GPS/GNSS data. Each type of data has API interfaces specific to the data. Unlike dataselect, the API calls return information about data or processed data. The collected data is distributed by a file server and can be programatically downloaded if you know the URL to the file.

In this example, we will download GNSS data in RINEX. GAGE data is located on a file server and data cab be downloaded with a properly formatted URL. The script downloads the stations by providing the parameters that make up the URL to the data. 

The GAGE base URL for gnss data in RINEX is `https://gage-data.earthscope.org/archive/gnss/rinex/obs/`. 

Files are organized by year and the day of the year, e.g., `/2025/001/`. File names use this pattern: 

| station | day of year | 0. | two digit year | o.Z or d.Z |
|---------|-------------|----|----------------|-----|
| p034 | 001 |0. | 25 | d.Z |
| p034 | 001 |0. | 25 | o.Z |

The complete URL for this RINEX file:

`https://gage-data.earthscope.org/archive/gnss/rinex/obs/2025/001/p0340010.25d.Z`

> Note: files ending with `d.Z` are [hatanaka compressed files](https://www.unavco.org/data/gps-gnss/hatanaka/hatanaka.html) and files ending with `o.Z` are not hatanaka compressed. Hatanaka compressed files are much smaller but require software to read the data.

The same method for downloading SAGE data can be used to download GAGE data once URL is properly constructed.


In [32]:
import requests, os
from pathlib import Path
from earthscope_sdk import EarthScopeClient

client = EarthScopeClient()

BASE_URL= 'https://gage-data.earthscope.org/archive/gnss/rinex/obs/'

# function to get authorization token 
def get_token(token_path='./'):
    
    # refresh the token if it has expired
    client.ctx.auth_flow.refresh_if_necessary()

    token = client.ctx.auth_flow.access_token
    
    return token

# function to download data from GAGE archive
def download_file(url, data_directory):
    
    # get authorization Bearer token
    token = get_token()

    # the pathlib package (https://docs.python.org/3/library/pathlib.html#accessing-individual-parts) 
    # supports extracting the file name from the end of a path
    file_name = Path(url).name
    
    # request a file and provide the token in the Authorization header
    r = requests.get(url, headers={"authorization": f"Bearer {token}"}, stream=True)
    if r.status_code == requests.codes.ok:
        # save the file
        with open(Path(Path(data_directory) / file_name), 'wb') as f:
            for data in r:
                f.write(data)
    else:
        #problem occured
        print(f"failure: {r.status_code}, {r.reason}")

# function to creat URL to download data
def create_url(year, day, station, compression):
    # using Python string formatting and slicing
    doy = '%03d' % (day) # converts day to a three character zero padded string , '001'
    two_digit_year = str(year)[2:] # converts integer to string and slices the last characters

    # using the Python join method to concatenate an array or list of strings
    file_path = '/'.join([str(year), doy]) # integer year converted to string for string join
    file_name = ''.join(['/', station, doy, '0.', two_digit_year, compression])
    url = ''.join([BASE_URL,file_path,file_name])

    return url

# create a directory for rinex data
directory_path = "./rinex_data"
os.makedirs(directory_path, exist_ok=True)

# data requested from station p034 on January 1, 2025 hatanaka compressed
year = 2025
day = 1
station = 'p034'
compression = 'd.Z'

# download the RINEX file
url = create_url(year, day, station, compression)
download_file(url, directory_path)

In this example, we've added a function to create a URL to the data. While this can be done more succintly in a single line, the example demonstrates how to format parameters using Python string functions and how to join strings to form a URL.

More succint code would form the URL would be to use string formatting and add the following code to `download_file` function along with the required parameters. However, this is less explicit.

```
doy = '%03d'.format(day)
two_digit_year = str(year)[2:]
url='https://gage-data.earthscope.org/archive/gnss/rinex/obs/{}/{}/{}{}.{}d.Z'.format(year,doy,station,doy,two_digit_year)
```

## Data Access to AWS S3

![](images/cloud_native_data_access.png)

Object storage in the cloud is a cost effective way to hold and distribute large collections of data. Objects consist of the data, metadata, and a unique identifier. They are accessed through an application programming interface or API. EarthScope uses Amazon Web Services' (AWS) Simple Storage Service or S3 to store and distribute seismic and geodetic data.

AWS S3 supports streaming data directly into memory. This is a major advantage when analyzing large amounts of data because writing and reading data to and from a drive consumes the majority of time when performing an analysis. When data is streamed directly into memory, it is immediately available for processing.

### Buckets and Keys

Objects in S3 are stored in containers called `buckets`. Each object is identified by unique object identifier, or `keys`. Objects are addressed by a combination of the web service endpoint, a bucket name, and a key. Unlike a hierarchical file system on your computer, S3 doesn't have directories, instead it has prefixes which act as filters that logically groups data. Consider the following example, we can decipher the key:

> s3:ncedc-pds/continuous_waveforms/BK/2022/2022.231/MERC.BK.HNZ.00.D.2022.231

- s3 - service name
- ncedc-pds - bucket name
- continuous_waveforms - prefix
- BK - (prefix) seismic network name 
- 2022 - (prefix) year 
- 2022.231 - (prefix) year and day of year
- MERC.BK.HNZ.00.D.2022.231 - (key) station.network.channel.location.year.day of year

Like a web service file URL, the object key is used to request the data.

### S3 Buckets with Public Read Access

S3 buckets can be configured for public read access, you can access objects without providing credentials. The [`boto3`](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html) Python package provides libraries for working with AWS services, including S3. Boto3 provides two methods for interacting with AWS services. The `client` method is a low level and fine-grained interface that closely follows the AWS API for a service. The 'resource` method is a high level interface that wraps the `client` interface. AWS stopped development on the resource interface in `boto3` in 2023, for this reason we will use the `client` interface when working with S3 resources.

The following example reads a miniSEED file from the Northern California Earthquake Data Center (NCEDC). The trace data is for the 2014 Napa earthquake. GeoLab's default environment includes both `boto3` and `obspy` packages and we can import them without installation. We establish the connection to S3 by creating a client that specifies that requests are unsigned. This means that the S3 bucket allows public access and does not require credentials. The client calls the `get_object` method with the bucket name and key for the miniSEED object. 

In [19]:
import boto3
from botocore import UNSIGNED
from botocore.config import Config
# from io import BytesIO
import io
from obspy import read

s3 = boto3.client('s3', config = Config(signature_version = UNSIGNED), region_name='us-west-2')

BUCKET_NAME = 'ncedc-pds'
KEY = 'continuous_waveforms/BK/2014/2014.236/PACP.BK.HHN.00.D.2014.236'

response = s3.get_object(Bucket=BUCKET_NAME, Key=KEY)
data_stream = io.BytesIO(response['Body'].read())

# Parse with ObsPy
st = read(data_stream)

# Print the ObsPy Streams
print(st)

1 Trace(s) in Stream:
BK.PACP.00.HHN | 2014-08-24T00:00:00.008393Z - 2014-08-24T23:59:59.998393Z | 100.0 Hz, 8640000 samples


### AWS Temporary Credentials (to be implemented)

Like EarthScope web services, you will need an EarthScope login to request and authorization token. The token can be used to request temporary AWS credentials. AWS services require credentials to interact with AWS S3, or any other AWS service. Requests are cryptographically signed using AWS credentials (access key ID, secret access key, and optionally a session token). In the near future, EarthScope will issue temporary credentials that will allow you to access SAGE and GAGE data.

