Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate an ASDFStackStore from a S3-hosted NumpyStackStore #19

Closed
mdenolle opened this issue Oct 17, 2023 · 3 comments
Closed

Generate an ASDFStackStore from a S3-hosted NumpyStackStore #19

mdenolle opened this issue Oct 17, 2023 · 3 comments
Assignees

Comments

@mdenolle
Copy link
Contributor

Write a script/function to download S3-hosted numpy stack store, convert locally on users' end to an ASDFstack store

@carlosgjs
Copy link
Collaborator

carlosgjs commented Oct 17, 2023

@mdenolle here's a small script that reads from s3/numpy and writes to our ASDFStackStore:

Feb 22, 2024: Updated by Yiyu to reflect the new noisepy.seis.io package when doing the import

import os
from noisepy.seis.io.asdfstore import ASDFStackStore
from noisepy.seis.io.numpystore import NumpyStackStore

stack_data_path = "s3://scoped-noise/scedc_CI_2022_stack/"
S3_STORAGE_OPTIONS = {"s3": {"anon": False}}
stack_store = NumpyStackStore(stack_data_path, storage_options=S3_STORAGE_OPTIONS)

# Get list of station pairs (~47k pairs)
pairs = stack_store.get_station_pairs()
# Get the first timespan available for the first pair
ts = stack_store.get_timespans(*pairs[0])[0]
print(f"Timespan: {ts}")

# Read some stacks (10?) from S3/numpy
stacks_10 = stack_store.read_bulk(ts, pairs[0:10]) 

# write them to ASDF
output= "./asdf_data"
os.makedirs(output, exist_ok=True)
asdf_store = ASDFStackStore(output)
for ((src,rec), stacks) in stacks_10:
    asdf_store.append(ts, src, rec, stacks)

Note that the ASDFStackStore implementation creates one file per-pair (that's what the original implementation did):

asdf_data
└── CI.ABL
    ├── CI.ABL_CI.ABL.h5
    ├── CI.ABL_CI.ACP.h5
     ....

@mdenolle
Copy link
Contributor Author

mdenolle commented Dec 1, 2023

@carlosgjs could you help us debug here. We are trying this snippet of code above but NumpyStackStore no longer has read_bulk but rather "read". Could you guide us to fixing that and update the tutorials?

@mdenolle
Copy link
Contributor Author

@niyiyu and I think this function should be in the io package once it's done

@niyiyu niyiyu transferred this issue from noisepy/NoisePy Feb 22, 2024
@niyiyu niyiyu closed this as completed Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants