The aim of this Jupyter Notebook is to figure out how to organize and convert data and metadata from Alan Linde.

The waveform data should all end up in the DAYS directory in a BUD archive.

So far I have only looked at the early-data, which are in REFTEK/PASSCAL-SEGY format consisting of a 240-byte trce header following by a trace, according to Bruce Beaudoin. These trace files seem to be unreadable by ObsPy, but perhaps a help request to the listserv can lead to success. I think I tried converting them with rt2ms also and that failed. I possibly need a much older version of rt2ms.

The new directory is basically a place to play with data, e.g. extract *tar.Z files.

Hopefully a lot more data are in the Q330 directories.

In [None]:
import libseisGT

In [None]:
TOPDIR = '/Users/thompsong/calipso_data'
NEWDIR = os.path.join(TOPDIR, 'new')       

# 1. READ DATA FROM THE early-data/bh? DIRECTORIES

## 1.1 Process early-data/bh?/*.sac
I accidentally moved or deleted these after they were combined into the BUD archive

In [None]:
wfdirs = glob.glob(os.path.join(TOPDIR, 'early-data', 'bh?'))
filematch = '*.sac'

libseisGT.process_wfdirs(wfdirs, filematch)

## 1.2 Process early-data/bh?/DT*.???

In [None]:
wfdirs = glob.glob(os.path.join(TOPDIR, 'early-data', 'bh?'))
filematch = 'DT*.???'

libseisGT.process_wfdirs(wfdirs, filematch)

## 1.3 Process early-data/bh?/*LH?.

In [None]:
wfdirs = glob.glob(os.path.join(TOPDIR, 'early-data', 'bh?'))
filematch = '*LH?.'

libseisGT.process_wfdirs(wfdirs, filematch)


## 1.4 Process early-data/bh5/*_folder/DT*.???

In [None]:
wfdirs = glob.glob(os.path.join(TOPDIR, 'early-data', 'bh5', '*_folder'))
filematch = 'DT*.???'

libseisGT.process_wfdirs(wfdirs, filematch)    

# 2. PROCESSING *TAR.Z FILES IN early-data/bh5

## 2.1 Try to uncompress *tar.Z files (JUST 2 FOR NOW)

In [None]:
wfdirs = glob.glob(os.path.join(TOPDIR, 'early-data', 'bh?'))
for wfdir in wfdirs:
    print('Processing %s' % wfdir)
    compressedfiles = glob.glob(os.path.join(wfdir, '*.tar.Z'))
    for compressedfile in compressedfiles[0:2]:
        compressedfileroot = os.path.basename(compressedfile).replace('.tar.Z','')
        decompressDir = os.path.join(NEWDIR, compressedfileroot)
        decompressFile(compressedfile, decompressDir)

## 2.2 Try to read a REFTEK/PASSCAL/SEG-Y file from an uncompressed *TAR.Z file

### 2.2.1 Using ObsPy blindly

TypeError: Unknown format for file /Users/thompsong/calipso_data/new/gerstr.161/03.161.09.04.50.7089.1

In [None]:
SEGYFILE = '/Users/thompsong/calipso_data/new/gerstr.161/03.161.09.04.50.7089.1'
#SEGYFILE2 = '/Users/thompsong/calipso_data/new/gerstr.119/03.119.14.13.50.7089.1'
st = read(SEGYFILE)

### 2.2.2 Force ObsPy to treat it as SEGY format with big-endian

SEGYError: Unable to determine the endianness of the file. Please specify it.

In [None]:
st = read(SEGYFILE, 'SEGY', endian='big')

### 2.2.3 Use _read_segy

SEGYError: Unable to determine the endianness of the file. Please specify it.

In [None]:
from obspy.io.segy.segy import _read_segy
segy = _read_segy(SEGYFILE)

### 2.2.4 Use iread_segy

SEGYError: Unable to determine the endianness of the file. Please specify it.

In [None]:
from obspy.io.segy.segy import iread_segy
for tr in iread_segy(SEGYFILE):
    # Each Trace's stats attribute will have references to the file
    # headers and some more information.
    tf = tr.stats.segy.textual_file_header
    bf = tr.stats.segy.binary_file_header
    tfe = tr.stats.segy.textual_file_header_encoding
    de = tr.stats.segy.data_encoding
    e = tr.stats.segy.endian
    # Also do something meaningful with each Trace.
    print(int(tr.data.sum() * 1E9))

### 2.2.5 Read as an SEGYTrace object with ObsPy

AttributeError: 'str' object has no attribute 'fileno'

In [None]:
from obspy.io.segy.segy import SEGYTrace
sgtrace = SEGYTrace(file=SEGYFILE)
sgtrace._read_trace()

### 2.2.6 Read as a Seismic Unix object with ObsPy

ValueError: hour must be in 0..23

In [None]:
from obspy.io.segy.segy import iread_su
for tr in iread_su(SEGYFILE, endian='big'):
    print(tr)

### 2.2.7 Reading as 4-byte integer

SEG-Y is a big-endian format, which I think can be read as '>i' as '<i' is little-endian. However, there are multiple formats: 
- 4-byte IBM float
- 4-byte 2s-complement integer
- 2-byte 2s-complement integer
- 4-byte fixed-point with gain
- 4-byte IEEE float
- 1-byte 2s-complement integer

See page 6 of https://seg.org/Portals/0/SEG/News%20and%20Resources/Technical%20Standards/seg_y_rev1.pdf

In [None]:
import struct
import numpy as np
from obspy.core import Trace
import matplotlib.pyplot as plt

fin = open("/Users/thompsong/calipso_data/new/gerstr.119/03.119.14.13.50.7089.1", "rb")
#header = fin.read(228)
linetraceno = struct.unpack('>i', fin.read(4)) # 0-3
reeltraceno = struct.unpack('>i', fin.read(4)) # 4-7
fieldrecordno = struct.unpack('>i', fin.read(4)) # 8-11
fieldtraceno = struct.unpack('>i', fin.read(4)) # 12-15
energysourcepointno = struct.unpack('>i', fin.read(4)) # 16-19
ensembleno = struct.unpack('>i', fin.read(4)) # 20-23
traceno = struct.unpack('>i', fin.read(4)) # 24-27
traceidcode = fin.read(2) # 28-29
crap = fin.read(4) # 30-33
datause = fin.read(2) # 34-35
crap2 = fin.read(120) # 36-155
year = fin.read(2)
doy = fin.read(2) # 158-159
crap3 = fin.read(81) # 160-240
print(year, doy)
print(datause)
print(linetraceno, reeltraceno, fieldrecordno)
y = []
error_flag = False
while not error_flag:
    try:
        y.append(struct.unpack('>i', fin.read(4))) # <i is little-endian, I think >i is big-endian. Intel chips are little endian
    except:
        error_flag = True
fin.close()

plt.plot(y)

# 3. PROCESSING Q330 DATA

## 3.1 Process data from Q330data/TR

The current problem here is that the Q330data directories contain data not just across a few days, but across years. Need to somehow create an wfdisc-like dataframe index that works day-by-day.

In [None]:
wfdirs = glob.glob(os.path.join(TOPDIR, 'Q330data', 'TR'))
filematch = 'TR*.???'

libseisGT.process_wfdirs(wfdirs, filematch) 

## 3.2 Process data from Q330data/O1

In [None]:
station = 'O1'
put_away = False
#wfdirs = glob.glob(os.path.join(TOPDIR, 'Q330data', station, '2005'))
wfdirs = glob.glob(os.path.join(TOPDIR, 'Q330data', station, '2003'))                                
filematch = '%s_*' % station

libseisGT.process_wfdirs(wfdirs, filematch, put_away) 

I got the following trace from the code above:
    QT.429.O1.BHZ | 2005-01-02T15:38:44.388396Z - 2006-01-01T05:35:36.268396Z | 50.0 Hz, 1570670595 samples
OK, I need to walk through the wfdisc dataframe and split it into days. 

# 4. Loading and plotting a BUD file

In [None]:
BUDDIR = os.path.join(TOPDIR, 'DAYS')
st = libseisGT.BUD_load_day(BUDDIR, 2005, 167)
if st:
    st.plot(equal_scale=False);