# Vibroseis data

**[Download the data from source](http://www.geofizyka.pl/2D_Land_vibro_data_2ms.tgz) or from [Agile's S3 bucket](	
https://s3.amazonaws.com/agilegeo/2D_Land_vibro_data_2ms.tgz).**

This prestack 2D land Vibroseis dataset was donated to the public domain by [Geofizyka Torun, Poland](http://www.geofizyka.pl/).

More info about this line:

- Info about this line [on SEG Wiki](http://wiki.seg.org/wiki/2D_Vibroseis_Line_001). 
- A [Madagascar tutorial](http://ahay.org/wikilocal/docs/school10.pdf) using this line, by Yang Liu.
- A [FreeUSP tutorial](http://www.freeusp.org/RaceCarWebsite/TechTransfer/Tutorials/Processing_2D/Processing_2D.html) using this line, by Paul Garossino.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
import obspy
obspy.__version__

In [None]:
ls -l ../data/poland

We'll use this helper function later.

In [None]:
def view_header(string, width=80):
    try:
        # Make sure we don't have a ``bytes`` object.
        string = string.decode()
    except:
        # String is already a string, carry on.
        pass
    lines = int(np.ceil(len(string) / width))
    result = ''
    for i in range(lines):
        line = string[i*width:i*width+width]
        result += line + (width-len(line))*' ' + '\n'
    print(result)
    return

## Load data

In [None]:
filename = '../data/poland/Line_001.sgy'

In [None]:
from obspy.io.segy.segy import _read_segy
section = _read_segy(filename)

The file-wide header:

In [None]:
view_header(section.textual_file_header)

The import line is this:

    C 5 DATA TRACES/RECORD: 282  AUXILIARY TRACES/RECORD:  2    CDP FOLD            

There are 282 data traces, plus 2 auxilliary traces, so a total of **284 traces in each record**.

Let's also check a trace header:

In [None]:
section.traces[3].header

There's also a readme file:

In [None]:
!cat ../data/poland/Line_001.TXT

This might be useful, but remember not to believe anything you read.

## Explore and organize the data

First we'll collect the traces and reshape them into a volume.

In [None]:
raw = np.vstack([t.data for t in section.traces])

In [None]:
raw.shape

First 1000 traces:

In [None]:
plt.figure(figsize=(18,8))
plt.imshow(raw[:1000, :].T, cmap="Greys", vmin=-.1, vmax=.1, aspect=0.25, interpolation='none')
plt.colorbar(shrink=0.5)
plt.show()

Recall that there are 284 traces (282 + 2 auxilliary) per ensemble, we can use the `reshape` trick of passing `-1` as one of the dimensions to get it to compute that axis on the fly, given the other two dimensions. We'll pass the last dimension of the input data to avoid changing the shape in that dimension. 

In [None]:
data = raw.reshape((-1, 284, raw.shape[-1]))

In [None]:
plt.figure(figsize=(18,8))
plt.imshow(data[90, :, :].T, cmap="Greys", vmin=-1, vmax=1, aspect=0.1, interpolation='none')
plt.colorbar(shrink=0.5)
plt.show()

There are two special data traces at the start of each ensemble. Let's pull those out so we have 'pure' gathers.

In [None]:
gathers = data[:, 2:, :]

vm = np.percentile(gathers, 99)

plt.figure(figsize=(18,8))
plt.imshow(gathers[0, :, :].T, cmap="Greys", vmin=-vm, vmax=vm, aspect=0.1, interpolation='none')
plt.colorbar(shrink=0.5)
plt.show()

Let's go back and look at that zeroth trace — we'll just look at the one on the 91st gather:

In [None]:
t90 = data[0,:,:]

In [None]:
plt.figure(figsize=(16,3))
plt.plot(t90[0,:])
plt.show()

In [None]:
#np.savetxt("../data/poland_wavelet.txt", t90[0,:])

## Source and receiver positions

Let's look at the source and receiver data.

In [None]:
!head -25 ../data/poland/Line_001.RPS

The obvious way to load this sort of data is `pandas`...

In [None]:
names = ['Record', 'Point', 'Static', 'Easting', 'Northing', 'Elevation']
cols = [0, 1, 2, 7, 8, 9]

In [None]:
import pandas

rcv = pandas.read_csv('../data/poland/Line_001.RPS',
                      delim_whitespace=True,
                      skiprows=20,
                      usecols=cols,
                      names=names,
                     )

In [None]:
rcv.head()

In [None]:
rcv.describe()

Hopefully the source data is the same...

In [None]:
!head -25 ../data/poland/Line_001.SPS

It is!

In [None]:
src = pandas.read_csv('../data/poland/Line_001.SPS',
                      delim_whitespace=True,
                      skiprows=20,
                      usecols=cols,
                      names=names,
                     )

In [None]:
src.head()

Now plot them together.

In [None]:
plt.scatter(src.Easting, src.Northing, c='r', lw=0, s=3, alpha=0.5, label='src')
plt.scatter(rcv.Easting, rcv.Northing, c='b', lw=0, s=2, alpha=0.4, label='rcv')
plt.legend(loc=2)
plt.show()

In [None]:
!head -25 ../data/poland/Line_001.XPS

## Brute stack

We can stack the traces as they are, without any noise suppression, NMO correction, etc.

In [None]:
gathers.shape

In [None]:
brute = np.mean(gathers, axis=1)

In [None]:
vm = np.percentile(brute, 99)

In [None]:
plt.figure(figsize=(18,8))
plt.imshow(brute.T, cmap="Greys", vmin=-vm, vmax=vm, aspect=0.1, interpolation='none')
plt.colorbar(shrink=0.5)
plt.show()

### Write this out to SEG-Y

In [None]:
from obspy.core import Trace, Stream, UTCDateTime
from obspy.io.segy.segy import SEGYTraceHeader

stream = Stream()

for i, trace in enumerate(brute):

    # Make the trace.
    tr = Trace(trace)

    # Add required data.
    tr.stats.delta = 0.004
    tr.stats.starttime = 0  # Not strictly required.

    # Add yet more to the header (optional).
    tr.stats.segy = {'trace_header': SEGYTraceHeader()}
    tr.stats.segy.trace_header.trace_sequence_number_within_line = i + 1
    tr.stats.segy.trace_header.receiver_group_elevation = 0

    # Append the trace to the stream.
    stream.append(tr)
    
from obspy.core import AttribDict
from obspy.io.segy.segy import SEGYBinaryFileHeader

# Text header.
stream.stats = AttribDict()
stream.stats.textual_file_header = '{:80s}'.format('This is the textual header.').encode()
stream.stats.textual_file_header += '{:80s}'.format('This file contains a brute stack.').encode()
stream.stats.textual_file_header += '{:80s}'.format('The original file header and trace headers disagree on sample interval.').encode()
stream.stats.textual_file_header += '{:80s}'.format('I think the header is probably right, it is 4 ms so records are 6 s.').encode()
stream.stats.textual_file_header += '{:80s}'.format('Only useful lines from original file header:').encode()
stream.stats.textual_file_header += '{:80s}'.format('C 2 LINE:  LINE_001           AREA                        MAP ID                ').encode()
stream.stats.textual_file_header += '{:80s}'.format('C 4 INSTRUMENT: MFG            MODEL            SERIAL NO                       ').encode()
stream.stats.textual_file_header += '{:80s}'.format('C 5 DATA TRACES/RECORD: 282  AUXILIARY TRACES/RECORD:  2    CDP FOLD            ').encode()
stream.stats.textual_file_header += '{:80s}'.format('C 6 SAMPLE INTERNAL:  4MS     SAMPLES/TRACE: 750  BITS/IN      BYTES/SAMPLE 4   ').encode()

# Binary header.
stream.stats.binary_file_header = SEGYBinaryFileHeader()
stream.stats.binary_file_header.trace_sorting_code = 4
stream.stats.binary_file_header.seg_y_format_revision_number = 0x0100

import sys
stream.write('../data/poland_brute_stack.sgy', format='SEGY', data_encoding=5, byteorder=sys.byteorder)

## NMO velocity

From Madagascar velocity scan: https://www.dropbox.com/s/alski0p047ylwu0/Screenshot%202016-09-14%2009.28.40.png?raw=1

Min velocity (blue): 2200 m/s, max velocity (red): 4250 m/s

In [None]:
import numpy as np
velocity = np.load('../data/poland/Velocity.npy')
plt.imshow(velocity, cmap='viridis')
plt.colorbar()

In [None]:
mi, ma = np.amin(velocity), np.amax(velocity)
mi, ma

In [None]:
# There are 251 gathers, and 1501 time samples
# So we need this array to be 1501 rows by 251 columns.

In [None]:
from scipy.misc import imresize

In [None]:
v = imresize(velocity,(1501, 251))

We lost the scaling:

In [None]:
np.amin(v), np.amax(v)

Let's also fix the orientation — we want traces in the first dimension.

In [None]:
v = ((v/255).T * (ma - mi) + mi).astype(np.int16)

In [None]:
np.amin(v), np.amax(v)

In [None]:
plt.plot(v[40])

In [None]:
plt.imshow(v.T, aspect=0.1)
plt.colorbar()

In [None]:
stream = Stream()

for i, trace in enumerate(v):

    # Make the trace.
    tr = Trace(trace)

    # Add required data.
    tr.stats.delta = 0.004
    tr.stats.starttime = 0  # Not strictly required.

    # Add yet more to the header (optional).
    tr.stats.segy = {'trace_header': SEGYTraceHeader()}
    tr.stats.segy.trace_header.trace_sequence_number_within_line = i + 1
    tr.stats.segy.trace_header.receiver_group_elevation = 0

    # Append the trace to the stream.
    stream.append(tr)
    
# Text header.
stream.stats = AttribDict()
stream.stats.textual_file_header = '{:80s}'.format('This is the textual header.').encode()
stream.stats.textual_file_header += '{:80s}'.format('This file contains velocity data.').encode()

# Binary header.
stream.stats.binary_file_header = SEGYBinaryFileHeader()
stream.stats.binary_file_header.trace_sorting_code = 4
stream.stats.binary_file_header.seg_y_format_revision_number = 0x0100

# Encodings:
# 1: IBM, 32-bit float
# 2: 32-bit int
# 3: 16-bit int
# 4: obselete
# 5: IEEE, 32-bit float
# 8: 8-bit int
stream.write('../data/poland/poland_velocity.sgy', format='SEGY', data_encoding=3, byteorder=sys.byteorder)

<hr />

<div>
<img src="https://avatars1.githubusercontent.com/u/1692321?s=50"><p style="text-align:center">© Agile Geoscience 2016</p>
</div>