# TileDB-Segy tutorial

This notebook introduces [TileDB-Segy](https://github.com/TileDB-Inc/TileDB-Segy) for reading and inspecting SEG-Y data. It uses the [Kerry3D data](https://wiki.seg.org/wiki/Kerry-3D) as the input SEG-Y file.

## Converting to TileDB-Segy

When TileDB-Segy is installed, it installs a commandline interface (CLI) called `segy2tiledb` for converting
SEG-Y and Seismic Unix formatted files to TileDB formatted arrays.

In [1]:
!segy2tiledb --help

usage: segy2tiledb [-h] [-o] [-g {auto,structured,unstructured}] [--su] [--iline ILINE] [--xline XLINE] [--endian {big,msb,little,lsb}] [-s TILE_SIZE]
                   input [output]

Convert a SEG-Y file to tiledb-segy format

positional arguments:
  input                 Input SEG-Y file path
  output                Output directory path (default: None)

optional arguments:
  -h, --help            show this help message and exit
  -o, --overwrite       Overwrite the output directory if it already exists (default: False)
  -g {auto,structured,unstructured}, --geometry {auto,structured,unstructured}
                        Output geometry:
                        - auto: same as the input SEG-Y.
                        - structured: same as `auto` but abort if a geometry cannot be inferred.
                        - unstructured: opt out on building geometry information.
                         (default: auto)

segyio options:
  --su                  Open a seism

***

`Kerry3D` uses a non-default `iline` and `xline` number field so we need to specify these if we want to preserve the geometry:

In [2]:
!segy2tiledb --iline=223 --xline=21 --overwrite ./Kerry3D.segy

TileDB-Segy uses [TileDB's compression filters](https://docs.tiledb.com/main/basic-concepts/data-format#compression-filters) to perform a lossless compression of the input data.

In [3]:
!du -sh Kerry3D.*

1,1G	Kerry3D.segy
612M	Kerry3D.tsgy


## Using the TileDB-Segy API

First we need to import the `tiledb.segy` package. Also import `numpy` and set some printing options. 

In [4]:
import tiledb.segy
import numpy as np

np.set_printoptions(precision=4, threshold=10, suppress=True)

### Opening and closing

Opening a `tsgy` directory is done with the `tiledb.segy.open` function, idiomatically used as a context manager:

In [5]:
with tiledb.segy.open("./Kerry3D.tsgy") as f:
    ...

Alternatively we can call `open` directly, in which case we should `close` the returned object explicitly:

In [6]:
f = tiledb.segy.open("./Kerry3D.tsgy")
# ...
# Remember to close f when no longer needed
# f.close()

In this case `f` is a `StructuredSegy` instance, which means the data have an established geometry structure (inline numbers, crossline numbers etc.). The alternative would be  a `Segy` instance, which is unstructured. `StructuredSegy` extends `Segy` and its API is a superset of the latter:

In [7]:
f

StructuredSegy('Kerry3D.tsgy')

In [8]:
f.__class__.mro()

[tiledb.segy.StructuredSegy, tiledb.segy.Segy, object]

In [9]:
# Segy API
print(list(a for a in dir(tiledb.segy.Segy) if a[0] != '_'))

['attributes', 'bin', 'close', 'depth_slice', 'dt', 'header', 'samples', 'sorting', 'text', 'trace', 'uri']


In [10]:
# StructuredSegy API
print(list(a for a in dir(tiledb.segy.StructuredSegy) if a[0] != '_'))

['attributes', 'bin', 'close', 'cube', 'depth_slice', 'dt', 'fast', 'gather', 'header', 'iline', 'ilines', 'offsets', 'samples', 'slow', 'sorting', 'text', 'trace', 'uri', 'xline', 'xlines']


In the following sections `f` is an open `StructuredSegy` instance.

### Public attributes

`StructuredSegy` objects have the following public attributes:
- `f.uri`: Uniform resource identifier (usually local file path) to the underlying data
- `f.sorting`: Whether the data are sorted by inline, crossline or neither (unsorted)
- `f.ilines`: Inferred inline numbers
- `f.xlines`: Inferred crossline numbers
- `f.offsets`: Inferred offsets numbers
- `f.samples`: Inferred sample offsets (frequency and recording time delay)


In [11]:
f.uri

URL('Kerry3D.tsgy')

In [12]:
f.sorting

INLINE_SORTING

In [13]:
f.ilines.size, f.ilines

(287, array([510, 511, 512, ..., 794, 795, 796], dtype=int32))

In [14]:
f.xlines.size, f.xlines

(735, array([ 58,  59,  60, ..., 790, 791, 792], dtype=int32))

In [15]:
f.offsets

array([0], dtype=int32)

In [16]:
f.samples.size, f.samples

(1252, array([   0.,    4.,    8., ..., 4996., 5000., 5004.]))

### Modes

TileDB-Segy supports all the [segyio](https://github.com/equinor/segyio#modes) _addressing modes_ with the same semantics. There are two main differences:
- Indexing a `tiledb.segy` mode returns a single numpy array of higher dimension in cases where the respective `segyio` operation returns a generator of numpy arrays.
- The mappings returned by `bin`, `header` and `attributes(name)` have string keys instead of `segyio.TraceField` enums or integers.

#### trace

In [17]:
f.trace

<tiledb.segy.indexables.Trace at 0xa4880e0c>

In [18]:
len(f.trace)

210945

In [19]:
t = f.trace[12345]
t.shape, t

((1252,),
 array([0.    , 0.    , 0.    , ..., 0.0487, 0.0487, 0.    ], dtype=float32))

In [20]:
t5 = f.trace[12345:12350]
assert np.array_equal(t5[0], t)
t5.shape, t5

((5, 1252),
 array([[0.    , 0.    , 0.    , ..., 0.0487, 0.0487, 0.    ],
        [0.    , 0.    , 0.    , ..., 0.0417, 0.0417, 0.    ],
        [0.    , 0.    , 0.    , ..., 0.    , 0.0394, 0.    ],
        [0.    , 0.    , 0.    , ..., 0.0405, 0.0405, 0.    ],
        [0.    , 0.    , 0.    , ..., 0.    , 0.042 , 0.    ]],
       dtype=float32))

#### header

In [21]:
f.header

<tiledb.segy.indexables.Header at 0xa47a260c>

In [22]:
len(f.header)

210945

In [23]:
h = f.header[12345]
print(h)
type(h)

{'TRACE_SEQUENCE_LINE': 12346, 'TRACE_SEQUENCE_FILE': 12346, 'FieldRecord': 49, 'TraceNumber': 55966956, 'EnergySourcePoint': 288, 'CDP': 643, 'CDP_TRACE': 1, 'TraceIdentificationCode': 1, 'NSummedTraces': 0, 'NStackedTraces': 0, 'DataUse': 1, 'offset': 0, 'ReceiverGroupElevation': 0, 'SourceSurfaceElevation': 0, 'SourceDepth': 0, 'ReceiverDatumElevation': 0, 'SourceDatumElevation': 0, 'SourceWaterDepth': 0, 'GroupWaterDepth': 0, 'ElevationScalar': 1, 'SourceGroupScalar': 1, 'SourceX': 1703234, 'SourceY': 5600913, 'GroupX': 1703234, 'GroupY': 5600913, 'CoordinateUnits': 0, 'WeatheringVelocity': 0, 'SubWeatheringVelocity': 0, 'SourceUpholeTime': 0, 'GroupUpholeTime': 0, 'SourceStaticCorrection': 0, 'GroupStaticCorrection': 0, 'TotalStaticApplied': 0, 'LagTimeA': 0, 'LagTimeB': 0, 'DelayRecordingTime': 0, 'MuteTimeStart': 0, 'MuteTimeEND': 0, 'TRACE_SAMPLE_COUNT': 1252, 'TRACE_SAMPLE_INTERVAL': 4000, 'GainType': 0, 'InstrumentGainConstant': 0, 'InstrumentInitialGain': 0, 'Correlated': 0,

dict

In [24]:
h5 = f.header[12345:12350]
assert h5[0] == h
type(h5), len(h5)


(list, 5)

#### attributes(header)

In [25]:
a = f.attributes("SourceX")
a

<tiledb.segy.indexables.Attributes at 0xa47a2dcc>

In [26]:
len(a)

210945

In [27]:
a[12345]

array([1703234])

In [28]:
a[12345:12350]

array([1703234, 1703235, 1703236, 1703236, 1703237])

#### iline, xline

In [29]:
f.iline, f.xline

(<tiledb.segy.indexables.Line at 0xa47c062c>,
 <tiledb.segy.indexables.Line at 0xa47c066c>)

In [30]:
len(f.iline), len(f.xline)

(287, 735)

In [31]:
i = f.iline[515]
i.shape

(735, 1252)

In [32]:
i2 = f.iline[515:517]
assert np.array_equal(i2[0], i)
i2.shape, i2

((2, 735, 1252),
 array([[[0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         ...,
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.]],
 
        [[0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         ...,
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.]]], dtype=float32))

In [33]:
x = f.xline[60]
x.shape, x

((287, 1252),
 array([[0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]], dtype=float32))

In [34]:
x2 = f.xline[60:62]
assert np.array_equal(x2[0], x)
x2.shape, x2

((2, 287, 1252),
 array([[[0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         ...,
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.]],
 
        [[0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         ...,
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.]]], dtype=float32))

#### fast, slow

In [35]:
f.fast is f.iline

True

In [36]:
f.slow is f.xline

True

#### depth_slice


In [37]:
f.depth_slice

<tiledb.segy.indexables.Depth at 0xa47c058c>

In [38]:
len(f.depth_slice)

1252

In [39]:
d = f.depth_slice[123]
d.shape, d

((287, 735),
 array([[0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]], dtype=float32))

In [40]:
d2 = f.depth_slice[123:125]
assert np.array_equal(d2[0], d)
d2.shape, d2

((2, 287, 735),
 array([[[0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         ...,
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.]],
 
        [[0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         ...,
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.]]], dtype=float32))

#### gather


In [41]:
f.gather

<tiledb.segy.indexables.Gather at 0xa47c04ac>

In [42]:
len(f.gather)

TypeError: object of type 'Gather' has no len()

In [43]:
g = f.gather[525, 159]
g.shape, g

((1252,),
 array([0.    , 0.    , 0.    , ..., 0.6014, 0.5011, 0.    ], dtype=float32))

In [44]:
g5 = f.gather[525:530, 159]
assert np.array_equal(g5[0], g)
g5.shape, g5

((5, 1252),
 array([[ 0.    ,  0.    ,  0.    , ...,  0.6014,  0.5011,  0.    ],
        [ 0.    ,  0.    ,  0.    , ...,  0.6067,  0.56  ,  0.    ],
        [ 0.    ,  0.    ,  0.    , ..., -0.0999,  0.    ,  0.    ],
        [ 0.    ,  0.    ,  0.    , ...,  0.    , -0.0442,  0.    ],
        [ 0.    ,  0.    ,  0.    , ...,  0.    ,  0.    ,  0.    ]],
       dtype=float32))

In [45]:
g4 = f.gather[525, 159:163]
assert np.array_equal(g4[0], g)
g4.shape, g4

((4, 1252),
 array([[0.    , 0.    , 0.    , ..., 0.6014, 0.5011, 0.    ],
        [0.    , 0.    , 0.    , ..., 0.4984, 0.3987, 0.    ],
        [0.    , 0.    , 0.    , ..., 0.3552, 0.3108, 0.    ],
        [0.    , 0.    , 0.    , ..., 0.2803, 0.2402, 0.    ]],
       dtype=float32))

In [46]:
g54 = f.gather[525:530, 159:163]
assert np.array_equal(g54[0, 0], g)
g54.shape, g54

((5, 4, 1252),
 array([[[ 0.    ,  0.    ,  0.    , ...,  0.6014,  0.5011,  0.    ],
         [ 0.    ,  0.    ,  0.    , ...,  0.4984,  0.3987,  0.    ],
         [ 0.    ,  0.    ,  0.    , ...,  0.3552,  0.3108,  0.    ],
         [ 0.    ,  0.    ,  0.    , ...,  0.2803,  0.2402,  0.    ]],
 
        [[ 0.    ,  0.    ,  0.    , ...,  0.6067,  0.56  ,  0.    ],
         [ 0.    ,  0.    ,  0.    , ...,  0.4748,  0.4748,  0.    ],
         [ 0.    ,  0.    ,  0.    , ...,  0.4932,  0.4484,  0.    ],
         [ 0.    ,  0.    ,  0.    , ...,  0.4675,  0.425 ,  0.    ]],
 
        [[ 0.    ,  0.    ,  0.    , ..., -0.0999,  0.    ,  0.    ],
         [ 0.    ,  0.    ,  0.    , ...,  0.    ,  0.0478,  0.    ],
         [ 0.    ,  0.    ,  0.    , ...,  0.0792,  0.1188,  0.    ],
         [ 0.    ,  0.    ,  0.    , ...,  0.0399,  0.0799,  0.    ]],
 
        [[ 0.    ,  0.    ,  0.    , ...,  0.    , -0.0442,  0.    ],
         [ 0.    ,  0.    ,  0.    , ...,  0.0456,  0.0456,  0.   

#### text


In [47]:
type(f.text), len(f.text), type(f.text[0])

(tuple, 1, bytes)

In [48]:
f.text[0]



#### bin


In [49]:
b = f.bin
print(b)
type(b)

{'AmplitudeRecovery': 0, 'AuxTraces': 0, 'BinaryGainRecovery': 0, 'CorrelatedTraces': 0, 'EnsembleFold': 1, 'ExtAuxTraces': 0, 'ExtEnsembleFold': 0, 'ExtSamples': 0, 'ExtSamplesOriginal': 0, 'ExtendedHeaders': 0, 'Format': 1, 'ImpulseSignalPolarity': 0, 'Interval': 4000, 'IntervalOriginal': 0, 'JobID': 0, 'LineNumber': 510, 'MeasurementSystem': 0, 'ReelNumber': 0, 'SEGYRevision': 0, 'SEGYRevisionMinor': 0, 'Samples': 1252, 'SamplesOriginal': 0, 'SortingCode': 4, 'Sweep': 0, 'SweepChannel': 0, 'SweepFrequencyEnd': 0, 'SweepFrequencyStart': 0, 'SweepLength': 0, 'SweepTaperEnd': 0, 'SweepTaperStart': 0, 'Taper': 0, 'TraceFlag': 0, 'Traces': 1, 'VerticalSum': 1, 'VibratoryPolarity': 0}


dict