Copyright (c) Microsoft Corporation.

Licensed under the MIT License.

# Converting SEG-Y files for training or validation

This notebook describes how to prepare your own SEG-Y files for training.

To use your own SEG-Y volumes to train models in the DeepSeismic repo, you need to bring at least one pair of ground truth and label data SEG-Y files where the files have an identical shape. The seismic data file contains  typical SEG-Y post stack data traces and the label data file should contain an integer class label at every sample in each trace.

For each SEG-Y file, run the convert_segy.py script to create a npy file. Optionally, you can normalize and/or clip the data in the SEG-Y file as it is converted to npy.

Once you have a pair of ground truth and related label npy files, you can edit one of the training scripts in the repo to use these files. One example is the [dutchf3 train.py](../../experiments/interpretation/dutchf3_patch/local/train.py) script.


In [1]:
from itkwidgets import view
import numpy as np

SEGYFILE= './normalsegy.segy'
PREFIX='normalsegy'

## convert_segy.py usage

In [2]:
!python ./convert_segy.py --help

usage: train [-h] --prefix PREFIX --input_file INPUT_FILE
             [--output_dir OUTPUT_DIR] [--metadata_only] [--iline ILINE]
             [--xline XLINE] [--cube_size CUBE_SIZE] [--stride STRIDE]
             [--normalize] [--clip] [--input INPUT] [--output OUTPUT]

optional arguments:
  -h, --help            show this help message and exit
  --prefix PREFIX       prefix label for output files
  --input_file INPUT_FILE
                        segy file path
  --output_dir OUTPUT_DIR
                        Output files are written to this directory
  --metadata_only       Only produce inline,xline metadata
  --iline ILINE         segy file path
  --xline XLINE         segy file path
  --cube_size CUBE_SIZE
                        cube dimensions
  --stride STRIDE       stride
  --normalize           Normalization flag - clip and normalize the data
  --clip                Clipping flag - only clip the data
  --input INPUT         Used when running in Azure ML S

# Example run

Convert the SEG-Y file to a single output npy file in the local directory. Do not normalize or clip the data

In [3]:
!python ./convert_segy.py --prefix {PREFIX} --input_file {SEGYFILE} --output_dir . --clip

	Fast Lines: 10 to 49 (40 lines)
	Slow Lines: 100 to 299 (200 lines)
	Sample Size: 10
	Trace Count: 8000
	First five distinct Fast Line Indexes: [10, 11, 12, 13, 14]
	First five distinct Slow Line Indexes: [100 101 102 103 104]
	First five fast trace ids: [10 10 10 10 10]
	First five slow trace ids: [100 101 102 103 104]
Npy files written: 1
Completed SEG-Y converstion in: 1.0972433060014737
Clipping File
Completed clipping in 0.21636191600191523 seconds


## Post processing instructions

There should now be on npy file in the local directory named donuthole_10_100_00000.npy. The number relate to the anchor point
of the array. In this case, inline 10, crossline 100, and depth 0 is the origin [0,0,0] of the array.

Rerun the convert_segy script for the related label file

In [4]:
npydata = np.load(f"{PREFIX}_10_100_00000.npy")
view(npydata, slicing_planes=True)

Viewer(geometries=[], gradient_opacity=0.22, point_sets=[], rendered_image=<itkImagePython.itkImageF3; proxy o…

### Prepare train/test splits file

Once the data and label segy files are converted to npy, use the prepare_dutchf3.py script on the resulting npy file to generate the list of patches as input to the train script