# Introduction

The main seismic data file formats we see in our processing center today are SEGY and SEGD files. Both are standards developed by the Society of Exploration Geophsyics (SEG) for storing and exchanging geophysical data.  Both formats are open standards, with the standards published on the [SEG website](https://library.seg.org/seg-technical-standards).

SEGY is up to revision 2.0, and SEGD is up to revision 3.1. However in reality we mostly see SEGY rev 1.0, and only recently have we started to see SEGD rev. 2.1.

[SEGY rev 1.0](https://library.seg.org/pb-assets/technical-standards/seg_y_rev1-1686080991247.pdf)

[SEGD rev 2.1](https://library.seg.org/pb-assets/technical-standards/seg_d_rev2.1-1686080991997.pdf)


This is partially due to both of these standards originally being written for magnetic tape storage in the 1970s. Both formats are now regularly stored on disk, however magnetic tape is still commonly used in the industry as it's robust and relatively cheap.  As the code to write to tape is often legacy and the motivation to update existing hardware not particularly urgent, this tends to lead to these old revisions hanging on. If you're ever in doubt what version SEGY or SEGD to make, an older version is safer (and simpler!).

There is another issue caused by the fact these standards are over 40 years old, and that is that the data typing within the file formats are often 40 years old as well. For example [BCD](https://en.wikipedia.org/wiki/Binary-coded_decimal) (4 bit binary coded decimal) is used to store values in SEGD headers. 

Another more problematic issue is the trace data in SEGY files is commonly stored as IBM floats instead of IEEE floats, and python/numpy has no native support for IBM floating point numbers. Conversions are available, e.g. https://pypi.org/project/ibm2ieee/ 

There's a good paper on some of the issues [here.](https://www.crewes.org/Documents/ResearchReports/2017/CRR201725.pdf)

Despite the SEGY standard now handling IEEE floats it is common practice to write IBM floats as well, which requires an ieee2ibm algorithm, which is not so easily found in python.

Pyseis has native python implementations of both, but unless you're into vectorised bit shifting, admire and move on.

The downside of all of this is python in general and numpy specifically cannot play directly with SEGY and SEGD formats. Instead these formats need to be converted to a more numpy friendly format - e.g. the seismic unix (SU) format.






# The SEGY Format

The SEGY format is relativelty simple, except for the presence of IBM floats.

Each SEGY file consists of a 3200 byte EBCDIC text header, which is an 8-bit non-ascii text encoding used by IBM in the 1960s. Go figure. But luckily python can handle EBCDIC strings,  and the function to read the EBCDIC header is simply the following
00)


In [None]:
def readEBCDIC(_file):
    with open(file=_file, mode="rt", encoding="cp500") as f:
        f.seek(0)
        self.params["EBCDIC"] = f.read(3200)
