# Importing and Exporting GSLIB (GEO-EAS) Files
- categories: [Jupyter, GSLIB, Pandas, Geostatistics, Python]
- comments: true

Though a bit dated GSLIB remains the standard in many Geostatistical workflows, unfortunately the GSLIB data format can be a bit of hassle.
The standard GSLIB aka GEO-EAS data format as described on [gslib.com](http://www.gslib.com/gslib_help/format.html):

> * The first line in the file is taken as a title and is possibly transferred to output files.
> * The second line should be a numerical value specifying the number of numerical variables nvar in the data file.
> * The next nvar lines contain character identification labels and additional text (optional) that describe each variable.
> * The following lines, from nvar+3 until the end of the file, are considered as data points and must have nvar numerical values per line. Missing values are typically considered as large negative or positive numbers (e.g., less than -1.0e21 or greater than 1.0e21). The number of data will be the number of lines in the file minus nvar+2 minus the number of missing values. The programs read numerical values and not alphanumeric characters; alphanumeric variables may be transformed to integers or the source code modified. 





The header is informative, but a little bit of a hassle when importing into a strictly tabular format like Pandas.
It should be noted that line #2 in the header can often contain grid definition information in addition to ncols, and in the case of multiple simulations nsim is commonly given after the grid definition (this is overlooked in the read/write functions to follow).

The goal here is just to provide a couple simple functions to save a little time for anyone who needs to do this.

## Reading GSLIB data

Importing GSLIB data really happens in 2 steps. 
    1. read the header
    2. read all the data to a dataframe.

> side note: I've found `skip_rows` and `delim_whitespace` are useful when it comes to reading ASCII data from other scientific software (MODFLOW, PEST, TOUGH2 etc.)

In [7]:
#hide
import pandas as pd

In [2]:
def read_gslib(filename:str):
    with open(filename, "r") as f:
        lines = f.readlines()
        ncols = int(lines[1].split()[0])
        col_names = [lines[i+2].strip() for i in range(ncols)]
    df = pd.read_csv(filename, skiprows=ncols+2, delim_whitespace=True, names=col_names)
    return df

In [3]:
df = read_gslib(filename="data/example.dat")
df.head()

Unnamed: 0,x,y,z,var
0,0.723,0.564,0.785,2.853
1,0.915,0.317,0.357,0.749
2,0.346,0.484,0.69,0.786
3,0.591,0.15,0.669,0.29
4,0.157,0.332,0.006,1.777


Now go about your business analyzing data, making plots and doing all the other things python does well until you need re-export to GSLIB to run specific Geostatistical algorithm.

## Writing a Pandas DataFrame to GSLIB Format

As with reading in the data, I'm sure there are a number of ways this can be done. Below is one rather simple approach where I write the header than iterate over each row as a tuple. 

> If speed is a consideration when iterating over a pandas DataFrame use `.itertuples` its noticeably faster than `.iterrows`.

In [4]:
def write_gslib(df:pd.DataFrame, filename:str):
    with open(filename, "w") as f:
        f.write("GSLIB Example Data\n")
        f.write(f"{len(df.columns)}\n")
        f.write("\n".join(df.columns)+"\n")
        for row in df.itertuples():
            row_data = "\t".join([f"{i:.3f}" for i in row[1:]])
            f.write(f"{row_data}\n")

In [5]:
write_gslib(df, "data/exported_data.dat")

Now, just have a quick look at the file to be sure its correct:

In [6]:
with open("data/exported_data.dat","r") as f:
    for i in range(10):
        print(f.readline().strip())

GSLIB Example Data
4
x
y
z
var
0.723	0.564	0.785	2.853
0.915	0.317	0.357	0.749
0.346	0.484	0.690	0.786
0.591	0.150	0.669	0.290


Really the whole purpose here is to have these functions readily available to copy/paste when you need them.