# Pandas DataFrames with a GSLIB I/O Methods
- categories: [Python, Jupyter, Pandas, GSLIB, OOP]
- comments: true


Pandas is everywhere, for good reason. If you are using python to manipulate data, chances are you're using pandas at some point in your workflow. If you are using python with some kind geoscience data changes are you've come across some weird FORTRAN generated file format that was a great idea in the 80s, but is a bit of a hassle for I/O operations. Pandas offers a large number of [read/write methods](https://pandas.pydata.org/pandas-docs/stable/reference/io.html), but occasionally some archaic file format comes along that can be a challenge (MODFLOW anyone?). But don't limit yourself to I/O operations, if there is additional functionality you desire from Pandas, you can extend dataframe, series or index functionality to do just that. If extending existing Pandas objects isn't enough, with a little extra effort and a few important details subclass and make your own DataFrame class.

## Step 1:  Reusable Function

In the last post about reading/writing GSLIB files, I shared a couple short snippets I use for reading/writing GEO-EAS (GSLIB) files to/from Pandas. As an example, I'll extend pandas to include these convenient read/write operations:

In [21]:
#collapse
import pandas as pd
import numpy as np

In [None]:
def write_gslib(self, filename:str):
    with open(filename, "w") as f:
        f.write("GSLIB Example Data\n")
        f.write(f"{len(self._obj.columns)}\n")
        f.write("\n".join(self._obj.columns)+"\n")
        for row in df.itertuples():
            row_data = "\t".join([f"{i:.3f}" for i in row[1:]])
            f.write(f"{row_data}\n")
            
def read_gslib(filename:str):
    with open(filename, "r") as f:
        lines = f.readlines()
        ncols = int(lines[1].split()[0])
        col_names = [lines[i+2].strip() for i in range(ncols)]
    df = pd.read_csv(filename, skiprows=ncols+2, delim_whitespace=True, names=col_names)
    return df

## Step 2: pd.DataFrame Accessor

I'm a big fan of this Pandas functionality. Just a simple decorator opens up the ability to add your own methods, properties etc. Here are the steps:

1.  Make a class out your function(s). I'll call mine

    `GSLIBAccessor`:

In [74]:
class GSLIBAccessor:
    def __init__(self, pandas_obj):
        self._obj=(pandas_obj)

    def write_gslib(self, filename:str):
        with open(filename, "w") as f:
            f.write("GSLIB Example Data\n")
            f.write(f"{len(self._obj.columns)}\n")
            f.write("\n".join(self._obj.columns)+"\n")
            for row in df.itertuples():
                row_data = "\t".join([f"{i:.3f}" for i in row[1:]])
                f.write(f"{row_data}\n")
            
    def read_gslib(filename:str):
        with open(filename, "r") as f:
            lines = f.readlines()
            ncols = int(lines[1].split()[0])
            col_names = [lines[i+2].strip() for i in range(ncols)]
        df = pd.read_csv(filename, 
                    skiprows=ncols+2, 
                    delim_whitespace=True, 
                    names=col_names)
        return df

1. In the `__init__` include your DataFrame that this method will be operating on, `pandas_obj` in above snippet.
2. Add the decorator (Also I removed the `_gslib` suffix):
    

In [75]:
@pd.api.extensions.register_dataframe_accessor("gslib")
class GSLIBAccessor:
    def __init__(self, pandas_obj):
        self._obj=(pandas_obj)

    def write(self, filename:str):
        with open(filename, "w") as f:
            f.write("GSLIB Example Data\n")
            f.write(f"{len(self._obj.columns)}\n")
            f.write("\n".join(self._obj.columns)+"\n")
            for row in df.itertuples():
                row_data = "\t".join([f"{i:.3f}" for i in row[1:]])
                f.write(f"{row_data}\n")
                
    @staticmethod        
    def read(filename:str):
        with open(filename, "r") as f:
            lines = f.readlines()
            ncols = int(lines[1].split()[0])
            col_names = [lines[i+2].strip() for i in range(ncols)]
        df = pd.read_csv(filename, 
                    skiprows=ncols+2, 
                    delim_whitespace=True, 
                    names=col_names)
        return df

Boom! Done! Thats it. Fantastic, right? 

Now, the function`write_gslib` is available as a DataFrame method at `df.gslib.write()` . There are plenty of GSLIB specific details to manage here - fill null values with -999.00, add grid definition in the file header etc, but the point here is this is a fast, easy, flexible way to add whatever functionality you need to the DataFrame class. 

Reading in a dataframe by this approach works just fine as well. Keep in mind, the method created is associated with the DataFrame and won't be accessible at same place as the other pandas I/O operations like `pd.read_csv`, instead it will be at `pd.DataFrame.gslib.read`. 

In [76]:
df = pd.DataFrame.gslib.read("data/example.dat")
df.head()

Unnamed: 0,x,y,z,var
0,0.723,0.564,0.785,2.853
1,0.915,0.317,0.357,0.749
2,0.346,0.484,0.69,0.786
3,0.591,0.15,0.669,0.29
4,0.157,0.332,0.006,1.777


To write out the file:

In [77]:
df.gslib.write("data/export_data.dat")

In [78]:
with open("data/export_data.dat", "r") as f:
    for i in range(10):
        print(f.readline().strip())

GSLIB Example Data
4
x
y
z
var
0.723	0.564	0.785	2.853
0.915	0.317	0.357	0.749
0.346	0.484	0.690	0.786
0.591	0.150	0.669	0.290


## Step 3: Subclassing Pandas DataFrame

If forwhatever reason the decorator accessor approach isn't enough, you can always create your own class entirely and inheirit `pd.DataFrame`. This is a bit more work and there are a couple import details not to be missed. 

 ### Inheiritance
 Inheiritance is a common aspect of OOP (object oriented programming), and is a topic that warrants a discussion all its own. Rather than get into that can of worms, if want some more details I'd suggest [Real Python: Inheiritance and Composition](https://realpython.com/inheritance-composition-python/). In example, create a new class and inheirit all the good things that `pd.DataFrame` does, but add a few properities, methods etc.

In [79]:
class GSLIBDataFrame(pd.DataFrame):
    def __init__(self, data, *args, **kwargs):
        super().__init__(data=data, *args, **kwargs)

    def write(self, filename:str):
        with open(filename, "w") as f:
            f.write("GSLIB Example Data\n")
            f.write(f"{len(self._obj.columns)}\n")
            f.write("\n".join(self._obj.columns)+"\n")
            for row in df.itertuples():
                row_data = "\t".join([f"{i:.3f}" for i in row[1:]])
                f.write(f"{row_data}\n")

    @classmethod        
    def read(cls, filename:str):
        with open(filename, "r") as f:
            lines = f.readlines()
            ncols = int(lines[1].split()[0])
            col_names = [lines[i+2].strip() for i in range(ncols)]
        df = pd.read_csv(filename, 
                    skiprows=ncols+2, 
                    delim_whitespace=True, 
                    names=col_names)
        return cls(df)

This works and is quite similar to the accessor example but I do think this apporach would scale better should you have big plans for your DIY dataframes. Lets consider a couple issues shown below

In [80]:
df = GSLIBDataFrame.read("data/example.dat")
returned_df = df.applymap(lambda x: x*2)

type(returned_df)

pandas.core.frame.DataFrame



1. If you use some standard pandas operatione, it will return a regular `pd.DataFrame` not a `GSLIBDataFrame` - To manage this, the `_constructor` must be defined to override the method inheirted from pandas.

2. Any other properites created, but be added to the `metadata` list so that they are passed on to results of manipulation.

To demonstrate in addition to the previously defined methods, add a property `favorite_column`, though this name is nonsense this can be a useful approach for defining a specific column that defines categories, domains or coordinates.


In [81]:
class GSLIBDataFrame(pd.DataFrame):
    def __init__(self, data, favorite_column=None, *args, **kwargs):
        super().__init__(data=data, *args, **kwargs)
        self.favorite_column = favorite_column

    _metadata = ["favorite_column"]

    @property
    def _constructor(self):
        return GSLIBDataFrame

    def write(self, filename:str):
        with open(filename, "w") as f:
            f.write("GSLIB Example Data\n")
            f.write(f"{len(self._obj.columns)}\n")
            f.write("\n".join(self._obj.columns)+"\n")
            for row in df.itertuples():
                row_data = "\t".join([f"{i:.3f}" for i in row[1:]])
                f.write(f"{row_data}\n")

    @classmethod        
    def read(cls, filename:str, favorite_column:str):
        with open(filename, "r") as f:
            lines = f.readlines()
            ncols = int(lines[1].split()[0])
            col_names = [lines[i+2].strip() for i in range(ncols)]
        df = pd.read_csv(filename, 
                    skiprows=ncols+2, 
                    delim_whitespace=True, 
                    names=col_names)
        return cls(data=df, favorite_column=favorite_column)



Now, instead of returning a `pd.DataFrame` a `GSLIBDataFrame` is returned as a result of manipulation.

In [82]:
df = GSLIBDataFrame.read("data/example.dat", favorite_column="var")
returned_df = df.applymap(lambda x: x*2)

type(returned_df)

__main__.GSLIBDataFrame

Subclassing `pd.DataFrame` requires a bit more effort and has some quirks but in the long run might be worthwhile in cases where more than just a few methods and/or properites are going to be added to the class. If you're going down this road, have a look at the [Pandas Documentation: Extending Pandas](https://pandas.pydata.org/pandas-docs/stable/development/extending.html), much of what I shared here is paraphrased from their fantastic documentation.  Code snippets posted as [gists on github](https://gist.github.com/ericbdaniels).