# gINT data import in Python for Linux and Mac

gINT is a widely used geodatabase. While the application is widely used in the geotechnical community, the file-based data storage does not facilite use of data across projects and data is often locked in a database file and not used further when the project finishes.

Combining data from different projects can lead to improved insights and having past experience readily available can help geotechnical engineers make better decision.

To unlock gINT data for engineers, importing the data in Python can be very useful. Once the data is available in Python, it allows further processing or calculations.

gINT stores geotechnical data in Microsoft Access databases. The Microsoft Windows drivers for reading Access databases do not work on Linux and Mac but a workaround is possible using [MDBTools](https://github.com/mdbtools/mdbtools) and the [```pandas_access```](https://github.com/jbn/pandas_access) Python library.

## Installation of ```mdbtools```

### Linux

Install ```mdbtools``` on Linux by running the following command in a terminal window:

```sh
$ apt install mdbtools
```

### Mac

Install ```mdbtools``` on Mac using [Homebrew](http://brew.sh/). Run the following command in a terminal window:

```sh
$ brew install mdbtools
```



## Installation of ```pandas_access```

The ```pandas_access``` library can be installed using pip:

```sh
$ pip install pandas_access
```

Once installed, you can import the library in the notebook (note that you may need to restart Jupyter notebooks for the changes to take effect).

In [None]:
import pandas_access as mdb

## Reading database tables

A gINT project file (```.gpj``` extension) contains a number of tables with data. The names of these tables can be read using the ```.list_tables``` function. The path to the gINT file needs to be supplied as an argument.

An example file from a highway repair operation in the US is used as an example.

In [None]:
db_file = "Data/9724000.gpj"

In [None]:
for tbl in mdb.list_tables(db_file):
    print(tbl)

Not all tables contain data. We can select the ones containing data using the ```.read_schema``` function. The database schema is returned as a dictionary:

In [None]:
mdb.read_schema(db_file)

We can store a list with the tables names containing data:

In [None]:
data_tables = mdb.read_schema(db_file).keys()
data_tables

We can see that several table names are returned. Some (e.g. ```PROJECT```) speak for themselves, whereas others required additional inspection to know the data contained in them.

## Reading table data

Reading the data from the tables is straightforward using the ```.read_table``` function. This function returns a Pandas dataframe which can be used for further filtering of the data. The path to the database file and the name of the table need to be supplied.

In [None]:
df = mdb.read_table(db_file, "PROJECT")
df

We can thus loop over all the tables containing data and print the contents:

In [None]:
for tbl in data_tables:
    _df = mdb.read_table(db_file, tbl)
    print('-------%s-------' % tbl)
    print(_df.head())

## Exporting gINT data to Excel 

Once gINT data is available in dataframes, it can easily be exported to Excel as a workbook with multiple sheets. Although Excel is not recommended for data processing tasks (Python is much better at this), exporting can be useful for visually inspecting the data.

In [None]:
import pandas as pd
writer = pd.ExcelWriter('Output/9724000.xlsx', engine='xlsxwriter')
# Write each dataframe to a different worksheet.
for tbl in data_tables:
    _df = mdb.read_table(db_file, tbl)
    _df.to_excel(writer, sheet_name=tbl, index=False)
# Close the Pandas Excel writer and output the Excel file.
writer.save()

## Case study: SPT data for a selected location

The use of gINT data reading is illustrated for reporting SPT blowcount for a given location.

### Retrieval of test locations

The locations of tests are given in the ```POINT``` table:

In [None]:
point_df = mdb.read_table(db_file, 'POINT')
point_df

As an example, the SPT blowcount will be reported for PointID 6, with a depth of 36.5ft.

### Retrieval of SPT data

The SPT data is included in the ```SAMPLE``` table. We can import all the data first:

In [None]:
sample_df = mdb.read_table(db_file, 'SAMPLE')
sample_df.head()

A common problem with data imported from external files is that the data does not have the correct data type. We can check this with the ```.dtypes``` function in Pandas.

In [None]:
sample_df.dtypes

We can see that SPT numbers are indeed not numeric (```object``` data type). We can convert these columns as follows:

In [None]:
for key in ['SPT 1', 'SPT 2', 'SPT 3']:
    sample_df[key] = pd.to_numeric(sample_df[key], errors='coerce')

In [None]:
sample_df.head()

We can see that three SPT numbers are available. ```SPT 1``` is the blowcount for the seating drive, ```SPT 2``` and ```SPT 3``` can be added to obtain the SPT $ N $ number.

In [None]:
sample_df["SPT N"] = sample_df["SPT 2"] + sample_df["SPT 3"]

The SPT data for PointID 6 can be filtered using conventional Pandas syntax. Since the ```PointID``` field is an ```object```, we need to specify the PointID for filtering as a string.

In [None]:
sample_6_df = sample_df[sample_df['PointID'] == '6']
sample_6_df

These numbers can be used for further processing using the ```SPTProcessing``` class in [```groundhog```](https://github.com/snakesonabrain/groundhog).

## Closing remarks: Using gINT library files for data interpretation

This article shows how gINT project files can be read using ```mdbtools``` and ```pandas_access```. The Microsoft Access database files are readily imported in Python which can greatly facilitate further processing.

Certain gINT project files are also connected to a gINT library file (```.glb``` extension). These files contain e.g. possible choices for soil types, hammer types, ... In the example above, such a file would contain a table which says that ```GRAB``` stands for a grab sample and ```SS``` for a split-spoon sample. The methods described above can just as easily be used to import these ```.glb``` files.