# Tutorial - Loading AGS data using ```groundhog```

This tutorial outlines how an AGS 4.0 file can be converted to Python-compatible data structures (Pandas Dataframes) using ```groundhog```.

A file with geotechnical data from the Borssele I offshore wind farm is used to demonstrate the principles. This file is used under a Creative Commons 4.0 license.

## 1. Importing libraries

We can import the ```AGSConverter``` class. This class will convert the AGS data to Pandas dataframes which can be used for further data processing.

In [1]:
from groundhog.general.agsconversion import AGSConverter

## 2. Reading data

We can read data from an AGS file by creating an ```AGSConverter``` object. We will use the file ```N6016_BH-WFS1-2A_AGS4_150703.AGS```. ```groundhog``` performs some initial processing on the file. The AGS groupnames are extracted and any double-quotes which would hinder import (e.g. in latitude or longitude values) are removed.

In [2]:
agsdata = AGSConverter(path='Data/N6016_BH-WFS1-2A_AGS4_150703.AGS')

We can check which groupnames are available by printing the ```groupnames``` property of the ```AGSConverter``` object.

In [3]:
agsdata.groupnames

['PROJ',
 'UNIT',
 'TYPE',
 'ABBR',
 'User-defined data group',
 'DICT',
 'LOCA',
 'GEOL',
 'DETL',
 'SAMP',
 'CONG',
 'GCHM',
 'GRAG',
 'GRAT',
 'LDEN',
 'LLPL',
 'LNMC',
 'LPDN',
 'LPEN',
 'TREG',
 'TRIG',
 'TRIT']

The groupnames are four character abbreviations which defines which data the group contains. We can convert these groupnames to a more verbose format using the ```GROUP_NAMES``` dictionary.

In [4]:
from groundhog.general.agsconversion import GROUP_NAMES

Currently, only the most common group names for geotechnical tests in the AGS 4.0 standard are encoded. Different group names can easily be added in the future.

In [5]:
GROUP_NAMES

{'PROJ': 'Project Information',
 'ABBR': 'Abbreviation Definitions',
 'DICT': 'User Defined Groups and Headings',
 'FILE': 'Associated Files',
 'TRAN': 'Data File Transmission Information / Data Status',
 'TYPE': 'Definition of Data Types',
 'UNIT': 'Definition of Units',
 'CLSS': 'Classification tests',
 'CONG': 'Consolidation Tests - General',
 'CONS': 'Consolidation Tests - Data',
 'CORE': 'Coring Information',
 'GEOL': 'Field Geological Descriptions',
 'GRAG': 'Particle Size Distribution Analysis - General',
 'GRAT': 'Particle Size Distribution Analysis - Data',
 'SCPG': 'Static Cone Penetration Tests - General',
 'SCPT': 'Static Cone Penetration Tests - Data',
 'SCPP': 'Static Cone Penetration Tests - Derived Parameters',
 'LOCA': 'Location Details',
 'DETL': 'Stratum Detail Descriptions',
 'SAMP': 'Sample Information',
 'GCHM': 'Geotechnical Chemistry Testing',
 'LDEN': 'Density tests',
 'LLPL': 'Liquid and Plastic Limit Tests',
 'LNMC': 'Water/Moisture Content Tests',
 'LPDN': '

## 3. Converting AGS data to Pandas dataframes

### 3.1. Converting all groups

Converting the AGS data to Pandas dataframes is a matter of running the ```create_dataframes``` method. This creates a dictionary of dataframes for all group names. If groups cannot be converted, a warning will be raised but the code will continue. Dataframe creation for this group is simply skipped.

In [13]:
agsdata.create_dataframes()



We now have a dictionary of dataframes in the ```data``` attribute. The keys of this dictionary are the groupnames.

In [14]:
agsdata.data.keys()

dict_keys(['PROJ', 'UNIT', 'TYPE', 'ABBR', 'DICT', 'LOCA', 'GEOL', 'DETL', 'SAMP', 'CONG', 'GCHM', 'GRAG', 'GRAT', 'LDEN', 'LLPL', 'LNMC', 'LPDN', 'LPEN', 'TREG', 'TRIG', 'TRIT'])

We can check the data for the density tests (we only print the first five rows using the ```head()``` method).

In [16]:
agsdata.data['LDEN'].head()

Unnamed: 0,LOCA_ID,SAMP_TOP [m],SAMP_REF,SAMP_TYPE,SAMP_ID,SPEC_REF,SPEC_DPTH [m],LDEN_MC [%],LDEN_BDEN [kN/m3],LDEN_DDEN [kN/m3],LDEN_LAB
0,BH-WFS1-2A,1.0,W2,W,,6,1.15,24,19.4,15.7,
1,BH-WFS1-2A,1.0,W2,W,,7,1.5,23,,,
2,BH-WFS1-2A,2.0,W3,W,,10,2.15,25,19.3,15.4,
3,BH-WFS1-2A,2.0,W3,W,,11,2.45,24,19.2,15.5,
4,BH-WFS1-2A,3.0,W4,W,,19,3.15,24,20.3,16.4,


The resulting dataframe has AGS codes as column headers, with the accompanying units. These column keys are not verbose, but the ```create_dataframes``` method can fix this as explained in the next section.

### 3.2. Converting selected groups

#### 3.2.1. Using AGS column headers

We often don't need all groups in the AGS file. We can only import selected groups by specifying a list of groupnames we want to convert in the keyword argument ```selectedgroups```.

As an example, we will convert only the sample information (```SAMP``` group). The resulting dictionary only contains one element.

In [17]:
agsdata.create_dataframes(selectedgroups=['SAMP',])
agsdata.data.keys()

dict_keys(['SAMP'])

We can visualise the content of the resulting dataframe. This dataframe has the AGS codes as the column headers.

In [18]:
agsdata.data['SAMP'].head()

Unnamed: 0,LOCA_ID,SAMP_TOP [m],SAMP_REF,SAMP_TYPE,SAMP_ID,SAMP_BASE [m],SAMP_DTIM [yyyy-mm-dd],SAMP_UBLO,SAMP_CONT,SAMP_SDIA [mm],SAMP_RECV [%],SAMP_TECH,SAMP_WHY,SAMP_DESD [yyyy-mm-dd],SAMP_LOG,SAMP_COND
0,BH-WFS1-2A,0.0,W1,W,,0.32,2015-04-10,,,,43.0,,,2015-04-10,TAD,Undisturbed
1,BH-WFS1-2A,1.0,W2,W,,1.65,2015-04-10,,,,70.0,,,2015-04-10,TAD,Undisturbed
2,BH-WFS1-2A,2.0,W3,W,,2.6,2015-04-10,,,,67.0,,,2015-04-10,TAD,Undisturbed
3,BH-WFS1-2A,3.0,W4,W,,3.65,2015-04-10,,,,68.0,,,2015-04-10,TAD,Undisturbed
4,BH-WFS1-2A,4.0,W5,W,,4.6,2015-04-10,,,,63.0,,,2015-04-10,TAD,Undisturbed


#### 3.2.2. Long verbose column headers

We can automatically convert AGS column headers by setting the ```verbose_keys``` boolean to True. The ```AGSConverter``` class will make use of the dictionary ```AGS_TABLES``` to perform the conversion. Currently, not all AGS groups are encoded in ```groundhog``` but this is expanded with each release.

In [19]:
agsdata.create_dataframes(selectedgroups=['SAMP',], verbose_keys=True)

The resulting dataframe now has readable column headers. The downside for further coding is that these headers are rather long. 

In [21]:
agsdata.data['SAMP'].head()

Unnamed: 0,Location identifier,Depth to top of sample [m],Sample reference,Sample type,Sample unique identifier,Depth to base of sample [m],Date and time sample taken [yyyy-mm-dd],Number of blows required to drive sampler,Sample container,Sample diameter [mm],Percentage of sample recovered [%],Sampling technique/method,Reason for sampling,Date sample described [yyyy-mm-dd],Person responsible for sample/specimen description,Condition and representativeness of sample
0,BH-WFS1-2A,0.0,W1,W,,0.32,2015-04-10,,,,43.0,,,2015-04-10,TAD,Undisturbed
1,BH-WFS1-2A,1.0,W2,W,,1.65,2015-04-10,,,,70.0,,,2015-04-10,TAD,Undisturbed
2,BH-WFS1-2A,2.0,W3,W,,2.6,2015-04-10,,,,67.0,,,2015-04-10,TAD,Undisturbed
3,BH-WFS1-2A,3.0,W4,W,,3.65,2015-04-10,,,,68.0,,,2015-04-10,TAD,Undisturbed
4,BH-WFS1-2A,4.0,W5,W,,4.6,2015-04-10,,,,63.0,,,2015-04-10,TAD,Undisturbed


#### 3.2.3. Short verbose headers

Column header conversion using short verbose names is also possible using the ```use_shorthands``` boolean. This will still provide some verbosity to the column headers while keeping the columns short. The dictionary ```AGS_TABLES_SHORTHANDS``` contains the conversion keys.

In [22]:
agsdata.create_dataframes(selectedgroups=['SAMP',], verbose_keys=True, use_shorthands=True)

The resulting dataframe now has shorter readable column headers.

In [23]:
agsdata.data['SAMP'].head()

Unnamed: 0,Location identifier,Depth from [m],Sample reference,Sample type,Sample ID,Depth to [m],Date and time sample taken [yyyy-mm-dd],No blows,Sample container,Sample diameter [mm],Recovery [%],Sample technique,Reason for sampling,Date described [yyyy-mm-dd],Logged by,Condition and representativeness of sample
0,BH-WFS1-2A,0.0,W1,W,,0.32,2015-04-10,,,,43.0,,,2015-04-10,TAD,Undisturbed
1,BH-WFS1-2A,1.0,W2,W,,1.65,2015-04-10,,,,70.0,,,2015-04-10,TAD,Undisturbed
2,BH-WFS1-2A,2.0,W3,W,,2.6,2015-04-10,,,,67.0,,,2015-04-10,TAD,Undisturbed
3,BH-WFS1-2A,3.0,W4,W,,3.65,2015-04-10,,,,68.0,,,2015-04-10,TAD,Undisturbed
4,BH-WFS1-2A,4.0,W5,W,,4.6,2015-04-10,,,,63.0,,,2015-04-10,TAD,Undisturbed
