In [1]:
from pathlib import Path
repo = Path("..")

# 1 - Quick start

In [17]:
 import aseg_gdf2

Read in a simple example GDF2 file

In [39]:
gdf = aseg_gdf2.read(repo / r"tests/example_datasets/3bcfc711/GA1286_Waveforms")
gdf

<aseg_gdf2.gdf2.GDF2 object at 0x00000281330D72C8 nrecords=?>

How big is the data table? `aseg_gdf2` doesn't know initially, because it is generally "lazy", only calculating things or retrieving data as requested. This is intended to allow working on very large files. 

You can find out the size of the data table file by accessing the `nrecords` attribute:

In [40]:
gdf.nrecords

23039

In [41]:
gdf

<aseg_gdf2.gdf2.GDF2 object at 0x00000281330D72C8 nrecords=23039>

In [42]:
gdf.field_names()

['FLTNUM', 'Rx_Voltage', 'Flight', 'Time', 'Tx_Current']

You can iterate over rows in the data table:

In [43]:
i = 0
for row in gdf.iterrows():
    print(row)
    i += 1
    if i > 5:
        break

{'Index': 0, 'FLTNUM': 1.0, 'Rx_Voltage': -0.0, 'Flight': 1, 'Time': 0.0052, 'Tx_Current': 0.00176}
{'Index': 1, 'FLTNUM': 1.0, 'Rx_Voltage': -0.0, 'Flight': 1, 'Time': 0.0104, 'Tx_Current': 0.00176}
{'Index': 2, 'FLTNUM': 1.0, 'Rx_Voltage': -0.0, 'Flight': 1, 'Time': 0.0156, 'Tx_Current': 0.00176}
{'Index': 3, 'FLTNUM': 1.0, 'Rx_Voltage': -0.0, 'Flight': 1, 'Time': 0.0208, 'Tx_Current': 0.00176}
{'Index': 4, 'FLTNUM': 1.0, 'Rx_Voltage': -0.0, 'Flight': 1, 'Time': 0.026, 'Tx_Current': 0.00176}
{'Index': 5, 'FLTNUM': 1.0, 'Rx_Voltage': -0.0, 'Flight': 1, 'Time': 0.0312, 'Tx_Current': 0.00176}


You can also get the data table as a pandas.DataFrame:

In [44]:
df = gdf.df()
df.head()

Unnamed: 0,FLTNUM,Rx_Voltage,Flight,Time,Tx_Current
0,1.0,-0.0,1,0.0052,0.00176
1,1.0,-0.0,1,0.0104,0.00176
2,1.0,-0.0,1,0.0156,0.00176
3,1.0,-0.0,1,0.0208,0.00176
4,1.0,-0.0,1,0.026,0.00176


In [45]:
print(gdf.df().head())

   FLTNUM  Rx_Voltage  Flight    Time  Tx_Current
0     1.0        -0.0       1  0.0052     0.00176
1     1.0        -0.0       1  0.0104     0.00176
2     1.0        -0.0       1  0.0156     0.00176
3     1.0        -0.0       1  0.0208     0.00176
4     1.0        -0.0       1  0.0260     0.00176


(If the file is too big to fit in memory, you can also use dask in exactly the same way -- see [Example 3](3%20-%20Use%20dask%20to%20read%20a%20DAT%20file%20too%20big%20to%20fit%20in%20memory.ipynb))

The metadata from the definition file is there too:

In [24]:
gdf.record_types

{'': {'fields': [{'name': 'FLTNUM',
    'format': 'F10.1',
    'unit': '',
    'null': None,
    'long_name': 'FlightNumber',
    'comment': '',
    'cols': 1,
    'width': 10},
   {'name': 'Rx_Voltage',
    'format': 'F10.5',
    'unit': 'Volt',
    'null': '-99.99999',
    'long_name': 'Rx_Voltage',
    'comment': '',
    'cols': 1,
    'width': 10},
   {'name': 'Flight',
    'format': 'I6',
    'unit': '',
    'null': '-9999',
    'long_name': 'Flight',
    'comment': '',
    'cols': 1,
    'width': 6},
   {'name': 'Time',
    'format': 'F10.4',
    'unit': 'msec',
    'null': '-999.9999',
    'long_name': 'Time',
    'comment': '',
    'cols': 1,
    'width': 10},
   {'name': 'Tx_Current',
    'format': 'F13.5',
    'unit': 'Amp',
    'null': '-99999.99999',
    'long_name': 'Tx_Current',
    'comment': '',
    'cols': 1,
    'width': 13}],
  'format': None}}

You can also get this metadata as a DataFrame:

In [25]:
gdf.record_types.df()

Unnamed: 0,record_type,name,format,unit,null,long_name,comment,cols,width
0,,FLTNUM,F10.1,,,FlightNumber,,1,10
1,,Rx_Voltage,F10.5,Volt,-99.99999,Rx_Voltage,,1,10
2,,Flight,I6,,-9999.0,Flight,,1,6
3,,Time,F10.4,msec,-999.9999,Time,,1,10
4,,Tx_Current,F13.5,Amp,-99999.99999,Tx_Current,,1,13


Get the data just for one field/column as either a pandas Series, or an ndarray:

In [26]:
gdf.get_field_data('Time')

array([5.20000e-03, 1.04000e-02, 1.56000e-02, ..., 5.99844e+01,
       5.99896e+01, 5.99948e+01])

What about fields which are 2D arrays? 

Some GDF2 data files have fields with more than one value per row/record. 

Let's load a different example where this is the case.

In [46]:
gdf = aseg_gdf2.read(repo / r"tests/example_datasets/9a13704a/Mugrave_WB_MGA52.dfn")

In [47]:
gdf.record_types.df()

Unnamed: 0,record_type,name,format,unit,null,long_name,comment,cols,width
0,COMM,RT,A4,,,,,1,4
1,COMM,COMMENTS,A76,,,,,1,76
0,,GA_Project,I10,,,,Geoscience Australia airborne survey project n...,1,10
1,,Job_No,I10,,,,SkyTEM Australia Job Number,1,10
2,,Fiducial,F15.2,,,,Fiducial,1,15
3,,DATETIME,F18.10,days,,,Decimal days since midnight December 31st 1899,1,18
4,,LINE,I10,,,,Line number,1,10
5,,Easting,F12.2,m,-9999999.99,,Easting (GDA94 MGA Zone 52),1,12
6,,NORTH,F15.2,m,-9999999999.99,,Northing (GDA 94 MGA Zone 52),1,15
7,,DTM_AHD,F10.2,,-99999.99,,Digital terrain model (AUSGeoid09 datum),1,10


In [49]:
print(gdf.record_types.df()[["name", "unit", "format", "cols"]])

          name  unit   format  cols
0           RT             A4     1
1     COMMENTS            A76     1
0   GA_Project            I10     1
1       Job_No            I10     1
2     Fiducial          F15.2     1
3     DATETIME  days   F18.10     1
4         LINE            I10     1
5      Easting     m    F12.2     1
6        NORTH     m    F15.2     1
7      DTM_AHD          F10.2     1
8        RESI1          F10.3     1
9       HEIGHT     m    F10.2     1
10      INVHEI     m    F10.2     1
11         DOI     m    F10.2     1
12        Elev     m  30F12.2    30
13         Con  mS/m  30F15.5    30
14     Con_doi  mS/m  30F15.5    30
15        RUnc        30F12.3    30


See those last four fields? They have 30 columns each and are therefore each 2D arrays.

They are still normal GDF fields:

In [29]:
gdf.field_names()

['GA_Project',
 'Job_No',
 'Fiducial',
 'DATETIME',
 'LINE',
 'Easting',
 'NORTH',
 'DTM_AHD',
 'RESI1',
 'HEIGHT',
 'INVHEI',
 'DOI',
 'Elev',
 'Con',
 'Con_doi',
 'RUnc']

But we can see their representation in the data table file, as 30 separate columns each, by explicitly requesting a listing of the column names:

In [30]:
gdf.column_names()

['GA_Project',
 'Job_No',
 'Fiducial',
 'DATETIME',
 'LINE',
 'Easting',
 'NORTH',
 'DTM_AHD',
 'RESI1',
 'HEIGHT',
 'INVHEI',
 'DOI',
 'Elev[0]',
 'Elev[1]',
 'Elev[2]',
 'Elev[3]',
 'Elev[4]',
 'Elev[5]',
 'Elev[6]',
 'Elev[7]',
 'Elev[8]',
 'Elev[9]',
 'Elev[10]',
 'Elev[11]',
 'Elev[12]',
 'Elev[13]',
 'Elev[14]',
 'Elev[15]',
 'Elev[16]',
 'Elev[17]',
 'Elev[18]',
 'Elev[19]',
 'Elev[20]',
 'Elev[21]',
 'Elev[22]',
 'Elev[23]',
 'Elev[24]',
 'Elev[25]',
 'Elev[26]',
 'Elev[27]',
 'Elev[28]',
 'Elev[29]',
 'Con[0]',
 'Con[1]',
 'Con[2]',
 'Con[3]',
 'Con[4]',
 'Con[5]',
 'Con[6]',
 'Con[7]',
 'Con[8]',
 'Con[9]',
 'Con[10]',
 'Con[11]',
 'Con[12]',
 'Con[13]',
 'Con[14]',
 'Con[15]',
 'Con[16]',
 'Con[17]',
 'Con[18]',
 'Con[19]',
 'Con[20]',
 'Con[21]',
 'Con[22]',
 'Con[23]',
 'Con[24]',
 'Con[25]',
 'Con[26]',
 'Con[27]',
 'Con[28]',
 'Con[29]',
 'Con_doi[0]',
 'Con_doi[1]',
 'Con_doi[2]',
 'Con_doi[3]',
 'Con_doi[4]',
 'Con_doi[5]',
 'Con_doi[6]',
 'Con_doi[7]',
 'Con_doi[8]',


These are represented as you'd expect in the data table's DataFrame object:

In [31]:
gdf.df().head()

Unnamed: 0,GA_Project,Job_No,Fiducial,DATETIME,LINE,Easting,NORTH,DTM_AHD,RESI1,HEIGHT,...,RUnc[20],RUnc[21],RUnc[22],RUnc[23],RUnc[24],RUnc[25],RUnc[26],RUnc[27],RUnc[28],RUnc[29]
0,1288,10013,3621109.0,42655.910984,112601,948001.6,7035223.1,354.1,1.091,40.98,...,1.39,1.76,2.35,3.26,4.45,5.74,6.94,8.0,8.99,98.0
1,1288,10013,3621110.0,42655.910995,112601,948001.9,7035196.8,353.8,1.101,41.08,...,1.43,1.84,2.47,3.41,4.62,5.9,7.09,8.15,9.15,98.0
2,1288,10013,3621111.0,42655.911007,112601,948001.5,7035169.5,353.7,0.813,41.03,...,1.45,1.88,2.53,3.48,4.7,5.97,7.16,8.22,9.21,98.0
3,1288,10013,3621112.0,42655.911019,112601,948000.6,7035141.6,353.9,0.567,40.79,...,1.45,1.87,2.53,3.49,4.71,5.98,7.16,8.21,9.2,98.0
4,1288,10013,3621113.0,42655.91103,112601,947999.1,7035113.6,354.2,0.522,40.37,...,1.45,1.88,2.54,3.52,4.74,6.01,7.18,8.23,9.22,98.0


We can get the data in exactly the same way as a normal "column" field.

In [32]:
gdf.get_field_data("Elev")

array([[ 354.1,  352.1,  349.8, ..., -105.8, -171.2, -245.7],
       [ 353.8,  351.8,  349.5, ..., -106.1, -171.5, -246. ],
       [ 353.7,  351.7,  349.4, ..., -106.2, -171.6, -246.1],
       ...,
       [ 510.5,  508.5,  506.2, ...,   50.6,  -14.8,  -89.3],
       [ 510.5,  508.5,  506.2, ...,   50.6,  -14.8,  -89.3],
       [ 510.6,  508.6,  506.3, ...,   50.7,  -14.7,  -89.2]])

We can also get a combination of ordinary column fields and 2D fields:

In [34]:
data = gdf.get_fields_data(["Easting", "NORTH", "Elev"])
data

(array([948001.6, 948001.9, 948001.5, 948000.6, 947999.1, 947997.2,
        947995.1, 947993.4, 947992.5, 947992.5, 947993.3, 947994.7,
        947996. , 947997.1, 947997.8, 947997.9, 800001.6, 800002.4,
        800003. , 800003.5, 800003.5, 800003.3, 800002.9, 800002.8,
        800002.8, 800003.1, 800003.7, 800004.1, 800004.3, 800004.5,
        800004.4, 800004.2, 800004.1, 800004.1, 800003.9, 800003.7,
        800003.3, 800002.6]),
 array([7035223.1, 7035196.8, 7035169.5, 7035141.6, 7035113.6, 7035085.9,
        7035058.5, 7035031.3, 7035004.2, 7034976.6, 7034948.3, 7034919.2,
        7034889.4, 7034859. , 7034828.4, 7034797.9, 7029884.1, 7029855.3,
        7029826.9, 7029798.6, 7029770.1, 7029741.5, 7029712.8, 7029684.3,
        7029656.1, 7029628.1, 7029600.1, 7029572. , 7029543.8, 7029515.5,
        7029487.4, 7029459.7, 7029432.1, 7029404.5, 7029376.8, 7029348.7,
        7029320.2, 7029291.4]),
 array([[ 354.1,  352.1,  349.8, ..., -105.8, -171.2, -245.7],
        [ 353.8,  351.8

Under the hood this works using pandas' [``usecols`` keyword argument](https://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.read_fwf.html).