# GPS from raw files

## pyEcholab

Using the [`pyEcholab`](https://github.com/CI-CMG/pyEcholab.git) toolkit developed by NOAA for reading echosounder data, to pull nmea data from EK60 raw files, you can install directly from the repository:

```python
!pip install git+https://github.com/CI-CMG/pyEcholab.git
```

This is setup to use pyEcholab rather than [`echoPype`](https://echopype.readthedocs.io/en/latest/index.html) which uses an intermediate step of forming the netCDF structure. Pulling the lat/lon with `echoPype` will be included as an example at the bottom of the notebook at some point.

The downside of the setup below is that it doesn't write out the file until it's done, but that is the trade off for maintaining iterations intervals of < 2s. File building within the loop exponentially increases iteration time due to loading the document and essentially having to read the dataframe twice each step.

In [25]:
from glob import glob
from echolab2.instruments import EK60
import pandas as pd
import matplotlib.pyplot as plt
from tqdm.notebook import trange, tqdm
%matplotlib inline
import numpy as np
import csv
import os

An example of what the `positions` dictionary produced from echolab looks like

In [16]:
positions

{'latitude': array([nan, nan, nan, ..., nan, nan, nan]),
 'longitude': array([nan, nan, nan, ..., nan, nan, nan]),
 'ping_time': array(['2019-07-02T19:31:41.346', '2019-07-02T19:31:43.395',
        '2019-07-02T19:31:44.424', ..., '2019-07-21T17:52:34.230',
        '2019-07-21T17:52:34.408', '2019-07-21T17:52:34.588'],
       dtype='datetime64[ms]')}

One way to do this using only pandas to build a list of dictionaries and combining at the end

In [7]:
rawfiles = sorted(glob('D:\AIESII\OS201901\EK60_Data\*.raw'))
df_list= []
for i in trange(10):
    ek60 = EK60.EK60()
    ek60.read_raw(rawfiles[i])
    rawd = ek60.get_raw_data(channel_number=1)
    sv = rawd.get_Sv()
    positions = ek60.nmea_data.interpolate(sv, 'position')
    dfGPS = pd.DataFrame(positions)
    df_list.append(dfGPS)
df = pd.concat(df_list)
df.to_csv('test.csv')

HBox(children=(IntProgress(value=0, max=10), HTML(value='')))




A slightly faster method which uses `dask.delayed` rather than converting each dataframe to pd.DataFrame within the loop

In [11]:
import dask
rawfiles = sorted(glob('D:\AIESII\OS201901\EK60_Data\*.raw'))
df_list= []
for i in trange(2000,len(rawfiles)):
    try:
        ek60 = EK60.EK60()
        ek60.read_raw(rawfiles[i])
        rawd = ek60.get_raw_data(channel_number=1)
        sv = rawd.get_Sv()
        positions = ek60.nmea_data.interpolate(sv, 'position')
        dfGPS = dask.delayed(pd.DataFrame)(positions)
        df_list.append(dfGPS)
    except:
        print(rawfiles[i])
df = dask.delayed(pd.concat)(df_list).compute()
df.to_csv('D:\AIESII\OS201901\gps\gps3.csv')

HBox(children=(IntProgress(value=0, max=1976), HTML(value='')))

D:\AIESII\OS201901\EK60_Data\OceanStarr_2019-D20190825-T104936.raw



When running this, I split it up into 3 parts because I wasn't sure how long it would take and I was worried originally it would fail. I had one file (printed above via the exception) that I manually exported gps from Echoview and formatted spreadsheet to match the headers. Then I read them all back in into a big sheet, dropped the missing lat/lon points, dropped the index column, and sorted by datetime of the ping. This is then saved to the final data format.

In [9]:
df1 = pd.read_csv('D:\AIESII\OS201901\gps\gps1.csv',parse_dates=['ping_time']).dropna()
df2 = pd.read_csv('D:\AIESII\OS201901\gps\gps2.csv',parse_dates=['ping_time']).dropna()
df3 = pd.read_csv('D:\AIESII\OS201901\gps\gps3.csv',parse_dates=['ping_time']).dropna()
df4 = pd.read_csv('D:\AIESII\OS201901\gps\gpsFail.csv',parse_dates=['ping_time']).dropna()
dft = pd.concat([df1,df2,df3,df4], axis=0, sort=True)
dft = dft[['ping_time','latitude','longitude']]
dft = dft.sort_values('ping_time')
dft = dft[dft.ping_time > '2019-08-01']
display(dft.head())
dft.to_csv('D:\AIESII\OS201901\gps\OS201901_EK60GPS.csv', index=False,float_format="%.6f")

Unnamed: 0,ping_time,latitude,longitude
4169,2019-07-19 19:36:56.394,47.666748,-122.391758
4170,2019-07-19 19:36:57.414,47.666748,-122.391758
4171,2019-07-19 19:36:58.434,47.666749,-122.391758
4172,2019-07-19 19:36:59.454,47.666751,-122.391759
4173,2019-07-19 19:37:00.475,47.666751,-122.39176


Further resampling can be done for saving the file.