# Sonde3 
##  Reads and converts binary water quality environmental instrument data to a DataFrame


#### I.  Example Usage
Import the packaged we need to interact with `sonde3`:

In [1]:
import sonde3
import pandas     

Lets dive in!  

We have a example water quality instrument binary file `"tests/ysi_test_files/SA08.dat"`.  This file was generated by a YSI 600LS instrument and is in proprietary binary format.

#### Using the `sonde()` function we:

1.  `autodetect()` the file type and pass to the correct parser function 
2.  `read_ysi()` the binary file and convert to pandas DataFrame
3.  Transform all datetimes to the UTC timezone
4.  Standardize the units to metric and rename the columns to standard name conventions
3.  Pass the DataFrame to `calculate_salinity_psu()` and `calculate_do_mgl()` to apply standard formulas to generate the salinity and dissolved oxygen columns.

In [2]:
metadata, df = sonde3.sonde("tests/ysi_test_files/SA08.dat")


  metadata, df = formats.read_ysi(filename, tzinfo)
  Rtx = (rt) ** 0.5


#### Why the runtime warnings?

1.  The YSI instrument files don't contain any timezone information.  Therefore, the function has to assume that the timezone of the file to make the UTC conversion.

2. Often raw instrument files will contain impossible & incorrect values in the beginning and end of the file.  Examples: negative values for salinity or dissolved oxygen percentage.  `sonde3` does not trim the raw file, or perform QA analysis.  `sonde3` will pass the values as they were recorded by the instrument.

##### We can now interact with the two dataframes produced by `sonde3`:


In [3]:
metadata

Unnamed: 0,Instrument_Type,Manufacturer,System_Signal,Program_Version,Instrument_Serial_Number,Site,Logging_Interval,Begin_Log_Time_(UTC),First_Sample_Time_(UTC)
0,600,YSI,870489733,306,1012,SANT_CDT,3600,2008-07-16 12:00:00+00:00,2008-07-16 12:00:31+00:00


In [4]:
df.info()  

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 700 entries, 0 to 699
Data columns (total 8 columns):
datetime_(UTC)                700 non-null datetime64[ns, UTC]
water_temp_c                  700 non-null float64
water_conductivity_mS/cm      700 non-null float64
water_depth_m_nonvented       700 non-null float64
water_DO_%                    700 non-null float64
instrument_battery_voltage    700 non-null float64
water_salinity_PSU            678 non-null float64
water_DO_mgl                  678 non-null float64
dtypes: datetime64[ns, UTC](1), float64(7)
memory usage: 43.8 KB


In [5]:
df.head() 

Unnamed: 0,datetime_(UTC),water_temp_c,water_conductivity_mS/cm,water_depth_m_nonvented,water_DO_%,instrument_battery_voltage,water_salinity_PSU,water_DO_mgl
0,2008-07-16 12:00:31+00:00,28.998718,3.7e-05,0.010862,93.391418,6.09375,0.013536,7.18342
1,2008-07-16 13:00:31+00:00,28.482361,5.9e-05,0.016358,96.765137,6.09375,0.013326,7.510631
2,2008-07-16 14:00:31+00:00,27.257385,0.000546,0.017263,103.529358,6.09375,0.012655,8.212117
3,2008-07-16 15:00:31+00:00,29.507751,21.301758,0.542648,93.055725,6.09375,11.601472,6.655432
4,2008-07-16 16:00:31+00:00,29.762268,21.454102,0.557098,94.18869,6.09375,11.631321,6.706414


#### II.  Working with time zones


What if data was collected outside of US/Central time?  Pass the timezone information to `sonde3.sonde`:

In [6]:
import pytz


metadata, df = sonde3.sonde("tests/ysi_test_files/SA08.dat", pytz.timezone('US/Eastern'))


  Rtx = (rt) ** 0.5


In [7]:
df.head()

Unnamed: 0,datetime_(UTC),water_temp_c,water_conductivity_mS/cm,water_depth_m_nonvented,water_DO_%,instrument_battery_voltage,water_salinity_PSU,water_DO_mgl
0,2008-07-16 11:05:31+00:00,28.998718,3.7e-05,0.010862,93.391418,6.09375,0.013536,7.18342
1,2008-07-16 12:05:31+00:00,28.482361,5.9e-05,0.016358,96.765137,6.09375,0.013326,7.510631
2,2008-07-16 13:05:31+00:00,27.257385,0.000546,0.017263,103.529358,6.09375,0.012655,8.212117
3,2008-07-16 14:05:31+00:00,29.507751,21.301758,0.542648,93.055725,6.09375,11.601472,6.655432
4,2008-07-16 15:05:31+00:00,29.762268,21.454102,0.557098,94.18869,6.09375,11.631321,6.706414


### Autodetecting files

Curious about what kind of instrument files you have in a directory?  Apply the `sonde3.autodetect` method:

In [8]:

sonde3.autodetect("tests/greenspan_test_files/RIOA_20060718_CDT_GS7837.xls") 

'greenspan_xls'

In [9]:
#this script runs through all of the text examples and prints out the autodetect results
import os

root_dir = 'tests'
results = []
for directory, subdirectories, files in os.walk(root_dir):
    for file in files:
        if "_test.txt" in file:
            continue
        os.path.join(directory, file)
        results.append(os.path.join(directory, file) + ' ' + sonde3.autodetect(os.path.join(directory, file)))

results 

['tests\\format_tests.py unsupported_ascii',
 'tests\\sonde_tests.py unsupported_ascii',
 'tests\\test_file_example.txt unsupported_ascii',
 'tests\\ysi_tests.py unsupported_ascii',
 'tests\\__init__.py unsupported_ascii',
 'tests\\espey_test_files\\BZ3L_ALL.csv espey_csv',
 'tests\\eureka_test_files\\JARD_20070222_CST_EU7396.xls eureka_xls',
 'tests\\eureka_test_files\\JARD_20070404_CDT_EU7396.xls unsupported_xls',
 'tests\\eureka_test_files\\JARD_20070425_CDT_EU7396.xls unsupported_xls',
 'tests\\eureka_test_files\\JDM2_20060808_CDT_EU0312.csv unsupported_ascii',
 'tests\\eureka_test_files\\JDM2_20060919_CDT_EU0312.csv unsupported_csv',
 'tests\\eureka_test_files\\JDM2_20070410_CST_EU0312.csv unsupported_ascii',
 'tests\\eureka_test_files\\JDM4_20060919_CDT_EU0313.csv unsupported_csv',
 'tests\\eureka_test_files\\MCF1_20060322_CST_EU0096.csv unsupported_ascii',
 'tests\\eureka_test_files\\MCF1_20060807_CDT_EU0097.csv eureka_csv',
 'tests\\eureka_test_files\\MCF1_20061101_CDT_EU0097.c

### Generating Salinity and Dissolved Oxygen

Typically deployed water quality instruments do not compute all rows of data internally.  Instead, these are calculated by the program used to read the file back at the lab.  For example, YSI instruments do not compute salinity or dissolved oxygen concentration.  

Lets read the raw binary file of the example file `"tests/ysi_test_files/SA08.dat"` and see what it contains:

In [10]:
metadata, SA08_BIN = sonde3.read_ysi("tests/ysi_test_files/SA08.dat",pytz.timezone('US/Central'))
SA08_BIN.head()

Unnamed: 0,datetime_(UTC),water_temp_c,water_conductivity_mS/cm,water_depth_m_nonvented,water_DO_%,instrument_battery_voltage
0,2008-07-16 12:00:31+00:00,28.998718,3.7e-05,0.010862,93.391418,6.09375
1,2008-07-16 13:00:31+00:00,28.482361,5.9e-05,0.016358,96.765137,6.09375
2,2008-07-16 14:00:31+00:00,27.257385,0.000546,0.017263,103.529358,6.09375
3,2008-07-16 15:00:31+00:00,29.507751,21.301758,0.542648,93.055725,6.09375
4,2008-07-16 16:00:31+00:00,29.762268,21.454102,0.557098,94.18869,6.09375


For comparision, lets read the comma separated version of this file that was produced by the proprietary YSI Ecowin program:

In [11]:
#for comparison, lets read the comma separated version of the same file generated by YSI Ecowin
metadata, SA08_CSV = sonde3.read_ysi_ascii("tests/ysi_test_files/SA08.CDF", pytz.timezone('US/Central'), delim=",")
SA08_CSV.head()

Unnamed: 0,Datetime_(UTC),water_temp_c,water_specific_conductivity_mS/cm,water_depth_m_nonvented,water_DO_%,instrument_battery_voltage,water_conductivity_mS/cm,water_DO_mgl,water_resistivity_KOhm/cm,water_salinity_psu,water_tds_g/L
0,2008-07-16 12:00:31+00:00,29.0,0.0,0.011,93.4,6.1,0.0,7.18,26854.29,-0.0,0.0
1,2008-07-16 13:00:31+00:00,28.48,0.0,0.016,96.8,6.1,0.0,7.51,16950.16,-0.0,0.0
2,2008-07-16 14:00:31+00:00,27.26,0.001,0.017,103.5,6.1,0.001,8.21,1830.82,-0.0,0.0
3,2008-07-16 15:00:31+00:00,29.51,19.613,0.543,93.1,6.1,21.302,6.66,0.05,11.6,12.749
4,2008-07-16 16:00:31+00:00,29.76,19.665,0.557,94.2,6.1,21.454,6.71,0.05,11.63,12.782


Lets pass the SA08_BIN DataFrame to calculate Salinity (PSU) and Dissolved Oxygen (mg/L)

We can then compare our computed results to the ECOwatch program results:

In [12]:
SA08_BIN = sonde3.calculate_salinity_psu(SA08_BIN)
SA08_BIN = sonde3.calculate_do_mgl(SA08_BIN)

  Rtx = (rt) ** 0.5
