# FRF_Currituck Sound Water Quality Data

Reading the FRF Thredds Server netCDF data for various water quality variables recorded in the Currituck Sound and writing these data, or a subset thereof, to a comma-separated values file. 

What data?

- time (converted to ISO 8601 format)
- waterTemperature  (in degrees Celsius)
- salinity (in ppt)
- pH (H)
- turbidity (?)
- chlorophyll (?)
- DOSat (Dissolved Oxygen - Saturate)
- DOMass (Dissolved Oxygen Mass Concentration)

python netcdf4 documentation: http://unidata.github.io/netcdf4-python

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import netCDF4
import time
import datetime

%matplotlib inline

### Load source file location information and load copy of data to local container:

In [2]:
f='FRF-ocean_waterquality_CS01-EXO_201612.nc'
url_base='https://chlthredds.erdc.dren.mil/thredds/dodsC/frf/oceanography/waterquality/'
url='CS02-EXO/CS02-EXO.ncml' #'CS01-EXO/CS01-EXO.ncml'

urls=['CS01-EXO/2017/FRF-ocean_waterquality_CS01-EXO_201701.nc',
 'CS01-EXO/2017/FRF-ocean_waterquality_CS01-EXO_201704.nc',
 'CS01-EXO/2017/FRF-ocean_waterquality_CS01-EXO_201705.nc',
 'CS01-EXO/2017/FRF-ocean_waterquality_CS01-EXO_201706.nc',
 'CS01-EXO/2017/FRF-ocean_waterquality_CS01-EXO_201707.nc',
 'CS01-EXO/2017/FRF-ocean_waterquality_CS01-EXO_201708.nc',
 'CS01-EXO/2017/FRF-ocean_waterquality_CS01-EXO_201711.nc',
 'CS01-EXO/2017/FRF-ocean_waterquality_CS01-EXO_201712.nc',
'CS01-EXO/2018/FRF-ocean_waterquality_CS01-EXO_201801.nc',
]

nc=netCDF4.Dataset(url_base+url)

### Validate Variables:

Check for variables present in the data, along with their length. If the lengths differ (indicating missing data) it's probably best to just drop the variable from further consideration. If you really need it, you might be able to pad it to match the size of the other vars...

In [3]:
print('Variables present in the file, and their length (based on np.shape:')
for var in nc.variables:
    print(var, np.shape(nc.variables[var]))

Variables present in the file, and their length (based on np.shape:
latitude ()
longitude ()
station_name (64,)
time (145947,)
gaugeDepth (145947,)
gaugeDepth_raw (145947,)
gaugeDepthQCFlag (145947,)
waterTemperature (145947,)
waterTemperature_raw (145947,)
waterTemperatureQCFlag (145947,)
salinity (145947,)
salinity_raw (145947,)
salinityQCFlag (145947,)
pH (145947,)
pH_raw (145947,)
pHQCFlag (145947,)
turbidity (145947,)
turbidity_raw (145947,)
turbidityQCFlag (145947,)
chlorophyll (145947,)
chlorophyll_raw (145947,)
chlorophyllQCFlag (145947,)
blueGreenAlgae (145947,)
blueGreenAlgae_raw (145947,)
blueGreenAlgaeQCFlag (145947,)
fDOM (145947,)
fDOM_raw (145947,)
fDOMQCFlag (145947,)
DOsat (145947,)
DOsat_raw (145947,)
DOsatQCFlag (145947,)
DOmassConc (145947,)
DOmassConc_raw (145947,)
DOmassConcQCFlag (145947,)
batteryV (145947,)
externalV (145947,)
wiperCurrent (145947,)


#### Reading the data stream and writing a subset to a comma-separated values file - CS01 and CS02 EXO - Station Data by Month

In [167]:
for url in urls:
    nc=netCDF4.Dataset(url_base+url)
    times=nc.variables['time']

    csdict={
    'water_temp' : nc.variables['waterTemperature'],
    'salinity' : nc.variables['salinity'],
    'pH' : nc.variables['pH'],
    'turbidity' : nc.variables['turbidity'],
    'chlorophyll' : nc.variables['chlorophyll'],
    'DOsat' : nc.variables['DOsat'],
    'DOmass' : nc.variables['DOmassConc'],
    'bV' : nc.variables['batteryV']
    }

    dt = netCDF4.num2date(times[:],times.units)
    df = pd.DataFrame(csdict,index=dt)
    df.to_csv('./'+url[14:-3]+'.csv')

  v = np.array(v, copy=False)


#### Reading the data stream and writing a subset to a comma-separated values file - CS01 and CS02 EXO - All Station Data

In [3]:
times=nc.variables['time']

csdict={
'water_temp' : nc.variables['waterTemperature'],
'salinity' : nc.variables['salinity'],
'pH' : nc.variables['pH'],
'turbidity' : nc.variables['turbidity'],
'chlorophyll' : nc.variables['chlorophyll'],
'DOsat' : nc.variables['DOsat'],
'DOmass' : nc.variables['DOmassConc'],
}

dt = netCDF4.num2date(times[:],times.units)
df = pd.DataFrame(csdict,index=dt)
df.to_csv('./FRF_'+url[9:-5]+'_ALL.csv')

  v = np.array(v, copy=False)


#### Parse the combined station data into files by year and month

The water quality data collected by the FRF for Currituck Sound spanned three years, from 2016 through early 2018. Initially, we captured these data from the FRF Thredds server by station. Here, we parse each station's data by year and month and save to new discrete comma-separated value files.

In [52]:
file_path='/Users/paulp/GoogleDrive/projects/CurrituckSnd/CS_Stations/'
stations=['FRF_CS01-EXO_ALL.csv','FRF_CS02-EXO_ALL.csv']
years=['2016','2017','2018']
months=['01','02','03','04','05','06','07','08','09','10','11','12']

for station in stations:
    print('for station:', station)
    dfw=pd.read_csv(file_path+station[4:8]+'/'+station, index_col='datetime')
    for year in years:
        print('   Processing year:', year)
        for month in months:
            start_date = year+'-'+month+'-01 00:00:00'
            end_date = year+'-'+month+'-31 59:59:59.9999999'

            #print(index, start_date, end_date)
            dfo=dfw.loc[(dfw.index >= start_date) & (dfw.index <= end_date)].copy()
            if dfo.empty:
                print('      No data for', month, year)
            else:
                dfo.to_csv(file_path+station[4:8]+'/'+'FRF_'+station[4:8]+'_EXO_'+year+month+'.csv')
        
print('All done!')

for station: FRF_CS01-EXO_ALL.csv
   Processing year: 2016
      No data for 01 2016
   Processing year: 2017
      No data for 02 2017
      No data for 03 2017
      No data for 10 2017
   Processing year: 2018
      No data for 02 2018
      No data for 03 2018
      No data for 04 2018
      No data for 05 2018
      No data for 06 2018
      No data for 07 2018
      No data for 08 2018
      No data for 09 2018
      No data for 10 2018
      No data for 11 2018
      No data for 12 2018
for station: FRF_CS02-EXO_ALL.csv
   Processing year: 2016
      No data for 01 2016
      No data for 02 2016
   Processing year: 2017
      No data for 02 2017
      No data for 03 2017
      No data for 11 2017
      No data for 12 2017
   Processing year: 2018
      No data for 01 2018
      No data for 02 2018
      No data for 03 2018
      No data for 04 2018
      No data for 05 2018
      No data for 06 2018
      No data for 07 2018
      No data for 08 2018
      No data for 09 2018
  

## Spare Parts...

#### Many ways to convert epoch to Python (day)date/time format. Since the time is UTC only the third and fourth options are useful, unless we add the +5 hour offset to adjust local (EST) to UT...

In [80]:
#print( time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(nc.variables['time'][0]) ) )
print( datetime.datetime.fromtimestamp(nc.variables['time'][0]).strftime('%c') ) 
datetime.datetime.utcfromtimestamp(nc.variables['time'][0]).strftime("%Y-%m-%d %H:%M:%S")
#print( time.strftime("%Y-%m-%d %H:%M:%S", time.gmtime(nc.variables['time'][0])) )

Wed Nov 30 19:00:18 2016


'2016-12-01 00:00:18'