## Sensor Location Review Process


**In this notebook:**

1. The pressure array in the data file is compared to the deployment information contained in an external file provided by the Data Operators. The sensor depth location in the deployment file corresponds to the depth value recorded at the time the sensor was placed in the water.


> **Pressure Comparison Method:**
<blockquote> Compares the sensor deployment depth to the average pressure calculated from the data file after eliminating erroneous data (e.g., data outside of global ranges and data above 3 standard deviations from the mean).</blockquote>


2. The latitude and longitude values in the data file and compared to the sensor latitude and longitude in the deployment information file. The sensor coordinates information in the deployment file corresponds to the coordinated recorded values at the time the sensor was put in the water. 


> **Watch Circle Limit Method:**
<blockquote> Calculates the distance between deployment coordinates information and the data file Lat/Lon values using the great-circle distance. The difference is calculated in km.</blockquote>



### Notebook Outline:

- [Python Packages.](#1)
- [Load Data File.](#2)
    - [Extract the Deployment Number.](#21)
    - [Deployment Validation Check.](#22)
- [Load Deployment File.](#3)
- [Extract the Pressure Array.](#4)
- [Pressure Comparison.](#5)
    - [Review the Pressure Array.](#6)
    - [Calculate Basic Statistics.](#7)
    - [Compare Pressure to the Deployment Depth.](#8)
- [Watch Circle Limit.](#9)
    - [Calculate the Lon/Lat Difference in km.](#10)
- [Review Process Finding Summary.](#11)
- [Sensor Location Review.](#12)

<span style='color:Orange' size=20 > **Attention:** </span> 
- To run the notebook, you need to follow the septs in order.
- For the code cell, run the cell before you move on to the next one. 
    - **Remember**: The output of a cell may be an input in the next cell.

<a id="1"></a>
### Python Packages.

In [4]:
import xarray as xr
import pandas as pd
import numpy as np
from geopy.distance import geodesic

<a id="2"></a>
###  Load Data File.

In [9]:
# Use the path to your data file to change your directory.
%cd '/Users/leilabelabassi/Desktop/TAMU/online-class/612-DataQuality4theGeosciences/class_material/Module3_DataFiles_telemetered-GP03FLMB-RIM01-02-CTDMOG060/'
# List content of the current directory.
%ls

/Users/leilabelabassi/Desktop/TAMU/online-class/612-DataQuality4theGeosciences/class_material/Module3_DataFiles_telemetered-GP03FLMB-RIM01-02-CTDMOG060
deployment0001_GP03FLMB-RIM01-02-CTDMOG060-telemetered-ctdmo_ghqr_sio_mule_instrument_20130724T100001-20140227T140001.nc
deployment0002_GP03FLMB-RIM01-02-CTDMOG060-telemetered-ctdmo_ghqr_sio_mule_instrument_20140620T040001-20141109T000001.nc
deployment0003_GP03FLMB-RIM01-02-CTDMOG060-telemetered-ctdmo_ghqr_sio_mule_instrument_20150609T000001-20160209T220001.nc
deployment0004_GP03FLMB-RIM01-02-CTDMOG060-telemetered-ctdmo_ghqr_sio_mule_instrument_20161008T080001-20161219T000001.nc
deployment0007_GP03FLMB-RIM01-02-CTDMOG060-telemetered-ctdmo_ghqr_sio_mule_instrument_20190928T000001-20200118T200001.nc


In [10]:
# Load data
file_content = xr.open_dataset('deployment0004_GP03FLMB-RIM01-02-CTDMOG060-telemetered-ctdmo_ghqr_sio_mule_instrument_20161008T080001-20161219T000001.nc', mask_and_scale=False) 

# Print content
file_content

<a id="21"></a>
#### Extract the Deployment Number.

In [11]:
deployment_num = np.unique(file_content['deployment'])[0]
deployment_num

4

<a id="22"></a>
**Deployment Validation Check.** 
- The deployment number should match the deployment number in the file name.

'deployment<span style='color:Purple'> 0004 </span> _GP03FLMB-RIM01-02-CTDMOG060-telemetered-ctdmo_ghqr_sio_mule_instrument_20161008T080001-20161030T120001.nc'

<a id="3"></a>
### Load Deployment File.

In [12]:
# Use the path to the deployment file to change your directory.
%cd '/Users/leilabelabassi/Desktop/TAMU/online-class/612-DataQuality4theGeosciences/class_material/Module3_cruise_info_GP03FLMB-RIM01-02-CTDMOG060/'

# Load data
deployment_file = pd.read_csv('GP03FLMB-RIM01-02-CTDMOG060_info.csv')

# Print content
deployment_file

/Users/leilabelabassi/Desktop/TAMU/online-class/612-DataQuality4theGeosciences/class_material/Module3_cruise_info_GP03FLMB-RIM01-02-CTDMOG060


Unnamed: 0,Deployment,Cruise,Start Date,Stop Date,Mooring Asset,Node Asset,Sensor Asset,Latitude,Longitude,Deployment Depth,Water Depth
0,1,MV-1309,2013-07-24,2014-06-18,CGMGP-03FLMB-00001,,CGINS-CTDMOG-10255,50.3317,-144.401,30.0,4145
1,2,MV-1404,2014-06-20,2015-06-07,CGMGP-03FLMB-00002,,CGINS-CTDMOG-11646,50.3313,-144.398,31.0,4145
2,3,TN-323,2015-06-08,2016-07-03,CGMGP-03FLMB-00003,,CGINS-CTDMOG-12638,50.3303,-144.398,47.0,4145
3,4,RB-16-05,2016-07-04,2017-07-17,CGMGP-03FLMB-00004,,CGINS-CTDMOG-11638,50.3293,-144.398,,4146
4,5,SR17-10,2017-07-14,2018-07-25,CGMGP-03FLMB-00005,,CGINS-CTDMOG-13422,50.3777,-144.515,,4169
5,6,SR1811,2018-07-24,2019-09-27,CGMGP-03FLMB-00006,,CGINS-CTDMOG-10225,50.3295,-144.398,,4145
6,7,SKQ201920S,2019-09-27,,CGMGP-03FLMB-00007,,CGINS-CTDMOG-10218,50.3755,-144.514,,4176


In [13]:
# Extract the deployment information using the deployment column
# and the deployment_num variable defined in the previous cell. 
deployment_x = deployment_file[deployment_file['Deployment'] == deployment_num]

# Print row
deployment_x

Unnamed: 0,Deployment,Cruise,Start Date,Stop Date,Mooring Asset,Node Asset,Sensor Asset,Latitude,Longitude,Deployment Depth,Water Depth
3,4,RB-16-05,2016-07-04,2017-07-17,CGMGP-03FLMB-00004,,CGINS-CTDMOG-11638,50.3293,-144.398,,4146


<a id="4"></a>
### Extract the Pressure Array.
- What variable to use to check the pressure array?
- What pressure variable has the science unit dbar?
- Get the pressure array.
- Get and display the pressure attributes. 

In [14]:
# What variable to use to check the pressure array?
# List variable names.
list_variables = file_content.variables.keys()

# Select the variables with the keyword pressure.
pressure_name = [x for x in tuple(list_variables) if 'pressure' in x]
print(pressure_name)

['ctdmo_seawater_pressure_qc_executed', 'ctdmo_seawater_pressure_qc_results', 'pressure', 'ctdmo_seawater_pressure']


In [15]:
# What pressure variable has the science unit dbar?
# Select variable with the unit dbar.
for x in pressure_name:
    try: 
        x_unit = file_content[x].attrs['units']
        if x_unit == 'dbar':
            print('Pass:', x)
    except KeyError:
        print('Fail:', x)

Fail: ctdmo_seawater_pressure_qc_executed
Fail: ctdmo_seawater_pressure_qc_results
Pass: ctdmo_seawater_pressure


In [16]:
# Get the pressure array.
# Use the name of the variable that Passed the unit test
pressure = file_content['ctdmo_seawater_pressure']

In [17]:
# Get the pressure attributes.
# Print the attribute names.
pressure.attrs.keys()

dict_keys(['_FillValue', 'comment', 'long_name', 'coordinates', 'data_product_identifier', 'standard_name', 'units', 'ancillary_variables'])

In [19]:
# Put in a dataframe the pressure attributes to look at the content.
df = pd.DataFrame()
df0 = pd.DataFrame({
                    'Long Name':[pressure.long_name],
                    'Standard Name': [pressure.standard_name],
                    'Comment': [pressure.comment],
                    'Coordinates': [pressure.coordinates],                    
                    'Units': [pressure.units],
                    'Fill_values': [pressure._FillValue],
                    'Ancillary Variables': [pressure.ancillary_variables],
                    'Data Product Identifier': [pressure.data_product_identifier]
    
                    }, index=['Pressure'])

df = df.append(df0)
pd.set_option('display.max_colwidth', None)
df.T

Unnamed: 0,Pressure
Long Name,Seawater Pressure
Standard Name,sea_water_pressure
Comment,Seawater Pressure refers to the pressure exerted on a sensor in situ by the weight of the column of seawater above it. It is calculated by subtracting one standard atmosphere from the absolute pressure at the sensor to remove the weight of the atmosphere on top of the water column. The pressure at a sensor in situ provides a metric of the depth of that sensor.
Coordinates,time lat lon pressure
Units,dbar
Fill_values,-9999
Ancillary Variables,pressure
Data Product Identifier,PRESWAT_L1


<a id="5"></a>
### Pressure Comparison.
<a id="6"></a>
#### Review the Pressure Array.
- Reject Nans
- Reject Fill Values
- Reject Global Ranges
- Reject Extreme Values (use [-1e7, 1e7])
- Reject outliers beyond 3 standard deviations of the mean.
- Add a note to review the pressure variable

In [20]:
# Reject Nans.
# Use function: ~np.isnan()
p_nonan = pressure.values[f]

# Calculate the number of data point that are Nans.
len_nan = len(pressure) - len(p_nonan)

In [21]:
# Reject fill values. 
# Use operand: !=
# Use pressure._FillValue: returns the data fill value (-9999, see previous output).
p_nonan_nofv = p_nonan[p_nonan != pressure._FillValue]

# Calculate the number of data point that are fill values.
len_nan_fv = len(pressure) - len(p_nonan_nofv)

In [22]:
# Reject data outside global ranges.
# Use operands:( >= )  & (  <= )
# Use pressure global ranges: [0, 6000] dbar
p_nonan_nofv_gr = p_nonan_nofv[(p_nonan_nofv >= 0) & (p_nonan_nofv <= 6000)]

# Calculate the number of data point that are outside [0,6000].
len_nan_fv_gr = len(pressure) - len(p_nonan_nofv_gr)

In [23]:
# Reject extreme values.
# Use operands:( > )  & (  < )
# Use extreme values: [-1e7, 1e7]
p_nonan_nofv_gr_ev = p_nonan_nofv_gr[(p_nonan_nofv_gr > -1e7) & (p_nonan_nofv_gr < 1e7)]

# Calculate the number of data point that are outside [-1e7, 1e7].
len_nan_fv_gr_ev = len(pressure) - len(p_nonan_nofv_gr_ev)

In [24]:
# Reject outliers beyond 3 standard deviations of the mean.
# Use standard deviation function: np.nanstd
stdev = np.nanstd(p_nonan_nofv_gr_ev)

# Use function to calculate the mean: np.nanmean()
mean_pressure = np.nanmean(p_nonan_nofv_gr_ev)

# Use formula: abs(data - np.nanmean(data)) < 3 * stdev 
p_nonan_nofv_gr_ev_std = p_nonan_nofv_gr_ev[abs(p_nonan_nofv_gr_ev - mean_pressure) < 3 * stdev]

# Calculate the number of data point that are outside 3 standard deviations of the mean
len_nan_fv_gr_ev_std = len(pressure) - len(p_nonan_nofv_gr_ev_std)

In [25]:
# Add a note to report on when the pressure array is not valid 
# Not valid:  all Nans 
#          or all fill values 
#          or all values outside of global ranges 
#          or all values are extreme values.

notes = ['']
if len(pressure) > 0 and len(p_nonan) == 0: # NaNs
    notes.append('Pressure variable all NaNs')
elif len(pressure) > 0 and len(p_nonan) > 0 and len(p_nonan_nofv) == 0: # fill values
    notes.append('Pressure variable all fill values')
elif len(pressure) > 0 and len(p_nonan) > 0 and len(p_nonan_nofv) > 0 and len(p_nonan_nofv_gr) == 0: # outside of global ranges
    notes.append('Pressure variable outside of global ranges')
elif len(pressure) > 0 and len(p_nonan) > 0 and len(p_nonan_nofv) > 0 and len(p_nonan_nofv_gr) == 0  and len(p_nonan_nofv_gr_ev) == 0:
    notes.append('Pressure variable are beyond (+/-)1e7 ')

    
print(notes)

['']


<a id="7"></a>
#### Calculate Basic Statistics.
- Mean Pressure: np.nanmean()
- Maximum Pressure: np.nanmax()
- Minimum Pressure: np.nanmin()
- Pressure Standard Deviation: np.nanstd()


1. Basic statistic can only be calculated after the data is cleaned up of erroneous values. 
2. The clean version of the pressure dataset to use for basic statistic is: **p_nonan_nofv_gr_ev_std**

In [26]:
pressure_mean = round(np.nanmean(p_nonan_nofv_gr_ev_std), 2)
pressure_max = round(np.nanmax(p_nonan_nofv_gr_ev_std), 2)
pressure_min = round(np.nanmin(p_nonan_nofv_gr_ev_std), 2)
pressure_std = round(np.nanstd(p_nonan_nofv_gr_ev_std), 2)
print('mean: ', pressure_mean)
print('max: ', pressure_max)
print('min: ', pressure_min)
print('std: ', pressure_std)

mean:  37.76
max:  40.04
min:  35.22
std:  1.1


<a id="8"></a>
#### Compare Pressure to the Deployment Depth.
- Extract the deployment depth defined in the deployment file.
- If deployment depth is known compare it to the mean pressure. 

In [27]:
# Extract the deployment depth in the deployment_x datafame define previously.
Deployment_Depth = deployment_x['Deployment Depth'].values[0]

# Print the value of Deployment_Depth
print(Deployment_Depth)

nan


In [28]:
# Compare the mean pressure to the deployment depth if not nan.
if ~np.isnan(Deployment_Depth):
    depth_diff = pressure_mean - Deployment_Depth
    pressure_comparison_note = 'Pass: Deployment Depth Available'
else:
    depth_diff = np.nan
    pressure_comparison_note = 'Fail: Deployment Depth Missing'

# Print the value of Deployment_Depth
print(pressure_comparison_note, ' (', depth_diff, ')')

Fail: Deployment Depth Missing  ( nan )


<a id="9"></a>
### Watch Circle Limit.
<a id="10"></a>
#### Calculate the Lon/Lat Difference in Km.

In [29]:
# Get the lat and lon from the metadata stored under the file attributes.
loc1 = [file_content.attrs['lat'], file_content.attrs['lon']]
print('Data File Info: ', loc1)

# Get the lat and lon from the deployment file (Use deployment_x).
loc0 = [deployment_x['Latitude'].values[0], deployment_x['Longitude'].values[0]]
print('Deployment File Info: ', loc0)

# Calculate the distance between coordinates using the great-circle distance.
diff_loc = round(geodesic(loc0, loc1).kilometers, 3)
print('The great-circle distance: ', diff_loc, ' km')

Data File Info:  [50.32925, -144.398]
Deployment File Info:  [50.3293, -144.398]
The great-circle distance:  0.006  km


<a id="11"></a>
### Review Process Finding Summary. 
- Summarize the pressure review process findings to report on the sensor location evaluation.

In [31]:
df = pd.DataFrame()
df0 = pd.DataFrame({
                    'Mean Pressure':[pressure_mean],
                    'Maximum Pressure': [pressure_max],
                    'Minimum Pressure': [pressure_min],
                    'Pressure Comparison':[pressure_comparison_note],
                    'STD': [pressure_std],                    
                    'NaN': [len_nan],
                     pressure._FillValue:[len_nan_fv],
                    '0, 6000 dbar': [len_nan_fv_gr],
                    '1e7':[len_nan_fv_gr_ev],
                    'Mean +/- 3 STD': [len_nan_fv_gr_ev_std],
                    'Notes': [notes],
                    'Great-Circle Distance (km)': [diff_loc]              
                    }, index=['Results'])

df = df.append(df0)
pd.set_option('display.max_colwidth', None)
df.T

Unnamed: 0,Results
Mean Pressure,37.76
Maximum Pressure,40.04
Minimum Pressure,35.22
Pressure Comparison,Fail: Deployment Depth Missing
STD,1.1
,0
-9999.0,0
"0, 6000 dbar",0
1e7,0
Mean +/- 3 STD,0


<a id="12"></a>
## Sensor Location Review. 
- The data quality of the pressure datasets in the file seem to be of good quality. 
- There are no erroneous values that raise concerns. 
- The missing deployment depth in the deployment file should be reported to add it to the file or let the user know its status. 
- The great-circle distance of 0.006 km should be compared to what is acceptable of the current sensor deployment condition. 
- This information is not included in the file metadata, which calls for the system to locate the "Watch Circle Limit" information. 

### END