# GOA-ON Inventory Assessment
Identifying gaps (empty attributes) and sources and distribution of those gaps.   
Emilio Mayorga, 4/18/2016

In [1]:
from collections import OrderedDict
import pandas as pd

In [2]:
# Use an OrderedDict to preserve column order
columns_rename = OrderedDict([
    ('#', 'id'),
    ('Overlaps with', 'overlaps_with'),
    ('Source_Doc', 'Source_Doc'),
    ('Platform_type', 'Platform_type'),
    ('Platform Name', 'platform_name'),
    ('Type', 'Type'),
    ('Organization_structured', 'Organization'),
    ('Organization abbreviation', 'organization_abbreviation'),
    ('Department', 'Department'),
    ('City', 'City'),
    ('Country', 'Country'),
    ('Project', 'Project'),
    ('Additional Organization(s)', 'additional_organizations'),
    ('Agency_structured', 'agency'),
    ('URL', 'url'),
    ('Contact_name', 'Contact_name'),
    ('Email', 'Email'),
    ('Data link (structured)', 'data_link'),
    ('Deploy Date.1', 'deploy_date'),
    ('Frequency', 'Frequency'),
    ('Duration', 'Duration'),
    ('Sensors.1', 'Sensors'),
    ('Parameters', 'Parameters'),
    ('Parameter planed to be added', 'parameter_planned'),
    ('Method', 'Method'),
    ('Depth_range', 'Depth_range'),
    ('Comments', 'Comments'),
    ('Longitude', 'Longitude'),
    ('Latitude', 'Latitude'),
    ('Location.1', 'Location'),
    ('Method Documentation.1', 'Method_Documentation')
])

In [13]:
# !cd <hidden file path>

In [4]:
df_xls = pd.read_excel('./GOA-ON_structering.xls', 'original')

In [5]:
# Retain only pre-defined subset of columns, then rename them
df_xls_cols = df_xls[columns_rename.keys()].copy()
df = df_xls_cols.rename_axis(columns_rename, axis=1)

In [6]:
# Recode NaN's (nulls) as the string 'NA'
df = df.fillna({'Platform_type': 'NA', 'Type': 'NA'})
len(df)

513

Number of records in the inventory (after removing one bad record)

### Distribution by Platform_type and "Source_Doc" (the origin of each record)

Distribution of `Platform_type` codes. "NA" is the recoded null entries. Half of the records are not coded ("NA"/null)!

In [7]:
df.Platform_type.value_counts()

NA        262
STS       148
M          67
VOS        28
FOTS        6
OP          1
M; STS      1
Name: Platform_type, dtype: int64

Uncoded records originate in all sources; though records from the original GOA-ON inventory may be coded more easily, because they originate in spreadsheets that are organized by essentially platform types.

In [8]:
df.groupby(['Source_Doc', 'Platform_type']).size()

Source_Doc               Platform_type
Bjoern_Times_Series_OCB  M                  3
                         NA               105
                         STS              138
FixedTimeSeries          FOTS               6
                         NA                73
                         OP                 1
                         STS                9
Mooring                  M                 64
                         M; STS             1
                         NA                46
ShipbasedTime_Series     NA                38
                         STS                1
VOS                      VOS               28
dtype: int64

`Type` is the category used in the original GOA-ON inventory. Note that ~ 2/3 of the records have no Type entry ("NA"). Most of those (246) are from the "Bjoern_Times_Series_OCB" `Source_Doc`.

In [9]:
df.Type.value_counts().head(20)

NA                                                331
NOAA OA Coral Reef Monitoring Site                 20
NOAA OA Mooring (Coastal)                          17
Intertidal (surf-zone) station                     15
ChloroGIN - ANTARES Station                        12
NOAA OA Mooring (Open Ocean)                        9
Inner-shelf mooring (15-20 meter depth)             8
Open Ocean Mooring                                  8
Ship-based Time Series                              6
IOLR Hydrographic cruise station                    6
SmartBuoy                                           5
Fixed time series.                                  4
Shelf Station                                       4
NOAA OA Mooring (Coral Reef)                        4
IOLR Coastal Beach Rock Monitoring                  4
Mooring in tropical Atlantic Brazilian islands      3
Seafloor, surface, and mid-water sampling.          3
Coastal Observatory                                 2
Float                       

### Distribution of records with no latitude & langitude

In [10]:
df_nolatlon = df[(df.Latitude.isnull()) & (df.Longitude.isnull())]

In [11]:
len(df_nolatlon)

96

28 of these records are from the "Bjoern_Times_Series_OCB". Nearly all the remaining ones, from the GOA-ON inventory, are from ship records for which latitude and longitude for a representative point can be obtained from the ship track.

In [12]:
df_nolatlon.groupby(['Source_Doc', 'Platform_type']).size()

Source_Doc               Platform_type
Bjoern_Times_Series_OCB  NA               27
                         STS               1
FixedTimeSeries          STS               1
ShipbasedTime_Series     NA               38
                         STS               1
VOS                      VOS              28
dtype: int64