# Deep Sea Corals
## Coral Records from NOAA’s Deep-Sea Coral Research and Technology Program

<table>
<tr>
    <td> <img src="images/NOAA_Flag.svg " alt="Photo by Q.U.I on Unsplash" style="height:300px"> </td>
    <td> <img src="images/vlad-tchompalov-LsIXVKThAG0-unsplash.jpg" style="height:300px"> </td>
</tr>
</table>

### Context

This dataset contains information about deep sea corals and sponges collected by NOAA and NOAA’s partners. Amongst the data are geo locations of deep sea corals and sponges and the whole thing is tailored to the occurrences of azooxanthellates - a subset of all corals and all sponge species (i.e. they don't have symbiotic relationships with certain microbes). Additionally, these records only consists of observations deeper than 50 meters to truly focus on the deep sea corals and sponges.

### Content

Column descriptions:

- CatalogNumber: Unique record identifier assigned by the Deep-Sea Coral Research and Technology Program.
- DataProvider: The institution, publication, or individual who ultimately deserves credit for acquiring or aggregating the data and making it available.
- ScientificName: Taxonomic identification of the sample as a Latin binomial.
- VernacularNameCategory: Common (vernacular) name category of the organism.
- TaxonRank: Identifies the level in the taxonomic hierarchy of the ScientificName term.
- ObservationDate: Time as hh:mm:ss when the sample/observation occurred (UTC).
- Latitude (degrees North): Latitude in decimal degrees where the sample or observation was collected.
- Longitude (degrees East): Longitude in decimal degrees where the sample or observation was collected.
- DepthInMeters: Best single depth value for sample as a positive value in meters.
- DepthMethod: Method by which best singular depth in meters (DepthInMeters) was determined. "Averaged" when start and stop depths were averaged. "Assigned" when depth was derived from bathymetry at the location. "Reported" when depth was reported based on instrumentation or described in literature.
- Locality: A specific named place or named feature of origin for the specimen or observation (e.g., Dixon Entrance, Diaphus Bank, or Sur Ridge). Multiple locality names can be separated by a semicolon, arranged in a list from largest to smallest area (e.g., Gulf of Mexico; West Florida Shelf, Pulley Ridge).
- IdentificationQualifier: Taxonomic identification method and level of expertise. Examples: “genetic ID”; “morphological ID from sample by taxonomic expert”; “ID by expert from image”; “ID by non-expert from video”; etc.
- SamplingEquipment: Method of data collection. Examples: ROV, submersible, towed camera, SCUBA, etc.
- RecordType: Denotes the origin and type of record. published literature ("literature"); a collected specimen ("specimen"); observation from a still image ("still image"); observation from video ("video observation"); notation without a specimen or image ("notation"); or observation from trawl surveys, longline surveys, and/or observer records ("catch record").

## Business Understanding

**Main Goal**: Creatin a research costal resort for marince science. 

**Guading Questions**: 

1. Which part of the world has the most coral research activities?
2. How diverse are corals in certain areas of the world
3. What kind of instrument is needed for doing coral research?
4. Which institution/organization would be willing to be partners?

## Data Understanding

In [1]:
import numpy as np
import pandas as pd
import chart_studio
import chart_studio.plotly as py
import plotly.graph_objects as go

chart_studio.tools.set_credentials_file(username='grilhami123', api_key='iYyJdIjUQHtkNsT02gKr')

### Load Data

In [2]:
df = pd.read_csv("../deep_sea_corals.csv")
df = df.iloc[1:]


Columns (5,7,8,13) have mixed types. Specify dtype option on import or set low_memory=False.



### Explore Data

In [3]:
df.head()

Unnamed: 0,CatalogNumber,DataProvider,ScientificName,VernacularNameCategory,TaxonRank,Station,ObservationDate,latitude,longitude,DepthInMeters,DepthMethod,Locality,LocationAccuracy,SurveyID,Repository,IdentificationQualifier,EventID,SamplingEquipment,RecordType,SampleID
1,625366.0,"NOAA, Deep Sea Coral Research & Technology Pro...",Madrepora oculata,stony coral (branching),species,D2-EX1504L3-05,2015-09-02,18.30817,-158.45392,959.0,reported,"Hawaiian Archipelago, Swordfish Seamount",50m,Hohonu Moana: Exploring Deep Waters off Hawai'i,University of Hawaii,ID by expert from video,D2-EX1504L3-05,ROV,video observation,EX1504L3_05_20150901T181522Z.mp4_05:45:26:28
2,625373.0,"NOAA, Deep Sea Coral Research & Technology Pro...",Madrepora oculata,stony coral (branching),species,D2-EX1504L3-05,2015-09-01,18.30864,-158.45393,953.0,reported,"Hawaiian Archipelago, Swordfish Seamount",50m,Hohonu Moana: Exploring Deep Waters off Hawai'i,University of Hawaii,ID by expert from video,D2-EX1504L3-05,ROV,video observation,EX1504L3_05_20150901T181522Z.mp4_05:24:35:53
3,625386.0,"NOAA, Deep Sea Coral Research & Technology Pro...",Madrepora oculata,stony coral (branching),species,D2-EX1504L3-05,2015-09-01,18.30877,-158.45384,955.0,reported,"Hawaiian Archipelago, Swordfish Seamount",50m,Hohonu Moana: Exploring Deep Waters off Hawai'i,University of Hawaii,ID by expert from video,D2-EX1504L3-05,ROV,video observation,EX1504L3_05_20150901T181522Z.mp4_05:15:22:09
4,625382.0,"NOAA, Deep Sea Coral Research & Technology Pro...",Madrepora oculata,stony coral (branching),species,D2-EX1504L3-05,2015-09-01,18.30875,-158.45384,955.0,reported,"Hawaiian Archipelago, Swordfish Seamount",50m,Hohonu Moana: Exploring Deep Waters off Hawai'i,University of Hawaii,ID by expert from video,D2-EX1504L3-05,ROV,video observation,EX1504L3_05_20150901T181522Z.mp4_05:13:29:50
5,625384.0,"NOAA, Deep Sea Coral Research & Technology Pro...",Madrepora oculata,stony coral (branching),species,D2-EX1504L3-05,2015-09-01,18.30902,-158.45425,968.0,reported,"Hawaiian Archipelago, Swordfish Seamount",50m,Hohonu Moana: Exploring Deep Waters off Hawai'i,University of Hawaii,ID by expert from video,D2-EX1504L3-05,ROV,video observation,EX1504L3_05_20150901T181522Z.mp4_04:24:44:48


In [4]:
df.shape

(513372, 20)

In [5]:
df.columns

Index(['CatalogNumber', 'DataProvider', 'ScientificName',
       'VernacularNameCategory', 'TaxonRank', 'Station', 'ObservationDate',
       'latitude', 'longitude', 'DepthInMeters', 'DepthMethod', 'Locality',
       'LocationAccuracy', 'SurveyID', 'Repository', 'IdentificationQualifier',
       'EventID', 'SamplingEquipment', 'RecordType', 'SampleID'],
      dtype='object')

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 513372 entries, 1 to 513372
Data columns (total 20 columns):
CatalogNumber              513372 non-null float64
DataProvider               513372 non-null object
ScientificName             513372 non-null object
VernacularNameCategory     513197 non-null object
TaxonRank                  513364 non-null object
Station                    253590 non-null object
ObservationDate            513367 non-null object
latitude                   513372 non-null object
longitude                  513372 non-null object
DepthInMeters              513372 non-null float64
DepthMethod                496845 non-null object
Locality                   389645 non-null object
LocationAccuracy           484662 non-null object
SurveyID                   306228 non-null object
Repository                 496584 non-null object
IdentificationQualifier    488591 non-null object
EventID                    472141 non-null object
SamplingEquipment          485883 non

In [7]:
df['longitude'] = pd.to_numeric(df['longitude'])
df['latitude'] = pd.to_numeric(df['latitude'])
df['ObservationDate'] = pd.to_datetime(df['ObservationDate'])

In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 513372 entries, 1 to 513372
Data columns (total 20 columns):
CatalogNumber              513372 non-null float64
DataProvider               513372 non-null object
ScientificName             513372 non-null object
VernacularNameCategory     513197 non-null object
TaxonRank                  513364 non-null object
Station                    253590 non-null object
ObservationDate            513367 non-null datetime64[ns]
latitude                   513372 non-null float64
longitude                  513372 non-null float64
DepthInMeters              513372 non-null float64
DepthMethod                496845 non-null object
Locality                   389645 non-null object
LocationAccuracy           484662 non-null object
SurveyID                   306228 non-null object
Repository                 496584 non-null object
IdentificationQualifier    488591 non-null object
EventID                    472141 non-null object
SamplingEquipment          

In [9]:
df.describe()

Unnamed: 0,CatalogNumber,latitude,longitude,DepthInMeters
count,513372.0,513372.0,513372.0,513372.0
mean,426607.263715,36.498871,-120.292148,798.589769
std,206162.54248,13.232359,51.570693,805.991501
min,1.0,-78.9167,-179.99358,-999.0
25%,222081.75,32.843575,-130.337028,218.0
50%,469248.5,36.69471,-122.72345,539.0
75%,604061.25,42.907495,-120.412837,1137.0
max,740097.0,74.35,179.994,6369.0


In [10]:
df[df.DepthInMeters < 0].shape

(3997, 20)

In [11]:
df.isna().sum()

CatalogNumber                   0
DataProvider                    0
ScientificName                  0
VernacularNameCategory        175
TaxonRank                       8
Station                    259782
ObservationDate                 5
latitude                        0
longitude                       0
DepthInMeters                   0
DepthMethod                 16527
Locality                   123727
LocationAccuracy            28710
SurveyID                   207144
Repository                  16788
IdentificationQualifier     24781
EventID                     41231
SamplingEquipment           27489
RecordType                  12295
SampleID                   111078
dtype: int64

In [12]:
df.Repository.isna().sum()

16788

### 1. Which part of the world has the most coral research activities?

In [13]:
def general_location(location):
    if ";" in location:
        general_loc = location.split(";")[0]
        return general_loc
    elif "," in location:
        general_loc = location.split(",")[0]
        return general_loc
    else:
        return location

In [14]:
from collections import Counter 

all_locations = df.Locality.astype(str).values.tolist()
all_locations = list(map(general_location, all_locations))

all_locations_count = Counter(all_locations)
all_locations_count.most_common()

[('nan', 123727),
 ('Davidson Seamount', 40114),
 ('Northwestern Hawaiian Islands', 26766),
 ('Southern California Bight', 24965),
 ('Alaska', 24143),
 ('Pioneer Seamount', 23972),
 ('OLYMPIC COAST', 22478),
 ('Main Hawaiian Islands', 19300),
 ('Rodriguez Seamount', 18702),
 ('Central Aleutian Islands', 15094),
 ('Olympic Coast National Marine Sanctuary', 14042),
 ('Viosca Knoll', 9982),
 ('Shutter ridge', 8351),
 ('Florida', 8323),
 ('Continental slope south of Point St. George', 7005),
 ('Continental slope north of Point St. George', 6428),
 ('Hawaiian Archipelago', 5764),
 ('Aleutians', 5229),
 ('Cordell Bank National Marine Sanctuary', 4704),
 ('Monterey Bay', 3338),
 ('Eureka_W', 2604),
 ('The Footprint', 2549),
 ('Piggy_Bank', 2246),
 ('Piggy Bank', 1788),
 ('South Santa Rosa', 1660),
 ('Western Gulf of Alaska', 1619),
 ('Guide Seamount', 1599),
 ('San Juan Seamount', 1594),
 ('Monterey Canyon', 1453),
 ('Off Florida', 1428),
 ('off California', 1363),
 ('Santa Monica Cyn', 1282)

In [15]:
df['GeneralLocality'] = all_locations

In [16]:
values = df.GeneralLocality.value_counts(normalize=True).values.tolist()[1:]

value_list = [value for value in values if value < 0.01]

value_first_index = values.index(value_list[0])

counts = df.GeneralLocality.value_counts().values.tolist()[1:][:value_first_index]
locations = df.GeneralLocality.value_counts().index.tolist()[1:][:value_first_index]

In [17]:
# UNCOMMENT THE CODE BELOW TO GENERATE
# PLOT ON PLOTLY

# fig = go.Figure(data=[go.Pie(labels=locations, values=counts)])
# fig.update_layout(
#         title = 'Coral Reef Observation Locations',
#     )
# py.plot(fig, filename = 'coral-reef-location-pie-chart', auto_open=True)

# if you wish to display the chart in the notebook
# comment the line above and uncomment below
# fig.show()

In [18]:
location_df = df[df.GeneralLocality.isin(locations)]
location_df.head()

Unnamed: 0,CatalogNumber,DataProvider,ScientificName,VernacularNameCategory,TaxonRank,Station,ObservationDate,latitude,longitude,DepthInMeters,...,Locality,LocationAccuracy,SurveyID,Repository,IdentificationQualifier,EventID,SamplingEquipment,RecordType,SampleID,GeneralLocality
1,625366.0,"NOAA, Deep Sea Coral Research & Technology Pro...",Madrepora oculata,stony coral (branching),species,D2-EX1504L3-05,2015-09-02,18.30817,-158.45392,959.0,...,"Hawaiian Archipelago, Swordfish Seamount",50m,Hohonu Moana: Exploring Deep Waters off Hawai'i,University of Hawaii,ID by expert from video,D2-EX1504L3-05,ROV,video observation,EX1504L3_05_20150901T181522Z.mp4_05:45:26:28,Hawaiian Archipelago
2,625373.0,"NOAA, Deep Sea Coral Research & Technology Pro...",Madrepora oculata,stony coral (branching),species,D2-EX1504L3-05,2015-09-01,18.30864,-158.45393,953.0,...,"Hawaiian Archipelago, Swordfish Seamount",50m,Hohonu Moana: Exploring Deep Waters off Hawai'i,University of Hawaii,ID by expert from video,D2-EX1504L3-05,ROV,video observation,EX1504L3_05_20150901T181522Z.mp4_05:24:35:53,Hawaiian Archipelago
3,625386.0,"NOAA, Deep Sea Coral Research & Technology Pro...",Madrepora oculata,stony coral (branching),species,D2-EX1504L3-05,2015-09-01,18.30877,-158.45384,955.0,...,"Hawaiian Archipelago, Swordfish Seamount",50m,Hohonu Moana: Exploring Deep Waters off Hawai'i,University of Hawaii,ID by expert from video,D2-EX1504L3-05,ROV,video observation,EX1504L3_05_20150901T181522Z.mp4_05:15:22:09,Hawaiian Archipelago
4,625382.0,"NOAA, Deep Sea Coral Research & Technology Pro...",Madrepora oculata,stony coral (branching),species,D2-EX1504L3-05,2015-09-01,18.30875,-158.45384,955.0,...,"Hawaiian Archipelago, Swordfish Seamount",50m,Hohonu Moana: Exploring Deep Waters off Hawai'i,University of Hawaii,ID by expert from video,D2-EX1504L3-05,ROV,video observation,EX1504L3_05_20150901T181522Z.mp4_05:13:29:50,Hawaiian Archipelago
5,625384.0,"NOAA, Deep Sea Coral Research & Technology Pro...",Madrepora oculata,stony coral (branching),species,D2-EX1504L3-05,2015-09-01,18.30902,-158.45425,968.0,...,"Hawaiian Archipelago, Swordfish Seamount",50m,Hohonu Moana: Exploring Deep Waters off Hawai'i,University of Hawaii,ID by expert from video,D2-EX1504L3-05,ROV,video observation,EX1504L3_05_20150901T181522Z.mp4_04:24:44:48,Hawaiian Archipelago


In [19]:
location_df.shape

(280658, 21)

In [20]:
# UNCOMMENT HERE IF YOU WISH TO DISPLAY
# THE PLOT DIRECTLY IN THE NOTEBOOK.

# fig = go.Figure(data=go.Scattergeo(
#         lon = location_df.longitude,
#         lat = location_df.latitude,
#         text = location_df.Locality,
#         mode = 'markers',
#         ))

# fig.update_layout(
#         title = 'Coral Reef Observations in North America',
#         geo_scope='north america',
#     )
# fig.show()

<img src="images/visualizations/Coral_Reef_Observations_in_North_America.png ">

In [21]:
nan_loc_df = df[df.GeneralLocality == 'nan']
nan_loc_df.shape

(123727, 21)

In [22]:
# UNCOMMENT HERE IF YOU WISH TO DISPLAY
# THE PLOT DIRECTLY IN THE NOTEBOOK.

# fig = go.Figure(data=go.Scattergeo(
#         lon = nan_loc_df.longitude,
#         lat = nan_loc_df.latitude,
#         text = nan_loc_df.Locality,
#         mode = 'markers',
#         ))

# fig.update_layout(
#         title = 'Coral Reef Observations in Unknown Locations',
#         geo_scope='world',
#     )
# fig.show()

<img src="images/visualizations/Coral_Reef_Observations_in_Unknown_Locations.png">

### How diverse are coral reefs in certain areas?

In [23]:
df.VernacularNameCategory.value_counts(normalize=True)

gorgonian coral               0.277198
sponge (unspecified)          0.150168
sea pen                       0.134405
glass sponge                  0.107701
soft coral                    0.075441
demosponge                    0.074613
black coral                   0.050805
stony coral (branching)       0.048525
lace coral                    0.041760
stony coral (cup coral)       0.020507
stony coral (unspecified)     0.008535
gold coral                    0.005183
stoloniferan coral            0.002340
calcareous sponge             0.002011
scleromorph sponge            0.000528
other coral-like hydrozoan    0.000275
lithotelestid coral           0.000006
Name: VernacularNameCategory, dtype: float64

In [24]:
# values = df.VernacularNameCategory.value_counts(normalize=True).values.tolist()

# value_list = [value for value in values if value < 0.01]

# value_first_index = values.index(value_list[0])

category_counts = df.VernacularNameCategory.value_counts().values.tolist()[:value_first_index]
category_names = df.VernacularNameCategory.value_counts().index.tolist()[:value_first_index]

In [25]:
# UNCOMMENT THE CODE BELOW TO GENERATE
# PLOT ON PLOTLY

# fig = go.Figure(data=[go.Pie(labels=category_names, values=category_counts)])

# fig.update_layout(
#         title = 'Coral Type Percentages',
#     )
# py.plot(fig, filename = 'coral-type-percentages', auto_open=True)

# fig.show()

In [26]:
df.shape

(513372, 21)

In [27]:
copy_df = df.copy()
copy_df = copy_df[copy_df.VernacularNameCategory.notnull()]
copy_df.isna().sum()

CatalogNumber                   0
DataProvider                    0
ScientificName                  0
VernacularNameCategory          0
TaxonRank                       2
Station                    259664
ObservationDate                 5
latitude                        0
longitude                       0
DepthInMeters                   0
DepthMethod                 16527
Locality                   123609
LocationAccuracy            28675
SurveyID                   206982
Repository                  16788
IdentificationQualifier     24739
EventID                     41225
SamplingEquipment           27484
RecordType                  12295
SampleID                   111078
GeneralLocality                 0
dtype: int64

In [28]:
copy_df.shape

(513197, 21)

In [29]:
coral_types = copy_df.VernacularNameCategory.value_counts().index.tolist()

color_dict = {coral_type: num+1 for num, coral_type in enumerate(coral_types)}
copy_df["ColorNum"] = [color_dict[coral] for coral in copy_df.VernacularNameCategory]

In [30]:
# UNCOMMENT THE CODE BELOW TO GENERATE
# PLOT ON PLOTLY

# fig = go.Figure()

# for coral_type, num, in color_dict.items():
#     coral_sample_df = copy_df[copy_df.VernacularNameCategory == coral_type]
    
#     fig.add_trace(go.Scattergeo(
#         lon = coral_sample_df.longitude,
#         lat = coral_sample_df.latitude,
#         text = coral_sample_df.GeneralLocality,
#         name = coral_type, 
#         mode = 'markers',
#         marker = dict(
#             color = num,
#             size = 4
#         ),
#     ))

# fig.update_layout(
#         title = 'Coral Type Diversity in The World',
#         geo_scope='world',
#         showlegend=True
#     )
# fig.show()

<img src="images/visualizations/Coral_Type_Diversity_in_The_World.png">

In [31]:
# UNCOMMENT THE CODE BELOW TO GENERATE
# PLOT ON PLOTLY

# fig = go.Figure()

# for coral_type, num, in color_dict.items():
#     coral_sample_df = copy_df[copy_df.VernacularNameCategory == coral_type]
    
#     fig.add_trace(go.Scattergeo(
#         lon = coral_sample_df.longitude,
#         lat = coral_sample_df.latitude,
#         text = coral_sample_df.GeneralLocality,
#         name = coral_type, 
#         mode = 'markers',
#         marker = dict(
#             color = num,
#             size = 4
#         ),
#     ))

# fig.update_layout(
#         title = 'Coral Type Diversity in Asia',
#         geo_scope='asia',
#         showlegend=True
#     )
# fig.show()

<img src="images/visualizations/Coral_Type_Diversity_in_Asia.png">

In [32]:
# UNCOMMENT THE CODE BELOW TO GENERATE
# PLOT ON PLOTLY

# fig = go.Figure()

# for coral_type, num, in color_dict.items():
#     coral_sample_df = copy_df[copy_df.VernacularNameCategory == coral_type]
    
#     fig.add_trace(go.Scattergeo(
#         lon = coral_sample_df.longitude,
#         lat = coral_sample_df.latitude,
#         text = coral_sample_df.GeneralLocality,
#         name = coral_type, 
#         mode = 'markers',
#         marker = dict(
#             color = num,
#             size = 4
#         ),
#     ))

# fig.update_layout(
#         title = 'Coral Type Diversity in North America',
#         geo_scope='north america',
#         showlegend=True
#     )
# fig.show()

<img src="images/visualizations/Coral_Type_Diversity_in_North_America.png">

### What kind of instrument is needed for doing coral research?

In [33]:
df.SamplingEquipment.value_counts()

ROV                    326289
submersible             70268
trawl                   51899
towed camera            19626
longline                 9481
dredge                   2840
AUV                      2535
drop camera              1262
grab                      621
net                       504
corer                     212
SCUBA                     174
multiple gears             86
trap                       41
other                      20
hook and line              12
pot                         5
Cp                          2
camera - drop               1
GMT                         1
South Pacific Ocean         1
Jsl-I-3905                  1
trawl-otter                 1
GMST                        1
Name: SamplingEquipment, dtype: int64

In [3]:
depth_df = df[df.DepthInMeters > 0]

print(f"Number of observations: {df.shape[0]}")

print(f"Number of observations after correcting the depth: {depth_df.shape[0]}")

no_record_percetage = round((df.shape[0] - depth_df.shape[0])/df.shape[0], 2)

print(f"Percentage of observations without recorded depth: {no_record_percetage}%")

Number of observations: 513372
Number of observations after correcting the depth: 508991
Percentage of observations without recorded depth: 0.01%


In [14]:
values = depth_df.SamplingEquipment.value_counts(normalize=True).values.tolist()[1:]

value_list = [value for value in values if value < 0.01]

value_first_index = values.index(value_list[0])

counts = depth_df.SamplingEquipment.value_counts().values.tolist()[:value_first_index]
devices = depth_df.SamplingEquipment.value_counts().index.tolist()[:value_first_index]

In [None]:
# fig = go.Figure(data=[go.Pie(labels=devices, values=counts)])
# fig.update_layout(
#         title = 'Uses of Sampling Equipments Percentage',
#     )
# py.plot(fig, filename = 'coral-reef-location-pie-chart', auto_open=True)

# if you wish to display the chart in the notebook
# comment the line above and uncomment below
# fig.show()

<img src="images/visualizations/Uses_of_Sampling_Equipments_Percentage.png">

In [35]:
import plotly.figure_factory as ff

In [36]:
# UNCOMMENT THE CODE BELOW TO GENERATE
# PLOT ON PLOTLY

# fig = ff.create_distplot([depth_df.DepthInMeters.values.tolist()], ["DepthInMeters"])

# fig.update_layout(
#         title = 'Distribution of Coral Depth (in Meters)',
#     )
# fig.show()

<img src="images/visualizations/Distribution_of_Coral_Depth.png">

In [37]:
rov_df = depth_df[depth_df.SamplingEquipment == "ROV"]
rov_df.shape

(325751, 21)

In [38]:
# UNCOMMENT THE CODE BELOW TO GENERATE
# PLOT ON PLOTLY

# fig = ff.create_distplot([rov_df.DepthInMeters.values.tolist()], ["DepthInMeters"])

# fig.update_layout(
#         title = 'Distribution of Coral Depth using ROV (in Meters)',
#     )
# fig.show()

<img src="images/visualizations/Distribution_of_Coral_Depth_using_ROV.png">

In [39]:
device_names = depth_df.SamplingEquipment.value_counts().index[:8].tolist()
device_df = depth_df[depth_df.SamplingEquipment.isin(device_names)]
device_df.shape

(479978, 21)

In [40]:
depth_data = []

for device in device_names:
    device_depth = device_df[device_df.SamplingEquipment == device].DepthInMeters.values.tolist()
    depth_data.append(device_depth)

In [41]:
# fig = ff.create_distplot(depth_data, device_names)

# fig.update_layout(
#         title = 'Distribution of Coral Depth using Most Equipments (in Meters)',
#     )
# fig.show()

<img src="images/visualizations/Distribution_of_Coral_Depth_using_Most_Equipments.png">

In [42]:
# Get the years of each datapoint
year_list = device_df.ObservationDate.values.astype('datetime64[s]').tolist()
obsrv_years = [date.year for date in year_list]

# Last recorded year
last_recorded_year = sorted(list(set(obsrv_years)))[-1] - 1 # from 2015

# Year range
n_years_past = 26
wanted_years = list(range(last_recorded_year + 1  - n_years_past, last_recorded_year + 1))

# Create a new column called ObservationYear
device_df['ObservationYear'] = obsrv_years


# Get devices with the desired year range
date_obs_df = device_df[device_df.ObservationYear.isin(wanted_years)]

In [43]:
# fig = go.Figure()

# all_values = []

# for device in device_names:
#     wanted_year_df = date_obs_df[date_obs_df.SamplingEquipment == device]
#     wanted_devices_df = wanted_year_df[wanted_year_df.ObservationYear.isin(wanted_years)]
#     year_freq_dict = wanted_year_df.ObservationYear.value_counts().to_dict()
    
#     value_list = []
    
#     for year in wanted_years:
        
#         if year in year_freq_dict.keys():
#             value_list.append(year_freq_dict[year])
#         else:
#             value_list.append(0)
#     all_values.append(value_list)
    
# for i in range(len(device_names)):
#     fig.add_trace(go.Bar(x=wanted_years, y=all_values[i], name=device_names[i]))
    
# fig.update_layout(
#     barmode='stack', 
#     xaxis={'categoryorder':'category ascending'},
#     title = f'Uses of Most Common Equipments from {str(wanted_years[0])}-{str(wanted_years[-1])}'
# )    
# fig.show()
# print("Hell Wolrd")

<img src="images/visualizations/Uses_of_Most_Common_Equipments.png">

In [44]:
date_obs_df[date_obs_df.ObservationYear == 2010].SamplingEquipment.value_counts()

ROV            27165
submersible     5472
trawl           3545
AUV             1493
longline         676
dredge            21
Name: SamplingEquipment, dtype: int64

### Which institution/organization would be willing to be partners?

In [45]:
df.DataProvider.value_counts(normalize=True)

Monterey Bay Aquarium Research Institute                                                           0.380773
NOAA, Alaska Fisheries Science Center                                                              0.144879
NOAA, Southwest Fisheries Science Center, Santa Cruz                                               0.084882
NOAA, Olympic Coast National Marine Sanctuary                                                      0.070668
Hawaii Undersea Research Laboratory                                                                0.068373
Smithsonian Institution, National Museum of Natural History                                        0.046446
NOAA, Office of Ocean Exploration and Research                                                     0.033880
Bureau of Ocean Energy Management                                                                  0.025297
Temple University                                                                                  0.021275
NOAA, Southwest Fisheries Sc

In [46]:
copy_df = df.copy()

orgs = copy_df.DataProvider.value_counts().index.tolist()

color_dict = {org: num+1 for num, org in enumerate(orgs)}
copy_df["ColorNum"] = [color_dict[org] for org in copy_df.DataProvider]

In [47]:
values = df.DataProvider.value_counts(normalize=True).values.tolist()[1:]

value_list = [value for value in values if value < 0.01]

value_first_index = values.index(value_list[0])

counts = df.DataProvider.value_counts().values.tolist()[:value_first_index]
organizations = df.DataProvider.value_counts().index.tolist()[:value_first_index]

In [48]:
org_df = df[df.DataProvider.isin(organizations)]

orgs = org_df.DataProvider.value_counts().index.tolist()

color_dict = {org: num+1 for num, org in enumerate(orgs)}
org_df["ColorNum"] = [color_dict[org] for org in org_df.DataProvider]

In [49]:
color_dict

{'Monterey Bay Aquarium Research Institute': 1,
 'NOAA, Alaska Fisheries Science Center': 2,
 'NOAA, Southwest Fisheries Science Center, Santa Cruz': 3,
 'NOAA, Olympic Coast National Marine Sanctuary': 4,
 'Hawaii Undersea Research Laboratory': 5,
 'Smithsonian Institution, National Museum of Natural History': 6,
 'NOAA, Office of Ocean Exploration and Research': 7,
 'Bureau of Ocean Energy Management': 8,
 'Temple University': 9,
 'NOAA, Southwest Fisheries Science Center, La Jolla': 10,
 'Harbor Branch Oceanographic Institute': 11,
 'NOAA, Northwest Fisheries Science Center': 12,
 'NOAA, Deep Sea Coral Research & Technology Program and Office of Ocean Exploration and Research': 13,
 'NOAA, Channel Islands National Marine Sanctuary': 14}

In [50]:
# # UNCOMMENT THE CODE BELOW TO GENERATE
# # PLOT ON PLOTLY

# fig = go.Figure()

# initials = {}

# for org, num, in color_dict.items():
#     data_prov_df = org_df[org_df.DataProvider == org]
    
#     if len(org.split()) > 2:
#         if "NOAA" in org:
#             name = org.replace("NOOA,", "")
#             words = name.split()
#             first_chars = [word[0] for word in words]
#             new_name = "NOOA, " + "".join(first_chars)
#         else:
#             words = org.split()
#             first_chars = [word[0] for word in words]
#             new_name = "".join(first_chars)
#     else:
#         new_name = org
        
#     if new_name != org:
#         initials[new_name] = org
    
#     fig.add_trace(go.Scattergeo(
#         lon = data_prov_df.longitude,
#         lat = data_prov_df.latitude,
#         text = data_prov_df.DataProvider,
#         name = new_name, 
#         mode = 'markers',
#         marker = dict(
#             color = num,
#             size = 4
#         ),
#     ))

# fig.update_layout(
#         title = 'Organization Coral Research Activities in The World',
#         geo_scope='world',
#         showlegend=True
#     )
# fig.show()

<img src="images/visualizations/Organization_Coral_Research_Activities_in_The_World.png">

'MBARI': 'Monterey Bay Aquarium Research Institute'

'NOOA, NAFSC': 'NOAA, Alaska Fisheries Science Center'

'NOOA, NSFSCSC': 'NOAA, Southwest Fisheries Science Center, Santa Cruz'

'NOOA, NOCNMS': 'NOAA, Olympic Coast National Marine Sanctuary'

'HURL': 'Hawaii Undersea Research Laboratory'

'SINMoNH': 'Smithsonian Institution, National Museum of Natural History'

'NOOA, NOoOEaR': 'NOAA, Office of Ocean Exploration and Research'

'BoOEM': 'Bureau of Ocean Energy Management'

'NOOA, NSFSCLJ': 'NOAA, Southwest Fisheries Science Center, La Jolla'

'HBOI': 'Harbor Branch Oceanographic Institute'

'NOOA, NNFSC': 'NOAA, Northwest Fisheries Science Center'

'NOOA, NDSCR&TPaOoOEaR': 'NOAA, Deep Sea Coral Research & Technology Program and Office of Ocean Exploration and Research'

'NOOA, NCINMS': 'NOAA, Channel Islands National Marine Sanctuary'

In [51]:
org_df.DataProvider.value_counts().index.tolist()

['Monterey Bay Aquarium Research Institute',
 'NOAA, Alaska Fisheries Science Center',
 'NOAA, Southwest Fisheries Science Center, Santa Cruz',
 'NOAA, Olympic Coast National Marine Sanctuary',
 'Hawaii Undersea Research Laboratory',
 'Smithsonian Institution, National Museum of Natural History',
 'NOAA, Office of Ocean Exploration and Research',
 'Bureau of Ocean Energy Management',
 'Temple University',
 'NOAA, Southwest Fisheries Science Center, La Jolla',
 'Harbor Branch Oceanographic Institute',
 'NOAA, Northwest Fisheries Science Center',
 'NOAA, Deep Sea Coral Research & Technology Program and Office of Ocean Exploration and Research',
 'NOAA, Channel Islands National Marine Sanctuary']

In [52]:
df[df.DataProvider == 'NOAA, Southwest Fisheries Science Center, La Jolla']

Unnamed: 0,CatalogNumber,DataProvider,ScientificName,VernacularNameCategory,TaxonRank,Station,ObservationDate,latitude,longitude,DepthInMeters,...,Locality,LocationAccuracy,SurveyID,Repository,IdentificationQualifier,EventID,SamplingEquipment,RecordType,SampleID,GeneralLocality
431,534242.0,"NOAA, Southwest Fisheries Science Center, La J...",Acanthogorgia sp.,gorgonian coral,genus,,2011-12-04,33.95954,-119.47534,170.0,...,The Footprint,20m,SWFSC_045,"NOAA, Southwest Fisheries Science Center",field ID,11-338B,ROV,still image,43142,The Footprint
432,535915.0,"NOAA, Southwest Fisheries Science Center, La J...",Acanthogorgia sp.,gorgonian coral,genus,,2011-12-04,33.96086,-119.47995,164.0,...,The Footprint,20m,SWFSC_045,"NOAA, Southwest Fisheries Science Center",field ID,11-338B,ROV,still image,43221,The Footprint
433,531981.0,"NOAA, Southwest Fisheries Science Center, La J...",Acanthogorgia sp.,gorgonian coral,genus,,2011-12-04,33.95747,-119.47461,221.0,...,The Footprint,20m,SWFSC_045,"NOAA, Southwest Fisheries Science Center",field ID,11-338D,ROV,still image,43297,The Footprint
434,535685.0,"NOAA, Southwest Fisheries Science Center, La J...",Acanthogorgia sp.,gorgonian coral,genus,,2011-12-07,33.95587,-119.46760,189.0,...,The Footprint,20m,SWFSC_045,"NOAA, Southwest Fisheries Science Center",field ID,11-341F,ROV,still image,44509,The Footprint
435,533989.0,"NOAA, Southwest Fisheries Science Center, La J...",Acanthogorgia sp.,gorgonian coral,genus,,2011-12-07,33.95650,-119.46863,183.0,...,The Footprint,20m,SWFSC_045,"NOAA, Southwest Fisheries Science Center",field ID,11-341F,ROV,still image,44481,The Footprint
436,531983.0,"NOAA, Southwest Fisheries Science Center, La J...",Acanthogorgia sp.,gorgonian coral,genus,,2011-12-07,33.95716,-119.47145,194.0,...,The Footprint,20m,SWFSC_045,"NOAA, Southwest Fisheries Science Center",field ID,11-341E,ROV,still image,44448,The Footprint
437,531980.0,"NOAA, Southwest Fisheries Science Center, La J...",Acanthogorgia sp.,gorgonian coral,genus,,2011-12-04,33.95977,-119.47607,164.0,...,The Footprint,20m,SWFSC_045,"NOAA, Southwest Fisheries Science Center",field ID,11-338B,ROV,still image,43164,The Footprint
438,535790.0,"NOAA, Southwest Fisheries Science Center, La J...",Acanthogorgia sp.,gorgonian coral,genus,,2011-12-04,33.95672,-119.47338,229.0,...,The Footprint,20m,SWFSC_045,"NOAA, Southwest Fisheries Science Center",field ID,11-338D,ROV,still image,43327,The Footprint
439,533985.0,"NOAA, Southwest Fisheries Science Center, La J...",Acanthogorgia sp.,gorgonian coral,genus,,2011-12-04,33.96001,-119.47701,168.0,...,The Footprint,20m,SWFSC_045,"NOAA, Southwest Fisheries Science Center",field ID,11-338B,ROV,still image,43184,The Footprint
440,531982.0,"NOAA, Southwest Fisheries Science Center, La J...",Acanthogorgia sp.,gorgonian coral,genus,,2011-12-04,33.95728,-119.47428,220.0,...,The Footprint,20m,SWFSC_045,"NOAA, Southwest Fisheries Science Center",field ID,11-338D,ROV,still image,43306,The Footprint


In [53]:
org_names = org_df.DataProvider.value_counts().index.tolist()
copy_df = df[df.ObservationDate.notnull()].copy()

new_org_df = copy_df[copy_df.DataProvider.isin(org_names)]

In [54]:
# Get the years of each datapoint
year_list = new_org_df.ObservationDate.values.astype('datetime64[s]').tolist()
obsrv_years = [date.year for date in year_list]

# Last recorded year
last_recorded_year = sorted(list(set(obsrv_years)))[-1] - 1 # from 2015

# Year range
n_years_past = 26
wanted_years = list(range(last_recorded_year + 1  - n_years_past, last_recorded_year + 1))

# Create a new column called ObservationYear
new_org_df['ObservationYear'] = obsrv_years


# Get devices with the desired year range
date_prov_df = new_org_df[new_org_df.ObservationYear.isin(wanted_years)]

In [55]:
# fig = go.Figure()

# all_values = []

# for org in org_names:
#     wanted_year_df = date_prov_df[date_prov_df.DataProvider == org]
#     wanted_orgs_df = wanted_year_df[wanted_year_df.ObservationYear.isin(wanted_years)]
#     year_freq_dict = wanted_orgs_df.ObservationYear.value_counts().to_dict()
    
#     value_list = []
    
#     for year in wanted_years:
        
#         if year in year_freq_dict.keys():
#             value_list.append(year_freq_dict[year])
#         else:
#             value_list.append(0)
#     all_values.append(value_list)
    
# for i in range(len(org_names)):
    
#     if len(org_names[i].split()) > 2:
        
#         if "NOAA" in org_names[i]:
#             name = org_names[i].replace("NOOA,", "")
#             words = name.split()
#             first_chars = [word[0] for word in words]
#             new_name = "NOOA, " + "".join(first_chars)
#         else:
#             words = org_names[i].split()
#             first_chars = [word[0] for word in words]
#             new_name = "".join(first_chars)
#     else:
#         new_name = org_names[i]
    
#     fig.add_trace(go.Bar(x=wanted_years, y=all_values[i], name=new_name))
    
# fig.update_layout(
#     barmode='stack', 
#     xaxis={'categoryorder':'category ascending'},
#     title = f'Organization Coral Research Activities from {str(wanted_years[0])}-{str(wanted_years[-1])}'
# )    
# fig.show()

<img src="images/visualizations/Organization_Coral_Research_Activities.png">

In [56]:
new_org_df.SamplingEquipment.value_counts()

ROV                    314594
submersible             65037
trawl                   49913
towed camera            16040
longline                 9480
dredge                   2564
AUV                      2535
drop camera              1183
net                       491
corer                     210
grab                      148
multiple gears             81
trap                       39
SCUBA                      27
other                      13
hook and line              10
pot                         5
Cp                          2
camera - drop               1
GMT                         1
South Pacific Ocean         1
Jsl-I-3905                  1
GMST                        1
Name: SamplingEquipment, dtype: int64

In [57]:
wanted_partners = [
    'Smithsonian Institution, National Museum of Natural History',
    'NOAA, Office of Ocean Exploration and Research',
    'NOAA, Southwest Fisheries Science Center, Santa Cruz'
]

wanted_devices = [
    'ROV',
    'submersible',
    'towed camera',
    'AUV',
    'trawl'
]

partners_df = new_org_df[new_org_df.DataProvider.isin(wanted_partners)]

In [58]:
# fig = go.Figure()

# all_values = []

# for partner in wanted_partners:
    
#     # Get data that match the following partner
#     wanted_df = partners_df[partners_df.DataProvider == partner]
    
#     # Get data which sampling equipment is desired
#     wanted_df = wanted_df[wanted_df.SamplingEquipment.isin(wanted_devices)]
    
#     # Dictionary of devices
#     devices_dict = wanted_df.SamplingEquipment.value_counts().to_dict()
    
#     value_list = []
    
#     for device in wanted_devices:
        
#         if device in devices_dict.keys():
#             value_list.append(devices_dict[device])
#         else:
#             value_list.append(0)
#     all_values.append(value_list)

# for i in range(len(wanted_partners)):
#     if len(wanted_partners[i].split()) > 2:
        
#         if "NOAA" in wanted_partners[i]:
#             name = wanted_partners[i].replace("NOOA,", "")
#             words = name.split()
#             first_chars = [word[0] for word in words]
#             new_name = "NOOA, " + "".join(first_chars)
#         else:
#             words = wanted_partners[i].split()
#             first_chars = [word[0] for word in words]
#             new_name = "".join(first_chars)
#     else:
#         new_name = wanted_partners[i]
#     fig.add_trace(go.Bar(x=wanted_devices, y=all_values[i], name=new_name))
    
# fig.update_layout(
#     title = "Use of Common Sampling Equipments by Potential Partners"
# )    
# fig.show()

<img src="images/visualizations/Uses_of_Common_Sampling_Equipments_by_Potential_Partners.png">