# Deep Sea Corals
## Coral Records from NOAA’s Deep-Sea Coral Research and Technology Program

## Motivation

While many people has considered the space as the "final frontier", I considered the ocean as the "forgotten frontier" that we have yet to explore here on earth. I mean we have more people who have been to the moon than to the Mariana Trench - the deepest part of the oceans. There is so much we don't know about the ocean. Although the surface area of the Pacific Ocean is wider than the moon, according to the NOAA, more than 80% of the world's vast underwater oceans are still unmapped, unobserved, and unexplored.

Knowing all of that, I think it only make sense for use to get to know more about our own ocean. Why is it so important? I thinkg it's better to let the National Oceanic and Atmospheric Administration (NOAA) explain to you in this page; but for me personally it's about the technological aspect. There are so many things we've invented thanks to the scientists and engineers who helped sent people to the moon. For example, the portable computer, wireless headphones, LEDs, and much more. If we could develop such wonderful stuff from space exploration, imagine what we could discovered from exploring the ocean.

In this project, we are going to use the NOAA deep-sea coral dataset to....wait for it...build a coastal research resort. The long-term goal of the research resort is to provide a suitable location for Mariana Trench realted missions; but for the short-term goal, we want to provide suitable location and facilities for deep-sea coral research. Why Mariana Trench? And whar does it have to do with corals? Well I think it would be better for you you to read my medium blog about this project. It will come out soon, so make sure you stay tune.

Blok link: Coming soon!

<table>
<tr>
    <td> <img src="images/NOAA_Flag.png " style="height:300px"> </td>
    <td> <img src="images/vlad-tchompalov-LsIXVKThAG0-unsplash.jpg" alt="Photo by Q.U.I on Unsplash" style="height:300px"> </td>
</tr>
</table>

### Context

This dataset contains information about deep sea corals and sponges collected by NOAA and NOAA’s partners. Amongst the data are geo locations of deep sea corals and sponges and the whole thing is tailored to the occurrences of azooxanthellates - a subset of all corals and all sponge species (i.e. they don't have symbiotic relationships with certain microbes). Additionally, these records only consists of observations deeper than 50 meters to truly focus on the deep sea corals and sponges.

### Content

Column descriptions:

- CatalogNumber: Unique record identifier assigned by the Deep-Sea Coral Research and Technology Program.
- DataProvider: The institution, publication, or individual who ultimately deserves credit for acquiring or aggregating the data and making it available.
- ScientificName: Taxonomic identification of the sample as a Latin binomial.
- VernacularNameCategory: Common (vernacular) name category of the organism.
- TaxonRank: Identifies the level in the taxonomic hierarchy of the ScientificName term.
- ObservationDate: Time as hh:mm:ss when the sample/observation occurred (UTC).
- Latitude (degrees North): Latitude in decimal degrees where the sample or observation was collected.
- Longitude (degrees East): Longitude in decimal degrees where the sample or observation was collected.
- DepthInMeters: Best single depth value for sample as a positive value in meters.
- DepthMethod: Method by which best singular depth in meters (DepthInMeters) was determined. "Averaged" when start and stop depths were averaged. "Assigned" when depth was derived from bathymetry at the location. "Reported" when depth was reported based on instrumentation or described in literature.
- Locality: A specific named place or named feature of origin for the specimen or observation (e.g., Dixon Entrance, Diaphus Bank, or Sur Ridge). Multiple locality names can be separated by a semicolon, arranged in a list from largest to smallest area (e.g., Gulf of Mexico; West Florida Shelf, Pulley Ridge).
- IdentificationQualifier: Taxonomic identification method and level of expertise. Examples: “genetic ID”; “morphological ID from sample by taxonomic expert”; “ID by expert from image”; “ID by non-expert from video”; etc.
- SamplingEquipment: Method of data collection. Examples: ROV, submersible, towed camera, SCUBA, etc.
- RecordType: Denotes the origin and type of record. published literature ("literature"); a collected specimen ("specimen"); observation from a still image ("still image"); observation from video ("video observation"); notation without a specimen or image ("notation"); or observation from trawl surveys, longline surveys, and/or observer records ("catch record").


### Note

I did not include the actual visualizations generated from plotly in this notebook since if I do that, then the file size of the notebook will exced the maximum file size needed to be push to github. Instead, I saved all the visualizations in a folder named **visualizations**, which you can find the folder **images**.

Then, I embedded the images in a markdown cell using html image tag like so...

**img src="path/to/images.png"**

## Business Understanding

In order to start, we need to set our goal straight first. Then, we want to make ask questions that would lead us one step closer into fulfilling that goal. By doing this, we can have some context about what kind of insight we want from our data.

**Long-term Goal**: Creating a research costal resort for ocean exploration and Marian Trench realted mission.

**Short-term goal**: Creating a research costal resort for deep-sea coral realted reasearch.

**Guading Questions**: 

1. Which part of the world has the most coral research activities?
2. How diverse are corals in certain areas of the world
3. What kind of instrument is needed for doing coral research?
4. Which institution/organization would be willing to be partners?

## Data Understanding

Now that we know what our goals are, we can start to explore the data. For those who do not have the datset, you can get it from [this page](https://www.kaggle.com/noaa/deep-sea-corals).

Let's make sure that we have all of our dependencies.

In [1]:
import numpy as np
import pandas as pd
import chart_studio
import chart_studio.plotly as py
import plotly.graph_objects as go

# Using plotly's chart studio API is optionl.
# If you wish to do so, uncomment the code below
# and include corresponding credentials.

# USERNAME = ""
# API_KEY = ""

# chart_studio.tools.set_credentials_file(username=USERNAME, api_key=API_KEY)

### Load Data

The NOAA deep-sea coral dataset has various variables that would be useful for us to wanswer questions. However, the dataset doesn't always comes clean. There are various data cleansing stuff that we need to do to make sure that the data could be analyzed. 

You'll see what I mean, for now let's import our dataset.

In [2]:
df = pd.read_csv("../deep_sea_corals.csv")
df = df.iloc[1:]


Columns (5,7,8,13) have mixed types. Specify dtype option on import or set low_memory=False.



The first time you imported the dataset, some warning like this will likely appeared.
```
DtypeWarning:

Columns (5,7,8,13) have mixed types. Specify dtype option on import or set low_memory=False.
```

Look at that, it looks like column 3, 7, 8, and 13 all have mixed datatypes. This means that some columns have more than one data type. Usually, it's when we were suppose to save a sort of number in the form of interger or float, but instead, were saved in the form of strings or chracter data type. However, this is not much of a big deal sinced we can easily handle them later.

I think it's a good idea to take a closer look of our dataset.

### Explore Data

Let's see the first five rows to catsh a glimpse of what our dataset looks like.

In [3]:
df.head()

Unnamed: 0,CatalogNumber,DataProvider,ScientificName,VernacularNameCategory,TaxonRank,Station,ObservationDate,latitude,longitude,DepthInMeters,DepthMethod,Locality,LocationAccuracy,SurveyID,Repository,IdentificationQualifier,EventID,SamplingEquipment,RecordType,SampleID
1,625366.0,"NOAA, Deep Sea Coral Research & Technology Pro...",Madrepora oculata,stony coral (branching),species,D2-EX1504L3-05,2015-09-02,18.30817,-158.45392,959.0,reported,"Hawaiian Archipelago, Swordfish Seamount",50m,Hohonu Moana: Exploring Deep Waters off Hawai'i,University of Hawaii,ID by expert from video,D2-EX1504L3-05,ROV,video observation,EX1504L3_05_20150901T181522Z.mp4_05:45:26:28
2,625373.0,"NOAA, Deep Sea Coral Research & Technology Pro...",Madrepora oculata,stony coral (branching),species,D2-EX1504L3-05,2015-09-01,18.30864,-158.45393,953.0,reported,"Hawaiian Archipelago, Swordfish Seamount",50m,Hohonu Moana: Exploring Deep Waters off Hawai'i,University of Hawaii,ID by expert from video,D2-EX1504L3-05,ROV,video observation,EX1504L3_05_20150901T181522Z.mp4_05:24:35:53
3,625386.0,"NOAA, Deep Sea Coral Research & Technology Pro...",Madrepora oculata,stony coral (branching),species,D2-EX1504L3-05,2015-09-01,18.30877,-158.45384,955.0,reported,"Hawaiian Archipelago, Swordfish Seamount",50m,Hohonu Moana: Exploring Deep Waters off Hawai'i,University of Hawaii,ID by expert from video,D2-EX1504L3-05,ROV,video observation,EX1504L3_05_20150901T181522Z.mp4_05:15:22:09
4,625382.0,"NOAA, Deep Sea Coral Research & Technology Pro...",Madrepora oculata,stony coral (branching),species,D2-EX1504L3-05,2015-09-01,18.30875,-158.45384,955.0,reported,"Hawaiian Archipelago, Swordfish Seamount",50m,Hohonu Moana: Exploring Deep Waters off Hawai'i,University of Hawaii,ID by expert from video,D2-EX1504L3-05,ROV,video observation,EX1504L3_05_20150901T181522Z.mp4_05:13:29:50
5,625384.0,"NOAA, Deep Sea Coral Research & Technology Pro...",Madrepora oculata,stony coral (branching),species,D2-EX1504L3-05,2015-09-01,18.30902,-158.45425,968.0,reported,"Hawaiian Archipelago, Swordfish Seamount",50m,Hohonu Moana: Exploring Deep Waters off Hawai'i,University of Hawaii,ID by expert from video,D2-EX1504L3-05,ROV,video observation,EX1504L3_05_20150901T181522Z.mp4_04:24:44:48


Okay, the dataset looks interesting already. We have some `longitude` and `latitude` which we can use to plot some coordinates on the map later. I'm not sure why the `Locality` columns have multiple locations if they already provided the `latitutde` and `longitude`. Looks like there is the `Data Provider` column that we can use to find out which institution or organization are interested to be partners.The `SamplingEquipment` column will also be useful to know what kind of equipmenst are researchers using to discover the corals.

Now, let's take a look at the number of datapoints and all the name of the columns.

In [4]:
print(f"The dataset has {df.shape[0]} data points and {df.shape[1]} columns.")
print("The columns are: \n{}.".format(", ".join(list(df.columns))))

The dataset has 513372 data points and 20 columns.
The columns are: 
CatalogNumber, DataProvider, ScientificName, VernacularNameCategory, TaxonRank, Station, ObservationDate, latitude, longitude, DepthInMeters, DepthMethod, Locality, LocationAccuracy, SurveyID, Repository, IdentificationQualifier, EventID, SamplingEquipment, RecordType, SampleID.


We have over 500K data points, and the name each column mathces the table that contains the first 5 rows. However, as you might rememeber, we have some mized type in our dataset. Let's try to handle the necessary ones that would be useful for our analysis.

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 513372 entries, 1 to 513372
Data columns (total 20 columns):
CatalogNumber              513372 non-null float64
DataProvider               513372 non-null object
ScientificName             513372 non-null object
VernacularNameCategory     513197 non-null object
TaxonRank                  513364 non-null object
Station                    253590 non-null object
ObservationDate            513367 non-null object
latitude                   513372 non-null object
longitude                  513372 non-null object
DepthInMeters              513372 non-null float64
DepthMethod                496845 non-null object
Locality                   389645 non-null object
LocationAccuracy           484662 non-null object
SurveyID                   306228 non-null object
Repository                 496584 non-null object
IdentificationQualifier    488591 non-null object
EventID                    472141 non-null object
SamplingEquipment          485883 non

Hmm, something's weird here. The `latitude`, `longitude`, and `ObservationDate` are all non-null object. This might cause an issue later one when we'er trying to visualize our data.

We want to change that. A function called `to_numeric` and `to_datetime` from pandas can make our job much easier.

In [6]:
df['longitude'] = pd.to_numeric(df['longitude'])
df['latitude'] = pd.to_numeric(df['latitude'])
df['ObservationDate'] = pd.to_datetime(df['ObservationDate'])

Let's make sure that everything works well.

In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 513372 entries, 1 to 513372
Data columns (total 20 columns):
CatalogNumber              513372 non-null float64
DataProvider               513372 non-null object
ScientificName             513372 non-null object
VernacularNameCategory     513197 non-null object
TaxonRank                  513364 non-null object
Station                    253590 non-null object
ObservationDate            513367 non-null datetime64[ns]
latitude                   513372 non-null float64
longitude                  513372 non-null float64
DepthInMeters              513372 non-null float64
DepthMethod                496845 non-null object
Locality                   389645 non-null object
LocationAccuracy           484662 non-null object
SurveyID                   306228 non-null object
Repository                 496584 non-null object
IdentificationQualifier    488591 non-null object
EventID                    472141 non-null object
SamplingEquipment          

Looks like everything is set. Now, let's see a brief description of what our numerical dataset.

In [8]:
df.describe()

Unnamed: 0,CatalogNumber,latitude,longitude,DepthInMeters
count,513372.0,513372.0,513372.0,513372.0
mean,426607.263715,36.498871,-120.292148,798.589769
std,206162.54248,13.232359,51.570693,805.991501
min,1.0,-78.9167,-179.99358,-999.0
25%,222081.75,32.843575,-130.337028,218.0
50%,469248.5,36.69471,-122.72345,539.0
75%,604061.25,42.907495,-120.412837,1137.0
max,740097.0,74.35,179.994,6369.0


Okay, I get that `latitude` and `longitude` have negative numbers; but our dataset invovles **deep-sea** corals, and I'm not sure that a negative number is a good sign when we talk about depth. I'm pretty sure that this just means theu do not have the recorded depth for certain corals, which pretty normal in most dataset to have an empty value. How many data points that do not have a depth value?

In [9]:
df[df.DepthInMeters < 0].shape

(3997, 20)

Wow nearly 4000. That's not as bad as I thought. I mean that's not even 0.01% of the whole dataset. However, noticing this, it makes me wonder about the empty values of the dataset.

In [10]:
df.isna().sum()

CatalogNumber                   0
DataProvider                    0
ScientificName                  0
VernacularNameCategory        175
TaxonRank                       8
Station                    259782
ObservationDate                 5
latitude                        0
longitude                       0
DepthInMeters                   0
DepthMethod                 16527
Locality                   123727
LocationAccuracy            28710
SurveyID                   207144
Repository                  16788
IdentificationQualifier     24781
EventID                     41231
SamplingEquipment           27489
RecordType                  12295
SampleID                   111078
dtype: int64

Wow that's quite alot of empty values in some columns; although those columns would not be useful for our analysis. There are important columns like `Locality` and `SamplingEquipment` which we will be using to get some insight. Anyways, I think we will need to handle that later to minimize confusion during the analysis.

Now that we got to know a little bit about our dataset, we can expect what we need to do when we are creating the visualizations. In the case of the missing data, we can handle that as we make our way through the analtic process.

Moving on to question number one...

### 1. Which part of the world has the most coral research activities?

In this part, we want to see if can identify certain locations where ther are active research activities. For that we are going to use the `latitude` and the `longitude` variables to visualize the location of each coral observation using [plotly's scattergeo](https://plot.ly/python/scatter-plots-on-maps/). Then, we will use the `Locality` to make named the coordinate.

But before that, if you remember, the `Locality`column has rows that have multiple name location. We need to handle it so that a certain location has only one name. For example, the first row of the dataset has the location name of "Hawaiian Archipelago, Swordfish Seamount". Both names just location names in The Hawaiian Archipelago. Therefore, we want to make sure that this datapoint will only use the name "Hawaiian Archipelago."

Example: Hawaiian Archipelago, Swordfish Seamount ---> Hawaiian Archipelago

To do that, I created a function name `general_location` that takes the name of the location. If they have more than one location, it will output the 'general location' which means the name before `,` or `;`.


In [11]:
def general_location(location):
    if ";" in location:
        general_loc = location.split(";")[0]
        return general_loc
    elif "," in location:
        general_loc = location.split(",")[0]
        return general_loc
    else:
        return location

To make sure taht the function works we can use it to find out the frequency of the locationd of coral observations. After that, we'll create a new column named `GeneralLocality` that contains the general location.

In [12]:
from collections import Counter 

all_locations = df.Locality.astype(str).values.tolist()
all_locations = list(map(general_location, all_locations))

all_locations_count = Counter(all_locations)
all_locations_count.most_common()

[('nan', 123727),
 ('Davidson Seamount', 40114),
 ('Northwestern Hawaiian Islands', 26766),
 ('Southern California Bight', 24965),
 ('Alaska', 24143),
 ('Pioneer Seamount', 23972),
 ('OLYMPIC COAST', 22478),
 ('Main Hawaiian Islands', 19300),
 ('Rodriguez Seamount', 18702),
 ('Central Aleutian Islands', 15094),
 ('Olympic Coast National Marine Sanctuary', 14042),
 ('Viosca Knoll', 9982),
 ('Shutter ridge', 8351),
 ('Florida', 8323),
 ('Continental slope south of Point St. George', 7005),
 ('Continental slope north of Point St. George', 6428),
 ('Hawaiian Archipelago', 5764),
 ('Aleutians', 5229),
 ('Cordell Bank National Marine Sanctuary', 4704),
 ('Monterey Bay', 3338),
 ('Eureka_W', 2604),
 ('The Footprint', 2549),
 ('Piggy_Bank', 2246),
 ('Piggy Bank', 1788),
 ('South Santa Rosa', 1660),
 ('Western Gulf of Alaska', 1619),
 ('Guide Seamount', 1599),
 ('San Juan Seamount', 1594),
 ('Monterey Canyon', 1453),
 ('Off Florida', 1428),
 ('off California', 1363),
 ('Santa Monica Cyn', 1282)

In [13]:
df['GeneralLocality'] = all_locations

Great, we got the general locations. I think we can do some visualization now. What we are going to do now is basically plotting the latitude and longitude on the map to visually understand which locations have active research activities.

Before going on the map, however, let's create a [plotly's pie](https://plot.ly/python/pie-charts/) chart to understand he percentage of the number of corals observed on certain location. In this case, I excluded the locations that has than 0.01% of the number of observations of the whole dataset, since we want to understand which location that has significant number of corals.

In [14]:
values = df.GeneralLocality.value_counts(normalize=True).values.tolist()[1:]

# Locations that have more than 0.01 of coral
# observations of the whole dataset.
value_list = [value for value in values if value < 0.01]

value_first_index = values.index(value_list[0])

counts = df.GeneralLocality.value_counts().values.tolist()[1:][:value_first_index]
locations = df.GeneralLocality.value_counts().index.tolist()[1:][:value_first_index]

In [15]:
# UNCOMMENT THE CODE BELOW TO GENERATE
# PLOT ON PLOTLY

# fig = go.Figure(data=[go.Pie(labels=locations, values=counts)])
# fig.update_layout(
#         title = 'Coral Reef Observation Locations',
#     )
# py.plot(fig, filename = 'coral-reef-location-pie-chart', auto_open=True)

# if you wish to display the chart in the notebook
# comment the line above and uncomment below
# fig.show()

<img src="images/visualizations/Coral_Reef_Observation_Locations.png">

Looks like we have a couple of potential places like the [Davidson Seamount](https://en.wikipedia.org/wiki/Davidson_Seamount), Hawaii, Alaska, and even Florida. However, I wonder if they are well spread out. I'm trying to avoid places where it is restricted to just a certain seamount; in other words, the are cluttered up in certain areas.

Let's see how do all of this places look on the map.

In [16]:
location_df = df[df.GeneralLocality.isin(locations)]
location_df.head()

Unnamed: 0,CatalogNumber,DataProvider,ScientificName,VernacularNameCategory,TaxonRank,Station,ObservationDate,latitude,longitude,DepthInMeters,...,Locality,LocationAccuracy,SurveyID,Repository,IdentificationQualifier,EventID,SamplingEquipment,RecordType,SampleID,GeneralLocality
1,625366.0,"NOAA, Deep Sea Coral Research & Technology Pro...",Madrepora oculata,stony coral (branching),species,D2-EX1504L3-05,2015-09-02,18.30817,-158.45392,959.0,...,"Hawaiian Archipelago, Swordfish Seamount",50m,Hohonu Moana: Exploring Deep Waters off Hawai'i,University of Hawaii,ID by expert from video,D2-EX1504L3-05,ROV,video observation,EX1504L3_05_20150901T181522Z.mp4_05:45:26:28,Hawaiian Archipelago
2,625373.0,"NOAA, Deep Sea Coral Research & Technology Pro...",Madrepora oculata,stony coral (branching),species,D2-EX1504L3-05,2015-09-01,18.30864,-158.45393,953.0,...,"Hawaiian Archipelago, Swordfish Seamount",50m,Hohonu Moana: Exploring Deep Waters off Hawai'i,University of Hawaii,ID by expert from video,D2-EX1504L3-05,ROV,video observation,EX1504L3_05_20150901T181522Z.mp4_05:24:35:53,Hawaiian Archipelago
3,625386.0,"NOAA, Deep Sea Coral Research & Technology Pro...",Madrepora oculata,stony coral (branching),species,D2-EX1504L3-05,2015-09-01,18.30877,-158.45384,955.0,...,"Hawaiian Archipelago, Swordfish Seamount",50m,Hohonu Moana: Exploring Deep Waters off Hawai'i,University of Hawaii,ID by expert from video,D2-EX1504L3-05,ROV,video observation,EX1504L3_05_20150901T181522Z.mp4_05:15:22:09,Hawaiian Archipelago
4,625382.0,"NOAA, Deep Sea Coral Research & Technology Pro...",Madrepora oculata,stony coral (branching),species,D2-EX1504L3-05,2015-09-01,18.30875,-158.45384,955.0,...,"Hawaiian Archipelago, Swordfish Seamount",50m,Hohonu Moana: Exploring Deep Waters off Hawai'i,University of Hawaii,ID by expert from video,D2-EX1504L3-05,ROV,video observation,EX1504L3_05_20150901T181522Z.mp4_05:13:29:50,Hawaiian Archipelago
5,625384.0,"NOAA, Deep Sea Coral Research & Technology Pro...",Madrepora oculata,stony coral (branching),species,D2-EX1504L3-05,2015-09-01,18.30902,-158.45425,968.0,...,"Hawaiian Archipelago, Swordfish Seamount",50m,Hohonu Moana: Exploring Deep Waters off Hawai'i,University of Hawaii,ID by expert from video,D2-EX1504L3-05,ROV,video observation,EX1504L3_05_20150901T181522Z.mp4_04:24:44:48,Hawaiian Archipelago


In [17]:
location_df.shape

(280658, 21)

In [18]:
# UNCOMMENT HERE IF YOU WISH TO DISPLAY
# THE PLOT DIRECTLY IN THE NOTEBOOK.

# fig = go.Figure(data=go.Scattergeo(
#         lon = location_df.longitude,
#         lat = location_df.latitude,
#         text = location_df.Locality,
#         mode = 'markers',
#         ))

# fig.update_layout(
#         title = 'Coral Reef Observations in North America',
#         geo_scope='north america',
#     )
# fig.show()

<img src="images/visualizations/Coral_Reef_Observations_in_North_America.png ">

Looks like most of the corals in Alaska, Hawaii Islands, and Florida are quite spread out. Actually, it makes sense for Florida to have quite a diversed ecosystem of corals is they located in the [Mesoamerican Reef](https://en.wikipedia.org/wiki/Mesoamerican_Barrier_Reef_System). But remembering our long-term goal, we need the location to be near the Mariana Trench. In other words, we want it to be somewhere in the middle or the west side of the Pacific Ocean. Based on the map above, it looks like there's only Hawaii for now.

Note that we still have some location that has the no names (NAN values). Let's plot them to make things clear.

In [21]:
nan_loc_df = df[df.GeneralLocality == 'nan']
nan_loc_df.shape

(123727, 21)

In [22]:
# UNCOMMENT HERE IF YOU WISH TO DISPLAY
# THE PLOT DIRECTLY IN THE NOTEBOOK.

# fig = go.Figure(data=go.Scattergeo(
#         lon = nan_loc_df.longitude,
#         lat = nan_loc_df.latitude,
#         text = nan_loc_df.Locality,
#         mode = 'markers',
#         ))

# fig.update_layout(
#         title = 'Coral Reef Observations in Unknown Locations',
#         geo_scope='world',
#     )
# fig.show()

<img src="images/visualizations/Coral_Reef_Observations_in_Unknown_Locations.png">

Wow look at that, there are actually more coral around North America than expected. We can also see that it's quite dense just of the Eastern coast of Australia, where the Great Barrier Reef is located. Again, although the North America is quite dense with corals, it is still to far off from the Mariana Trench. Based the map, I think for now Hawaii is the potential candidate.

However, I think we are jumping to conclusion here, we still to make sure that the location has a diversed ecosystem of corals. That way, it will be much more actractive for researchers who wants to learn more about different kinds of corals. The insight about the diversity of corals will be answered in the next section. So let's get a move on now.

### How diverse are coral reefs in certain areas?

Like we talked about before, we want to find out the location which has high diversity in corals. Why? Basically, we want to also attract different kind of research in to our resort.

With that said, let's see what we can do with out dataset to gain some insights for our second question. First of all, let's take a look the percentage of each coral type. This means which corals are the most common to find.

In [19]:
df.VernacularNameCategory.value_counts(normalize=True)

gorgonian coral               0.277198
sponge (unspecified)          0.150168
sea pen                       0.134405
glass sponge                  0.107701
soft coral                    0.075441
demosponge                    0.074613
black coral                   0.050805
stony coral (branching)       0.048525
lace coral                    0.041760
stony coral (cup coral)       0.020507
stony coral (unspecified)     0.008535
gold coral                    0.005183
stoloniferan coral            0.002340
calcareous sponge             0.002011
scleromorph sponge            0.000528
other coral-like hydrozoan    0.000275
lithotelestid coral           0.000006
Name: VernacularNameCategory, dtype: float64

Now let's take a look again in the form of a pie chart.

In [20]:
# values = df.VernacularNameCategory.value_counts(normalize=True).values.tolist()

# value_list = [value for value in values if value < 0.01]

# value_first_index = values.index(value_list[0])

category_counts = df.VernacularNameCategory.value_counts().values.tolist()[:value_first_index]
category_names = df.VernacularNameCategory.value_counts().index.tolist()[:value_first_index]

In [21]:
# UNCOMMENT THE CODE BELOW TO GENERATE
# PLOT ON PLOTLY

# fig = go.Figure(data=[go.Pie(labels=category_names, values=category_counts)])

# fig.update_layout(
#         title = 'Coral Type Percentages',
#     )
# py.plot(fig, filename = 'coral-type-percentages', auto_open=True)

# fig.show()

<img src="images/visualizations/Coral_Type_Percenteges.png">

In this case, I included all the coral types instead of just inluding those who have more than 0.01% percentage over the whole dataset. 

Anyways, going back to the chart, it seems that gorgonian corals and sponges are quite common here, while stony corals and lace corals are some of the rarest. There are some benefits to these corals in terms of medicing and tourism. For exampl, gorgonian corals have the potential to be used on developing new drugs. Also, rare corals are one of the main attraction of tourist since they are quite rare to find.

Now we get to know which corals are rare and commo, we can try to display them on the map tp visually understand which locations have a diversed coral ecosystem. For starters, we need to make sure that exclude any data points that have null values for the `VernacularNameCategor` column.

In [23]:
copy_df = df.copy()
copy_df = copy_df[copy_df.VernacularNameCategory.notnull()]
copy_df.isna().sum()

CatalogNumber                   0
DataProvider                    0
ScientificName                  0
VernacularNameCategory          0
TaxonRank                       2
Station                    259664
ObservationDate                 5
latitude                        0
longitude                       0
DepthInMeters                   0
DepthMethod                 16527
Locality                   123609
LocationAccuracy            28675
SurveyID                   206982
Repository                  16788
IdentificationQualifier     24739
EventID                     41225
SamplingEquipment           27484
RecordType                  12295
SampleID                   111078
GeneralLocality                 0
dtype: int64

Then, we need to create an additional column named `ColorNum` to create for colors on the coordinate on the map.

In [24]:
coral_types = copy_df.VernacularNameCategory.value_counts().index.tolist()

color_dict = {coral_type: num+1 for num, coral_type in enumerate(coral_types)}
copy_df["ColorNum"] = [color_dict[coral] for coral in copy_df.VernacularNameCategory]

In [25]:
# UNCOMMENT THE CODE BELOW TO GENERATE
# PLOT ON PLOTLY

# fig = go.Figure()

# for coral_type, num, in color_dict.items():
#     coral_sample_df = copy_df[copy_df.VernacularNameCategory == coral_type]
    
#     fig.add_trace(go.Scattergeo(
#         lon = coral_sample_df.longitude,
#         lat = coral_sample_df.latitude,
#         text = coral_sample_df.GeneralLocality,
#         name = coral_type, 
#         mode = 'markers',
#         marker = dict(
#             color = num,
#             size = 4
#         ),
#     ))

# fig.update_layout(
#         title = 'Coral Type Diversity in The World',
#         geo_scope='world',
#         showlegend=True
#     )
# fig.show()

<img src="images/visualizations/Coral_Type_Diversity_in_The_World.png">

Wow what a colorful map that is. As we can see that North America has a set of diveresed coral ecosystem, spesifically Alaska and the East and West coast. I expected no less from the richness of corals in the Caribbean. An interesting are here is South East Asia (just above Australia). Looks like there are various type of corals there between the Philipphines and Indonesia. This actually where the [Coral Triangle](https://en.wikipedia.org/wiki/Coral_Triangle) is located. Actually, let's take a closer look at the Asian region.

In [31]:
# UNCOMMENT THE CODE BELOW TO GENERATE
# PLOT ON PLOTLY

# fig = go.Figure()

# for coral_type, num, in color_dict.items():
#     coral_sample_df = copy_df[copy_df.VernacularNameCategory == coral_type]
    
#     fig.add_trace(go.Scattergeo(
#         lon = coral_sample_df.longitude,
#         lat = coral_sample_df.latitude,
#         text = coral_sample_df.GeneralLocality,
#         name = coral_type, 
#         mode = 'markers',
#         marker = dict(
#             color = num,
#             size = 4
#         ),
#     ))

# fig.update_layout(
#         title = 'Coral Type Diversity in Asia',
#         geo_scope='asia',
#         showlegend=True
#     )
# fig.show()

<img src="images/visualizations/Coral_Type_Diversity_in_Asia.png">

Looks likw we've just found another potential location here other thatn Hawaii. While the density is not as much as the one in North America, the corals are quite dirverse in the Coral Triangle. Furthermore, it's not too far off from the Marian Trench! While we at it, let's check out Hawaii's coral diversity.

In [26]:
# UNCOMMENT THE CODE BELOW TO GENERATE
# PLOT ON PLOTLY

# fig = go.Figure()

# for coral_type, num, in color_dict.items():
#     coral_sample_df = copy_df[copy_df.VernacularNameCategory == coral_type]
    
#     fig.add_trace(go.Scattergeo(
#         lon = coral_sample_df.longitude,
#         lat = coral_sample_df.latitude,
#         text = coral_sample_df.GeneralLocality,
#         name = coral_type, 
#         mode = 'markers',
#         marker = dict(
#             color = num,
#             size = 4
#         ),
#     ))

# fig.update_layout(
#         title = 'Coral Type Diversity in North America',
#         geo_scope='north america',
#         showlegend=True
#     )
# fig.show()

<img src="images/visualizations/Coral_Type_Diversity_in_North_America.png">

Looks like Hawaii also has a rich collections of corals here, and most of them are [demosponge](https://en.wikipedia.org/wiki/Demosponge). However, compares to the South East Asia, Hawaii is still a bit too far from the Mariana Trench. Also, I think that there are much more corals that could explore around the Coral Triangle. I have to made up my mind here, I think South East Asia is a much better choice. 

Now that we got the place, let's see if can find what kind of equipment does the researchers use to observed the corals.

### What kind of instrument is needed for doing coral research?

The researchers and engineers coming to our research resort might need some kind of tools to prepare the equipments they bring for research purposes, or there might be engineering challenges that need to be overcome to reach a certain location within the deep-sea. We need to prepare for this so that the researchers and engineers can focus on the task at hands. The technology developed at our reasearch resort could potentially allow us to face engineering challenges that involves Marian Trench related exploration.

Like the previous questions, let's take a look the percentage of the sampling equipments used for coral observation. Note that when we took a glimps of our data earlier, we found that some observations has negative number in terms of depth. We want to make sure that those data points are not included, because want to analyze the equipments based on the depth.

In [29]:
df.SamplingEquipment.value_counts()

ROV                    326289
submersible             70268
trawl                   51899
towed camera            19626
longline                 9481
dredge                   2840
AUV                      2535
drop camera              1262
grab                      621
net                       504
corer                     212
SCUBA                     174
multiple gears             86
trap                       41
other                      20
hook and line              12
pot                         5
Cp                          2
GMST                        1
South Pacific Ocean         1
trawl-otter                 1
Jsl-I-3905                  1
GMT                         1
camera - drop               1
Name: SamplingEquipment, dtype: int64

In [30]:
depth_df = df[df.DepthInMeters > 0]

print(f"Number of observations: {df.shape[0]}")

print(f"Number of observations after correcting the depth: {depth_df.shape[0]}")

no_record_percetage = round((df.shape[0] - depth_df.shape[0])/df.shape[0], 2)

print(f"Percentage of observations without recorded depth: {no_record_percetage}%")

Number of observations: 513372
Number of observations after correcting the depth: 508991
Percentage of observations without recorded depth: 0.01%


Would you look at taht, the are actually only around 0.01% of the whole dataset. That means we can still get significant insights here.

That being said, let's look at the percentage of the uses of sampling equipments.

In [31]:
values = depth_df.SamplingEquipment.value_counts(normalize=True).values.tolist()[1:]

value_list = [value for value in values if value < 0.01]

value_first_index = values.index(value_list[0])

counts = depth_df.SamplingEquipment.value_counts().values.tolist()[:value_first_index]
devices = depth_df.SamplingEquipment.value_counts().index.tolist()[:value_first_index]

In [32]:
# fig = go.Figure(data=[go.Pie(labels=devices, values=counts)])
# fig.update_layout(
#         title = 'Uses of Sampling Equipments Percentage',
#     )
# py.plot(fig, filename = 'coral-reef-location-pie-chart', auto_open=True)

# if you wish to display the chart in the notebook
# comment the line above and uncomment below
# fig.show()

<img src="images/visualizations/Uses_of_Sampling_Equipments_Percentage.png">

Wow, ROVs are actually quite popular, followed by submersibles. I wonder why trawls are still common since they can cause some harm to the environment. 

I wonder AUVs (not included because less than 0.01%) are not that popular. There seems to be alot of talk of using autonomous drones these days.

Interesting. I wonder about the depth of the corals here.

In [33]:
import plotly.figure_factory as ff

In [34]:
# UNCOMMENT THE CODE BELOW TO GENERATE
# PLOT ON PLOTLY

# fig = ff.create_distplot([depth_df.DepthInMeters.values.tolist()], ["DepthInMeters"])

# fig.update_layout(
#         title = 'Distribution of Coral Depth (in Meters)',
#     )
# fig.show()

<img src="images/visualizations/Distribution_of_Coral_Depth.png">

Interesting, look at how deep these corals are. There are even corals that goes 6000 meters under the ocean surface. Does that mean we have ROVs that goes that deep?

In [35]:
rov_df = depth_df[depth_df.SamplingEquipment == "ROV"]
rov_df.shape

(325751, 21)

In [38]:
# UNCOMMENT THE CODE BELOW TO GENERATE
# PLOT ON PLOTLY

# fig = ff.create_distplot([rov_df.DepthInMeters.values.tolist()], ["DepthInMeters"])

# fig.update_layout(
#         title = 'Distribution of Coral Depth using ROV (in Meters)',
#     )
# fig.show()

<img src="images/visualizations/Distribution_of_Coral_Depth_using_ROV.png">

Looks like there are no ROVs that goes that deep. Then which device goes that deep?

In [36]:
device_names = depth_df.SamplingEquipment.value_counts().index[:8].tolist()
device_df = depth_df[depth_df.SamplingEquipment.isin(device_names)]
device_df.shape

(479978, 21)

In [37]:
depth_data = []

for device in device_names:
    device_depth = device_df[device_df.SamplingEquipment == device].DepthInMeters.values.tolist()
    depth_data.append(device_depth)

In [38]:
# fig = ff.create_distplot(depth_data, device_names)

# fig.update_layout(
#         title = 'Distribution of Coral Depth using Most Equipments (in Meters)',
#     )
# fig.show()

<img src="images/visualizations/Distribution_of_Coral_Depth_using_Most_Equipments.png">

I see now. For corals that have a dept of over 5000 meters, most of the rather uses trawls. It sort of make sens for me since [the deeper you go under the ocean, the water preasure increases and will most likely crush any objects that go that too deep](https://oceanservice.noaa.gov/facts/pressure.html).

Having ROVs that goes deep over 5000 meters under the ocean is quite impressive. I see the explanation why AUVs are not that popular here, because they can only go nearly 1000 meters in depth. But I'm curious about the use of these equipments overtime. Better take a look.

For this purpose, we are going to use [plotly's bar chart](https://plot.ly/python/bar-charts/).

What we need to do first is to convert the dates of day the coral was observer, and then store it on a new column name `ObservationYear`.

In [39]:
# Get the years of each datapoint
year_list = device_df.ObservationDate.values.astype('datetime64[s]').tolist()
obsrv_years = [date.year for date in year_list]

# Last recorded year
last_recorded_year = sorted(list(set(obsrv_years)))[-1] - 1 # from 2015

# Year range
n_years_past = 26
wanted_years = list(range(last_recorded_year + 1  - n_years_past, last_recorded_year + 1))

# Create a new column called ObservationYear
device_df['ObservationYear'] = obsrv_years


# Get devices with the desired year range
date_obs_df = device_df[device_df.ObservationYear.isin(wanted_years)]

In [40]:
# fig = go.Figure()

# all_values = []

# for device in device_names:
#     wanted_year_df = date_obs_df[date_obs_df.SamplingEquipment == device]
#     wanted_devices_df = wanted_year_df[wanted_year_df.ObservationYear.isin(wanted_years)]
#     year_freq_dict = wanted_year_df.ObservationYear.value_counts().to_dict()
    
#     value_list = []
    
#     for year in wanted_years:
        
#         if year in year_freq_dict.keys():
#             value_list.append(year_freq_dict[year])
#         else:
#             value_list.append(0)
#     all_values.append(value_list)
    
# for i in range(len(device_names)):
#     fig.add_trace(go.Bar(x=wanted_years, y=all_values[i], name=device_names[i]))
    
# fig.update_layout(
#     barmode='stack', 
#     xaxis={'categoryorder':'category ascending'},
#     title = f'Uses of Most Common Equipments from {str(wanted_years[0])}-{str(wanted_years[-1])}'
# )    
# fig.show()
# print("Hell Wolrd")

<img src="images/visualizations/Uses_of_Most_Common_Equipments.png">

Okay, there some stuff we need to talk about here.

First of all, it seems that ROVs have actually been the prefered choice of observing deep-sea corals over the year. However, I'm interested to know why is there a sudden spike in the use of ROVs in the year 2000, which followed by downturn the year after that. This is alsi the case for submersibles in 2005 and 2006. Could there be a trend here?

Second, we can see that uses of trawls are actually quite steady here. It seems there are times when certain corals are just to deep for ROVs or submersible to reach.

Third, I'm suprised that AUVs are rarely used. It sort of make sense since over 80% of the ocean are unmapped. Maybe this is one of the engineering challenges that needs to be further explored.

Finally, looks like towed cameras have only recently used after 2010. But we also have a spike here on 2014, where it is much more preferable to used towed cameras instead of ROVs.

Wow that was really interesting. We ended up with more questions than answers, but we got what we need. Looks like we will need to support ROVs, submersibles, towed cameras, and some trawls. I think I'm going to include AUVs here because I think there are potential for them to be developed. Maybe as we gain more scientists and engineers in our research resort, we can find a way to make AUVs more mainstream. If we have autonomous vehicles on land and air, why not in the ocean too right?

And now we are going to last part of our questions, to find partners.

### Which institution/organization would be willing to be partners?

We got our location, we know what kind of equipment we will support, know we just need to find which organization/institution is willing to partner with us. Some of the organizations are just divisions of the NOAA with different focuses, while others are independent institute, universities, and even single individuals.

As always, let's check out the percentage of organizations/institutions that are doing coral realted research. Without the pie chart this time.

In [41]:
df.DataProvider.value_counts(normalize=True)

Monterey Bay Aquarium Research Institute                                                           0.380773
NOAA, Alaska Fisheries Science Center                                                              0.144879
NOAA, Southwest Fisheries Science Center, Santa Cruz                                               0.084882
NOAA, Olympic Coast National Marine Sanctuary                                                      0.070668
Hawaii Undersea Research Laboratory                                                                0.068373
Smithsonian Institution, National Museum of Natural History                                        0.046446
NOAA, Office of Ocean Exploration and Research                                                     0.033880
Bureau of Ocean Energy Management                                                                  0.025297
Temple University                                                                                  0.021275
NOAA, Southwest Fisheries Sc

Look at that, we're seeing potentials already. It seems that Monterey Bay Aquarium Research Institute has the most active research activity here with over 3% of the whole dataset. But I think most of those researchs are actually corals discovered at the Davidson Seamount since it just southwest of Monterey.

Since we know where we wanto to put our resaerch resort, we want to know which institution are doing research around South East Asia, or at leat, around the Pacific Ocean.

Th code below basically does is list all the organizations/institutions that have done at least more than 0.01% of the whole coral research activities, enumerate them for color of the coordinates on the map, and plotting the locations on the map.

In [42]:
values = df.DataProvider.value_counts(normalize=True).values.tolist()[1:]

value_list = [value for value in values if value < 0.01]

value_first_index = values.index(value_list[0])

counts = df.DataProvider.value_counts().values.tolist()[:value_first_index]
organizations = df.DataProvider.value_counts().index.tolist()[:value_first_index]

In [43]:
org_df = df[df.DataProvider.isin(organizations)]

orgs = org_df.DataProvider.value_counts().index.tolist()

color_dict = {org: num+1 for num, org in enumerate(orgs)}
org_df["ColorNum"] = [color_dict[org] for org in org_df.DataProvider]

In [44]:
color_dict

{'Monterey Bay Aquarium Research Institute': 1,
 'NOAA, Alaska Fisheries Science Center': 2,
 'NOAA, Southwest Fisheries Science Center, Santa Cruz': 3,
 'NOAA, Olympic Coast National Marine Sanctuary': 4,
 'Hawaii Undersea Research Laboratory': 5,
 'Smithsonian Institution, National Museum of Natural History': 6,
 'NOAA, Office of Ocean Exploration and Research': 7,
 'Bureau of Ocean Energy Management': 8,
 'Temple University': 9,
 'NOAA, Southwest Fisheries Science Center, La Jolla': 10,
 'Harbor Branch Oceanographic Institute': 11,
 'NOAA, Northwest Fisheries Science Center': 12,
 'NOAA, Deep Sea Coral Research & Technology Program and Office of Ocean Exploration and Research': 13,
 'NOAA, Channel Islands National Marine Sanctuary': 14}

In [50]:
# # UNCOMMENT THE CODE BELOW TO GENERATE
# # PLOT ON PLOTLY

# fig = go.Figure()

# initials = {}

# for org, num, in color_dict.items():
#     data_prov_df = org_df[org_df.DataProvider == org]
    
#     if len(org.split()) > 2:
#         if "NOAA" in org:
#             name = org.replace("NOOA,", "")
#             words = name.split()
#             first_chars = [word[0] for word in words]
#             new_name = "NOOA, " + "".join(first_chars)
#         else:
#             words = org.split()
#             first_chars = [word[0] for word in words]
#             new_name = "".join(first_chars)
#     else:
#         new_name = org
        
#     if new_name != org:
#         initials[new_name] = org
    
#     fig.add_trace(go.Scattergeo(
#         lon = data_prov_df.longitude,
#         lat = data_prov_df.latitude,
#         text = data_prov_df.DataProvider,
#         name = new_name, 
#         mode = 'markers',
#         marker = dict(
#             color = num,
#             size = 4
#         ),
#     ))

# fig.update_layout(
#         title = 'Organization Coral Research Activities in The World',
#         geo_scope='world',
#         showlegend=True
#     )
# fig.show()

<img src="images/visualizations/Organization_Coral_Research_Activities_in_The_World.png">

**Abbreviation Meanings**

'MBARI': 'Monterey Bay Aquarium Research Institute'

'NOOA, NAFSC': 'NOAA, Alaska Fisheries Science Center'

'NOOA, NSFSCSC': 'NOAA, Southwest Fisheries Science Center, Santa Cruz'

'NOOA, NOCNMS': 'NOAA, Olympic Coast National Marine Sanctuary'

'HURL': 'Hawaii Undersea Research Laboratory'

'SINMoNH': 'Smithsonian Institution, National Museum of Natural History'

'NOOA, NOoOEaR': 'NOAA, Office of Ocean Exploration and Research'

'BoOEM': 'Bureau of Ocean Energy Management'

'NOOA, NSFSCLJ': 'NOAA, Southwest Fisheries Science Center, La Jolla'

'HBOI': 'Harbor Branch Oceanographic Institute'

'NOOA, NNFSC': 'NOAA, Northwest Fisheries Science Center'

'NOOA, NDSCR&TPaOoOEaR': 'NOAA, Deep Sea Coral Research & Technology Program and Office of Ocean Exploration and Research'

'NOOA, NCINMS': 'NOAA, Channel Islands National Marine Sanctuary'

Note that I shortened the actual name of all the organizations/institutions, because some have really long names, and If plot that in the legends, it will cover up the whole map.

Going back to the actual map, we can see that the only organization/institution that has done reaserh around South East Asia is the Smithsonian Institution. However, we do have some who are doing research in Hawaii: the Southwest Fisheries Science Center, Santa Cruz and the Office of Ocean Exploration and Research, both of which are part of the NOAA.

Hmm, but are they been active in recent years? Because we want to make sure our partners are active with their coral research activities. If not, then I think they probably have other priorities.

In [45]:
org_names = org_df.DataProvider.value_counts().index.tolist()
copy_df = df[df.ObservationDate.notnull()].copy()

new_org_df = copy_df[copy_df.DataProvider.isin(org_names)]

In [46]:
# Get the years of each datapoint
year_list = new_org_df.ObservationDate.values.astype('datetime64[s]').tolist()
obsrv_years = [date.year for date in year_list]

# Last recorded year
last_recorded_year = sorted(list(set(obsrv_years)))[-1] - 1 # from 2015

# Year range
n_years_past = 26
wanted_years = list(range(last_recorded_year + 1  - n_years_past, last_recorded_year + 1))

# Create a new column called ObservationYear
new_org_df['ObservationYear'] = obsrv_years


# Get devices with the desired year range
date_prov_df = new_org_df[new_org_df.ObservationYear.isin(wanted_years)]

In [47]:
# fig = go.Figure()

# all_values = []

# for org in org_names:
#     wanted_year_df = date_prov_df[date_prov_df.DataProvider == org]
#     wanted_orgs_df = wanted_year_df[wanted_year_df.ObservationYear.isin(wanted_years)]
#     year_freq_dict = wanted_orgs_df.ObservationYear.value_counts().to_dict()
    
#     value_list = []
    
#     for year in wanted_years:
        
#         if year in year_freq_dict.keys():
#             value_list.append(year_freq_dict[year])
#         else:
#             value_list.append(0)
#     all_values.append(value_list)
    
# for i in range(len(org_names)):
    
#     if len(org_names[i].split()) > 2:
        
#         if "NOAA" in org_names[i]:
#             name = org_names[i].replace("NOOA,", "")
#             words = name.split()
#             first_chars = [word[0] for word in words]
#             new_name = "NOOA, " + "".join(first_chars)
#         else:
#             words = org_names[i].split()
#             first_chars = [word[0] for word in words]
#             new_name = "".join(first_chars)
#     else:
#         new_name = org_names[i]
    
#     fig.add_trace(go.Bar(x=wanted_years, y=all_values[i], name=new_name))
    
# fig.update_layout(
#     barmode='stack', 
#     xaxis={'categoryorder':'category ascending'},
#     title = f'Organization Coral Research Activities from {str(wanted_years[0])}-{str(wanted_years[-1])}'
# )    
# fig.show()

<img src="images/visualizations/Organization_Coral_Research_Activities.png">

Alright, we know for sure that the Southwest Fisheries Science Center and the Office of Ocean Exploration and Research are quite active in recent years. Both of them did a number of coral research in 2014 and 2015. The Smithsonian Institution, however, are not active lately. In fact, their most active research on corals is in 2005.

Well, I think we already know which of the are going to be our potential partners. The Southwest Fisheries Science Center and the Office of Ocean Exploration and Research are definitely in. Also, I think I'm going to add the Smithsonian Institution anyway, because although they are not active in terms of the coral research, they do have some knowledge about doing coral research in South East Asia. Having them on board will surely benefit thr researchers or enngineers in our research resort to explore corals in the Coral Triangle.

For our final analysis, I think it's a good idea to check if these organizations/partners are using the equipments that we supported. 

Before that, however, I wanted to mention one interesting insight here. We can see in the year 2000 there's a spike in the number of coral research activites being done by the Monterey Bay Aquarium Research Institute. If you remember, that is the same year that there was a spike in the number of ROV uses. What's going on here? A question to be answered another time.

In [48]:
wanted_partners = [
    'Smithsonian Institution, National Museum of Natural History',
    'NOAA, Office of Ocean Exploration and Research',
    'NOAA, Southwest Fisheries Science Center, Santa Cruz'
]

wanted_devices = [
    'ROV',
    'submersible',
    'towed camera',
    'AUV',
    'trawl'
]

partners_df = new_org_df[new_org_df.DataProvider.isin(wanted_partners)]

In [49]:
# fig = go.Figure()

# all_values = []

# for partner in wanted_partners:
    
#     # Get data that match the following partner
#     wanted_df = partners_df[partners_df.DataProvider == partner]
    
#     # Get data which sampling equipment is desired
#     wanted_df = wanted_df[wanted_df.SamplingEquipment.isin(wanted_devices)]
    
#     # Dictionary of devices
#     devices_dict = wanted_df.SamplingEquipment.value_counts().to_dict()
    
#     value_list = []
    
#     for device in wanted_devices:
        
#         if device in devices_dict.keys():
#             value_list.append(devices_dict[device])
#         else:
#             value_list.append(0)
#     all_values.append(value_list)

# for i in range(len(wanted_partners)):
#     if len(wanted_partners[i].split()) > 2:
        
#         if "NOAA" in wanted_partners[i]:
#             name = wanted_partners[i].replace("NOOA,", "")
#             words = name.split()
#             first_chars = [word[0] for word in words]
#             new_name = "NOOA, " + "".join(first_chars)
#         else:
#             words = wanted_partners[i].split()
#             first_chars = [word[0] for word in words]
#             new_name = "".join(first_chars)
#     else:
#         new_name = wanted_partners[i]
#     fig.add_trace(go.Bar(x=wanted_devices, y=all_values[i], name=new_name))
    
# fig.update_layout(
#     title = "Use of Common Sampling Equipments by Potential Partners"
# )    
# fig.show()

<img src="images/visualizations/Uses_of_Common_Sampling_Equipments_by_Potential_Partners.png">

Everything looks good. All of them seems to be actively using the equipment we support.

However, it seems that the Smithsonian Institution prefers trawls over the other equipments. I could only think of two possibilities here; either the trawls are used in past research activities when ROVs or submersible are too expensive to use for research, or they need to observer real sample since they also have museum. Maybe both? Again, a question worth answering another time.

## Conclusion


In this notebook, we took a look at the deep-sea coral dataset provided by the NOAA for the purpose of building a coastal reasearch resort, which has long-term goal of supporting missions for Mariana Trench related research, and a short-term goal of supporting coral research activities. 

In order to do this, we asked four questions to determined the location of the research resort, the kind of equipment to support, and the potential partners.

We decided that the location of our resarch resort will be South East Asia, where the Coral Triangle is located and not too far from the Mariana Trench. For the equipments, we want to support four types: ROVs, submersibles, towed cameras, trawls, and AUVs. Finally for our partners, we wanto invite the Smithsonian Institution, Office of Ocean Exploration and Research, and the Southwest Fisheries Science Center.