# Deep Sea Corals
## Coral Records from NOAA’s Deep-Sea Coral Research and Technology Program

<table>
<tr>
    <td> <img src="images/NOAA_Flag.svg " alt="Photo by Q.U.I on Unsplash" style="height:300px"> </td>
    <td> <img src="images/q-u-i-0G01UI1MQhg-unsplash.jpg" style="height:300px"> </td>
</tr>
</table>

### Context

This dataset contains information about deep sea corals and sponges collected by NOAA and NOAA’s partners. Amongst the data are geo locations of deep sea corals and sponges and the whole thing is tailored to the occurrences of azooxanthellates - a subset of all corals and all sponge species (i.e. they don't have symbiotic relationships with certain microbes). Additionally, these records only consists of observations deeper than 50 meters to truly focus on the deep sea corals and sponges.

### Content

Column descriptions:

- CatalogNumber: Unique record identifier assigned by the Deep-Sea Coral Research and Technology Program.
- DataProvider: The institution, publication, or individual who ultimately deserves credit for acquiring or aggregating the data and making it available.
- ScientificName: Taxonomic identification of the sample as a Latin binomial.
- VernacularNameCategory: Common (vernacular) name category of the organism.
- TaxonRank: Identifies the level in the taxonomic hierarchy of the ScientificName term.
- ObservationDate: Time as hh:mm:ss when the sample/observation occurred (UTC).
- Latitude (degrees North): Latitude in decimal degrees where the sample or observation was collected.
- Longitude (degrees East): Longitude in decimal degrees where the sample or observation was collected.
- DepthInMeters: Best single depth value for sample as a positive value in meters.
- DepthMethod: Method by which best singular depth in meters (DepthInMeters) was determined. "Averaged" when start and stop depths were averaged. "Assigned" when depth was derived from bathymetry at the location. "Reported" when depth was reported based on instrumentation or described in literature.
- Locality: A specific named place or named feature of origin for the specimen or observation (e.g., Dixon Entrance, Diaphus Bank, or Sur Ridge). Multiple locality names can be separated by a semicolon, arranged in a list from largest to smallest area (e.g., Gulf of Mexico; West Florida Shelf, Pulley Ridge).
- IdentificationQualifier: Taxonomic identification method and level of expertise. Examples: “genetic ID”; “morphological ID from sample by taxonomic expert”; “ID by expert from image”; “ID by non-expert from video”; etc.
- SamplingEquipment: Method of data collection. Examples: ROV, submersible, towed camera, SCUBA, etc.
- RecordType: Denotes the origin and type of record. published literature ("literature"); a collected specimen ("specimen"); observation from a still image ("still image"); observation from video ("video observation"); notation without a specimen or image ("notation"); or observation from trawl surveys, longline surveys, and/or observer records ("catch record").

## Business Understanding

**Main Goal**: Creatin a research costal resort for marince science. 

**Guading Questions**: 

1. Which part of the world has the most coral research activities?
2. How diverse are coral reefs in certain areas?
3. What kind of instrument is needed for doing coral research?
4. Which institution/organization would be willing to be partners?

## Data Understanding

In [None]:
import numpy as np
import pandas as pd
import chart_studio
import chart_studio.plotly as py
import plotly.graph_objects as go
chart_studio.tools.set_credentials_file(username='grilhami123', api_key='iYyJdIjUQHtkNsT02gKr')

### Load Data

In [None]:
df = pd.read_csv("deep_sea_corals.csv")
df = df.iloc[1:]

### Explore Data

In [None]:
df.head()

In [None]:
df.shape

In [None]:
df.columns

In [None]:
df.info()

In [None]:
df['longitude'] = pd.to_numeric(df['longitude'])
df['latitude'] = pd.to_numeric(df['latitude'])
df['ObservationDate'] = pd.to_datetime(df['ObservationDate'])

In [None]:
df.info()

In [None]:
df.describe()

In [None]:
df[df.DepthInMeters < 0].shape

In [None]:
df.isna().sum()

In [None]:
df.Repository.isna().sum()

### 1. Which part of the world has the most coral research activities?

In [None]:
def general_location(location):
    if ";" in location:
        general_loc = location.split(";")[0]
        return general_loc
    elif "," in location:
        general_loc = location.split(",")[0]
        return general_loc
    else:
        return location

In [None]:
from collections import Counter 

all_locations = df.Locality.astype(str).values.tolist()
all_locations = list(map(general_location, all_locations))

all_locations_count = Counter(all_locations)
all_locations_count.most_common()

In [None]:
df['GeneralLocality'] = all_locations

In [None]:
values = df.GeneralLocality.value_counts(normalize=True).values.tolist()[1:]

value_list = [value for value in values if value < 0.01]

value_first_index = values.index(value_list[0])

counts = df.GeneralLocality.value_counts().values.tolist()[1:][:value_first_index]
locations = df.GeneralLocality.value_counts().index.tolist()[1:][:value_first_index]

In [None]:
fig = go.Figure(data=[go.Pie(labels=locations, values=counts)])
fig.update_layout(
        title = 'Coral Reef Observation Locations',
    )
py.plot(fig, filename = 'coral-reef-location-pie-chart', auto_open=True)

# if you wish to display the chart in the notebook
# comment the line above and uncomment below
# fig.show()

In [None]:
location_df = df[df.GeneralLocality.isin(locations)]
location_df.head()

In [None]:
location_df.shape

In [None]:
fig = go.Figure(data=go.Scattergeo(
        lon = location_df.longitude,
        lat = location_df.latitude,
        text = location_df.Locality,
        mode = 'markers',
        ))

fig.update_layout(
        title = 'Coral Reef Observations in North America',
        geo_scope='north america',
    )
fig.show()

<img src="images/visualizations/Coral_Reef_Observations_in_North_America.png ">

In [None]:
nan_loc_df = df[df.GeneralLocality == 'nan']
nan_loc_df.shape

In [None]:
fig = go.Figure(data=go.Scattergeo(
        lon = nan_loc_df.longitude,
        lat = nan_loc_df.latitude,
        text = nan_loc_df.Locality,
        mode = 'markers',
        ))

fig.update_layout(
        title = 'Coral Reef Observations in Unknown Locations',
        geo_scope='world',
    )
fig.show()

<img src="images/visualizations/Coral_Reef_Observations_in_Unknown_Locations.png">

### How diverse are coral reefs in certain areas?

In [None]:
df.VernacularNameCategory.value_counts(normalize=True)

In [None]:
# values = df.VernacularNameCategory.value_counts(normalize=True).values.tolist()

# value_list = [value for value in values if value < 0.01]

# value_first_index = values.index(value_list[0])

category_counts = df.VernacularNameCategory.value_counts().values.tolist()[:value_first_index]
category_names = df.VernacularNameCategory.value_counts().index.tolist()[:value_first_index]

In [None]:
fig = go.Figure(data=[go.Pie(labels=category_names, values=category_counts)])
fig.show()

In [None]:
fig.__dict__.keys()

In [None]:
fig._layout['template']

In [None]:
fig = go.Figure(data=go.Scattergeo(
        lon = df.longitude,
        lat = df.latitude,
        text = df.VernacularNameCategory,
        mode = 'markers',
        marker = dict(color=list(range(len(df.VernacularNameCategory)))),
        ))

fig.update_layout(
        title = 'Coral Reef Diversity in The World',
        geo_scope='world',
    )
fig.show()

In [None]:
iris = px.data.iris()

In [None]:
iris