#Welcome!
This is a walkthrough of how to refine the original database of Lionfish sightings to a more research-applicable subset of sightings from the American Gulf Coast in 2013

#Prepare Workspace and Dataset

**Import and Mount Google Drive**<br>
This allows the code to access the dataset, if saved in Drive

In [2]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


**Import Packages**<br>
This allows for manipulation of the dataset

In [3]:
import numpy as np
import pandas as pd

**Import Dataset**

In [4]:
df=pd.read_csv('gdrive/My Drive/Colab Notebooks/LionfishData(in).csv', encoding='latin-1')

**Review Data** <br>
Use `.shape`, `.size`, and `.columns` functions to assess the size and features of the original dataset

In [5]:
df.shape

(12088, 67)

In [6]:
df.size

809896

In [7]:
df.columns

Index(['Specimen Number', 'Species ID', 'Group', 'Family', 'Scientific Name',
       'Common Name', 'Country', 'State', 'County', 'Locality', 'Latitude',
       'Longitude', 'Source', 'Accuracy', 'Drainage Name', 'HUC 8 Number',
       'Year', 'Month', 'Day', 'Status', 'Comments', 'record_type', 'disposal',
       'Museum_Cat_No', 'fresh_marine_intro', 'Reference 1', 'Type 1',
       'Date 1', 'Author 1', 'Title 1', 'Publisher 1', 'Location 1',
       'Reference 2', 'Type 2', 'Date 2', 'Author 2', 'Title 2', 'Publisher 2',
       'Location 2', 'Reference 3', 'Type 3', 'Date 3', 'Author 3', 'Title 3',
       'Publisher 3', 'Location 3', 'Reference 4', 'Type 4', 'Date 4',
       'Author 4', 'Title 4', 'Publisher 4', 'Location 4', 'Reference 5',
       'Type 5', 'Date 5', 'Author 5', 'Title 5', 'Publisher 5', 'Location 5',
       'Reference 6', 'Type 6', 'Date 6', 'Author 6', 'Title 6', 'Publisher 6',
       'Location 6'],
      dtype='object')

#Refine Dataset <br>


**Filter by Feature: State** <br>
To assess Lionfish sightings along the US Gulf Coast in particular, select only Gulf Coast states from the dataset <br><br>
1. Create subset of USA sightings
2. Use `.value_counts` to note the Gulf Coast states included in the data
3. Create individual subsets of data from each of the Gulf Coast states
4. Use `.concat` to combine these subsets into a Gulf Coast state dataset


In [8]:
USA_data = df[df["Country"] == "United States of America"]
USA_data.value_counts("State")

Unnamed: 0_level_0,count
State,Unnamed: 1_level_1
FL,6944
TX,841
LA,349
AL,310
VI,253
MS,247
PR,198
NC,135
SC,51
GA,32


In [9]:
FL_subset = df[df["State"] == "FL"]

In [10]:
LA_subset = df[df["State"] == "LA"]

In [11]:
AL_subset = df[df["State"] == "AL"]

In [12]:
TX_subset = df[df["State"] == "TX"]

In [13]:
MS_subset = df[df["State"] == "MS"]

In [14]:
GULF = pd.concat([FL_subset,LA_subset,AL_subset,TX_subset, MS_subset],axis=0,ignore_index=True,sort=False)

**Filter by Feature: Year**<br>
To assess Lionfish sightings from 2013 in particular, create a subset from the Gulf Coast state dataset with sightings from that year <br><br>


In [15]:
GULFA = GULF[GULF["Year"] == 2013]

#Review and Save Dataset

**View the Dataset**<br>
Check the abbreviated dataset returned to make sure the filters were correctly applied

In [16]:
GULFA

Unnamed: 0,Specimen Number,Species ID,Group,Family,Scientific Name,Common Name,Country,State,County,Locality,...,Title 5,Publisher 5,Location 5,Reference 6,Type 6,Date 6,Author 6,Title 6,Publisher 6,Location 6
2,1339724,963,Marine Fishes,Scorpaenidae,Pterois volitans/miles,lionfish,United States of America,FL,,"Atlantic Ocean, Florida Keys, Key Largo, Eagle...",...,,,,,,,,,,
3,1339710,963,Marine Fishes,Scorpaenidae,Pterois volitans/miles,lionfish,United States of America,FL,,"Atlantic Ocean, Florida Keys, south of America...",...,,,,,,,,,,
4,1339512,963,Marine Fishes,Scorpaenidae,Pterois volitans/miles,lionfish,United States of America,FL,,"Gulf of Mexico, Florida, Cutter Rock",...,,,,,,,,,,
71,1339469,963,Marine Fishes,Scorpaenidae,Pterois volitans/miles,lionfish,United States of America,FL,,"Atlantic Ocean, Florida Keys, Dixie Shoals",...,,,,,,,,,,
72,1339484,963,Marine Fishes,Scorpaenidae,Pterois volitans/miles,lionfish,United States of America,FL,,"Atlantic Ocean, Florida Keys, Eagle Ray II",...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8672,1414389,963,Marine Fishes,Scorpaenidae,Pterois volitans/miles,lionfish,United States of America,MS,,"Gulf of Mexico, Mississippi, VK-384",...,,,,,,,,,,
8673,1414390,963,Marine Fishes,Scorpaenidae,Pterois volitans/miles,lionfish,United States of America,MS,,"Gulf of Mexico, Mississippi, VK-384",...,,,,,,,,,,
8674,1414391,963,Marine Fishes,Scorpaenidae,Pterois volitans/miles,lionfish,United States of America,MS,,"Gulf of Mexico, Mississippi, VK-384",...,,,,,,,,,,
8675,1414392,963,Marine Fishes,Scorpaenidae,Pterois volitans/miles,lionfish,United States of America,MS,,"Gulf of Mexico, Mississippi, VK-384",...,,,,,,,,,,


**Save the Dataset**<br>
Export the final dataset as a .csv **file**

In [18]:
GULFA.to_csv("USA_Gulf_subset.csv", index=False)