# General

---

In [1]:
# imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns 
from datetime import datetime
import descartes 
import geopandas as gpd
from shapely.geometry import Point, Polygon

ModuleNotFoundError: No module named 'geopandas'

# Data Notes  

Files (by fishing gear types):  
1. Drifting longlines vessels
2. Fixed gear vessels
3. Pole and line vessels
4. Purse Seines vessels
5. Trawlers vessels
6. Trollers vessels
7. Vessels with unknown geartypes  

CVS Table Schema:
* mmsi: anonymized vessel identifier
* timestamp: unix timestamp 
* distance_from_shore: distance from shore in meters 
* distance_from_port: distance from port in meters
* speed: vessel speed in knots
* course: vessel's course over ground (represented in degrees)
* lat: latitude in decimal degrees 
* lon: longitude in decimal degrees
* is_fishing: lable indicating fishing activity
  * 0 = not fishing
  * \>0 = fishing; data values between 0 and 1 indicate the average score for the position if scored by multiple people 
  * -1 = no data  
* source: the training data batch; data was prepared by GFW, Dalhousie, and a crowd sourcing campaign (false positives are marked as false_positives)

---

In [16]:
# filepaths 
drifting_longlines_file = './data/drifting_longlines.csv'
fixed_gear_file = './data/fixed_gear.csv'
pole_and_line_file = './data/pole_and_line.csv'
purse_seines_file = './data/purse_seines.csv'
trawlers_file = './data/trawlers.csv'
trollers_file = './data/trollers.csv'
unknown_file = './data/unknown.csv'

# Exploratory Analysis 
---

## Drifting Longlines    
A drifting longline consists of a mainline kept near the surface or at a certain depth by means of regularly spaced floats and with relatively long snoods with baited hooks, evenly spaced on the mainline. Drifting longlines may be of considerable length. Some drifting longlines are set vertically, each line hanging from a float at the surface. (Source: [FAO Drifting Longlines](https://www.fao.org/fishery/en/geartype/233/en))


In [57]:
# read in data from the drifting long lines boats
driftingLongLinesDF = pd.read_csv(drifting_longlines_file)

# peak at the data 
driftingLongLinesDF.head()

Unnamed: 0,mmsi,timestamp,distance_from_shore,distance_from_port,speed,course,lat,lon,is_fishing,source
0,12639560000000.0,1327137000.0,232994.28125,311748.65625,8.2,230.5,14.865583,-26.853662,-1.0,dalhousie_longliner
1,12639560000000.0,1327137000.0,233994.265625,312410.34375,7.3,238.399994,14.86387,-26.8568,-1.0,dalhousie_longliner
2,12639560000000.0,1327137000.0,233994.265625,312410.34375,6.8,238.899994,14.861551,-26.860649,-1.0,dalhousie_longliner
3,12639560000000.0,1327143000.0,233994.265625,315417.375,6.9,251.800003,14.822686,-26.865898,-1.0,dalhousie_longliner
4,12639560000000.0,1327143000.0,233996.390625,316172.5625,6.1,231.100006,14.821825,-26.867579,-1.0,dalhousie_longliner


In [40]:
# data shape 
driftingLongLinesDF.shape

(13968727, 10)

In [14]:
# check label distribution
driftingLongLinesDF['is_fishing'].value_counts()

-1.000000    13748986
 1.000000      138163
 0.000000       79574
 0.666667        1076
 0.333333         809
 0.750000         110
 0.250000           9
Name: is_fishing, dtype: int64

In [18]:
# look at data characteristics 
driftingLongLinesDF.describe()

Unnamed: 0,mmsi,timestamp,distance_from_shore,distance_from_port,speed,course,lat,lon,is_fishing
count,13968730.0,13968730.0,13968730.0,13968730.0,13968630.0,13968630.0,13968730.0,13968730.0,13968730.0
mean,129385000000000.0,1434290000.0,584531.1,789750.5,5.464779,181.4876,-8.997629,3.758693,-0.9743015
std,78873570000000.0,39842750.0,542006.8,691543.8,4.043567,105.0503,24.39311,109.5971,0.2119947
min,5601266000000.0,1325376000.0,0.0,0.0,0.0,0.0,-75.19017,-180.0,-1.0
25%,62603840000000.0,1410706000.0,101909.2,213020.6,2.1,90.7,-26.0155,-88.08668,-1.0
50%,118485900000000.0,1447302000.0,457639.3,637524.9,5.5,181.1,-14.97954,-1.716495,-1.0
75%,198075800000000.0,1466506000.0,960366.4,1210432.0,8.5,271.1,4.48579,100.9811,-1.0
max,281205800000000.0,1480032000.0,4430996.0,7181037.0,102.3,511.0,83.33266,179.9938,1.0


Many records with course values of 511, which is larger than 360 degrees. That represents not available data according to the [US Coast Guard Class A AIS Position Report Documentation](https://www.navcen.uscg.gov/?pageName=AISMessagesA).

In [33]:
# check for null values
driftingLongLinesDF.isnull().sum()

mmsi                    0
timestamp               0
distance_from_shore     0
distance_from_port      0
speed                  98
course                 98
lat                     0
lon                     0
is_fishing              0
source                  0
dtype: int64

In [58]:
# reformat unix timestamps into datetime
driftingLongLinesFormattedTimestamps = pd.to_datetime(driftingLongLinesDF['timestamp'])
driftingLongLinesDF.insert(2, 'timestamp_reformat', driftingLongLinesFormattedTimestamps)
driftingLongLinesDF.head() 

Unnamed: 0,mmsi,timestamp,timestamp_reformat,distance_from_shore,distance_from_port,speed,course,lat,lon,is_fishing,source
0,12639560000000.0,1327137000.0,1970-01-01 00:00:01.327136504,232994.28125,311748.65625,8.2,230.5,14.865583,-26.853662,-1.0,dalhousie_longliner
1,12639560000000.0,1327137000.0,1970-01-01 00:00:01.327136605,233994.265625,312410.34375,7.3,238.399994,14.86387,-26.8568,-1.0,dalhousie_longliner
2,12639560000000.0,1327137000.0,1970-01-01 00:00:01.327136734,233994.265625,312410.34375,6.8,238.899994,14.861551,-26.860649,-1.0,dalhousie_longliner
3,12639560000000.0,1327143000.0,1970-01-01 00:00:01.327143281,233994.265625,315417.375,6.9,251.800003,14.822686,-26.865898,-1.0,dalhousie_longliner
4,12639560000000.0,1327143000.0,1970-01-01 00:00:01.327143341,233996.390625,316172.5625,6.1,231.100006,14.821825,-26.867579,-1.0,dalhousie_longliner


In [59]:
# count the unique mmsi's
len(pd.unique(driftingLongLinesDF['mmsi']))

110

In [60]:
# group the drifting long line vessels by their mmsi's 
driftingLongLinesMMSIGroups = driftingLongLinesDF.groupby('mmsi')

# peak at first 3 rows in each resulting gorup 
driftingLongLinesMMSIGroups.head(3)

Unnamed: 0,mmsi,timestamp,timestamp_reformat,distance_from_shore,distance_from_port,speed,course,lat,lon,is_fishing,source
0,1.263956e+13,1.327137e+09,1970-01-01 00:00:01.327136504,2.329943e+05,3.117487e+05,8.2,230.500000,14.865583,-26.853662,-1.0,dalhousie_longliner
1,1.263956e+13,1.327137e+09,1970-01-01 00:00:01.327136605,2.339943e+05,3.124103e+05,7.3,238.399994,14.863870,-26.856800,-1.0,dalhousie_longliner
2,1.263956e+13,1.327137e+09,1970-01-01 00:00:01.327136734,2.339943e+05,3.124103e+05,6.8,238.899994,14.861551,-26.860649,-1.0,dalhousie_longliner
11846,5.139444e+13,1.328869e+09,1970-01-01 00:00:01.328868563,6.529687e+05,9.430994e+05,7.8,359.799988,59.139278,-178.281891,-1.0,dalhousie_longliner
11847,5.139444e+13,1.328869e+09,1970-01-01 00:00:01.328868584,6.529687e+05,9.430994e+05,7.9,359.600006,59.140053,-178.281921,-1.0,dalhousie_longliner
...,...,...,...,...,...,...,...,...,...,...,...
13725304,2.787982e+14,1.343593e+09,1970-01-01 00:00:01.343592535,6.170302e+05,7.961821e+05,6.8,224.899994,-19.311558,6.435753,-1.0,crowd_sourced
13725305,2.787982e+14,1.343593e+09,1970-01-01 00:00:01.343592554,6.170302e+05,7.961821e+05,6.0,221.600006,-19.312006,6.435373,-1.0,crowd_sourced
13830860,2.812058e+14,1.325376e+09,1970-01-01 00:00:01.325376202,1.650215e+06,2.245168e+06,1.3,244.100006,-11.898817,-118.598770,-1.0,crowd_sourced
13830861,2.812058e+14,1.325397e+09,1970-01-01 00:00:01.325397311,1.606711e+06,2.217462e+06,4.4,189.300003,-12.321533,-118.603546,-1.0,crowd_sourced


## Fixed Gear  
Fixed gear generally means trapping or potting, and gillnetting, where the catching implement is set, soaked and later retrieved. (Source: [FAO Gillnets](https://www.fao.org/fishery/en/geartype/247/en))

## Pole and Lines  
A pole and line consists of a hooked line attached to a pole. This method is common to sport fisheries (angling) but it is also used in commercial fisheries. Fishing rods/poles are made of wood (including bamboo, also constructed of split cane) and increasingly of fiberglass. (Source: [FAO Pole and Lines](https://www.fao.org/fishery/en/geartype/314/en))

## Purse Seines  
A purse seine is made of a long wall of netting framed with floatline and leadline (usually, of equal or longer length than the former) and having purse rings hanging from the lower edge of the gear, through which runs a purse line made from steel wire or rope which allow the pursing of the net. For most of the situation, it is the most efficient gear for catching large and small pelagic species that is shoaling. (Source: [FAO Purse \Sseines](https://www.fao.org/fishery/en/geartype/249/en))


## Trawlers  
The trawls are cone-shaped net (made from two, four or more panels) which are towed, by one or two boats, on the bottom or in midwater (pelagic). The cone-shaped body ends in a bag or coded. The horizontal opening of the gear while it is towed is maintained by beams, otter boards or by the distance between the two towing vessels (pair trawling). Floats and weights and/or hydrodynamic devices provide for the vertical opening. Two parallel trawls might be rigged between two otter boards (twin trawls). The mesh size in the codend or special designed devices is used to regulate the size and species to be captured. (Source: [FAO Trawls](https://www.fao.org/fishery/en/geartype/103/en))



## Trollers  
A trolling line consists of a line with natural or artificial baited hooks and is trailed by a vessel near the surface or at a certain depth. Several lines are often towed at the same time, by using outriggers to keep the lines away from the wake of the vessel. The line are hauled by hand or with small winches. A piece of rubber is often included in each line as a shock absorber. (Source: [FAO Trolling Lines](https://www.fao.org/fishery/en/geartype/235/en))


## Unknown  
Vessels with unkown fishing gear types. 