### Challenge questions

Easy questions:

 1. How many total pings are in the Ocearch shark data?
 2. How many unique species of sharks are in the data set?
 3. What is the name, weight, and species of the heaviest shark(s)?
 4. When and where was the very first ping?
 5. Excluding results with 0 distance traveled: what's the minimum, average, and maximum travel distances?
 
Intermediate questions:

 1. Which shark had the most pings?
 2. Which shark has been pinging the longest, and how long has that been?
 3. Which shark species has the most individual sharks tagged?
 4. What is the average length and weight of each shark species?
 5. Which shark has the biggest geographic box (largest distance from min lat/lon to max lat/lon, not dist_traveled)?
 
Hard questions:
 1. Use folium to plot the first ping, last ping, and a line connecting each ping for the Tiger shark Emma.  Make the first ping marker a 'play' icon, and last ping icon a 'stop' icon.
 2. Resample Emma data to have a daily lat/lon average, and interpolate missing results.  Plot a marker for each day, and color them blue for hard data, green for interpolated lat/lons
 3. Resample all shark data for daily lat/lon averages, and interpolate missing results
 4. Calculate distance between Emma and other sharks on a daily basis
 5. Identify the shark that has the shortest average distance to Emma per day (minimum 50 days of pings with Emma)
 6. Plot Emma and her closest buddy: interpolated results for each in green, Emma as circle icons and her buddy as square icons

### Load data

In [2]:
import pandas as pd
import datetime as dt
df = pd.read_csv('data/sharks.csv')
df.shape

(65793, 12)

#### Clean

In [3]:
#cleans datetime

df['datetime'] = pd.to_datetime(df['datetime'])
df.datetime[0]

#cleans weight

def clean_weight(value):
    if not value:
        return value
    # most values are like "123 lb"
    value = str(value)
    for character in 'lbs,+':
        value = value.replace(character, '')
    return float(value)

#cleans length

def clean_length(value):
    if not value:
        return value
    # most length values are like '3 ft 4 in.'
    value = str(value)
    total = 0
    if 'ft' in value:
        ft, inches = value.split('ft')
        total += int(ft.strip()) * 12
    else:
        inches = value
    if inches.strip():
        total += float(inches.strip().split()[0])
    return total

df['weight'] = df.weight.apply(clean_weight)
df['length'] = df.length.apply(clean_length)

numeric_cols = ['latitude', 'longitude', 'dist_total', 'weight', 'length']
df[numeric_cols] = df[numeric_cols].apply(pd.to_numeric, axis=1)
df.head()

Unnamed: 0,active,datetime,id,latitude,longitude,name,gender,species,weight,length,tagDate,dist_total
0,1,2014-07-06 04:57:28,3,-34.60661,21.15244,Oprah,Female,White Shark (Carcharodon carcharias),686.0,118.0,7 March 2012,2816.662
1,1,2014-06-23 02:40:09,3,-34.78752,19.42479,Oprah,Female,White Shark (Carcharodon carcharias),686.0,118.0,7 March 2012,2816.662
2,1,2014-06-15 13:15:44,3,-34.42487,21.09754,Oprah,Female,White Shark (Carcharodon carcharias),686.0,118.0,7 March 2012,2816.662
3,1,2014-06-03 02:23:57,3,-34.704323,20.210134,Oprah,Female,White Shark (Carcharodon carcharias),686.0,118.0,7 March 2012,2816.662
4,1,2014-05-28 19:53:57,3,-34.65556,19.37459,Oprah,Female,White Shark (Carcharodon carcharias),686.0,118.0,7 March 2012,2816.662


#### Query Ocearch API

In [4]:
import requests
url = 'http://www.ocearch.org/tracker/ajax/filter-sharks'

resp = requests.get(url)
resp

<Response [200]>

#### Transform data

### Explore data

### Challenge Questions

#### Hard questions

 1. Use folium to plot the first ping, last ping, and a line connecting each ping for the Tiger shark Emma.  Make the first ping marker a 'play' icon, and last ping icon a 'stop' icon.
 2. Resample Emma data to have a daily lat/lon average, and interpolate missing results.  Plot a marker for each day, and color them blue for hard data, green for interpolated lat/lons
 3. Resample all shark data for daily lat/lon averages, and interpolate missing results
 4. Calculate distance between Emma and other sharks on a daily basis
 5. Identify the shark that has the shortest average distance to Emma per day (minimum 50 days of pings with Emma)
 6. Plot Emma and her closest buddy: interpolated results for each in green, Emma as circle icons and her buddy as square icons

##### Plot Emma locations
Plot the ping locations for the shark named Emma as a `PolyLine` in folium.  Include the first and last ping location as markers.

In [5]:
import folium as fm

In [6]:
emma = df[df.name == 'Emma'].copy().sort_values(by='datetime')
emma.head()

Unnamed: 0,active,datetime,id,latitude,longitude,name,gender,species,weight,length,tagDate,dist_total
34075,1,2014-01-31 22:10:18,102,-0.466747,-90.30005,Emma,Female,Tiger Shark (Galeocerdo cuvier),,99.0,20 January 2014,4368.906
34074,1,2014-01-31 22:51:31,102,-0.41101,-90.32783,Emma,Female,Tiger Shark (Galeocerdo cuvier),,99.0,20 January 2014,4368.906
34073,1,2014-01-31 23:49:34,102,-0.47808,-90.36889,Emma,Female,Tiger Shark (Galeocerdo cuvier),,99.0,20 January 2014,4368.906
34072,1,2014-02-01 00:25:07,102,-0.24096,-89.920682,Emma,Female,Tiger Shark (Galeocerdo cuvier),,99.0,20 January 2014,4368.906
34071,1,2014-02-01 08:31:34,102,-0.42935,-89.64942,Emma,Female,Tiger Shark (Galeocerdo cuvier),,99.0,20 January 2014,4368.906


In [7]:
avg_lat = emma.latitude.mean()
avg_long = emma.longitude.mean()

mymap = fm.Map(tiles='stamenwatercolor',
              location=(avg_lat,avg_long),
              zoom_start=5)

latlong = list(zip(emma.latitude.values, emma.longitude.values))
latlong[:5]

fm.PolyLine(latlong,color='black').add_to(mymap)
fm.Marker(latlong[0],
          icon=fm.Icon(color='darkgreen',
                      icon='play')).add_to(mymap)
fm.Marker(latlong[-1],
          icon=fm.Icon(color='darkred',
                      icon='stop')).add_to(mymap)

mymap

##### Plot interpolated locs
Resample the Emma locations on a per-day basis and interpolate missing locations.  Then, plot the daily markes in folium along with a `PolyLine`.

In [8]:
emma['day'] = emma['datetime'].apply(lambda ts: ts.date())
emma.head()
avlang = emma.groupby('day').agg({'latitude':'mean','longitude':'mean'}).reset_index()

def daygap(day1,day2):
    return abs((day2 - day1).days)

for ind, row in avlang.iterrows():
    if ind<1:
        print("skipping first line")
        pass
    else:
        day1 = row.day
        last_row = avlang.iloc[ind-1]
        day2 = last_row.day
        gap = daygap(day1,day2)
        if gap>1:
            print("measured gap of {} days between {} and {}".format(gap, day1, day2))

skipping first line
measured gap of 27 days between 2014-04-30 and 2014-04-03
measured gap of 2 days between 2014-05-09 and 2014-05-07
measured gap of 4 days between 2014-05-13 and 2014-05-09
measured gap of 3 days between 2014-05-17 and 2014-05-14
measured gap of 3 days between 2014-05-21 and 2014-05-18
measured gap of 7 days between 2014-05-28 and 2014-05-21
measured gap of 2 days between 2014-05-31 and 2014-05-29
measured gap of 27 days between 2014-06-27 and 2014-05-31
measured gap of 14 days between 2014-07-11 and 2014-06-27
measured gap of 5 days between 2014-07-17 and 2014-07-12
measured gap of 2 days between 2014-07-24 and 2014-07-22
measured gap of 2 days between 2014-07-31 and 2014-07-29
measured gap of 14 days between 2014-08-16 and 2014-08-02
measured gap of 5 days between 2014-08-21 and 2014-08-16
measured gap of 2 days between 2014-08-23 and 2014-08-21
measured gap of 2 days between 2014-08-25 and 2014-08-23
measured gap of 3 days between 2014-08-29 and 2014-08-26


In [9]:
interpolatedemma = emma.set_index('datetime').resample('1D')[['latitude','longitude']].mean()
interpolatedemma['interpolated'] = interpolatedemma['latitude'].isnull()
fullyinterpolatedemma = interpolatedemma.interpolate(method ='linear')
fullyinterpolatedemma.tail()

Unnamed: 0_level_0,latitude,longitude,interpolated
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2014-08-25,0.18386,-80.49963,False
2014-08-26,0.81609,-81.22973,False
2014-08-27,1.245737,-81.090107,True
2014-08-28,1.675383,-80.950483,True
2014-08-29,2.10503,-80.81086,False


In [12]:
emmainterpolatedmap = fm.Map(tiles='cartodbpositron')

interpolatedvalues = list(zip(fullyinterpolatedemma.latitude.values,
                   fullyinterpolatedemma.longitude.values))

#interpolatedvalues[:5]

fm.PolyLine(interpolatedvalues,color='black').add_to(emmainterpolatedmap)
for value in interpolatedvalues:
    fm.Marker(value).add_to(emmainterpolatedmap)
#fm.Marker(interpolatedvalues[0],
 #         icon=fm.Icon(color='lightred',
  #                    icon='play')).add_to(emmainterpolatedmap)

emmainterpolatedmap

##### Resample all shark data
Resample all shark data for daily lat/lon averages, and interpolate missing results

In [None]:
#interpolatedsharks = df.set_index('datetime').resample('1D')[['latitude','longitude']].mean()
#interpolatedsharks['interpolated'] = interpolatedsharks['latitude'].isnull()
#fullyinterpolatedsharks = interpolatedsharks.interpolate(method ='linear')
#fullyinterpolatedsharks.head()

In [14]:
dtgroup = df.set_index('datetime').groupby('name').resample('1D')[['latitude','longitude']].mean()
dtgroup['interpolated'] = dtgroup['latitude'].isnull()
interpolatedsharks = dtgroup.interpolate(method ='linear').reset_index()


#dtgroup = df.groupby('datetime').agg({'latitude':'mean','longitude':'mean'}).head()
#dtgroup['name'] = df['name']

interpolatedsharks

Unnamed: 0,name,datetime,latitude,longitude,interpolated
0,AB,2016-03-30,30.493530,-80.375390,False
1,AB,2016-03-31,30.495155,-80.328785,False
2,AB,2016-04-01,30.466347,-80.129087,False
3,AB,2016-04-02,30.362820,-80.235310,False
4,AB,2016-04-03,30.313510,-80.223723,False
5,AB,2016-04-04,30.327662,-80.333310,True
6,AB,2016-04-05,30.341813,-80.442897,False
7,AB,2016-04-06,30.441282,-80.342625,False
8,AB,2016-04-07,30.760415,-80.484415,False
9,AB,2016-04-08,30.943278,-80.021762,False


##### Distance to Emma
Identify the shark that has the shortest average distance to Emma per day

In [15]:
fullyinterpolatedemma.tail()
# latitudedistance = fullyinterpolatedemma[latitude] - othersharklatitude 
# longitudedistamce = fullyinterpolatedemma[longitude] - othersharklongitude
# def haversine(lat1,lon1,lat2,lon2):

Unnamed: 0_level_0,latitude,longitude,interpolated
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2014-08-25,0.18386,-80.49963,False
2014-08-26,0.81609,-81.22973,False
2014-08-27,1.245737,-81.090107,True
2014-08-28,1.675383,-80.950483,True
2014-08-29,2.10503,-80.81086,False


##### Emma's buddy
Identify the shark that has the shortest average distance to Emma per day (minimum 50 days of pings with Emma)

##### Plot Emma and Buddy
Plot Emma and her closest buddy on folium.  Emma should be blue/green (known/interpolated) and her buddy should be red/black (known/interpolated).