# Exploring movement and diving behaviour of seabirds

## Table of contents
* [General Explorations](#1)
* [Overview of Tracks](#2)
* [Track individual Birds](#3)

### Data description from RTF file:
lat - latitude, with colony location removed (so colony is at 0)

lon - longitude, with colony location removed (so colony is at 0)

bird - bird ID (1 to N)

species - species ID (tRAZO = Razorbill, tCOGU = Common Guillemot, tEUSH = European Shag)

year - Year of recording

date_time - Date and time of sample

max_depth.m - maximum dive depth in a 100 second window

colony2 - colony ID (1 to N)

coverage_ratio - the proportion of available fixes recorded in a 10 sample window centred on this sample

is_dive - whether or not this location contains a dive classified as > 3 m (used in main analyses)

is_dive_1m - whether or not this location contains a dive classified as > 1 m ( used in SI)

is_dive_2m - ... a dive classified as > 2 m ( used in SI)

is_dive_4m - ... a dive classified as > 4 m ( used in SI)

is_dive_5m - ... a dive classified as > 5 m ( used in SI)

is_dive_0m - ... a dive classified as > 0 m ( used in SI)

In [None]:
# packages

# standard
import numpy as np
import pandas as pd
import time

# plots
import matplotlib.pyplot as plt
import plotly.express as px
import seaborn as sns

In [None]:
# import and preview
df = pd.read_csv('../input/predicting-animal-behavior-using-gps/gps/anon_gps_tracks_with_dive.csv')
df = df.drop(columns=['Unnamed: 0'], axis=1)
df.head()

<a id='1'></a>
# General Explorations

### Numerical Features - Basic Stats

In [None]:
features_num = ['lat','lon','alt','max_depth.m','coverage_ratio']
df[features_num].describe()

### Categorical Features

In [None]:
# we have 3 species: Razorbill, Common Guillemot, European Shag
print(df.species.value_counts())
# plot
df.species.value_counts().sort_index().plot(kind='bar')
plt.grid()
plt.title('Number of records by species')
plt.show()

In [None]:
# we have 10 colonies
print(df.colony2.value_counts())
# plot
df.colony2.value_counts().sort_index().plot(kind='bar')
plt.grid()
plt.title('Number of records by colony')
plt.show()

In [None]:
# we have 108 invidivual birds
df.bird.value_counts()

In [None]:
# data comprises 4 years
print(df.year.value_counts())
# plot
df.year.value_counts().sort_index().plot(kind='bar')
plt.grid()
plt.title('Number of records by year')
plt.show()

### Grouped counts

#### Birds by colony:

In [None]:
birds_per_colony = df.groupby(['colony2'])['bird'].nunique()
print(birds_per_colony)
birds_per_colony.plot(kind='bar')
plt.title('Birds per Colony')
plt.grid()
plt.show()

Colony 8 consists of only 1 bird!

#### Further drilldown - split by species:

In [None]:
df.groupby(['colony2','species'])['bird'].nunique()

#### Birds by species

In [None]:
birds_per_species = df.groupby(['species'])['bird'].nunique()
print(birds_per_species)
birds_per_species.plot(kind='bar')
plt.title('Birds per species')
plt.grid()
plt.show()

<a id='2'></a>
# Overview of Tracks

In [None]:
# plot all tracks
plt.figure(figsize=(8,8))
plt.scatter(df.lon, df.lat, c=df.bird, s=1)
plt.title('All tracks in one picture')
plt.grid()
plt.show()

In [None]:
# plot all tracks - colored by colony
plt.figure(figsize=(8,8))
plt.scatter(df.lon, df.lat, c=df.colony2, s=1)
plt.title('All tracks - Colored by colony')
plt.grid()
plt.show()

In [None]:
# plot tracks by colony - colored by individual bird
for selected_colony in range(1,10+1):
    df_col = df[df.colony2==selected_colony]

    plt.figure(figsize=(6,6))
    plt.scatter(df_col.lon, df_col.lat, 
                c=df_col.bird, s=1)
    plt.title('Tracks of colony ' + str(selected_colony))
    plt.grid()
    plt.show()

<a id='3'></a>
# Track individual Birds

In [None]:
selected_bird = 2

df_ex = df[df.bird==selected_bird].reset_index(drop=True)
df_ex

In [None]:
# dives?
df_ex.is_dive.value_counts()

In [None]:
# plot track (2D)
plt.figure(figsize=(8,8))
plt.scatter(df_ex.lon, df_ex.lat, c=df_ex.index)
plt.grid()
plt.title('Individual Track 2D (Color ~ Time)')
plt.show()

# show color encoding
plt.figure(figsize=(9,2))
plt.scatter(df_ex.index, np.ones(df_ex.shape[0]), 
            c=df_ex.index, s=150)
plt.title('Color encoding')
plt.xticks(rotation=90)
ax = plt.gca()
ax.axes.yaxis.set_visible(False) # hide y-axis
plt.grid()
plt.show()

In [None]:
# plot track (2D) - dives only
df_ex_dives = df_ex[df_ex.is_dive==True]
plt.figure(figsize=(8,8))
plt.scatter(df_ex_dives.lon, df_ex_dives.lat, 
            c=df_ex_dives.index)
plt.grid()
plt.title('Individual Track 2D - Dives (Color ~ Time)')
plt.show()

In [None]:
# different visualization - highlight dives using color
plt.figure(figsize=(8,8))
plt.scatter(df_ex.lon, df_ex.lat, c=df_ex.is_dive)
plt.grid()
plt.title('Individual Track 2D (Color ~ Dive [yellow])')
plt.show()

In [None]:
plt.figure(figsize=(14,6))
plt.scatter(df_ex.index, df_ex.alt)
ax.xaxis.set_major_locator(plt.MaxNLocator(20)) # reduce number of x-labels
plt.xticks(rotation=90)
plt.xlabel('Index')
plt.ylabel('Altitude')
plt.grid()
plt.title('Individual Track - Altitude Profile')
plt.show()

In [None]:
# interactive 3d plot of track
df_ex['size']=1
fig = px.scatter_3d(df_ex, x='lon', y='lat', z='alt',
                    color=df_ex.index,
                    size='size',                    
                    size_max=10,
                    hover_data=['date_time'],
                    opacity=1)
fig.update_layout(title='Individual Track - Color~Time')
fig.show()

### Same plot, but use altitude for coloring:

In [None]:
# interactive 3d plot of track
df_ex['size']=1
fig = px.scatter_3d(df_ex, x='lon', y='lat', z='alt',
                    color=df_ex.alt,
                    size='size',                    
                    size_max=10,
                    hover_data=['date_time'],
                    opacity=1)
fig.update_layout(title='Individual Track - Color~Altitude')
fig.show()