# Movebank Data Analysis Attempt

After downloading the data, I found that the package came with two csv files. First, we can import the two tables - reference and Argos. To do so, we will have to import pandas (and numpy, for easier use later).

In [1]:
import pandas as pd
import numpy as np

Read the reference table

In [2]:
whale_ref = pd.read_csv("../data/Blue_whales_reference.csv")
whale_ref.head()

Unnamed: 0,tag-id,animal-id,animal-taxon,deploy-on-date,deploy-off-date,animal-life-stage,animal-sex,attachment-type,deploy-on-latitude,deploy-on-longitude,deployment-id,manipulation-type,study-site,tag-manufacturer-name,tag-model,tag-readout-method
0,1993CA-ST6-10823,1993CA-Bmu-10823,Balaenoptera musculus,1993-08-28 18:20:00.000,1993-09-01 22:35:12.000,adult,,implant,37.012,-122.412,1993CA-10823,none,"Pt. Ano Nuevo, CA","Telonics, Inc",ST6,satellite
1,1993CA-ST6-10833,1993CA-Bmu-10833,Balaenoptera musculus,1993-08-28 18:54:00.000,1993-08-28 21:45:56.000,adult,,implant,37.022,-122.415,1993CA-10833,none,"Pt. Ano Nuevo, CA","Telonics, Inc",ST6,satellite
2,1993CA-ST6-00834,1993CA-Bmu-00834,Balaenoptera musculus,1993-08-29 01:13:00.000,1993-09-05 13:41:06.000,adult,,implant,37.058,-122.433,1993CA-00834,none,"Pt. Ano Nuevo, CA","Telonics, Inc",ST6,satellite
3,1993CA-ST6-10836,1993CA-Bmu-10836,Balaenoptera musculus,1993-08-31 23:40:00.000,1993-09-04 16:35:40.000,adult,,implant,37.198,-122.773,1993CA-10836,none,"Pigeon Point, CA","Telonics, Inc",ST6,satellite
4,1994CA-ST10-10821,1994CA-Bmu-10821,Balaenoptera musculus,1994-09-13 20:31:00.000,1994-09-14 00:23:34.000,adult,,implant,37.6,-123.0,1994CA-10821,none,"Farallon Islands, CA","Telonics, Inc",ST10,satellite


In [3]:
whale_ref.shape

(143, 16)

The reference table records the information of each blue whale monitore. It is of size 143 rows x 16 columns. Let's figure out what each column is about.

In [4]:
whale_ref.columns

Index(['tag-id', 'animal-id', 'animal-taxon', 'deploy-on-date',
       'deploy-off-date', 'animal-life-stage', 'animal-sex', 'attachment-type',
       'deploy-on-latitude', 'deploy-on-longitude', 'deployment-id',
       'manipulation-type', 'study-site', 'tag-manufacturer-name', 'tag-model',
       'tag-readout-method'],
      dtype='object')

Combined with the column information provided in the README.txt file, we can find a few columns that provide informations that will be useful to us 
- *amimal-id*: An individual identifier for the subjects monitored
- *deploy-on-date* & *deploy-off-date*: the start and end time of monitoring
- *deploy-on-latitude* & *deploy-on-longitutde*: the start location of monitoring
- *study-site*: name of deployment site/facility

Before we downsize the table to a more concise version, let's check if the some standard measures (i.e. other less interesting columns) are uniform for each subject recorded.

In [5]:
whale_ref.groupby(['animal-life-stage','animal-taxon','attachment-type','manipulation-type','tag-readout-method']).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,tag-id,animal-id,deploy-on-date,deploy-off-date,animal-sex,deploy-on-latitude,deploy-on-longitude,deployment-id,study-site,tag-manufacturer-name,tag-model
animal-life-stage,animal-taxon,attachment-type,manipulation-type,tag-readout-method,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
adult,Balaenoptera musculus,implant,none,satellite,143,143,143,143,143,143,143,143,143,143,143


In [6]:
whale_ref.groupby(['tag-manufacturer-name','tag-model']).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,tag-id,animal-id,animal-taxon,deploy-on-date,deploy-off-date,animal-life-stage,animal-sex,attachment-type,deploy-on-latitude,deploy-on-longitude,deployment-id,manipulation-type,study-site,tag-readout-method
tag-manufacturer-name,tag-model,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
"Telonics, Inc",ST10,15,15,15,15,15,15,15,15,15,15,15,15,15,15
"Telonics, Inc",ST15,109,109,109,109,109,109,109,109,109,109,109,109,109,109
"Telonics, Inc",ST21,6,6,6,6,6,6,6,6,6,6,6,6,6,6
"Telonics, Inc",ST6,12,12,12,12,12,12,12,12,12,12,12,12,12,12
"Wildlife Computers, Inc",MK10,1,1,1,1,1,1,1,1,1,1,1,1,1,1


Although most of the housekeeping columns have only one entry for all the rows (subjects), __the tag-model used for each subject is not exactly the same__ - most of which are ST15, and a few are of a slightly different model. __Will this create some problem for the data collected?__ We will find out later!

Now let's clean up our reference table for further use. 

In [21]:
whale_ref_clean = whale_ref[["animal-id","deploy-on-date","deploy-off-date","deploy-on-latitude",
                             "deploy-on-longitude","study-site","tag-manufacturer-name","tag-model"]]
whale_ref_clean.head(3)

Unnamed: 0,animal-id,deploy-on-date,deploy-off-date,deploy-on-latitude,deploy-on-longitude,study-site,tag-manufacturer-name,tag-model
0,1993CA-Bmu-10823,1993-08-28 18:20:00.000,1993-09-01 22:35:12.000,37.012,-122.412,"Pt. Ano Nuevo, CA","Telonics, Inc",ST6
1,1993CA-Bmu-10833,1993-08-28 18:54:00.000,1993-08-28 21:45:56.000,37.022,-122.415,"Pt. Ano Nuevo, CA","Telonics, Inc",ST6
2,1993CA-Bmu-00834,1993-08-29 01:13:00.000,1993-09-05 13:41:06.000,37.058,-122.433,"Pt. Ano Nuevo, CA","Telonics, Inc",ST6


In [20]:
whale_ref_clean.dtypes

animal-id                 object
deploy-on-date            object
deploy-off-date           object
deploy-on-latitude       float64
deploy-on-longitude      float64
study-site                object
tag-manufacturer-name     object
tag-model                 object
dtype: object

In [24]:
whale_ref_clean['deploy-on-date'] = pd.to_datetime(whale_ref_clean['deploy-on-date'])
whale_ref_clean['deploy-off-date'] = pd.to_datetime(whale_ref_clean['deploy-off-date'])
whale_ref_clean.dtypes

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


animal-id                        object
deploy-on-date           datetime64[ns]
deploy-off-date          datetime64[ns]
deploy-on-latitude              float64
deploy-on-longitude             float64
study-site                       object
tag-manufacturer-name            object
tag-model                        object
dtype: object

Now let's move on to the juicer table! Repeat the same procedure done to the reference table.

In [None]:
whale_data = pd.read_csv("../data/Blue_whales_Argos.csv")
whale_data.head(3)

In [None]:
whale_data.shape

In [None]:
whale_data.columns

Combined with the column information provided in the README.txt file, we can find a few columns that provide informations that will be useful to us 
- *timestamp*: The time point that a sensory measurement was taken
- *locaetion-long* & *location-lat*: the start location of monitoring
- *individual-local-identifier*: same as "animal-id" in reference table
- *manually-marked-outlier*: marked TRUE if "visible" marked FALSE

Some other technical measure associated with satellite signal receiving process that may be helpful for us to measure/visualize the credulity of data: 
- *argos:best-level*: Best signal strength
- *argos:calcul-freq*: Calculated frequency
- *argos:iq*: indicates transmitter oscillator frequency drift between two satellite passe
- *argos:nb-mes-120*: The number of messages received by the satellite at a signal strength greater than -120 decibels
- *sensor-type*: type of tracking sensor used (also appeared in reference table)



After identifying what we need and do not need, let us do some clean up and re-naming.

In [None]:
whale_data_clean = whale_data[['individual-local-identifier','timestamp','location-lat', 'location-long', 
                               'manually-marked-outlier','argos:best-level', 'argos:calcul-freq', 'argos:iq',
                               'argos:nb-mes','argos:nb-mes-120', 'sensor-type']]
whale_data_clean.columns = ['animal-id','timestamp','location-lat', 'location-long', 
                               'outlier','argos:best-level', 'argos:calcul-freq', 'argos:iq',
                               'argos:no-mes-rec','argos:no-mes-rec-120', 'sensor-type']
whale_data_clean.head()

In [None]:
whale_data.groupby(["manually-marked-outlier","individual-local-identifier"]).count()

In [None]:
whale_data.dtypes

In [None]:
whale_data[[]]