# MAST 
## Movement Analysis Software for Telemetry Data

This project notebook will guide the end user through a complete telemetry project, from setup, to data import, false positive reduction, and 1D movement analysis.  This notebook and software was designed so that the end user can complete a telemetry project in multiple sessions.  Some cells need to be re-run every session, while others will only be run once.  Please read and understand all directions before proceeding.

# Part 1: Project Setup

The steps in Part 1 need to be re-run every session.

## Import Modules

In [None]:
import os
import pandas as pd
import sys

Identify MAST software directory

In [None]:
sys.path.append(r"C:\Users\knebiolo\OneDrive - Kleinschmidt Associates, Inc\Software\mast\pymast")

Import MAST

In [None]:
from pymast.radio_project import radio_project
from pymast import formatter as formatter
import pymast

# Create a MAST Project

We designed MAST so that the end user can complete a telemetry project in multiple sessions.  After a work session is complete, there is no need to save the data or close the project.  Just shut down the notebook, the data has already been saved to the background HDF file.  When you start a new session, re-run this cell, MAST will not save over your previous session.  Please see the project ReadMe for instructions on creating the input data files.  

In [None]:
project_dir = r"J:\1871\201\Calcs\York Haven"
db_name = 'york_haven'
detection_count = 5
duration = 1
tag_data = pd.read_csv(os.path.join(project_dir,'tblMasterTag.csv'))
receiver_data = pd.read_csv(os.path.join(project_dir,'tblMasterReceiver.csv'))
nodes_data = pd.read_csv(os.path.join(project_dir,'tblNodes.csv'))

# create a project
project = radio_project(project_dir,
                        db_name,
                        detection_count,
                        duration,
                        tag_data,
                        receiver_data,
                        nodes_data)

# Part 2: Data management and False Positive Reduction

## Import Raw Telemetry Data

This cell **does not** need to be rerun every session.  

To import raw telemetry data, update the parameters and run the cell for every receiver in your project

1. rec_id: the Receiver ID as written in the receiver data input file.
2. rec_type: the Receiver Type, we currently have parsers for 'orion','ares','srx400','srx600','srx800','srx1200', and 'VR2'
3. scan_time: if the Receiver Type is 'orion' or 'ares', enter channel scan time in seconds if any, otherwise keep 1
4. channels: if the Receiver Type is 'orion' or 'ares', enter the number of channels if any, otherwise keep 1
5. antenna_to_receiver_dict: both SigmaEight and Lotek associate one or more antennas to a single receiver.  This dictionary makes that association.  

In [None]:
rec_id = 'T3'
rec_type = 'srx800'
training_dir = os.path.join(project_dir,'Data','Training_Files')
db_dir = os.path.join(project_dir,'%s.h5'%(db_name))
scan_time = 1.         
channels = 1
antenna_to_rec_dict = {'A0':rec_id}

project.telem_data_import(rec_id,
                          rec_type,
                          training_dir,
                          db_dir,
                          scan_time,
                          channels,
                          antenna_to_rec_dict)

### Undo Import

Sometimes thing's go wrong, sometimes the parameters you entered are incorrect.  Undo the import you just did with this cell.  

Note **you only run the cell when you need to.**

In [None]:
project.undo_import(rec_id)

## Create Training Data

This cell does **not** need to be run every work session

To train data, update the following parameters for your telemetry project.  Repeat this cell until data from all receivers have been trained.

1. rec_id: the Receiver ID as written in the receiver data input file.
2. rec_type: the Receiver Type,  we currently have parsers for and can train and classify 'orion','ares','srx400','srx600','srx800','srx1200', and 'VR2'

In [None]:
# set parameters and get a list of fish to iterate over
rec_id = 'T3'
rec_type = 'srx800'
fishes = project.get_fish(rec_id = rec_id)

# iterate over fish and train
for fish in fishes:
    project.train(fish, rec_id)

# generate summary statistics
project.training_summary(rec_type, site = [rec_id])

### Undo Training

**Run the following cell only when you need to.**

In [None]:
project.undo_training(rec_id)

## Classify a Receiver's Data

This cell can be run as many times as needed.  

To classify data, update the following parameters and run the cell.
1. rec_id: the Receiver ID as written in the receiver data input file.
2. rec_type: the Receiver Type,  we currently have parsers for and can train and classify 'orion','ares','srx400','srx600','srx800','srx1200', and 'VR2'
3. class_iter: the Classification Iteration, it is possible to reclassify a receiver's data and iterate until convergence.  Leave 'Null' for the first iteration, then start with 1 and number sequentially by 1 until covergence.
4. threshold_ratio: the default threshold ratio is 1.0 for the maximum a posteriori hypothesis.  a threshold ratio > 1.0 requires requires more weight of evidence for a record to be classified as true.  likewise a threshold ratio < 1.0 is less strict and may accept marginal detections as being true.
5. fields: the likelihood function is A-La Carte, it is possible to build a model with the following predictors: 'cons_length','cons_length','hit_ratio','noise_ratio','series_hit','power', and 'lag_diff'.  Note MAST requires at least 1 predictor to classify data.


In [None]:
#set parameters and get a list of fish to iterate over
rec_id = 'R020'
rec_type = 'orion'
threshold_ratio = 1.0  # 1.0 = MAP Hypothesis
# a-la carte likelihood, standard fields: ['hit_ratio', 'cons_length', 'noise_ratio', 'power', 'lag_diff']
likelihood = ['hit_ratio', 'cons_length', 'noise_ratio', 'power', 'lag_diff'] 

project.reclassify(project, rec_id, rec_type, threshold_ratio,likelihood)


### Undo Classification 

Lots can go wrong during classification, the likelihood model may have included conflicted predictors, the threshold ratio was too strict, or the iteration was wrong.  In any case, run the following cell when you need a redo.

In [None]:
project.undo_classification(rec_id, class_iter = class_iter)

## Identify Bouts

The following steps (Bouts and Presences) are not required for a MAST project.  They are powerful tools that will assist with modeling movement. 
To identify bouts at one of the nodes in your project, update the following parameters.  It is advised to identify bouts at nodes one at a time because model fitting requires user interaction.  MAST will ask the researcher to identify the number of knots that may be present in the data.  The presence method can either use the result of hte threshold method, or can accept a user identify threshold value (float).  

1. node: A Node in your project that may consist of one or more receivers.


In [None]:
# get nodes
node = 'R020'

# create a bout object
bout = mast.bout(project, node, 2, 21600)
    
# Find the threshold
threshold = bout.fit_processes()

# calculate presences - or pass float
bout.presence(threshold)

### Undo Bouts and Presence

The bout process involves trial and error.  To undo, run the following cell only when you need to.

In [None]:
project.undo_bouts(node)

## Reduce Overlap

With presences at receivers, it is possible to reduce overlap between receivers and put a fish in an exact place and time.  For example, it is possible to place a dipole receiver so it's detection range is completely within the area covered by a large aerial Yagi.  When a fish is present at the Dipole receiver and Yagi receiver at the same we can remove those overlapping detections at the Yagi receiver.  This is useful for modeling movement from a large area into a discrete location, like tailrace to upstream passage entrance.  

The overlap function requires the end user to identify the following parameters:
1. edges: List of tuples (network edges) that represent parent:child or Yagi:dipole relationships in your data
2. nodes: List of nodes in your project, note nodes may be made up of one or more receivers

In [None]:
# create edges showing parent:child relationships for nodes in network
edges = [('R010','R013'),('R010','R014'),('R010','R015'),('R010','R016'),('R010','R017'),('R010','R018'),
          ('R019','R013'),('R019','R014'),('R019','R015'),('R019','R016'),('R019','R017'),('R019','R018'),
          ('R020','R013'),('R020','R014'),('R020','R015'),('R020','R016'),('R020','R017'),('R020','R018')]

nodes = ['R010','R019','R020','R013','R014','R015','R016','R017','R018']
    
# create an overlap object and apply nested doll algorithm
doll = mast.overlap_reduction(nodes, edges, project)
doll.nested_doll()


Undo overlap

In [None]:
project.undo_overlap()

## Make Recaptures Table

The last step in the data management section is to aggregate data into a recaptures table.

In [None]:
project.make_recaptures_table()

Unde Recaptures Table

In [None]:
project.undo_recaptures()

# Part 3: Analysis of Movement

The following cells assist researchers with analyzing movement between receivers.  It is useful to reconstruct the receivers in your project into a network schematic that describes the possible movement pathways between receivers. Therefore, movement is 1D.  MAST has functions that can prepare data for Time to Event Analysis with Competing Risks, Multi State Markov Models, Cormack Jolly Seber Mark Recapture, and Live Recapture Dead Recovery Mark Recapture.  

## Model 1D Movement with Competing Risks and Multi-State Markov Models

The first step in modeling multi-state models with a Time to Event framework is to associate project nodes with states in the model.  This is done with the node_to_state dictionary.

In [None]:
#%% create models using a Time to Event Framework
    
# what is the Node to State relationship - use Python dictionary
node_to_state = {'R001':1,'R002':1,                   # upstream
                  'R012':2,                            # forebay
                  'R013':3,'R015':3,'R016':3,'R017':3, # powerhouse
                  'R018':4,                            # sluice
                  'R003':5,                            # east channel up
                  'R007':6,                            # east channel down
                  'R008':7,                            # east channel dam
                  'R009':8,                            # NLF
                  'R010':9,'R019':19,                  # tailrace
                  'R011':10,                           # downstream
                  'R004':11,'R005':11}                 # downstream 2

Then we create a Time to Event data object.  Note that there are a number of optional arguments that can be passed to the time to event data object.  If initial_state_release is set to True, a state (state: 0) is added to the model.  Therefore it is possible to model movement from the release location as well as determine fall back.  If last_presence_time0 is set to True, the last detection at the initial receiver is used as the starting time for the analysis of movement.  When modeling migratory movement of American Shad for example, adult fish can survive spawning.  Thus it can be recaptured at the same reciever on its way up and down.  If you are modeling downstream movement, you want to model movement from when it was last at the most upstream receiver.  Cap_loc and rel_loc are optional arguments that will filter the data in the model so it only looks at specimens at specific capture and release locations.  And finally, the species argument restricts model creation to a single species if more than 1 were tagged in your study.

In [None]:
tte = formatter.time_to_event(node_to_state,
                              project,
                              initial_state_release = False, 
                              last_presence_time0 = False, 
                              cap_loc = None,
                              rel_loc = None, 
                              species = None)

Then we perform data preparation.  When the time_dependent_covariates = True, MAST creates an output file that can be joined to time series data.  The bucket_length_min argument specifies the number of minutes between each time series observation.  Unknown_state places fish into a new 'unknown' state if they went missing before they reached their goal by the studies completion. Overlap may still exist between receivers and adjacency_filter removes those movements that still may exist.  This commonly happens when forebay receivers pick up fish in the tailrace.  When looking at transitions it appears that a fish has moved from the tailrace to the forebay.  To the algorithm, the forebay detections look like real detections, but when we model movement they must be removed.  The relationships in the filter specify the parent:child relationshp or to:from movements that are illegal.

In [None]:
tte.data_prep(project,
              time_dependent_covariates = True,
              unknown_state = None,
              bucket_length_min = 15,
              adjacency_filter = [('R010','R013'),('R010','R014'),('R010','R015'),('R010','R016'),('R010','R017'),('R010','R018'),
                                  ('R019','R013'),('R019','R014'),('R019','R015'),('R019','R016'),('R019','R017'),('R019','R018'),
                                  ('R020','R013'),('R020','R014'),('R020','R015'),('R020','R016'),('R020','R017'),('R020','R018')])
# Step 4, generate a summary
tte.summary()

The next cell creates a time to event dataset without covariates, if all your after is a Kaplan Meier, this is the one to run

In [None]:
tte.data_prep(project,
              adjacency_filter = [('R010','R013'),('R010','R014'),('R010','R015'),('R010','R016'),('R010','R017'),('R010','R018'),
                                  ('R019','R013'),('R019','R014'),('R019','R015'),('R019','R016'),('R019','R017'),('R019','R018'),
                                  ('R020','R013'),('R020','R014'),('R020','R015'),('R020','R016'),('R020','R017'),('R020','R018')])


## Cormack Jolly Seber Mark Recapture

The Cormack Jolly Seber (CJS) model is appropriate for a simple analysis of fish ladder effectiveness.  The following cells produce the detection history INP file that can be used with MARK or RMARK.  The first step is to identify sone parameters and associate study receivers with recapture occasions in the model.  We do that with a dictionary.

In [None]:
model_name = "york_haven"
output_ws = os.path.join(project_dir, 'Output')

# what is the Node to State relationship - use Python dictionary
receiver_to_recap = {'R001':'R01','R002':'R01',
                     'R003':'R02','R004':'R04','R005':'R04','R006':'R02',
                     'R007':'R02','R008':'R02','R009':'R02','R010':'R02',
                     'R011':'R03','R012':'R02','R013':'R02','R014':'R02',
                     'R015':'R02','R016':'R02','R017':'R02',#'R018':'R02',
                     'R019':'R03','R020':'R03',}

The next cell produces a cjs data object and input files for analysis with Mark Recapture software

In [None]:
# Step 1, create CJS data object
cjs = formatter.cjs_data_prep(receiver_to_recap, project, initial_recap_release = False)

# Step 2, Create input file for MARK
cjs.input_file(model_name,output_ws)
cjs.inp.to_csv(os.path.join(output_ws,model_name + '.csv'), index = False)