# stryke

Individual Based Monte Carlo model simulating fish entrainment through a hydroelectric facility. For information on setting up a project spreadsheet, please refer to the ReadMe [Readme](https://github.com/knebiolo/stryke/blob/master/README.md).  This project notebook will guide the end user through the analytical phases of an entrainment impact assessment.  If you're in JupyterLab, use the table of contents for navigation.

# Connect to software
The first step is to connect this notebook to stryke, which can be found in the directory you previously cloned from GitHub.  Don't know how to clone with GitHub desktop?  [Folks over at GitHub have you covered](https://docs.github.com/en/desktop/adding-and-cloning-repositories/cloning-and-forking-repositories-from-github-desktop) 

In [1]:
directory = r"C:\Users\knebiolo\Desktop\Stryke\stryke\Stryke"

import sys
sys.path.append(directory)

now that we have connected to stryke, let's import it and set up the remaining notebook environment.

In [2]:
import stryke
import os
%matplotlib inline

#### Fit Entrainment Rates
If you do not have existing empirical data for your facility of interest, stryke can query the EPRI entrainment database and develop them for you.  To fit a distribution, simply pass a list of arguments to stryke. The list of arguments, their datatype, and explanations are below.  The following example queries the EPRI database to return a sample of entrainment observations of Catastomidae in the winter within the great lakes watershed while leaving out Potato Rapids from the sample: 

'Family = 'Catostomidae', Month = [1,2,12], HUC02= [4], NIDID= 'WI00757''

| Parameter       | Data Type |                                             Comment                                           |
|-----------------|-----------|-----------------------------------------------------------------------------------------------|
|states           |String     |(not required) State abbreviations to filter the dat                                           |
|plant_cap        |String     |(not required) Plant capacity (cfs) with a direction for filtering (> or <=)                   |
|Family, Genus, Species|String     |(at least one required) taxonomic classifications                                         |
|HUC02, HUC04, HUC06, HUC08|String      |(not required) Hydrologic Unit Codes for geographic filtering, leading zeros required|
|NIDID         |String      |(not required) National Inventory of Dams identifier - used to filter out a facility              |
|River             |String     |(not required) River name for filtering                                                 |

When the next cell is run, stryke will return a figure with four histograms that depict natural logarithm transformed entrainment rates (one observed, three simulated).  Stryke fits a Log Normal, Weibull, and Pareto distribution to the returned data and produces a p-value from a Kolmogorov-Smirnof test, where H0 = no difference between observed and simulated histogram.  The distribution with the largest p-value  best describes trends in observed data. The query above produced the figure below.  In this instance, the Log Normal had the highest p-value and is most like the observed data.  For most queries, the Log Normal will be the best distribution.  The Weibull works when there are fewer observations with low rates, and the Pareto only works in special cases when observations are monotonically decreasing after log transforming them. 

<img src="https://github.com/knebiolo/stryke/assets/61742537/1b57783c-0913-40d9-913a-4f45ee2ab8a0" width="400" height="auto"/>



In [None]:
#%% Pass EPRI filter, fit distributions
fish = stryke.epri(Genus = 'Micropterus', Month = [3,4,5], HUC02= [2])
fish.ParetoFit()
fish.LogNormalFit()
fish.WeibullMinFit()
fish.plot()
fish.LengthSummary()

If, and only if you are satisfied with the distribution's fit, run the next cell.  This will arrange the parameters so that you can copy and paste them directly onto the **Population tab**.

In [None]:
fish.summary_output(directory, dist = 'Log Normal')

# Running a Simulation

Following completion of the spreadsheet interface, we can now run a simulation.  First identify the name of the spreadsheet interface, its directory, and run the next cell to start stryke.

In [3]:
# identify the project directory
proj_dir = r"C:\Users\knebiolo\Desktop\Stryke\stryke\Spreadsheet Interface"

# Identify spreadsheet interface name
wks = 'Cabot_Beta_Test.xlsx'
# Identify spreadsheet interface directory
wks_dir = os.path.join(proj_dir,wks)
# Create and Run a simulation.
simulation = stryke.simulation(proj_dir,wks, output_name = 'Cabot_Beta_Test')

simulation.run()
simulation.summary()

Requested data from https://waterservices.usgs.gov/nwis/dv/?format=json%2C1.1&sites=01170500&startDT=1995-01-01&endDT=1995-12-31
Scenario Spring Iteration 0 for Species Micropterus complete
Scenario Spring Iteration 1 for Species Micropterus complete
Scenario Spring Iteration 2 for Species Micropterus complete
Scenario Spring Iteration 3 for Species Micropterus complete
Scenario Spring Iteration 4 for Species Micropterus complete
Scenario Spring Iteration 5 for Species Micropterus complete
Scenario Spring Iteration 6 for Species Micropterus complete
Scenario Spring Iteration 7 for Species Micropterus complete
Scenario Spring Iteration 8 for Species Micropterus complete
Scenario Spring Iteration 9 for Species Micropterus complete
Completed Scenario Micropterus Spring
Completed Simulations - view results
iterate through species and scenarios and summarize
summarized length by season, state, and survival
Fit beta distributions to states
Yearly summary complete
