### to do: 
#### fix issues with non-spawners in SS seasonally estimate
#### using the VBGF from PSC yields estimates of TL and FL that seem quite low (~75 cm for age 5 chinook?)
#### 

## Process age 2+ Salish Sea Chinook data
**Created** Jul - Aug 2019 <br>
**Modified** - May 2021 <br>
**Author:** Greig Oldford <br>

This is a Jupyter Python 3 Notebook which runs in a browser <br>

Note: I coped this May 2021 from the notebook I made for Fanny Couture. <br>
To do: insert section for calculation of biomass <br>
     
**"Data" In** <br> 
1) Pacific Salmon Commission TCCHINOOK v1-27 AC6 model data (C Parken, Pers Comm, 2020) <br>
  (includes catch at age and abundance estimates by year and stock) <br>
2) seasonal distribution estimates for fall run fish (Shelton et al., 2019) <br>
3) seasonal distribution estimates for summer / spring run fish (various sources - see write-up) <br>
4) priority stocks for SRKW (from Fanny) <br>
5) stock codes and supplemental data (various sources) <br>
6) stock specific fork length at-age VBGF equations provided by C Parken (see accompanying email for error Fanny found) <br>
7) general FL:TL conversion from Fishbase
8) TL:mass(kg) equations from Jones, Petrell, & Pauly (1999)
9) L:mass equation from Schneider et al (2000) provided by Fanny
10) natural mortality at age from PSC (2018, p. 16)

**Data Out** <br>
1) table of estimated seasonal abundance by stock, age and season

**Process**<br>
1) Define stocks <br>
2) Extract cohort abundance, terminal run, escapements <br>
3) Extract and match total fisheries catches for fisheries inside Salish Sea <br>
4) Prepare seasonal estimates of proportion of cohorts inside Salish Sea <br>
5) Calculate length and mass
6) Join all together and export <br>
    
    
**Notes:** <br>
To-do: In final step I am not exporting catch - add catch (and other trimmed fields for Fanny?) - GO 2021-02-11

The fall run stocks seem to spend more time while reaching maturation in the SS, although this is highly uncertain (Brown et al., 2019). <br>
Used Shelton et al., 2019, marine distribution estimates for fall run and then used another method for spring and summer run. <br>



#### Old notes

##### process notes
        Define stocks with codes, run timing (following PSC)
        Enter the seasonal stock proportion estimates inside Salish Sea
        Read the age-structured abundance estimates
          pivot data, add stock descriptors
        Read CWT recoveries
          query only Salish Sea recoveries
          aggregate length-at-age by stock and season
          mean and count of len-at-age by stock
          query and group by other methods... 
        Join tables 
          join abundance and stock proportion estimates
          calculate abundance in SS (weighted abun)
          join with CWT recoveries w/ length-at-age
          calculate mean length-at-age
        Pivot and aggregate to final population tables 
        
        

## TOC: 
<a class="anchor" id="top"></a>
* [1. Define Stocks](#section-1)
* [2. PSC Stock Cohort Estimates](#section-2)
* [3. PSC Catch estimates inside SS](#section-3)
* [4. Seasonal distribution](#section-4)
* [5. Length at Age and Weight at age calculations](#section-5)
* [6a. Putting it all together - SRKW model](#section-6a)
* [6b. Putting it all together - EwE model](#section-6b)
* [experiments etc](#section-8)

In [2]:
# required libraries
import pandas as pd
import numpy as np
import math

pd.set_option('precision', 3)

## 1. Define stocks <a class="anchor" id="section-1"></a>
- stocks by season of year, season of run (runtype), and Fraser, Puget Sound, or SoG

[BACK TO TOP](#top)

In [3]:
stocks_df = pd.DataFrame()

# stock codes from PSC TCCHINOOOK Model
stocks_df['Stock'] = ["FS2","FS3","FSO","FSS","FHF","FCF","NKF","PSF","PSN",
                      "PSY","NKS","SKG","STL","SNO","MGS","LGS"]

# stock long names (Brown et al., 2019, table 2 / 5)
stocks_df['Name'] = ["Fraser Spring 1.2","Fraser Spring 1.3","Fraser Summer Ocean-type 0.3",
                     "Fraser Summer Stream-type 1.3","Harrison Fall","Chilliwack Fall Hatchery",
                     "Nooksack Fall","Pgt Sd Fing","Pgt Sd NatF","Pgt Sd Year","Nooksack Spring",
                     "Skagit Wild","Stillaguamish Wild","Snohomish Wild","Middle Georgia Strait Fall (Nanaimo / Chemainus)","Lower Georgia Strait (Cowichan)"]

# Canadian Conservation Unit ID's (Brown et al., 2019, table 2 / 5)
stocks_df['Cons Units'] = ["CK-16, CK-17","CK-10, CK-12, CK-18","CK-13, CK-15",
                           "CK-09, CK-11, CK-14, CK-19","CK-03","CK-9008/ CK-03",
                          "na","na","na","na","na","na","na","na","CK-25","CK-22"
                          ] 

stocks_df['area'] = ["fraser","fraser","fraser","fraser","fraser","fraser",
                     "PS","PS","PS","PS","PS","PS","PS","PS","ECVI","ECVI"]

# add run type (PSC Model Doc, TCCHINOOK 18.1.2, Fig 29)
stocks_df['runtype'] = ["spring","spring","summer","summer","fall","fall","fall","fall",
                        "fall","fall","spring","fall","fall","summer","fall","fall"]

# add dominant juvenile behaviour
# (TChinook 18.1.2 Fig 29), table 2, 5 from Brown et al. 2019
stocks_df['juve_behav'] = ["stream","stream","ocean","stream","ocean","ocean","ocean","ocean",
                          "ocean","stream","both","ocean","ocean","ocean","ocean","ocean"]
stocks_df

Unnamed: 0,Stock,Name,Cons Units,area,runtype,juve_behav
0,FS2,Fraser Spring 1.2,"CK-16, CK-17",fraser,spring,stream
1,FS3,Fraser Spring 1.3,"CK-10, CK-12, CK-18",fraser,spring,stream
2,FSO,Fraser Summer Ocean-type 0.3,"CK-13, CK-15",fraser,summer,ocean
3,FSS,Fraser Summer Stream-type 1.3,"CK-09, CK-11, CK-14, CK-19",fraser,summer,stream
4,FHF,Harrison Fall,CK-03,fraser,fall,ocean
5,FCF,Chilliwack Fall Hatchery,CK-9008/ CK-03,fraser,fall,ocean
6,NKF,Nooksack Fall,na,PS,fall,ocean
7,PSF,Pgt Sd Fing,na,PS,fall,ocean
8,PSN,Pgt Sd NatF,na,PS,fall,ocean
9,PSY,Pgt Sd Year,na,PS,fall,stream


## 2. PSC Stock Abundance Data<a class="anchor" id="section-2"></a>
- Pacific Salmon Commission TCCHINOOK v1-27 AC6 model data (C Parken, Pers Comm, 2020)<br>
[BACK TO TOP](#top)

In [6]:
PSC_df = pd.read_csv("C://Users//Greig//Sync//6. SSMSP Model//Model Greig//Data//1. Salmon//Chinook Abundance//ChinookAbundance_PSC_2020//MODIFIED//CTC_Model_Outputs.csv")

# fix blank spaces 
def fix_code (row):
    # Fraser
    if (row['Stock'] == " FS2"):
        return 'FS2'
    if (row['Stock'] == ' FS3'):
        return 'FS3'
    if (row['Stock'] == ' FSO'):
        return 'FSO'
    if (row['Stock'] == ' FSS'):
        return 'FSS'
    if (row['Stock'] == ' FHF'):
        return 'FHF'
    if (row['Stock'] == ' FCF'):
        return 'FCF'
    # Puget Sound
    if (row['Stock'] == ' NKF'):
        return 'NKF'
    if (row['Stock'] == ' PSF'):
        return 'PSF'
    if (row['Stock'] == ' PSN'):
        return 'PSN'
    if (row['Stock'] == ' PSY'):
        return 'PSY'
    if (row['Stock'] == ' NKS'):
        return 'NKS'
    if (row['Stock'] == ' SKG'):
        return 'SKG'
    if (row['Stock'] == ' STL'):
        return 'STL'
    if (row['Stock'] == ' SNO'):
        return 'SNO'
     # SoG
    if (row['Stock'] == ' LGS'):
        return 'LGS'
    if (row['Stock'] == ' MGS'):
        return 'MGS'
    return 'Other'

# Apply the function above
PSC_df['Stock'] = PSC_df.apply (lambda row: fix_code(row), axis=1)
PSC_df[0:60]


Unnamed: 0,Year,StockNum,Stock,Age,AEQCohort,Cohort,Terminal Run,Escapement
0,1979,7,FS2,2,26381.184,37964,612.67,453.11
1,1979,7,FS2,3,24832.051,25344,12833.14,10534.2
2,1979,7,FS2,4,1483.0,1483,1035.05,696.21
3,1979,7,FS2,5,0.0,0,0.0,0.0
4,1980,7,FS2,2,18201.516,26193,424.27,315.38
5,1980,7,FS2,3,20171.143,20587,10456.0,8596.11
6,1980,7,FS2,4,1442.0,1442,1007.22,677.54
7,1980,7,FS2,5,0.0,0,0.0,0.0
8,1981,7,FS2,2,24760.677,35632,577.11,428.8
9,1981,7,FS2,3,13968.029,14256,7238.73,5950.34


## 3) Add catch and escapement data for Salish Sea (PSC CTC model)<a class="anchor" id="section-3"></a>
- calculate catch for each cohort for fisheries within the Salish Sea <br>
- add this to the escapement estimate for pre-spawning abundance in the Salish Sea <br>
- expand based on natural mortality estimates <br>
[BACK TO TOP](#top)

In [7]:
# same original source as above
Catch_df = pd.read_csv("C://Users//Greig//Sync//6. SSMSP Model//Model Greig//Data//1. Salmon//Chinook Abundance//ChinookAbundance_PSC_2020//MODIFIED//CTC_Catch.csv")

# fix the spaces in the stock names (as above)
Catch_df['Stock'] = Catch_df.apply (lambda row: fix_code(row), axis=1)

# # catch versus cohort crosscheck, all fisheries whole cohort
# # only select records from Salish Sea, excluding terminal fisheries 
# #  (these are selected based on codes in the original CTC model sheet
Catch_all_df = Catch_df[['Year','FishNum','Stock','Age','Catch', 'Shakers']]


# sum SS catches across fisheries
Catch_all_grouped = (Catch_all_df.groupby(['Year', 'Stock', 'Age'])
                     .agg({'Catch': 'sum','Shakers':'sum'})
                     .assign(TotalCatchEst=lambda x: x['Catch'] + x['Shakers'])).reset_index()

# drop unwanted columns
Catch_all_df_grp2 = Catch_all_grouped.drop(columns=['Shakers', 'Catch'])


CatchAbund_allfisheries_df = pd.merge(PSC_df,Catch_all_df_grp2, on=['Year','Stock','Age'], how='inner')
CatchAbund_allfisheries_df2 = CatchAbund_allfisheries_df.drop(columns=['AEQCohort','Terminal Run'])

#print("columns:")
#print(Catch_df.dtypes)

# only select records from Salish Sea, excluding terminal fisheries 
#  (these are selected based on codes in the original CTC model sheet
Catch_in_df = Catch_df.loc[(Catch_df['FishNum'] == 8) | (Catch_df['FishNum'] == 12) |
                       (Catch_df['FishNum'] == 13) | (Catch_df['FishNum'] == 14) |
                       (Catch_df['FishNum'] == 15) | (Catch_df['FishNum'] == 21) |
                       (Catch_df['FishNum'] == 22) | (Catch_df['FishNum'] == 23) |
                       (Catch_df['FishNum'] == 25) | (Catch_df['FishNum'] == 26) |
                       (Catch_df['FishNum'] == 35) | (Catch_df['FishNum'] == 36) |
                       (Catch_df['FishNum'] == 38) | (Catch_df['FishNum'] == 39) |
                       (Catch_df['FishNum'] == 45) | (Catch_df['FishNum'] == 46) |
                       (Catch_df['FishNum'] == 47) 
                       ]

Catch_out_df = Catch_df.loc[(Catch_df['FishNum'] != 8) & (Catch_df['FishNum'] != 12) &
                       (Catch_df['FishNum'] != 13) & (Catch_df['FishNum'] != 14) &
                       (Catch_df['FishNum'] != 15) & (Catch_df['FishNum'] != 21) &
                       (Catch_df['FishNum'] != 22) & (Catch_df['FishNum'] != 23) &
                       (Catch_df['FishNum'] != 25) & (Catch_df['FishNum'] != 26) &
                       (Catch_df['FishNum'] != 35) & (Catch_df['FishNum'] != 36) &
                       (Catch_df['FishNum'] != 38) & (Catch_df['FishNum'] != 39) &
                       (Catch_df['FishNum'] != 45) & (Catch_df['FishNum'] != 46) &
                       (Catch_df['FishNum'] != 47) 
                       ]
# only select records from Salish Sea, excluding terminal fisheries ('TN')
#  (following 'FishNum' (fishery #) codes: 8,13,14,15,25,26,35,36,38,39)
#  (these are selected based on codes in the model files (C. Parken / CTC)
# Catch_df = Catch_df.loc[(Catch_df['FishNum'] == 8) | (Catch_df['FishNum'] == 13) |
#                        (Catch_df['FishNum'] == 14) | (Catch_df['FishNum'] == 15) |
#                        (Catch_df['FishNum'] == 25) | (Catch_df['FishNum'] == 26) |
#                        (Catch_df['FishNum'] == 35) | (Catch_df['FishNum'] == 36) |
#                        (Catch_df['FishNum'] == 38) | (Catch_df['FishNum'] == 39)]

# 8,12,13,14,15,21,22,23,25,26,35,36,38,39,45,46,47

Catch_in_df = Catch_in_df[['Year','FishNum','Stock','Age','Catch', 'Shakers']]
Catch_out_df = Catch_out_df[['Year','FishNum','Stock','Age','Catch', 'Shakers']]

Catch_in_df_grouped = (Catch_in_df.groupby(['Year', 'Stock', 'Age'])
   .agg({'Catch': 'sum','Shakers':'sum'})
   .assign(TotalCatchEst_in=lambda x: x['Catch'] + x['Shakers'])).reset_index()

Catch_out_df_grouped = (Catch_out_df.groupby(['Year', 'Stock', 'Age'])
   .agg({'Catch': 'sum','Shakers':'sum'})
   .assign(TotalCatchEst_out=lambda x: x['Catch'] + x['Shakers'])).reset_index()

Catch_in_df_grp2 = Catch_in_df_grouped.drop(columns=['Shakers', 'Catch'])
Catch_out_df_grp2 = Catch_out_df_grouped.drop(columns=['Shakers', 'Catch'])

# join the catch and abundance tables
CatchAbund_in_df = pd.merge(PSC_df,Catch_in_df_grp2, on=['Year','Stock','Age'], how='inner')
CatchAbund_df = pd.merge(CatchAbund_in_df,Catch_out_df_grp2, on=['Year','Stock','Age'], how='inner')
CatchAbund_df2 = CatchAbund_df.drop(columns=['AEQCohort','Terminal Run'])

# The abundance calculated as SS_AbunEst is a MINIMUM that must have been present to account for observations
# note: this is technically not correct since not all catch is of would-be spawners but I would need
#       a way to divide catch between spawners and non-spawners (maturation schedules). 
#       The two (spawners and non-spawners) get combined in the end for biomass estimates, so doesn't matter. 
CatchAbund_df2['SS_spawners'] = CatchAbund_df2['Escapement'] + CatchAbund_df2['TotalCatchEst_in']

# predation mortalities based on the natural mortality estimates used in PSC 2018 (p. 16)
predmort_df = pd.DataFrame()
predmort_df["Age"] = [2,3,4,5,6]
predmort_df["M"] = [0.51, 0.36, 0.22, 0.1, 0.1]

CatchAbund_df2 = pd.merge(CatchAbund_df2, predmort_df, on=['Age'], how='inner')
#CatchAbund_df2["M"] = CatchAbund_df2["M_y"] 
#CatchAbund_df2["M"] = CatchAbund_df2.drop(columns=["M_y","M_x"])

CatchAbund_df2['SS_spawners_p'] = CatchAbund_df2['SS_spawners'] / np.power(math.e, (CatchAbund_df2['M']/6) * -1)
CatchAbund_df2['nonspawners'] = CatchAbund_df2['Cohort'] - CatchAbund_df2['SS_spawners_p']
CatchAbund_df2['SS_spawners'] = CatchAbund_df2['SS_spawners_p']

In [77]:
# in some years catch > cohort (find those years and export)
CatchAbund_allfisheries_df2['crosscheck'] = CatchAbund_allfisheries_df2['Cohort'] - CatchAbund_allfisheries_df2['TotalCatchEst']
weird = CatchAbund_allfisheries_df2.loc[(CatchAbund_allfisheries_df2['crosscheck']<0) & (CatchAbund_allfisheries_df2['Stock']!="Other")].sort_values(['Year','Stock','Age'])
weird = weird.drop(columns=["Escapement"])
weird.to_csv (r'Cohort_minus_catch_error.csv', index = True, header=True)

In [49]:
# CatchAbund_allfisheries_df2.loc[(CatchAbund_allfisheries_df2['Stock']=='FSO') & (CatchAbund_allfisheries_df2['Year']==2019)]
CatchAbund_df2.groupby(["Year"]).sum()

Unnamed: 0_level_0,StockNum,Age,Cohort,Escapement,TotalCatchEst_in,TotalCatchEst_out,SS_spawners,M,SS_spawners_p,nonspawners
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1979,1212,252,6194257,394822.84,1266000.0,577800.0,1758000.0,21.42,1758000.0,4436000.0
1980,1212,252,6104676,379707.15,1169000.0,532100.0,1640000.0,21.42,1640000.0,4464000.0
1981,1212,252,5738191,335905.0,974800.0,449500.0,1387000.0,21.42,1387000.0,4351000.0
1982,1212,252,5946307,353607.37,793300.0,542400.0,1213000.0,21.42,1213000.0,4733000.0
1983,1212,252,6785547,374305.45,901100.0,412300.0,1351000.0,21.42,1351000.0,5435000.0
1984,1212,252,6928163,387146.49,1050000.0,425200.0,1522000.0,21.42,1522000.0,5406000.0
1985,1212,252,5713969,445795.4,752000.0,369600.0,1261000.0,21.42,1261000.0,4453000.0
1986,1212,252,5200842,494665.97,647100.0,343600.0,1198000.0,21.42,1198000.0,4003000.0
1987,1212,252,4619444,396629.46,451600.0,291700.0,889500.0,21.42,889500.0,3730000.0
1988,1212,252,6005106,379525.38,494700.0,316500.0,920000.0,21.42,920000.0,5085000.0


## 4) Estimate Marine Distribution & Abundance in SS Seasonally
in / out Salish Sea by season and 'stock' <br>
For fall Chinook, seasonal distribution is estimated in Shelton et al. (2019; digitized from Fig. 3). <br>
Spring / summer run stock estimates were based on ocean rearing behaviour reported in Brown et al (2019) <br>
  (these stocks rear outside of Salish Sea, generally). <br>
Estimate of abundance within Salish is based simply on: <br>
Nss = Css + T  (where Css = catch by fisheries in Salish Sea, T = terminal run size estimate) <br>
This represents a minimum abundance in the months that the fish are spawning (and several weeks prior as they transit) and it isn't accounting for predation. <br>


  
  <a class="anchor" id="section-4"></a>
[BACK TO TOP](#top)

In [9]:
SM_df = pd.DataFrame()

# define stocks
SM_df['area'] = ["fraser","fraser","fraser","fraser",
                 "PS","PS","PS","PS",
                 "fraser","fraser","fraser","fraser",
                 "PS","PS","PS","PS",
                 "fraser","fraser","fraser","fraser",
                 "PS","PS","PS","PS",
                  "ECVI","ECVI","ECVI","ECVI"]

SM_df['runtype'] = ["spring","spring","spring","spring",
                   "spring","spring","spring","spring",
                   "summer","summer","summer","summer",
                   "summer","summer","summer","summer",
                   "fall","fall","fall","fall",
                   "fall","fall","fall","fall",
                   "fall","fall","fall","fall"]

# season definitions match Shelton et al. (2019)
# winter: Nov - Mar
# spring: Apr - May
# summer: Jun - Jul
# fall:   Aug - Oct
SM_df['season'] =["spring", "summer", "fall", "winter",
                 "spring", "summer", "fall", "winter",
                 "spring", "summer", "fall", "winter",
                 "spring", "summer", "fall", "winter",
                 "spring","summer","fall", "winter",
                 "spring","summer","fall", "winter",
                 "spring","summer","fall", "winter"]


# distribution of spawners across seasons estimated from several sources(see write up)
# (subtracted two weeks to account for transit)
SM_df['spawner_seasonality'] = [0.5, 0.5, 0, 0,     # spring / fraser
                                0.66, 0.33, 0, 0,   # spring / PS
                                0.3, 0.5, 0.2, 0,   # summer / fraser
                                0.25, 0.5, 0.25, 0, # summer / PS
                                0, 0.3, 0.7, 0,     # fall /  fraser
                                0, 0.3, 0.7, 0,     # fall /  PS
                                0, 0.3, 0.7, 0      # fall /  SG
                               ]

# seasonal proportion in salish sea of non-spawners (and pre-spawners)
SM_df['nonspawners_inSS'] = [0.05, 0.05, 0.05, 0.05,
                             0.05, 0.05, 0.05, 0.05,
                             0.05, 0.05, 0.05, 0.05,
                             0.05, 0.05, 0.05, 0.05,
                             0.35, 0.35, 0.4, 0.4,
                             0.3, 0.25, 0.35, 0.35, 
                             0.35, 0.35, 0.4, 0.4
                            ]
# join tables
stocktiming_df = pd.merge(stocks_df, SM_df, on=['area','runtype'], how='inner')


# drop some columns and pivot to keep it simple
stocktiming_df2 = stocktiming_df.drop(columns=["Name","Cons Units","juve_behav","area"])

# pivot
stockprop_piv = pd.pivot_table(stocktiming_df2, values = ['spawner_seasonality','nonspawners_inSS'], index=['Stock','runtype'], columns = 'season')
#stockprop_piv = stockprop_piv.rename(columns={"fall": "fall_prop", "spring": "spring_prop", "summer": "summer_prop", "winter": "winter_prop"})

# join to catch + abundance table
CatchAbunProp1 = pd.merge(CatchAbund_df2, stockprop_piv, on=['Stock'], how='outer')
CatchAbunProp2 = pd.merge(CatchAbunProp1, stocktiming_df, on =['Stock'], how = 'inner')

CatchAbunProp2 = CatchAbunProp2.drop(columns=['spawner_seasonality','nonspawners_inSS','juve_behav', 'Cons Units', 'Name'])
CatchAbunProp2[CatchAbunProp2['Stock']=='FCF']

# calculate seasonal numbers inside SS
# using Shelton et al (2019) for fall fish: 
#  Cohort * seasonal proportion
# using run timing minus two weeks method for summer and spring fish: 
#  SS_AbunEst * seasonal proportion

df = CatchAbunProp2

# estimate seasonal abundance of non-spawners
conditions = [(df['season'] == 'spring'),
              (df['season'] == 'summer'),
              (df['season'] == 'fall'),
              (df['season'] == 'winter')
             ]

choices = [df['nonspawners_inSS', 'spring'] * df['nonspawners'], 
           df['nonspawners_inSS', 'summer'] * df['nonspawners'],   
           df['nonspawners_inSS', 'fall'] * df['nonspawners'],   
           df['nonspawners_inSS', 'winter'] * df['nonspawners']
          ]

df['nonspawners_season'] = np.select(conditions, choices, default=0.0)

# estimate seasonal abundance of spawners
conditions = [(df['season'] == 'spring'),
              (df['season'] == 'summer'),
              (df['season'] == 'fall'),
              (df['season'] == 'winter')
             ]

choices = [df['spawner_seasonality', 'spring'] * df['SS_spawners'], 
           df['spawner_seasonality', 'summer'] * df['SS_spawners'],   
           df['spawner_seasonality', 'fall'] * df['SS_spawners'],   
           df['spawner_seasonality', 'winter'] * df['SS_spawners']
          ]

df['spawners_season'] = np.select(conditions, choices, default=0.0)
df['SS_seasonal_est'] = (df['spawners_season'].round(1) + df['nonspawners_season'].round(1))
df['SS_seasonal_est'] = df['SS_seasonal_est'].round(1)

df.loc[df['SS_seasonal_est'] <= -0.1]




Unnamed: 0,Year,StockNum,Stock,Age,Cohort,Escapement,TotalCatchEst_in,TotalCatchEst_out,SS_spawners,M,...,"(spawner_seasonality, fall)","(spawner_seasonality, spring)","(spawner_seasonality, summer)","(spawner_seasonality, winter)",area,runtype,season,nonspawners_season,spawners_season,SS_seasonal_est
330,2019,7,FS2,3,5789,2593.92,7030.115,2094.094,10219.153,0.36,...,0.00,0.50,0.5,0.0,fraser,spring,fall,-221.508,0.0,-221.5
331,2019,7,FS2,3,5789,2593.92,7030.115,2094.094,10219.153,0.36,...,0.00,0.50,0.5,0.0,fraser,spring,winter,-221.508,0.0,-221.5
498,2019,7,FS2,4,354,126.95,500.038,117.643,650.404,0.22,...,0.00,0.50,0.5,0.0,fraser,spring,fall,-14.820,0.0,-14.8
499,2019,7,FS2,4,354,126.95,500.038,117.643,650.404,0.22,...,0.00,0.50,0.5,0.0,fraser,spring,winter,-14.820,0.0,-14.8
1170,2019,8,FS3,4,10871,6008.21,18054.392,1068.502,24961.272,0.22,...,0.00,0.50,0.5,0.0,fraser,spring,fall,-704.514,0.0,-704.5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10072,2019,25,STL,5,23,16.68,30.591,7.207,48.065,0.10,...,0.70,0.00,0.3,0.0,PS,fall,spring,-7.520,0.0,-7.5
10075,2019,25,STL,5,23,16.68,30.591,7.207,48.065,0.10,...,0.70,0.00,0.3,0.0,PS,fall,winter,-8.773,0.0,-8.8
10411,2019,26,SNO,3,5996,791.71,5692.353,5571.536,6885.015,0.36,...,0.25,0.25,0.5,0.0,PS,summer,winter,-44.451,0.0,-44.5
10579,2019,26,SNO,4,2800,1785.44,4839.791,797.524,6872.665,0.22,...,0.25,0.25,0.5,0.0,PS,summer,winter,-203.633,0.0,-203.6


In [52]:
df[['nonspawners_season','Year']].groupby(["Year"]).sum()

Unnamed: 0_level_0,nonspawners_season
Year,Unnamed: 1_level_1
1979,5192000.0
1980,5277000.0
1981,5013000.0
1982,5434000.0
1983,6333000.0
1984,6167000.0
1985,4746000.0
1986,4002000.0
1987,3579000.0
1988,5505000.0


## 5) Length and weight calculations
 <a class="anchor" id="section-5"></a>
[BACK TO TOP](#top)

In [10]:
# VBGF length-at-age function params for Fraser River
#  -- taken from table for Fraser river fish provided by Chuck Parken (avgs used for Chilliwack, harrison)
#  -- lengths in mm
#  -- standard deviations available but not used
#  -- lower or middle Georgia (LGS, MGS) stocks assumed to have average VBGF properties
VBGF_df = []
VBGF_df = pd.DataFrame()

# define stocks
VBGF_df['ModelStock'] = ["FS2", "FS3", "FSS", "FSO", "FCF", "FHF"]
VBGF_df['Linf']       = [981.8, 998.4, 1014.2, 906.3, 982.7, 1030.3]
VBGF_df['K']          = [0.427, 0.96,  0.52,   0.899, 0.864, 0.767]
VBGF_df['t0']         = [1.09,  1.48,  1.12,   1.25,   1.11, 1.04]

# L_t = L_inf(1 - e^K(t - t_0)) 
# where 
# L_t is length at time t
# L_inf is length at infinity
# K is a param year-1
# t_0 ... unsure but youngest year class? intercept? 
# Create new row
mean_Linf = VBGF_df[['Linf']].mean()[0]
mean_K    = VBGF_df[['K']].mean()[0]
mean_t0   = VBGF_df[['t0']].mean()[0]

avgs_LGS = [{"ModelStock":"LGS", "Linf":mean_Linf, "K":mean_K, "t0":mean_t0}]
avgs_MGS = [{"ModelStock":"MGS", "Linf":mean_Linf, "K":mean_K, "t0":mean_t0}]
avgs_NKF = [{"ModelStock":"NKF", "Linf":mean_Linf, "K":mean_K, "t0":mean_t0}]
avgs_PSF = [{"ModelStock":"PSF", "Linf":mean_Linf, "K":mean_K, "t0":mean_t0}]
avgs_PSN = [{"ModelStock":"PSN", "Linf":mean_Linf, "K":mean_K, "t0":mean_t0}]
avgs_PSY = [{"ModelStock":"PSY", "Linf":mean_Linf, "K":mean_K, "t0":mean_t0}]
avgs_NKS = [{"ModelStock":"NKS", "Linf":mean_Linf, "K":mean_K, "t0":mean_t0}]
avgs_SKG = [{"ModelStock":"SKG", "Linf":mean_Linf, "K":mean_K, "t0":mean_t0}]
avgs_STL = [{"ModelStock":"STL", "Linf":mean_Linf, "K":mean_K, "t0":mean_t0}]
avgs_SNO = [{"ModelStock":"SNO", "Linf":mean_Linf, "K":mean_K, "t0":mean_t0}]

# Insert rows 
VBGF_df = VBGF_df.append(avgs_LGS, ignore_index=True)
VBGF_df = VBGF_df.append(avgs_MGS, ignore_index=True)
VBGF_df = VBGF_df.append(avgs_NKF, ignore_index=True)
VBGF_df = VBGF_df.append(avgs_PSF, ignore_index=True)
VBGF_df = VBGF_df.append(avgs_PSN, ignore_index=True)
VBGF_df = VBGF_df.append(avgs_PSY, ignore_index=True)
VBGF_df = VBGF_df.append(avgs_NKS, ignore_index=True)
VBGF_df = VBGF_df.append(avgs_SKG, ignore_index=True)
VBGF_df = VBGF_df.append(avgs_STL, ignore_index=True)
VBGF_df = VBGF_df.append(avgs_SNO, ignore_index=True)

VBGF_df["Stock"] = VBGF_df["ModelStock"]
VBGF_df["e"] = math.exp(1)

df2 = pd.merge(df, VBGF_df, on=['Stock'], how='left')
#L_t = L_inf(1 - e^K(t - t_0)) 
df2["FL"] = df2["Linf"]*(1 - (np.power(df2["e"], (df2["K"]*-1)*(df2["Age"] - df2["t0"]))))
# Fork length to total length ratio from FishBase
# https://www.fishbase.de/popdyn/LLRelationshipList.php?ID=244&GenusName=Oncorhynchus&SpeciesName=tshawytscha&fc=76)
FL_to_TL = 1.034
df2["TL"] = df2["FL"] * FL_to_TL

# Mass estimated from Jones, Petrell, & Pauly (1999, p. 268, Tab. 3)
# M = B*L^2*H
# where M is mass (kg), B is a regression param, H is fish 'height' (m), L is total length (m)
H_L_ratio = 0.24 # H = L * 0.24
B = 62
df2["mass_kg"] = (B * np.power((df2["TL"] * 0.001), 2) * ((df2["TL"] * 0.001) * H_L_ratio))

# Alternative equation for length-weight conversion from Schneider et al (2000, Tab 17.1) 
# (for Atlantic Salmon used by Fanny)
#  logW=-5.113913 + (3.113913*log10(length))
df2["mass_kg2"] = np.power(10,-5.3134 + (3.113913*np.log10(df2["TL"]))) * 0.001
#df2["mass_kg2"] = -5.3134 + (3.113913*np.log10(df2["FL"]*0.001)

df = df2

df2[["Age","TL","mass_kg","mass_kg2"]]


Unnamed: 0,Age,TL,mass_kg,mass_kg2
0,2,326.862,0.520,0.328
1,2,326.862,0.520,0.328
2,2,326.862,0.520,0.328
3,2,326.862,0.520,0.328
4,2,326.862,0.520,0.328
...,...,...,...,...
10747,5,958.603,13.107,9.358
10748,5,958.603,13.107,9.358
10749,5,958.603,13.107,9.358
10750,5,958.603,13.107,9.358


## 6a) Summarize and export for the SRKW model
 <a class="anchor" id="section-6a"></a>
[BACK TO TOP](#top)

In [11]:
# get only some columns and pivot
df['SS_seasonal_est'] = df['SS_seasonal_est'].astype(str).astype(float)

PSC2_df = df[['Stock','Age','Cohort','Year','SS_seasonal_est', 'season']]
final_piv = pd.pivot_table(PSC2_df, values = 'SS_seasonal_est', index=['Stock','Year','season'], columns = 'Age').reset_index()

final_piv2 = pd.merge(final_piv, stocks_df, on =['Stock'], how = 'inner')
final_piv2.to_csv (r'final_SS_Cohort.csv', index = True, header=True)

final_piv2

Unnamed: 0,Stock,Year,season,2,3,4,5,Name,Cons Units,area,runtype,juve_behav
0,FCF,1979,fall,5892.6,3458.6,994.6,73.6,Chilliwack Fall Hatchery,CK-9008/ CK-03,fraser,fall,ocean
1,FCF,1979,spring,3857.9,1130.4,151.3,-0.9,Chilliwack Fall Hatchery,CK-9008/ CK-03,fraser,fall,ocean
2,FCF,1979,summer,4493.7,2059.0,503.5,31.1,Chilliwack Fall Hatchery,CK-9008/ CK-03,fraser,fall,ocean
3,FCF,1979,winter,4409.0,1291.8,172.9,-1.0,Chilliwack Fall Hatchery,CK-9008/ CK-03,fraser,fall,ocean
4,FCF,1980,fall,5899.4,3481.2,984.0,73.3,Chilliwack Fall Hatchery,CK-9008/ CK-03,fraser,fall,ocean
...,...,...,...,...,...,...,...,...,...,...,...,...
2683,STL,2019,winter,560.9,-277.4,-622.2,-8.8,Stillaguamish Wild,na,PS,fall,ocean
2684,STL,2020,fall,1067.7,906.6,235.8,7.7,Stillaguamish Wild,na,PS,fall,ocean
2685,STL,2020,spring,870.4,500.3,33.1,0.6,Stillaguamish Wild,na,PS,fall,ocean
2686,STL,2020,summer,747.7,555.3,112.1,3.5,Stillaguamish Wild,na,PS,fall,ocean


## 6b) Summarize and export for my EwE model
 <a class="anchor" id="section-6b"></a>
[BACK TO TOP](#top)

In [29]:
# summarize catch, escapement, and estimate of stock biomass at beginning of year
outpath = "C://Users//Greig//Sync//6. SSMSP Model//Model Greig//Data//1. Salmon/"
study_area = 11200 #km^2

# biomass (mt) from seasonal
df3 = df
df3["Chinook_B_mt_SS"] = (df3["SS_seasonal_est"] * df3["mass_kg"] * 0.001).round(1)
df3["Chinook_B_dens_mt_SS"] = (df3["Chinook_B_mt_SS"] / study_area).round(2)
df3 = df3[['Year','season','Stock','SS_seasonal_est','Chinook_B_mt_SS','Chinook_B_dens_mt_SS']]
df4 = df3.groupby(['Year','season']).sum().reset_index()

# rename cols prior to averaging
df5 = df4
df5["Chin_B_dens_mt_SS_avg"] = df5["Chinook_B_dens_mt_SS"]
df5["Chin_B_mt_SS_avg"] = df5["Chinook_B_mt_SS"]
df5["Chin_N_SS_avg"] = df5["SS_seasonal_est"]
df6 = df5.groupby(['Year']).mean().reset_index()
df6 = df6[["Year","Chin_N_SS_avg","Chin_B_dens_mt_SS_avg","Chin_B_mt_SS_avg"]]


# catch (mt)
# total catch estimate from fisheries inside and outside Salish Sea
# repeating the calculations here but on the annual catch rather than seasonal b
# to do fix code to streamline - should not repeat
VBGF_df["Stock"] = VBGF_df["ModelStock"]

CatchAbun_df = pd.merge(CatchAbund_df2,VBGF_df, on=['Stock'], how='left')
CatchAbun_df["FL"] = CatchAbun_df["Linf"]*(1 - (np.power(CatchAbun_df["e"], (CatchAbun_df["K"]*-1)*(CatchAbun_df["Age"] - CatchAbun_df["t0"]))))

# See above for FL:TL and L-W info
CatchAbun_df["TL"] = CatchAbun_df["FL"] * FL_to_TL
CatchAbun_df["mass_kg"] = (B * np.power((CatchAbun_df["TL"] * 0.001), 2) * ((CatchAbun_df["TL"] * 0.001) * H_L_ratio))
CatchAbun_df["catch_kg_in"] = CatchAbun_df["mass_kg"] * CatchAbun_df["TotalCatchEst_in"]
CatchAbun_df["catch_kg_out"] = CatchAbun_df["mass_kg"] * CatchAbun_df["TotalCatchEst_out"]
CatchAbun_df["Chin_catch_B_mt_in_PSC"] = (CatchAbun_df["catch_kg_in"] * 0.001).round(1)
CatchAbun_df["Chin_catch_B_mt_out_PSC"] = (CatchAbun_df["catch_kg_out"] * 0.001).round(1)
CatchAbun_df["Chinook_B_mt_Total_PSC"] = ((CatchAbun_df["mass_kg"] * CatchAbun_df["Cohort"]) * 0.001).round(2)

# escapement for SS stocks
CatchAbun_df["mass_esc_kg"] = CatchAbun_df["mass_kg"] * CatchAbun_df["Escapement"]
CatchAbun_df["Chin_B_Escpmnt_mt_PSC"] = (CatchAbun_df["mass_esc_kg"] * 0.001).round(1)
CatchAbun_df["Chin_N_Escpmnt_SS"] = CatchAbun_df["Escapement"]

# estimate escapement # for just IFR, LFR, SoG, to compare to CW's estimates
# stocks_SG_df['Stock'] = ["FS2","FS3","FSO","FSS","FHF","FCF","MGS","LGS"]
# stock long names (Brown et al., 2019, table 2 / 5)
# stocks_SG_df['Name'] = ["Fraser Spring 1.2","Fraser Spring 1.3","Fraser Summer Ocean-type 0.3",
#                      "Fraser Summer Stream-type 1.3","Harrison Fall","Chilliwack Fall Hatchery",
#                      "Middle Georgia Strait Fall (Nanaimo / Chemainus)","Lower Georgia Strait (Cowichan)"]

CatchAbun_df['Chin_escpmnt_N_SG-Fr_PSC'] = CatchAbun_df.loc[(CatchAbun_df['Stock'] == "FS2") | 
                                                    (CatchAbun_df['Stock'] == "FS3") |
                                                    (CatchAbun_df['Stock'] == "FSO") |
                                                    (CatchAbun_df['Stock'] == "FSS") |
                                                    (CatchAbun_df['Stock'] == "FHF") |
                                                    (CatchAbun_df['Stock'] == "FCF") |
                                                    (CatchAbun_df['Stock'] == "MGS") |
                                                    (CatchAbun_df['Stock'] == "LGS"),["Escapement"]].sum(axis=1)
# rename cols and select
CatchAbun_df['Chinook_N_Total_PSC'] = CatchAbun_df['Cohort']
CatchAbun_df["Chin_Catch_N_outside_PSC"] = CatchAbun_df["TotalCatchEst_out"]
CatchAbun_df["Chin_Catch_N_inSS_PSC"] = CatchAbun_df["TotalCatchEst_in"]
CatchAbun_df2 = CatchAbun_df[['Year','Stock',
                              'Chin_catch_B_mt_in_PSC','Chin_catch_B_mt_out_PSC',
                              'Chin_Catch_N_outside_PSC', 'Chin_Catch_N_inSS_PSC',
                              'Chinook_B_mt_Total_PSC','Chinook_N_Total_PSC',
                              'Chin_escpmnt_N_SG-Fr_PSC','Chin_N_Escpmnt_SS','Chin_B_Escpmnt_mt_PSC'
                             ]]
CatchAbun_df3 = CatchAbun_df2.groupby(['Year']).sum().reset_index()

final_df = pd.merge(df6,CatchAbun_df3, on=['Year'])
final_df["Chin_F_PSC"] = (np.abs(np.log((final_df["Chin_catch_B_mt_in_PSC"] + final_df["Chin_catch_B_mt_out_PSC"]) / final_df["Chinook_B_mt_Total_PSC"]))).round(2)
final_df["Chin_Bdens_total_PSC"] = (final_df["Chinook_B_mt_Total_PSC"] / study_area).round(2)
final_df.to_csv (outpath + r'Chinook_PSC_1979to2018.csv', index = True, header=True)

In [45]:
df5.groupby(['Year']).sum()

Unnamed: 0_level_0,SS_seasonal_est,Chinook_B_mt_SS,Chinook_B_dens_mt_SS,Chin_B_dens_mt_SS_avg,Chin_B_mt_SS_avg,Chin_N_SS_avg
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1979,6917000.0,31635.9,2.71,2.71,31635.9,6917000.0
1980,6890000.0,30520.0,2.64,2.64,30520.0,6890000.0
1981,6370000.0,29579.8,2.57,2.57,29579.8,6370000.0
1982,6617000.0,28852.9,2.53,2.53,28852.9,6617000.0
1983,7653000.0,31107.6,2.68,2.68,31107.6,7653000.0
1984,7653000.0,34288.2,2.95,2.95,34288.2,7653000.0
1985,5975000.0,31438.0,2.77,2.77,31438.0,5975000.0
1986,5164000.0,26463.7,2.2,2.2,26463.7,5164000.0
1987,4432000.0,21328.4,1.82,1.82,21328.4,4432000.0
1988,6387000.0,24638.4,2.08,2.08,24638.4,6387000.0


In [38]:
Catch_out_df_grp2.groupby(["Year"]).sum()

Unnamed: 0_level_0,Age,TotalCatchEst_out
Year,Unnamed: 1_level_1,Unnamed: 2_level_1
1979,238,548200.0
1980,238,506400.0
1981,238,424700.0
1982,238,507900.0
1983,238,388400.0
1984,238,401100.0
1985,238,346300.0
1986,238,316000.0
1987,238,266600.0
1988,238,293600.0


In [46]:
df5

Unnamed: 0,Year,season,SS_seasonal_est,Chinook_B_mt_SS,Chinook_B_dens_mt_SS,Chin_B_dens_mt_SS_avg,Chin_B_mt_SS_avg,Chin_N_SS_avg
0,1979,fall,2.524e+06,13239.2,1.17,1.17,13239.2,2.524e+06
1,1979,spring,1.293e+06,4995.0,0.39,0.39,4995.0,1.293e+06
2,1979,summer,1.686e+06,8223.5,0.70,0.70,8223.5,1.686e+06
3,1979,winter,1.414e+06,5178.2,0.45,0.45,5178.2,1.414e+06
4,1980,fall,2.487e+06,12701.3,1.13,1.13,12701.3,2.487e+06
...,...,...,...,...,...,...,...,...
163,2019,winter,-3.126e+05,-5814.8,-0.54,-0.54,-5814.8,-3.126e+05
164,2020,fall,1.341e+06,7164.1,0.64,0.64,7164.1,1.341e+06
165,2020,spring,8.413e+05,3404.6,0.25,0.25,3404.6,8.413e+05
166,2020,summer,9.785e+05,4885.5,0.40,0.40,4885.5,9.785e+05


### OLD CODE

In [14]:
# get only some columns and pivot
PSC2_df = PSC_df[['stock_desc','Stock','Age','Cohort','Year']]
pivPSC = pd.pivot_table(PSC2_df, values = 'Cohort', index=['Stock','stock_desc','Year'], columns = 'Age').reset_index()
pivPSC[:10]

Age,Stock,stock_desc,Year,2,3,4,5
0,FCF,Chilliwack Fall Hatchery,1979,13142.0,6325.0,1606.0,104.0
1,FCF,Chilliwack Fall Hatchery,1980,13059.0,6284.0,1563.0,102.0
2,FCF,Chilliwack Fall Hatchery,1981,20058.0,6259.0,1562.0,100.0
3,FCF,Chilliwack Fall Hatchery,1982,11519.0,9633.0,1556.0,100.0
4,FCF,Chilliwack Fall Hatchery,1983,43599.0,5538.0,2389.0,99.0
5,FCF,Chilliwack Fall Hatchery,1984,59601.0,20532.0,948.0,48.0
6,FCF,Chilliwack Fall Hatchery,1985,184987.0,30440.0,4135.0,18.0
7,FCF,Chilliwack Fall Hatchery,1986,204820.0,93906.0,11463.0,722.0
8,FCF,Chilliwack Fall Hatchery,1987,5557.0,103682.0,35959.0,1569.0
9,FCF,Chilliwack Fall Hatchery,1988,28958.0,2672.0,41604.0,2538.0


In [136]:
# export
pivPSC.to_csv (r'CTCPivoted.csv', index = True, header=True)

### (3) Get the length-at-age data from CWT / RMIS 

In [323]:
CWT_df = pd.read_csv("CWTRecoveries_NewFields.csv")
# get only required fields
# note weight is usually empty
CWT_df = CWT_df[['PSC','PSC_desc','recovery_year','brood_year','recovery_age','rec_season','SalishSea','weight','length']]

# get only salish sea recoveries
CWTSS_df = CWT_df.loc[(CWT_df['SalishSea'] == 'in') & (pd.to_numeric(CWT_df['recovery_year']) >= 1979)
                     & (pd.to_numeric(CWT_df['recovery_age']) >= 2)]
CWTSS_df = CWTSS_df.sort_values(['PSC','recovery_year','rec_season'])

# pivot so ages become columns
pivCWT = pd.pivot_table(CWTSS_df, values = 'length', aggfunc=[np.mean, 'count'], index=['PSC','PSC_desc','recovery_year','rec_season'], columns = 'recovery_age').reset_index()

pd.set_option('precision', 0)
pivCWT = pivCWT.sort_values(['PSC','rec_season', 'recovery_year'])

# validate the results (check a year)
validate = CWTSS_df.loc[(CWTSS_df['PSC'] == 'FCF') & (pd.to_numeric(CWTSS_df['recovery_year']) == 1984)
                     & (pd.to_numeric(CWTSS_df['recovery_age']) == 2) & (CWTSS_df['rec_season'] == 'fall')]

# simplify hierarchical column names for join later
pivCWT['SS_mean_2'] = pivCWT['mean',2]
pivCWT['SS_mean_3'] = pivCWT['mean',3]
pivCWT['SS_mean_4'] = pivCWT['mean',4]
pivCWT['SS_mean_5'] = pivCWT['mean',5]
pivCWT['SS_mean_6'] = pivCWT['mean',6]

pivCWT['SS_count_2'] = pivCWT['count',2]
pivCWT['SS_count_3'] = pivCWT['count',3]
pivCWT['SS_count_4'] = pivCWT['count',4]
pivCWT['SS_count_5'] = pivCWT['count',5]
pivCWT['SS_count_6'] = pivCWT['count',6]

# remove redundant columns 
pivCWT_SS = pd.DataFrame
pivCWT_SS = pivCWT[['PSC', 'PSC_desc','recovery_year','rec_season','SS_mean_2','SS_mean_3','SS_mean_4','SS_mean_5','SS_mean_6',
                   'SS_count_2','SS_count_3','SS_count_4','SS_count_5','SS_count_6']]

# export
pivCWT_SS.to_csv (r'CWTRecovSS_Lengths.csv', index = True, header=True)


#### (3.1) Group length-at-age differently if few recovs

In [324]:
# sample size is very low often for length-at-age
# select records including OUTSIDE salish sea to use when sample size is <20
CWTALL_df = CWT_df.loc[(pd.to_numeric(CWT_df['recovery_year']) >= 1979)
                     & (pd.to_numeric(CWT_df['recovery_age']) >= 2)]
CWTALL_df = CWTALL_df.sort_values(['PSC','recovery_year','rec_season'])
pivCWT_AllAreas = pd.pivot_table(CWTALL_df, values = 'length', aggfunc=[np.mean, 'count'], index=['PSC','PSC_desc','recovery_year','rec_season'], columns = 'recovery_age').reset_index()
pivCWT_AllAreas = pivCWT_AllAreas.sort_values(['PSC','rec_season', 'recovery_year'])
pivCWT_AllAreas[:10]

Unnamed: 0_level_0,PSC,PSC_desc,recovery_year,rec_season,mean,mean,mean,mean,mean,count,count,count,count,count
recovery_age,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,2,3,4,5,6,2,3,4,5,6
0,FCF,Chilliwack Fall Hatchery,1983,fall,472,,,,,169,,,,
4,FCF,Chilliwack Fall Hatchery,1984,fall,423,657.0,,,,25,173.0,,,
8,FCF,Chilliwack Fall Hatchery,1985,fall,428,605.0,792.0,,,30,37.0,68.0,,
12,FCF,Chilliwack Fall Hatchery,1986,fall,452,591.0,805.0,909.0,,60,47.0,15.0,11.0,
16,FCF,Chilliwack Fall Hatchery,1987,fall,463,633.0,869.0,820.0,,7,92.0,20.0,1.0,
20,FCF,Chilliwack Fall Hatchery,1988,fall,455,583.0,885.0,935.0,,30,14.0,43.0,1.0,
24,FCF,Chilliwack Fall Hatchery,1989,fall,397,691.0,883.0,1000.0,,11,26.0,3.0,4.0,
28,FCF,Chilliwack Fall Hatchery,1990,fall,403,592.0,886.0,,,59,14.0,8.0,,
32,FCF,Chilliwack Fall Hatchery,1991,fall,461,637.0,834.0,,,39,74.0,6.0,,
36,FCF,Chilliwack Fall Hatchery,1992,fall,426,683.0,835.0,655.0,,17,83.0,38.0,1.0,


In [325]:
# simplify hierarchical column names for join later
pivCWT_AllAreas['alla_mean_2'] = pivCWT_AllAreas['mean',2]
pivCWT_AllAreas['alla_mean_3'] = pivCWT_AllAreas['mean',3]
pivCWT_AllAreas['alla_mean_4'] = pivCWT_AllAreas['mean',4]
pivCWT_AllAreas['alla_mean_5'] = pivCWT_AllAreas['mean',5]
pivCWT_AllAreas['alla_mean_6'] = pivCWT_AllAreas['mean',6]

pivCWT_AllAreas['alla_count_2'] = pivCWT_AllAreas['count',2]
pivCWT_AllAreas['alla_count_3'] = pivCWT_AllAreas['count',3]
pivCWT_AllAreas['alla_count_4'] = pivCWT_AllAreas['count',4]
pivCWT_AllAreas['alla_count_5'] = pivCWT_AllAreas['count',5]
pivCWT_AllAreas['alla_count_6'] = pivCWT_AllAreas['count',6]

# trim redundant fields (could be done better)
pivCWT_AllAreas = pivCWT_AllAreas.drop([['mean',2]], axis=1)
pivCWT_AllAreas = pivCWT_AllAreas.drop([['mean',3]], axis=1)
pivCWT_AllAreas = pivCWT_AllAreas.drop([['mean',4]], axis=1)
pivCWT_AllAreas = pivCWT_AllAreas.drop([['mean',5]], axis=1)
pivCWT_AllAreas = pivCWT_AllAreas.drop([['mean',6]], axis=1)
pivCWT_AllAreas = pivCWT_AllAreas.drop([['count',2]], axis=1)
pivCWT_AllAreas = pivCWT_AllAreas.drop([['count',3]], axis=1)
pivCWT_AllAreas = pivCWT_AllAreas.drop([['count',4]], axis=1)
pivCWT_AllAreas = pivCWT_AllAreas.drop([['count',5]], axis=1)
pivCWT_AllAreas = pivCWT_AllAreas.drop([['count',6]], axis=1)

# get rid of empty hierarchy (messes up join later)
pivCWT_AllAreas.columns = pivCWT_AllAreas.columns.droplevel('recovery_age')
pivCWT_SS.columns = pivCWT_SS.columns.droplevel('recovery_age')

In [326]:
# get mean length-at-age by stock and season taken from all years
pivCWT_AllYears = pd.pivot_table(CWTALL_df, values = 'length', aggfunc=[np.mean, 'count'], index=['PSC','PSC_desc','rec_season'], columns = 'recovery_age').reset_index()
pivCWT_AllYears = pivCWT_AllYears.sort_values(['PSC','rec_season'])


# simplify hierarchical column names for join later
pivCWT_AllYears['ally_mean_2'] = pivCWT_AllYears['mean',2]
pivCWT_AllYears['ally_mean_3'] = pivCWT_AllYears['mean',3]
pivCWT_AllYears['ally_mean_4'] = pivCWT_AllYears['mean',4]
pivCWT_AllYears['ally_mean_5'] = pivCWT_AllYears['mean',5]
pivCWT_AllYears['ally_mean_6'] = pivCWT_AllYears['mean',6]

pivCWT_AllYears['ally_count_2'] = pivCWT_AllYears['count',2]
pivCWT_AllYears['ally_count_3'] = pivCWT_AllYears['count',3]
pivCWT_AllYears['ally_count_4'] = pivCWT_AllYears['count',4]
pivCWT_AllYears['ally_count_5'] = pivCWT_AllYears['count',5]
pivCWT_AllYears['ally_count_6'] = pivCWT_AllYears['count',6]

# trim redundant fields (could be done better)
pivCWT_AllYears = pivCWT_AllYears.drop([['mean',2]], axis=1)
pivCWT_AllYears = pivCWT_AllYears.drop([['mean',3]], axis=1)
pivCWT_AllYears = pivCWT_AllYears.drop([['mean',4]], axis=1)
pivCWT_AllYears = pivCWT_AllYears.drop([['mean',5]], axis=1)
pivCWT_AllYears = pivCWT_AllYears.drop([['mean',6]], axis=1)
pivCWT_AllYears = pivCWT_AllYears.drop([['count',2]], axis=1)
pivCWT_AllYears = pivCWT_AllYears.drop([['count',3]], axis=1)
pivCWT_AllYears = pivCWT_AllYears.drop([['count',4]], axis=1)
pivCWT_AllYears = pivCWT_AllYears.drop([['count',5]], axis=1)
pivCWT_AllYears = pivCWT_AllYears.drop([['count',6]], axis=1)

# drop empty hierarchical column
pivCWT_AllYears.columns = pivCWT_AllYears.columns.droplevel('recovery_age')

pivCWT_AllYears[:5]

Unnamed: 0,PSC,PSC_desc,rec_season,ally_mean_2,ally_mean_3,ally_mean_4,ally_mean_5,ally_mean_6,ally_count_2,ally_count_3,ally_count_4,ally_count_5,ally_count_6
0,FCF,Chilliwack Fall Hatchery,fall,468,691,853,921,,646,1858,599,34,
1,FCF,Chilliwack Fall Hatchery,spring,69,649,784,869,,2,986,708,26,
2,FCF,Chilliwack Fall Hatchery,summer,454,669,810,862,,134,3074,1171,38,
3,FCF,Chilliwack Fall Hatchery,winter,482,536,681,630,,120,277,68,1,
4,FHF,Harrison Fall,fall,463,653,821,858,,335,795,300,12,


### (4) merge abundance and stock distrbution estimates

In [None]:
# merge the CWT data and the abundance data

In [148]:
# Merge abundance and stock dist
pd.set_option('precision', 2)
AS_df = pd.merge(pivPSC, stocktiming_df, on='Stock', how='inner')
AS_df[:10]

Unnamed: 0,Stock,stock_desc,Year,2,3,4,5,area,runtype,season,proportion
0,FCF,Chilliwack Fall Hatchery,1979,13142.0,6325.0,1606.0,104.0,fraser,fall,spring,0.35
1,FCF,Chilliwack Fall Hatchery,1979,13142.0,6325.0,1606.0,104.0,fraser,fall,summer,0.35
2,FCF,Chilliwack Fall Hatchery,1979,13142.0,6325.0,1606.0,104.0,fraser,fall,fall,0.4
3,FCF,Chilliwack Fall Hatchery,1980,13059.0,6284.0,1563.0,102.0,fraser,fall,spring,0.35
4,FCF,Chilliwack Fall Hatchery,1980,13059.0,6284.0,1563.0,102.0,fraser,fall,summer,0.35
5,FCF,Chilliwack Fall Hatchery,1980,13059.0,6284.0,1563.0,102.0,fraser,fall,fall,0.4
6,FCF,Chilliwack Fall Hatchery,1981,20058.0,6259.0,1562.0,100.0,fraser,fall,spring,0.35
7,FCF,Chilliwack Fall Hatchery,1981,20058.0,6259.0,1562.0,100.0,fraser,fall,summer,0.35
8,FCF,Chilliwack Fall Hatchery,1981,20058.0,6259.0,1562.0,100.0,fraser,fall,fall,0.4
9,FCF,Chilliwack Fall Hatchery,1982,11519.0,9633.0,1556.0,100.0,fraser,fall,spring,0.35


In [149]:
# create columns for abundance weighted by proportion in SS 
AS_df['2_w'] = (AS_df[2] * AS_df['proportion'])
AS_df['3_w'] = (AS_df[3] * AS_df['proportion'])
AS_df['4_w'] = (AS_df[4] * AS_df['proportion'])
AS_df['5_w'] = (AS_df[5] * AS_df['proportion'])

# to do joins the column names have to match
AS_df['PSC'] = AS_df['Stock']
AS_df['recovery_year'] = AS_df['Year']
AS_df['rec_season'] = AS_df['season']

AS_df[:10]

Unnamed: 0,Stock,stock_desc,Year,2,3,4,5,area,runtype,season,proportion,2_w,3_w,4_w,5_w,PSC,recovery_year,rec_season
0,FCF,Chilliwack Fall Hatchery,1979,13142.0,6325.0,1606.0,104.0,fraser,fall,spring,0.35,4599.7,2213.75,562.1,36.4,FCF,1979,spring
1,FCF,Chilliwack Fall Hatchery,1979,13142.0,6325.0,1606.0,104.0,fraser,fall,summer,0.35,4599.7,2213.75,562.1,36.4,FCF,1979,summer
2,FCF,Chilliwack Fall Hatchery,1979,13142.0,6325.0,1606.0,104.0,fraser,fall,fall,0.4,5256.8,2530.0,642.4,41.6,FCF,1979,fall
3,FCF,Chilliwack Fall Hatchery,1980,13059.0,6284.0,1563.0,102.0,fraser,fall,spring,0.35,4570.65,2199.4,547.05,35.7,FCF,1980,spring
4,FCF,Chilliwack Fall Hatchery,1980,13059.0,6284.0,1563.0,102.0,fraser,fall,summer,0.35,4570.65,2199.4,547.05,35.7,FCF,1980,summer
5,FCF,Chilliwack Fall Hatchery,1980,13059.0,6284.0,1563.0,102.0,fraser,fall,fall,0.4,5223.6,2513.6,625.2,40.8,FCF,1980,fall
6,FCF,Chilliwack Fall Hatchery,1981,20058.0,6259.0,1562.0,100.0,fraser,fall,spring,0.35,7020.3,2190.65,546.7,35.0,FCF,1981,spring
7,FCF,Chilliwack Fall Hatchery,1981,20058.0,6259.0,1562.0,100.0,fraser,fall,summer,0.35,7020.3,2190.65,546.7,35.0,FCF,1981,summer
8,FCF,Chilliwack Fall Hatchery,1981,20058.0,6259.0,1562.0,100.0,fraser,fall,fall,0.4,8023.2,2503.6,624.8,40.0,FCF,1981,fall
9,FCF,Chilliwack Fall Hatchery,1982,11519.0,9633.0,1556.0,100.0,fraser,fall,spring,0.35,4031.65,3371.55,544.6,35.0,FCF,1982,spring


### (5) Join the abundance, stock distr, and CWT recovery data

In [329]:
# join the above table to the length-at-age observations from CWT
# start with length-at-age from only salish sea recovs
pd.set_option('precision', 2)

ASL_df = pd.merge(AS_df, pivCWT_SS, on=['PSC','recovery_year','rec_season'], how='inner')
# join second data frame containing stats for all areas by stock, year, recovery season
ASL_df2 = pd.merge(ASL_df, pivCWT_AllAreas, on=['PSC','recovery_year','rec_season'], how='inner')
# join third data frame containing stats for all areas by stock, recovery season
ASL_df3 = pd.merge(ASL_df2, pivCWT_AllYears, on=['PSC','rec_season'], how='inner')
ASL_df3[0:10]

Unnamed: 0,Stock,stock_desc,Year,2,3,4,5,area,runtype,season,proportion,2_w,3_w,4_w,5_w,PSC,recovery_year,rec_season,PSC_desc_x,SS_mean_2,SS_mean_3,SS_mean_4,SS_mean_5,SS_mean_6,SS_count_2,SS_count_3,SS_count_4,SS_count_5,SS_count_6,PSC_desc_y,alla_mean_2,alla_mean_3,alla_mean_4,alla_mean_5,alla_mean_6,alla_count_2,alla_count_3,alla_count_4,alla_count_5,alla_count_6,PSC_desc,ally_mean_2,ally_mean_3,ally_mean_4,ally_mean_5,ally_mean_6,ally_count_2,ally_count_3,ally_count_4,ally_count_5,ally_count_6
0,FCF,Chilliwack Fall Hatchery,1983,43599.0,5538.0,2389.0,99.0,fraser,fall,spring,0.35,15259.65,1938.3,836.15,34.65,FCF,1983,spring,Chilliwack Fall Hatchery,68.0,,,,,1.0,,,,,Chilliwack Fall Hatchery,68.0,,,,,1.0,,,,,Chilliwack Fall Hatchery,69.0,649.31,784.04,869.08,,2.0,986.0,708.0,26.0,
1,FCF,Chilliwack Fall Hatchery,1984,59601.0,20532.0,948.0,48.0,fraser,fall,spring,0.35,20860.35,7186.2,331.8,16.8,FCF,1984,spring,Chilliwack Fall Hatchery,,650.0,,,,,1.0,,,,Chilliwack Fall Hatchery,,653.5,,,,,52.0,,,,Chilliwack Fall Hatchery,69.0,649.31,784.04,869.08,,2.0,986.0,708.0,26.0,
2,FCF,Chilliwack Fall Hatchery,1985,184987.0,30440.0,4135.0,18.0,fraser,fall,spring,0.35,64745.45,10654.0,1447.25,6.3,FCF,1985,spring,Chilliwack Fall Hatchery,,585.0,676.67,,,,2.0,3.0,,,Chilliwack Fall Hatchery,,632.67,741.1,,,,6.0,40.0,,,Chilliwack Fall Hatchery,69.0,649.31,784.04,869.08,,2.0,986.0,708.0,26.0,
3,FCF,Chilliwack Fall Hatchery,1986,204820.0,93906.0,11463.0,722.0,fraser,fall,spring,0.35,71687.0,32867.1,4012.05,252.7,FCF,1986,spring,Chilliwack Fall Hatchery,,331.5,,,,,2.0,0.0,0.0,,Chilliwack Fall Hatchery,,499.5,,,,,4.0,0.0,0.0,,Chilliwack Fall Hatchery,69.0,649.31,784.04,869.08,,2.0,986.0,708.0,26.0,
4,FCF,Chilliwack Fall Hatchery,1987,5557.0,103682.0,35959.0,1569.0,fraser,fall,spring,0.35,1944.95,36288.7,12585.65,549.15,FCF,1987,spring,Chilliwack Fall Hatchery,,640.0,760.0,,,,2.0,2.0,,,Chilliwack Fall Hatchery,,630.0,755.0,,,,3.0,4.0,0.0,,Chilliwack Fall Hatchery,69.0,649.31,784.04,869.08,,2.0,986.0,708.0,26.0,
5,FCF,Chilliwack Fall Hatchery,1988,28958.0,2672.0,41604.0,2538.0,fraser,fall,spring,0.35,10135.3,935.2,14561.4,888.3,FCF,1988,spring,Chilliwack Fall Hatchery,,559.0,788.57,,,,1.0,7.0,,,Chilliwack Fall Hatchery,,559.0,792.5,,,,1.0,8.0,,,Chilliwack Fall Hatchery,69.0,649.31,784.04,869.08,,2.0,986.0,708.0,26.0,
6,FCF,Chilliwack Fall Hatchery,1989,64664.0,14606.0,1028.0,1732.0,fraser,fall,spring,0.35,22632.4,5112.1,359.8,606.2,FCF,1989,spring,Chilliwack Fall Hatchery,,600.0,,,,,1.0,,,,Chilliwack Fall Hatchery,,683.33,,,,,3.0,,,,Chilliwack Fall Hatchery,69.0,649.31,784.04,869.08,,2.0,986.0,708.0,26.0,
7,FCF,Chilliwack Fall Hatchery,1990,377137.0,31260.0,4639.0,2.0,fraser,fall,spring,0.35,131997.95,10941.0,1623.65,0.7,FCF,1990,spring,Chilliwack Fall Hatchery,,116.0,720.0,,,0.0,1.0,1.0,,,Chilliwack Fall Hatchery,,393.0,757.5,,,0.0,2.0,4.0,,,Chilliwack Fall Hatchery,69.0,649.31,784.04,869.08,,2.0,986.0,708.0,26.0,
8,FCF,Chilliwack Fall Hatchery,1991,222230.0,198126.0,11390.0,190.0,fraser,fall,spring,0.35,77780.5,69344.1,3986.5,66.5,FCF,1991,spring,Chilliwack Fall Hatchery,70.0,595.55,790.0,,,1.0,22.0,4.0,,,Chilliwack Fall Hatchery,70.0,595.55,790.0,,,1.0,22.0,4.0,,,Chilliwack Fall Hatchery,69.0,649.31,784.04,869.08,,2.0,986.0,708.0,26.0,
9,FCF,Chilliwack Fall Hatchery,1992,73629.0,116212.0,86136.0,647.0,fraser,fall,spring,0.35,25770.15,40674.2,30147.6,226.45,FCF,1992,spring,Chilliwack Fall Hatchery,,573.69,757.33,,,,13.0,15.0,,,Chilliwack Fall Hatchery,,541.07,778.33,,,,14.0,18.0,,,Chilliwack Fall Hatchery,69.0,649.31,784.04,869.08,,2.0,986.0,708.0,26.0,


In [330]:
len(ASL_df3)


1076

### (6) Get Catch Data from PSC CTC Model

In [153]:
# CSV below was extracted from CTC spreadsheet from C. Parken
Catch_df = pd.read_csv("CTC_Catch.csv")
Catch_df.dtypes
#len(Catch_df)

Year                                   int64  
FishNum                                int64  
Fishery                                object 
StockNum                               int64  
Stock                                  object 
Age                                    int64  
Scaled to Observed Catch               object 
Model Catch to Observed Catch Ratio    float64
Catch                                  float64
Shakers                                float64
CNR Legals                             float64
CNR Sublegals                          float64
AEQ Catch                              float64
AEQ Shakers                            float64
AEQ CNRLeg                             float64
AEQ CNRSubLeg                          float64
dtype: object

In [154]:
# only select records from Salish Sea, excluding terminal fisheries
#  (following 'FishNum' codes: 8,13,14,15,25,26,35,36,38,39)
#  (these are selected based on codes provided by C. Parken / CTC)
# get only salish sea recoveries
Catch_df = Catch_df.loc[(Catch_df['FishNum'] == 8) | (Catch_df['FishNum'] == 13) |
                       (Catch_df['FishNum'] == 14) | (Catch_df['FishNum'] == 15) |
                       (Catch_df['FishNum'] == 25) | (Catch_df['FishNum'] == 26) |
                       (Catch_df['FishNum'] == 35) | (Catch_df['FishNum'] == 36) |
                       (Catch_df['FishNum'] == 38) | (Catch_df['FishNum'] == 39)]
len(Catch_df)

30240

In [1]:
# pivot so age and fishery are columns
# get only some columns and pivot
Catch_df = Catch_df[['Year','FishNum','Stock','Age','Catch', 'Shakers']]
pivCatch = pd.pivot_table(Catch_df, values = ['Catch','Shakers'], index=['Stock','Year'], columns = ['Age','FishNum']).reset_index()


NameError: name 'Catch_df' is not defined

In [102]:
#pd.set_option('display.max_columns', None)  
#pivCatch[0:10]


In [156]:
# sum the catch from all fisheries for each age, stock and year
#  note that the indexing here seems odd; first number is index of start column, 
#  second number is the number of columns to include to the right of the start column (1 + index)
pivCatch['TotCatch_Age2'] = pivCatch.iloc[:,2:12].sum(1)
pivCatch['TotCatch_Age3'] = pivCatch.iloc[:,12:22].sum(1)
pivCatch['TotCatch_Age4'] = pivCatch.iloc[:,22:32].sum(1)
pivCatch['TotCatch_Age5'] = pivCatch.iloc[:,32:42].sum(1)
pivCatch['TotShakers_Age2'] = pivCatch.iloc[:,42:52].sum(1)
pivCatch['TotShakers_Age3'] = pivCatch.iloc[:,52:62].sum(1)
pivCatch['TotShakers_Age4'] = pivCatch.iloc[:,62:72].sum(1)
pivCatch['TotShakers_Age5'] = pivCatch.iloc[:,72:82].sum(1)
pivCatch['TotalCatch'] = pivCatch.iloc[:,2:42].sum(1)
pivCatch['TotalShakers'] = pivCatch.iloc[:,42:82].sum(1)

In [157]:
pivCatch[0:10]

Unnamed: 0_level_0,Stock,Year,Catch,Catch,Catch,Catch,Catch,Catch,Catch,Catch,Catch,Catch,Catch,Catch,Catch,Catch,Catch,Catch,Catch,Catch,Catch,Catch,Catch,Catch,Catch,Catch,Catch,Catch,Catch,Catch,Catch,Catch,Catch,Catch,Catch,Catch,Catch,Catch,Catch,Catch,Catch,Catch,Shakers,Shakers,Shakers,Shakers,Shakers,Shakers,Shakers,Shakers,Shakers,Shakers,Shakers,Shakers,Shakers,Shakers,Shakers,Shakers,Shakers,Shakers,Shakers,Shakers,Shakers,Shakers,Shakers,Shakers,Shakers,Shakers,Shakers,Shakers,Shakers,Shakers,Shakers,Shakers,Shakers,Shakers,Shakers,Shakers,Shakers,Shakers,Shakers,Shakers,TotCatch_Age2,TotCatch_Age3,TotCatch_Age4,TotCatch_Age5,TotShakers_Age2,TotShakers_Age3,TotShakers_Age4,TotShakers_Age5,TotalCatch,TotalShakers
Age,Unnamed: 1_level_1,Unnamed: 2_level_1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,5,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,5,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1
FishNum,Unnamed: 1_level_2,Unnamed: 2_level_2,8,13,14,15,25,26,35,36,38,39,8,13,14,15,25,26,35,36,38,39,8,13,14,15,25,26,35,36,38,39,8,13,14,15,25,26,35,36,38,39,8,13,14,15,25,26,35,36,38,39,8,13,14,15,25,26,35,36,38,39,8,13,14,15,25,26,35,36,38,39,8,13,14,15,25,26,35,36,38,39,Unnamed: 83_level_2,Unnamed: 84_level_2,Unnamed: 85_level_2,Unnamed: 86_level_2,Unnamed: 87_level_2,Unnamed: 88_level_2,Unnamed: 89_level_2,Unnamed: 90_level_2,Unnamed: 91_level_2,Unnamed: 92_level_2
0,FCF,1979,72.73,442.86,123.41,187.66,40.17,5.54,9.38,64.4,312.58,1.6,1310.21,70.95,29.2,113.54,21.86,3.66,31.79,282.99,642.32,12.63,434.16,25.67,26.49,0.0,6.06,95.01,17.1,8.19,230.81,4.13,25.51,0.0,3.13,0.0,0.24,1.3,0.82,0.0,14.58,0.0,42.1,22.28,28.8,30.81,5.12,4.46,26.27,94.63,34.69,1.77,23.43,0.66,2.5,2.68,0.43,0.95,4.77,41.61,44.69,0.92,7.41,0.0,0.11,0.0,0.01,0.07,2.49,1.22,15.95,0.29,0.45,0.0,0.01,0.0,0.01,0.04,0.11,0.0,1.01,0.0,1260.32,2519.15,847.61,45.57,290.94,122.66,27.56,1.63,4672.65,442.79
1,FCF,1980,92.11,582.43,181.69,186.47,54.73,5.52,8.08,61.76,197.39,1.76,1658.78,93.29,43.0,112.8,29.77,3.66,27.36,271.37,405.56,13.88,538.4,33.18,38.27,0.0,8.1,92.68,14.42,7.68,142.74,4.44,32.11,0.0,4.59,0.0,0.31,1.29,0.71,0.0,9.15,0.0,46.63,30.1,39.41,42.3,5.79,3.04,23.03,94.76,21.35,1.76,29.46,0.88,3.43,3.68,0.49,0.65,4.12,39.94,28.2,1.0,9.18,0.0,0.14,0.0,0.01,0.05,2.11,1.16,9.86,0.31,0.55,0.0,0.02,0.0,0.01,0.03,0.1,0.0,0.63,0.0,1371.95,2659.46,879.92,48.15,308.17,111.86,22.81,1.34,4959.48,444.18
2,FCF,1981,109.81,790.54,205.53,286.42,92.59,8.5,11.15,57.03,327.64,2.25,1282.48,82.13,31.54,112.36,32.66,3.65,24.48,162.52,436.53,11.47,417.62,29.32,28.19,0.0,8.92,92.67,12.94,4.62,154.15,3.69,24.31,0.0,3.3,0.0,0.34,1.26,0.62,0.0,9.64,0.0,61.49,37.72,44.07,43.47,8.18,4.87,31.41,91.37,34.01,2.09,22.89,0.78,2.48,2.45,0.44,0.68,3.69,23.93,30.33,0.83,7.12,0.0,0.11,0.0,0.02,0.05,1.88,0.7,10.66,0.25,0.43,0.0,0.01,0.0,0.02,0.03,0.09,0.0,0.67,0.0,1891.46,2179.82,752.12,39.48,358.68,88.5,20.79,1.24,4862.88,469.21
3,FCF,1982,58.84,451.88,121.36,164.48,48.63,4.88,3.98,24.77,128.76,1.31,1841.6,125.8,49.91,172.92,45.98,5.6,23.46,189.13,459.76,18.07,388.26,29.03,28.88,0.0,8.13,92.33,8.03,3.48,105.11,3.76,22.68,0.0,3.39,0.0,0.31,1.26,0.39,0.0,6.6,0.0,25.63,21.75,21.28,25.71,3.75,2.67,10.42,41.47,10.85,1.18,32.51,1.16,3.21,3.88,0.55,0.99,3.52,27.87,31.82,1.3,6.61,0.0,0.12,0.0,0.01,0.05,1.17,0.53,7.26,0.26,0.4,0.0,0.01,0.0,0.01,0.03,0.06,0.0,0.46,0.0,1008.89,2932.23,667.01,34.63,164.71,106.82,16.01,0.97,4642.75,288.51
4,FCF,1983,118.16,1468.57,210.3,558.43,196.08,64.63,0.0,0.0,707.13,4.44,624.54,62.09,13.13,89.17,28.15,18.06,30.46,222.27,383.5,9.24,414.37,48.63,25.79,0.0,16.88,428.86,29.22,11.46,234.12,5.14,15.89,0.0,1.64,0.0,0.36,3.17,0.91,0.0,9.55,0.0,86.16,73.73,41.4,124.61,20.52,14.01,126.31,317.91,60.48,4.29,14.49,0.58,0.95,2.86,0.46,1.27,5.48,34.91,26.55,0.67,7.17,0.04,0.09,0.0,0.03,0.09,4.26,1.72,16.16,0.36,0.27,0.0,0.0,0.0,0.02,0.03,0.14,0.02,0.65,0.0,3327.73,1480.62,1214.48,31.51,869.43,88.2,29.91,1.13,6054.33,988.67
5,FCF,1984,101.22,2599.44,257.81,851.04,170.23,4.4,0.0,0.0,1491.49,4.9,1450.93,298.1,43.68,368.54,66.3,7.53,101.16,708.79,2193.69,27.63,102.98,23.17,8.52,0.0,3.95,41.29,10.39,3.92,143.25,1.64,4.82,0.0,0.67,0.0,0.11,0.38,0.39,0.0,7.12,0.0,95.69,132.86,53.47,228.06,17.81,3.48,167.49,322.6,131.05,4.58,36.4,2.88,3.33,14.19,1.09,1.93,18.48,110.12,151.94,2.0,1.79,0.0,0.03,0.0,0.0,0.03,1.52,0.59,9.89,0.12,0.08,0.0,0.0,0.0,0.0,0.01,0.07,0.0,0.49,0.0,5480.53,5266.34,339.11,13.49,1157.09,342.37,13.98,0.65,11099.47,1514.09
6,FCF,1985,186.12,2113.41,2206.93,2781.44,804.41,4.29,0.0,0.0,3568.46,19.61,1274.42,115.76,178.61,575.36,149.63,1.4,111.79,788.01,2507.09,52.82,266.21,27.56,106.58,0.0,27.28,56.55,33.76,12.81,481.85,9.24,1.05,0.0,0.83,0.0,0.07,0.05,0.1,0.0,2.03,0.0,188.94,100.24,412.46,516.64,62.95,11.36,364.67,705.34,302.18,17.05,32.75,1.05,12.26,15.36,1.83,1.2,20.17,121.94,173.53,3.8,4.64,0.03,0.52,0.0,0.04,0.15,4.92,1.9,33.27,0.65,0.02,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.14,0.0,11684.65,5754.89,1021.85,4.12,2681.83,383.9,46.11,0.18,18465.51,3112.02
7,FCF,1986,371.62,2493.54,2519.06,2872.0,451.23,25.98,0.0,0.0,3818.81,26.04,3736.3,380.53,568.01,1655.28,233.87,17.28,560.04,2343.51,7475.36,195.5,1307.94,84.87,317.43,0.0,39.93,534.26,152.0,34.21,1291.19,30.73,79.37,0.0,32.83,0.0,1.37,6.42,7.11,0.0,79.61,0.0,159.63,115.07,246.27,449.57,24.39,20.17,587.87,695.58,301.35,20.31,140.63,3.34,20.39,37.23,1.97,4.34,99.03,360.9,516.82,13.99,22.37,0.09,1.4,0.0,0.07,0.4,22.15,5.1,89.13,2.14,1.32,0.04,0.06,0.0,0.07,0.21,1.07,0.04,5.49,0.0,12578.3,17165.66,3792.55,206.71,2620.21,1198.65,142.84,8.3,33743.22,3969.99
8,FCF,1987,10.97,16.38,77.74,72.6,11.0,0.67,0.0,0.0,83.95,0.55,4484.79,101.74,713.32,1702.9,232.09,12.76,439.17,1662.72,6687.97,167.23,4460.62,83.08,1459.59,0.0,145.09,529.31,338.65,68.97,3282.18,74.7,187.62,0.0,82.57,0.0,2.71,3.48,10.97,0.0,140.27,0.0,4.72,0.67,7.29,5.86,0.63,2.29,8.98,10.41,6.56,0.45,173.45,0.79,24.58,19.74,2.09,14.09,74.77,253.95,462.31,11.99,80.77,0.1,7.39,0.0,0.18,1.75,49.32,10.25,226.56,5.21,3.29,0.03,0.19,0.0,0.11,0.51,1.63,0.04,9.68,0.0,273.86,16204.69,10442.19,427.64,47.85,1037.77,381.54,15.47,27348.38,1482.63
9,FCF,1988,37.29,85.91,416.27,362.21,26.38,5.99,0.0,0.0,256.03,2.12,75.45,2.64,18.88,42.01,2.75,1.38,10.5,58.84,111.85,3.55,3369.74,88.81,1593.34,0.0,70.91,1573.36,363.46,109.6,5510.43,159.16,198.14,0.0,133.75,0.0,1.96,15.33,16.48,0.0,344.78,0.0,11.83,4.21,48.16,102.89,1.82,5.61,44.56,91.28,65.42,32.08,2.47,0.03,0.8,1.71,0.03,0.42,1.79,9.08,12.27,3.28,59.99,0.09,6.42,0.0,0.09,1.43,52.94,16.36,390.63,17.95,3.44,0.03,0.25,0.0,0.08,0.61,2.45,0.12,24.08,0.19,1192.19,327.85,12838.81,710.45,407.86,31.88,545.9,31.25,15069.3,1016.9


In [158]:
pivCatch['TotCatchShake_Age2'] = pivCatch.loc[:,['TotCatch_Age2','TotShakers_Age2']].sum(1)
pivCatch['TotCatchShake_Age3'] = pivCatch.loc[:,['TotCatch_Age3','TotShakers_Age3']].sum(1)
pivCatch['TotCatchShake_Age4'] = pivCatch.loc[:,['TotCatch_Age4','TotShakers_Age4']].sum(1)
pivCatch['TotCatchShake_Age5'] = pivCatch.loc[:,['TotCatch_Age5','TotShakers_Age5']].sum(1)

In [159]:
pivCatch.iloc[:,[0,1,82,83,84,85,86,87,88,89,90,91,92,93,94,95]]

Unnamed: 0_level_0,Stock,Year,TotCatch_Age2,TotCatch_Age3,TotCatch_Age4,TotCatch_Age5,TotShakers_Age2,TotShakers_Age3,TotShakers_Age4,TotShakers_Age5,TotalCatch,TotalShakers,TotCatchShake_Age2,TotCatchShake_Age3,TotCatchShake_Age4,TotCatchShake_Age5
Age,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
FishNum,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2
0,FCF,1979,1260.32,2519.15,847.61,45.57,290.94,122.66,27.56,1.63,4672.65,442.79,1551.26,2641.80,875.17,47.20
1,FCF,1980,1371.95,2659.46,879.92,48.15,308.17,111.86,22.81,1.34,4959.48,444.18,1680.12,2771.32,902.73,49.49
2,FCF,1981,1891.46,2179.82,752.12,39.48,358.68,88.50,20.79,1.24,4862.88,469.21,2250.15,2268.32,772.90,40.72
3,FCF,1982,1008.89,2932.23,667.01,34.63,164.71,106.82,16.01,0.97,4642.75,288.51,1173.60,3039.05,683.02,35.60
4,FCF,1983,3327.73,1480.62,1214.48,31.51,869.43,88.20,29.91,1.13,6054.33,988.67,4197.16,1568.82,1244.39,32.64
5,FCF,1984,5480.53,5266.34,339.11,13.49,1157.09,342.37,13.98,0.65,11099.47,1514.09,6637.62,5608.71,353.09,14.14
6,FCF,1985,11684.65,5754.89,1021.85,4.12,2681.83,383.90,46.11,0.18,18465.51,3112.02,14366.49,6138.78,1067.96,4.30
7,FCF,1986,12578.30,17165.66,3792.55,206.71,2620.21,1198.65,142.84,8.30,33743.22,3969.99,15198.51,18364.30,3935.40,215.01
8,FCF,1987,273.86,16204.69,10442.19,427.64,47.85,1037.77,381.54,15.47,27348.38,1482.63,321.72,17242.46,10823.73,443.11
9,FCF,1988,1192.19,327.85,12838.81,710.45,407.86,31.88,545.90,31.25,15069.30,1016.90,1600.06,359.73,13384.70,741.70


In [165]:
# merge df here with the dataframe from section 4 cohort table
# add the catch to the cohort estimates based on seasonal estimates
# simplify hierarchical column names for join later


In [171]:
pivCatch2 = pivCatch.iloc[:,[0,1,82,83,84,85,86,87,88,89,90,91,92,93,94,95]]

# drop empty levels
pivCatch2.columns = pivCatch2.columns.droplevel('Age')
pivCatch2.columns = pivCatch2.columns.droplevel('FishNum')

In [245]:
# need to remove preceding space from 'Stock' codes 
pivCatch2['Stock'] = pivCatch2.apply (lambda row: fix_code(row), axis=1)


NameError: name 'pivCatch3' is not defined

### Final Joins, Export, Data Summary

In [331]:
Final_df = pd.merge(ASL_df3, pivCatch2, on=['Stock','Year'], how='inner')

In [332]:
len(ASL_df2)

1076

In [333]:
Final_df[0:10]

Unnamed: 0,Stock,stock_desc,Year,2,3,4,5,area,runtype,season,proportion,2_w,3_w,4_w,5_w,PSC,recovery_year,rec_season,PSC_desc_x,SS_mean_2,SS_mean_3,SS_mean_4,SS_mean_5,SS_mean_6,SS_count_2,SS_count_3,SS_count_4,SS_count_5,SS_count_6,PSC_desc_y,alla_mean_2,alla_mean_3,alla_mean_4,alla_mean_5,alla_mean_6,alla_count_2,alla_count_3,alla_count_4,alla_count_5,alla_count_6,PSC_desc,ally_mean_2,ally_mean_3,ally_mean_4,ally_mean_5,ally_mean_6,ally_count_2,ally_count_3,ally_count_4,ally_count_5,ally_count_6,TotCatch_Age2,TotCatch_Age3,TotCatch_Age4,TotCatch_Age5,TotShakers_Age2,TotShakers_Age3,TotShakers_Age4,TotShakers_Age5,TotalCatch,TotalShakers,TotCatchShake_Age2,TotCatchShake_Age3,TotCatchShake_Age4,TotCatchShake_Age5
0,FCF,Chilliwack Fall Hatchery,1983,43599.0,5538.0,2389.0,99.0,fraser,fall,spring,0.35,15259.65,1938.3,836.15,34.65,FCF,1983,spring,Chilliwack Fall Hatchery,68.0,,,,,1.0,,,,,Chilliwack Fall Hatchery,68.0,,,,,1.0,,,,,Chilliwack Fall Hatchery,69.0,649.31,784.04,869.08,,2.0,986.0,708.0,26.0,,3327.73,1480.62,1214.48,31.51,869.43,88.2,29.91,1.13,6054.33,988.67,4197.16,1568.82,1244.39,32.64
1,FCF,Chilliwack Fall Hatchery,1983,43599.0,5538.0,2389.0,99.0,fraser,fall,summer,0.35,15259.65,1938.3,836.15,34.65,FCF,1983,summer,Chilliwack Fall Hatchery,395.5,,,,,6.0,,,,,Chilliwack Fall Hatchery,454.88,,,,,43.0,,,,,Chilliwack Fall Hatchery,454.49,669.47,810.12,861.84,,134.0,3074.0,1171.0,38.0,0.0,3327.73,1480.62,1214.48,31.51,869.43,88.2,29.91,1.13,6054.33,988.67,4197.16,1568.82,1244.39,32.64
2,FCF,Chilliwack Fall Hatchery,1983,43599.0,5538.0,2389.0,99.0,fraser,fall,fall,0.4,17439.6,2215.2,955.6,39.6,FCF,1983,fall,Chilliwack Fall Hatchery,455.21,,,,,48.0,,,,,Chilliwack Fall Hatchery,471.69,,,,,169.0,,,,,Chilliwack Fall Hatchery,467.96,690.96,853.38,920.62,,646.0,1858.0,599.0,34.0,,3327.73,1480.62,1214.48,31.51,869.43,88.2,29.91,1.13,6054.33,988.67,4197.16,1568.82,1244.39,32.64
3,FCF,Chilliwack Fall Hatchery,1984,59601.0,20532.0,948.0,48.0,fraser,fall,spring,0.35,20860.35,7186.2,331.8,16.8,FCF,1984,spring,Chilliwack Fall Hatchery,,650.0,,,,,1.0,,,,Chilliwack Fall Hatchery,,653.5,,,,,52.0,,,,Chilliwack Fall Hatchery,69.0,649.31,784.04,869.08,,2.0,986.0,708.0,26.0,,5480.53,5266.34,339.11,13.49,1157.09,342.37,13.98,0.65,11099.47,1514.09,6637.62,5608.71,353.09,14.14
4,FCF,Chilliwack Fall Hatchery,1984,59601.0,20532.0,948.0,48.0,fraser,fall,summer,0.35,20860.35,7186.2,331.8,16.8,FCF,1984,summer,Chilliwack Fall Hatchery,,372.25,,,,0.0,8.0,,,,Chilliwack Fall Hatchery,340.0,630.0,,,,1.0,247.0,,,,Chilliwack Fall Hatchery,454.49,669.47,810.12,861.84,,134.0,3074.0,1171.0,38.0,0.0,5480.53,5266.34,339.11,13.49,1157.09,342.37,13.98,0.65,11099.47,1514.09,6637.62,5608.71,353.09,14.14
5,FCF,Chilliwack Fall Hatchery,1984,59601.0,20532.0,948.0,48.0,fraser,fall,fall,0.4,23840.4,8212.8,379.2,19.2,FCF,1984,fall,Chilliwack Fall Hatchery,402.2,579.59,,,,5.0,17.0,,,,Chilliwack Fall Hatchery,423.36,656.94,,,,25.0,173.0,,,,Chilliwack Fall Hatchery,467.96,690.96,853.38,920.62,,646.0,1858.0,599.0,34.0,,5480.53,5266.34,339.11,13.49,1157.09,342.37,13.98,0.65,11099.47,1514.09,6637.62,5608.71,353.09,14.14
6,FCF,Chilliwack Fall Hatchery,1985,184987.0,30440.0,4135.0,18.0,fraser,fall,spring,0.35,64745.45,10654.0,1447.25,6.3,FCF,1985,spring,Chilliwack Fall Hatchery,,585.0,676.67,,,,2.0,3.0,,,Chilliwack Fall Hatchery,,632.67,741.1,,,,6.0,40.0,,,Chilliwack Fall Hatchery,69.0,649.31,784.04,869.08,,2.0,986.0,708.0,26.0,,11684.65,5754.89,1021.85,4.12,2681.83,383.9,46.11,0.18,18465.51,3112.02,14366.49,6138.78,1067.96,4.3
7,FCF,Chilliwack Fall Hatchery,1985,184987.0,30440.0,4135.0,18.0,fraser,fall,summer,0.35,64745.45,10654.0,1447.25,6.3,FCF,1985,summer,Chilliwack Fall Hatchery,,347.33,710.0,,,0.0,3.0,1.0,,,Chilliwack Fall Hatchery,466.5,619.02,780.76,,,4.0,40.0,96.0,,,Chilliwack Fall Hatchery,454.49,669.47,810.12,861.84,,134.0,3074.0,1171.0,38.0,0.0,11684.65,5754.89,1021.85,4.12,2681.83,383.9,46.11,0.18,18465.51,3112.02,14366.49,6138.78,1067.96,4.3
8,FCF,Chilliwack Fall Hatchery,1985,184987.0,30440.0,4135.0,18.0,fraser,fall,fall,0.4,73994.8,12176.0,1654.0,7.2,FCF,1985,fall,Chilliwack Fall Hatchery,410.5,340.14,753.9,,,6.0,7.0,10.0,,,Chilliwack Fall Hatchery,427.87,604.78,792.37,,,30.0,37.0,68.0,,,Chilliwack Fall Hatchery,467.96,690.96,853.38,920.62,,646.0,1858.0,599.0,34.0,,11684.65,5754.89,1021.85,4.12,2681.83,383.9,46.11,0.18,18465.51,3112.02,14366.49,6138.78,1067.96,4.3
9,FCF,Chilliwack Fall Hatchery,1986,204820.0,93906.0,11463.0,722.0,fraser,fall,spring,0.35,71687.0,32867.1,4012.05,252.7,FCF,1986,spring,Chilliwack Fall Hatchery,,331.5,,,,,2.0,0.0,0.0,,Chilliwack Fall Hatchery,,499.5,,,,,4.0,0.0,0.0,,Chilliwack Fall Hatchery,69.0,649.31,784.04,869.08,,2.0,986.0,708.0,26.0,,12578.3,17165.66,3792.55,206.71,2620.21,1198.65,142.84,8.3,33743.22,3969.99,15198.51,18364.3,3935.4,215.01


In [334]:
# export
Final_df.to_csv (r'Step2FinalOut.csv', index = True, header=True)

### A Bit of Data Exploration

In [345]:
# I should be creating a function for below but this is quick and dirty method
# purpose is to create a table with counts of records over / under 20 

df1 = Final_df.loc[(Final_df['season'] == "spring")] 
seriesObj1 = df1.apply(lambda x: True if x['SS_count_2'] > 20 else False , axis=1)
# Count number of True in series
numOfRows = len(seriesObj1[seriesObj1 == True].index)
percent1 = numOfRows/len(df1)*100

df1 = Final_df.loc[(Final_df['season'] == "summer")] 
seriesObj1 = df1.apply(lambda x: True if x['SS_count_2'] > 20 else False , axis=1)
# Count number of True in series
numOfRows = len(seriesObj1[seriesObj1 == True].index)
percent2 = numOfRows/len(df1)*100

df1 = Final_df.loc[(Final_df['season'] == "fall")] 
seriesObj1 = df1.apply(lambda x: True if x['SS_count_2'] > 20 else False , axis=1)
# Count number of True in series
numOfRows = len(seriesObj1[seriesObj1 == True].index)
percent3 = numOfRows/len(df1)*100

#create a tuple with spring, summer, fall percentages for age
age2 = ("Salish Sea Recoveries", 2, percent1, percent2, percent3)

df1 = Final_df.loc[(Final_df['season'] == "spring")] 
seriesObj1 = df1.apply(lambda x: True if x['SS_count_3'] > 20 else False , axis=1)
# Count number of True in series
numOfRows = len(seriesObj1[seriesObj1 == True].index)
percent1 = numOfRows/len(df1)*100

df1 = Final_df.loc[(Final_df['season'] == "summer")] 
seriesObj1 = df1.apply(lambda x: True if x['SS_count_3'] > 20 else False , axis=1)
# Count number of True in series
numOfRows = len(seriesObj1[seriesObj1 == True].index)
percent2 = numOfRows/len(df1)*100

df1 = Final_df.loc[(Final_df['season'] == "fall")] 
seriesObj1 = df1.apply(lambda x: True if x['SS_count_3'] > 20 else False , axis=1)
# Count number of True in series
numOfRows = len(seriesObj1[seriesObj1 == True].index)
percent3 = numOfRows/len(df1)*100


#create a tuple with spring, summer, fall percentages for age
age3 = ("Salish Sea Recoveries", 3, percent1, percent2, percent3)

df1 = Final_df.loc[(Final_df['season'] == "spring")] 
seriesObj1 = df1.apply(lambda x: True if x['SS_count_4'] > 20 else False , axis=1)
# Count number of True in series
numOfRows = len(seriesObj1[seriesObj1 == True].index)
percent1 = numOfRows/len(df1)*100

df1 = Final_df.loc[(Final_df['season'] == "summer")] 
seriesObj1 = df1.apply(lambda x: True if x['SS_count_4'] > 20 else False , axis=1)
# Count number of True in series
numOfRows = len(seriesObj1[seriesObj1 == True].index)
percent2 = numOfRows/len(df1)*100

df1 = Final_df.loc[(Final_df['season'] == "fall")] 
seriesObj1 = df1.apply(lambda x: True if x['SS_count_4'] > 20 else False , axis=1)
# Count number of True in series
numOfRows = len(seriesObj1[seriesObj1 == True].index)
percent3 = numOfRows/len(df1)*100


#create a tuple with spring, summer, fall percentages for age
age4 = ("Salish Sea Recoveries", 4, percent1, percent2, percent3)

df1 = Final_df.loc[(Final_df['season'] == "spring")] 
seriesObj1 = df1.apply(lambda x: True if x['SS_count_5'] > 20 else False , axis=1)
# Count number of True in series
numOfRows = len(seriesObj1[seriesObj1 == True].index)
percent1 = numOfRows/len(df1)*100

df1 = Final_df.loc[(Final_df['season'] == "summer")] 
seriesObj1 = df1.apply(lambda x: True if x['SS_count_5'] > 20 else False , axis=1)
# Count number of True in series
numOfRows = len(seriesObj1[seriesObj1 == True].index)
percent2 = numOfRows/len(df1)*100

df1 = Final_df.loc[(Final_df['season'] == "fall")] 
seriesObj1 = df1.apply(lambda x: True if x['SS_count_5'] > 20 else False , axis=1)
# Count number of True in series
numOfRows = len(seriesObj1[seriesObj1 == True].index)
percent3 = numOfRows/len(df1)*100


#create a tuple with spring, summer, fall percentages for age
age5 = ("Salish Sea Recoveries", 5, percent1, percent2, percent3)

df1 = Final_df.loc[(Final_df['season'] == "spring")] 
seriesObj1 = df1.apply(lambda x: True if x['SS_count_6'] > 20 else False , axis=1)
# Count number of True in series
numOfRows = len(seriesObj1[seriesObj1 == True].index)
percent1 = numOfRows/len(df1)*100

df1 = Final_df.loc[(Final_df['season'] == "summer")] 
seriesObj1 = df1.apply(lambda x: True if x['SS_count_6'] > 20 else False , axis=1)
# Count number of True in series
numOfRows = len(seriesObj1[seriesObj1 == True].index)
percent2 = numOfRows/len(df1)*100

df1 = Final_df.loc[(Final_df['season'] == "fall")] 
seriesObj1 = df1.apply(lambda x: True if x['SS_count_6'] > 20 else False , axis=1)
# Count number of True in series
numOfRows = len(seriesObj1[seriesObj1 == True].index)
percent3 = numOfRows/len(df1)*100

#create a tuple with spring, summer, fall percentages for age
age6 = ("Salish Sea Recoveries", 6, percent1, percent2, percent3)

recovsSS = [age2, age3, age4, age5, age6]

#***********************************
# for recovery df including outside SS
df1 = Final_df.loc[(Final_df['season'] == "spring")] 
seriesObj1 = df1.apply(lambda x: True if x['alla_count_2'] > 20 else False , axis=1)
# Count number of True in series
numOfRows = len(seriesObj1[seriesObj1 == True].index)
percent1 = numOfRows/len(df1)*100

df1 = Final_df.loc[(Final_df['season'] == "summer")] 
seriesObj1 = df1.apply(lambda x: True if x['alla_count_2'] > 20 else False , axis=1)
# Count number of True in series
numOfRows = len(seriesObj1[seriesObj1 == True].index)
percent2 = numOfRows/len(df1)*100


df1 = Final_df.loc[(Final_df['season'] == "fall")] 
seriesObj1 = df1.apply(lambda x: True if x['alla_count_2'] > 20 else False , axis=1)
# Count number of True in series
numOfRows = len(seriesObj1[seriesObj1 == True].index)
percent3 = numOfRows/len(df1)*100

#create a tuple with spring, summer, fall percentages for age
age2 = ("All Areas", 2, percent1, percent2, percent3)

df1 = Final_df.loc[(Final_df['season'] == "spring")] 
seriesObj1 = df1.apply(lambda x: True if x['alla_count_3'] > 20 else False , axis=1)
# Count number of True in series
numOfRows = len(seriesObj1[seriesObj1 == True].index)
percent1 = numOfRows/len(df1)*100

df1 = Final_df.loc[(Final_df['season'] == "summer")] 
seriesObj1 = df1.apply(lambda x: True if x['alla_count_3'] > 20 else False , axis=1)
# Count number of True in series
numOfRows = len(seriesObj1[seriesObj1 == True].index)
percent2 = numOfRows/len(df1)*100


df1 = Final_df.loc[(Final_df['season'] == "fall")] 
seriesObj1 = df1.apply(lambda x: True if x['alla_count_3'] > 20 else False , axis=1)
# Count number of True in series
numOfRows = len(seriesObj1[seriesObj1 == True].index)
percent3 = numOfRows/len(df1)*100


#create a tuple with spring, summer, fall percentages for age
age3 = ("All Areas", 3, percent1, percent2, percent3)

df1 = Final_df.loc[(Final_df['season'] == "spring")] 
seriesObj1 = df1.apply(lambda x: True if x['alla_count_4'] > 20 else False , axis=1)
# Count number of True in series
numOfRows = len(seriesObj1[seriesObj1 == True].index)
percent1 = numOfRows/len(df1)*100

df1 = Final_df.loc[(Final_df['season'] == "summer")] 
seriesObj1 = df1.apply(lambda x: True if x['alla_count_4'] > 20 else False , axis=1)
# Count number of True in series
numOfRows = len(seriesObj1[seriesObj1 == True].index)
percent2 = numOfRows/len(df1)*100


df1 = Final_df.loc[(Final_df['season'] == "fall")] 
seriesObj1 = df1.apply(lambda x: True if x['alla_count_4'] > 20 else False , axis=1)
# Count number of True in series
numOfRows = len(seriesObj1[seriesObj1 == True].index)
percent3 = numOfRows/len(df1)*100


#create a tuple with spring, summer, fall percentages for age
age4 = ("All Areas", 4, percent1, percent2, percent3)

df1 = Final_df.loc[(Final_df['season'] == "spring")] 
seriesObj1 = df1.apply(lambda x: True if x['alla_count_5'] > 20 else False , axis=1)
# Count number of True in series
numOfRows = len(seriesObj1[seriesObj1 == True].index)
percent1 = numOfRows/len(df1)*100

df1 = Final_df.loc[(Final_df['season'] == "summer")] 
seriesObj1 = df1.apply(lambda x: True if x['alla_count_5'] > 20 else False , axis=1)
# Count number of True in series
numOfRows = len(seriesObj1[seriesObj1 == True].index)
percent2 = numOfRows/len(df1)*100


df1 = Final_df.loc[(Final_df['season'] == "fall")] 
seriesObj1 = df1.apply(lambda x: True if x['alla_count_5'] > 20 else False , axis=1)
# Count number of True in series
numOfRows = len(seriesObj1[seriesObj1 == True].index)
percent3 = numOfRows/len(df1)*100


#create a tuple with spring, summer, fall percentages for age
age5 = ("All Areas", 5, percent1, percent2, percent3)

df1 = Final_df.loc[(Final_df['season'] == "spring")] 
seriesObj1 = df1.apply(lambda x: True if x['alla_count_6'] > 20 else False , axis=1)
# Count number of True in series
numOfRows = len(seriesObj1[seriesObj1 == True].index)
percent1 = numOfRows/len(df1)*100

df1 = Final_df.loc[(Final_df['season'] == "summer")] 
seriesObj1 = df1.apply(lambda x: True if x['alla_count_6'] > 20 else False , axis=1)
# Count number of True in series
numOfRows = len(seriesObj1[seriesObj1 == True].index)
percent2 = numOfRows/len(df1)*100


df1 = Final_df.loc[(Final_df['season'] == "fall")] 
seriesObj1 = df1.apply(lambda x: True if x['alla_count_6'] > 20 else False , axis=1)
# Count number of True in series
numOfRows = len(seriesObj1[seriesObj1 == True].index)
percent3 = numOfRows/len(df1)*100


#create a tuple with spring, summer, fall percentages for age
age6 = ("All Areas", 6, percent1, percent2, percent3)

recovsALL = [age2, age3, age4, age5, age6]

In [351]:
df1 = Final_df.loc[(Final_df['season'] == "fall")] 
seriesObj2 = df1.apply(lambda x: True if ((x['SS_count_2'] > 20) &
                                          (x['SS_count_3'] > 20) &
                                          (x['SS_count_4'] > 20) &
                                          (x['SS_count_5'] > 20) &
                                         (x['SS_count_6'] > 20)) else False , axis=1)
seriesObj2

2       False
5       False
8       False
11      False
14      False
17      False
20      False
23      False
26      False
29      False
32      False
35      False
38      False
41      False
44      False
47      False
50      False
53      False
56      False
59      False
62      False
65      False
68      False
71      False
74      False
77      False
80      False
83      False
86      False
89      False
        ...  
993     False
995     False
997     False
999     False
1002    False
1005    False
1008    False
1011    False
1014    False
1017    False
1020    False
1023    False
1026    False
1029    False
1032    False
1035    False
1038    False
1041    False
1044    False
1047    False
1050    False
1053    False
1056    False
1059    False
1062    False
1065    False
1068    False
1071    False
1074    False
1075    False
Length: 363, dtype: bool

In [343]:
print(recovsALL)

[('All Areas', 2, 0.0, 1.5189873417721518, 12.396694214876034), ('All Areas', 3, 18.238993710691823, 38.9873417721519, 48.760330578512395), ('All Areas', 4, 25.78616352201258, 41.51898734177215, 32.78236914600551), ('All Areas', 5, 0.0, 1.7721518987341773, 2.203856749311295), ('All Areas', 6, 0.0, 0.0, 0.0)]


In [262]:
# create field for length observation that will be using for each age class

In [None]:
# let's remove all fields except year, stock, season, and length observ count (from all areas). 
# Then pivot so season is now a column
# then count for each year, stock, and age group whether there are more than 20 observations 
# in at least one season (in all areas)
pivCWT_AllAreas = pd.pivot_table(CWTALL_df, values = 'length', aggfunc=[np.mean, 'count'], index=['PSC','PSC_desc','recovery_year','rec_season'], columns = 'recovery_age').reset_index()
# get only some fields

In [356]:
stats_df1 = Final_df[['Stock','Year','season','alla_count_2','alla_count_3','alla_count_4','alla_count_5','alla_count_6']]

In [358]:
stats_df2 = pd.pivot_table(stats_df1, values = ['alla_count_2','alla_count_3','alla_count_4','alla_count_5','alla_count_6'], index=['Stock','Year'], columns = ['season']).reset_index()

In [376]:
# columns to hold count of how many seasons for each age and year and stock have >20 observ
stats_df2['age2_allseasons'] = 1
stats_df2['age3_allseasons'] = 1
stats_df2['age4_allseasons'] = 1
stats_df2['age5_allseasons'] = 1
stats_df2['age6_allseasons'] = 1

# determine if > 20 obs in at least one season for age
stats_df2['age2_allseasons'] = stats_df2['age2_allseasons'].where((stats_df2['alla_count_2', 'spring'] > 20) | 
                                                                  (stats_df2['alla_count_2', 'summer'] > 20) |
                                                                  (stats_df2['alla_count_2', 'fall'] > 20), 0)
stats_df2['age3_allseasons'] = stats_df2['age3_allseasons'].where((stats_df2['alla_count_3', 'spring'] > 20) | 
                                                                  (stats_df2['alla_count_3', 'summer'] > 20) |
                                                                  (stats_df2['alla_count_3', 'fall'] > 20), 0)
stats_df2['age4_allseasons'] = stats_df2['age4_allseasons'].where((stats_df2['alla_count_4', 'spring'] > 20) | 
                                                                  (stats_df2['alla_count_4', 'summer'] > 20) |
                                                                  (stats_df2['alla_count_4', 'fall'] > 20), 0)
stats_df2['age5_allseasons'] = stats_df2['age5_allseasons'].where((stats_df2['alla_count_5', 'spring'] > 20) | 
                                                                  (stats_df2['alla_count_5', 'summer'] > 20) |
                                                                  (stats_df2['alla_count_5', 'fall'] > 20), 0)
stats_df2['age6_allseasons'] = stats_df2['age6_allseasons'].where((stats_df2['alla_count_6', 'spring'] > 20) | 
                                                                  (stats_df2['alla_count_6', 'summer'] > 20) |
                                                                  (stats_df2['alla_count_6', 'fall'] > 20), 0)

In [378]:
# column for sum of ages w/ at least one season w/ > 20 obs
stats_df2['sum_allages'] = stats_df2['age2_allseasons'] + stats_df2['age3_allseasons'] + stats_df2['age4_allseasons'] + stats_df2['age5_allseasons'] + stats_df2['age6_allseasons']


In [383]:
stats_df2[['Year','Stock','sum_allages']]

Unnamed: 0_level_0,Year,Stock,sum_allages
season,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,1983,FCF,1
1,1984,FCF,2
2,1985,FCF,3
3,1986,FCF,2
4,1987,FCF,2
5,1988,FCF,2
6,1989,FCF,1
7,1990,FCF,2
8,1991,FCF,3
9,1992,FCF,2


In [384]:
stats_df3 = pd.pivot_table(stats_df2, values = ['sum_allages'], index=['Year'], columns = ['Stock']).reset_index()

In [385]:
# do pivot so we can generate a graphic
stats_df3

Unnamed: 0_level_0,Year,sum_allages,sum_allages,sum_allages,sum_allages,sum_allages,sum_allages,sum_allages,sum_allages,sum_allages,sum_allages,sum_allages,sum_allages,sum_allages
season,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
Stock,Unnamed: 1_level_2,FCF,FHF,FS2,FS3,FSO,FSS,NKF,NKS,PSF,PSN,PSY,SKG,SNO
0,1979,,,0.0,0.0,0.0,,2.0,,1.0,3.0,0.0,0.0,1.0
1,1980,,,0.0,0.0,0.0,0.0,2.0,,1.0,3.0,0.0,1.0,1.0
2,1981,,,0.0,0.0,0.0,0.0,2.0,,2.0,2.0,1.0,0.0,0.0
3,1982,,,,,1.0,0.0,1.0,0.0,3.0,2.0,1.0,0.0,
4,1983,1.0,1.0,,0.0,0.0,0.0,1.0,0.0,4.0,3.0,2.0,,
5,1984,2.0,2.0,0.0,,0.0,0.0,3.0,1.0,2.0,3.0,2.0,,
6,1985,3.0,2.0,0.0,0.0,,1.0,0.0,0.0,1.0,1.0,1.0,,
7,1986,2.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,2.0,1.0,,
8,1987,2.0,1.0,0.0,0.0,1.0,1.0,0.0,1.0,1.0,1.0,0.0,,
9,1988,2.0,3.0,0.0,0.0,2.0,1.0,2.0,1.0,2.0,1.0,2.0,,0.0


In [386]:
stats_df3.to_csv (r'Step2DataVis.csv', index = True, header=True)

In [393]:
pivCWT_AllAreas.loc[(pivCWT_AllAreas['PSC'] == 'FS2')]

Unnamed: 0,PSC,PSC_desc,recovery_year,rec_season,alla_mean_2,alla_mean_3,alla_mean_4,alla_mean_5,alla_mean_6,alla_count_2,alla_count_3,alla_count_4,alla_count_5,alla_count_6
258,FS2,Fraser Spring 1.2,1979,fall,,,,,,0.0,,,,
265,FS2,Fraser Spring 1.2,1982,fall,,,,960.0,,,,,1.0,
268,FS2,Fraser Spring 1.2,1984,fall,,500.00,,,,,1.0,,,
273,FS2,Fraser Spring 1.2,1986,fall,600.0,,,,,2.0,,,,
276,FS2,Fraser Spring 1.2,1987,fall,,621.00,,,,,5.0,,0.0,
280,FS2,Fraser Spring 1.2,1988,fall,,534.00,,,,,4.0,,,
283,FS2,Fraser Spring 1.2,1989,fall,,612.00,698.00,,,,1.0,2.0,,
287,FS2,Fraser Spring 1.2,1990,fall,505.0,,,,,1.0,,,,
290,FS2,Fraser Spring 1.2,1991,fall,,514.50,656.00,,,,2.0,1.0,,
294,FS2,Fraser Spring 1.2,1992,fall,,543.50,705.00,,,,2.0,1.0,,
