# Logging Reptilia Workflow

In [1]:
import pandas as pd

## MCMC Commands

Key:
"*" = Done
"**" = In progress


File Name Notes:
- All Gamma (-mG models) have _G in the file name
- Gibbs models have Gibbs in the name

First date = when .pkl and sum.txt are outputted
Second date = when ex_rates, sp_rates, per_species_rates, mcmc are ouptutted 

CoVar Model (not BDNN): 8/15, 8/19
- *python PyRate.py reptilia/Reptilia_cleaned_pyrate_input_PyRate.py -trait_file data/reptilia/Reptilia_species_traits.txt -mCov 5 -logT 1 -pC 0 -fixShift data/Time_bins_CrossStage.txt -qShift data/Time_bins_ByStages.txt -mG -A 0 -n 20000000 -s 2000
    - This is: a Covar BD model with fixed times of rate shifts, log transformed traits, TPP and Gamma preservation model, parameter estimation MCMC

BDNN run 1: 8/15, 8/19
- *python PyRate.py reptilia/Reptilia_cleaned_pyrate_input_PyRate.py -j 1 -fixShift data/Time_bins_CrossStage.txt -BDNNmodel 1 -trait_file data/reptilia/Reptilia_species_traits.txt -qShift data/Time_bins_ByStages.txt -mG -A 0 -n 20000000 -s 2000
    - Traits file needed to be: normalized continuous variables, no nulls, consistent data types, tab separated .txt

BDNN run 2: 8/27, 8/28
- *python PyRate.py reptilia/Reptilia_cleaned_pyrate_input_PyRate.py -j 1 -fixShift data/Time_bins_ByStages.txt -BDNNmodel 1 -trait_file data/reptilia/Reptilia_species_traits.txt -qShift data/Time_bins_ByStages.txt -A 0 -n 20000000 -s 2000 -BDNNnodes 8 4 -BDNNupdate_f 0.05 0.05 0.25 -singleton 1
    - Removed -mG flag
    - Removed singletons using -singleton 1
    - Reduced network complexity:
        - -BDNNnodes 8 4
        - -BDNNupdate_f 0.05 0.05 0.25
    - Shifted dates towards the present to remove empty space from LAD to present day: -translate 175.0

BDNN run 3: Torsten Reduced Complexity + no -mG, 8/28, 8/29, 8/30. 9/4
- *python ../PyRate/PyRate.py reptilia/Reptilia_cleaned_pyrate_input_PyRate.py -j 1 -fixShift data/Time_bins_CrossStage.txt -BDNNmodel 1 -trait_file data/reptilia/Reptilia_species_traits.txt -qShift data/Time_bins_CrossStage.txt -n 50000000 -s 50000 -BDNNnodes 8 4 -translate -175
    - **Result**: low ESS prior, BD_lik. Burn-in ~ 15%
- *python ../PyRate/PyRate.py reptilia/Reptilia_cleaned_pyrate_input_PyRate.py -j 1 -fixShift data/Time_bins_ByStages.txt -BDNNmodel 1 -trait_file data/reptilia/Reptilia_species_traits.txt -qShift data/Time_bins_ByStages.txt -n 50000000 -s 50000 -BDNNnodes 8 4 -translate -175
    - Starting to use PyRate from PyRate repo, not Arielli repo
    - Removed -mG flag
    - Reduced network complexity: -BDNNnodes 8 4
    - **Result**: low ESS prior, BD_lik. Burn-in very high for those two. Going forward with 10%
BDNN run 3 RESTORED:
- python PyRate.py ../PyRate/PyRate.py reptilia/Reptilia_cleaned_pyrate_input_PyRate.py -restore_mcmc ..../pyrate_mcmc_logs/*_mcmc.log -BDNNmodel 1 -trait_file  .../Traits.txt -BDNNtimevar …/Paleotemperature.txt -mG -n 200001 -p 20000 -s 5000


BDNN run 4: Gibbs + no -mG
- Did not do Cross Stages   
- * python ../PyRate/PyRate.py reptilia/Reptilia_cleaned_pyrate_input_PyRate.py -j 1 -fixShift data/Time_bins_ByStages.txt -BDNNmodel 1 -trait_file data/reptilia/Reptilia_species_traits.txt -qShift data/Time_bins_ByStages.txt -n 50000000 -s 50000 -se_gibbs -translate -175
    - Removed -mG flag
    - Uses Gibbs sampler: -se_gibbs True
     - Getting line spacing issue when trying to put mcmc file into Tracer. Also can't move files because mcmc.log is "still open in Python" 
    - ended 9/11 4:09 pm

BDNN run 5: BDNN 3 w/ more generations
- *python ../PyRate/PyRate.py reptilia/Reptilia_cleaned_pyrate_input_PyRate.py -j 1 -fixShift data/Time_bins_ByStages.txt -BDNNmodel 1 -trait_file data/reptilia/Reptilia_species_traits.txt -qShift data/Time_bins_ByStages.txt -n 200000000 -s 20000 -BDNNnodes 8 4 -translate -175 -out "_bdnn5_run"
    - Only doing By Stages now, since both By and Cross Stages had similar results, and By Stages makes more conceptual sense
    - Run the above 4 x to compare whether each independent run reaches the same values (convergence)
    - Added -out # of run just for organization's sake
    - **Result**: all stopped early because computer shut off. 180 mill iterations
    - **Restored**  python ../PyRate/PyRate.py test_reptilia\reptilia\Reptilia_cleaned_pyrate_input_PyRate.py -restore_mcmc reptilia/pyrate_mcmc_logs/bdnn5_by/Reptilia_cleaned_pyrate_input_1_bdnn5_BDS_BDNN_8_4Tc_mcmc.log -j 1 -fixShift test_reptilia/data/Time_bins_ByStages.txt -BDNNmodel 1 -trait_file data/reptilia/Reptilia_species_traits.txt -qShift test_reptilia/data/Time_bins_ByStages.txt -n 200000000 -s 20000 -BDNNnodes 8 4 -translate -175 -out "_bdnn5_run1_restored"  

COVAR run 2: more gens 
- *python ../PyRate/PyRate.py reptilia/Reptilia_cleaned_pyrate_input_PyRate.py -trait_file data/reptilia/Reptilia_species_traits.txt -mCov 5 -logT 1 -pC 0 -fixShift data/Time_bins_ByStages.txt -qShift data/Time_bins_ByStages.txt -n 200000000 -s 20000
    - This is: a Covar BD model with fixed times of rate shifts, log transformed traits, TPP model, parameter estimation MCMC (did -A4 but didn't realize -fixShift overrides that)
    - Removed -mG
    - Run the above 4 x to compare whether each independent run reaches the same values (convergence)
    - Added -out # of run just for organization's sake


COVAR run 3: no -fixShift in order to see RJMCMC times of rate shift
- **python ../PyRate/PyRate.py .\test_reptilia\reptilia\Reptilia_cleaned_pyrate_input_PyRate.py -trait_file .\test_reptilia\data\reptilia\Reptilia_species_traits.txt -qShift .\test_reptilia\data\Time_bins_ByStages.txt -n 200000000 -s 20000 -mCov 5 1 -pC 0  -A 4 
    - Removed -logT (so traits are not log transformed), 

RJMCMC (BDS): no -fixShift, TPP
- *python ../PyRate/PyRate.py .\test_reptilia\reptilia\Reptilia_cleaned_pyrate_input_PyRate.py -qShift .\test_reptilia\data\Time_bins_ByStages.txt -mG -n 200000000 -s 20000 -A 4

NOTE:
- To-Do: run 3 more Covar 2's
- Waiting on all 4 BDNN5's
- waiting on 

## Post-Processing Commands

### First Steps:
- Move MCMC files into descriptive folder
- Check Tracer to decide on burn-in percentage

### BD Sampling Freq (RJMCM)
RJMCMC 1:
-  *python ..\PyRate\PyRate.py -mProb .\test_reptilia\reptilia\pyrate_mcmc_logs\rjmcmc\Reptilia_cleaned_pyrate_input_1_Grj_mcmc.log -b 0.1 | tee .\test_reptilia\reptilia\pyrate_mcmc_logs\rjmcmc\rjmcmc_bd_sampling_freq.txt

### Marginal RTT Plot
**Output in pyrate_mcmc_logs/bdnn...**: _RTT.pdf, _RTT.r
BDNN run 3:
- *python ../PyRate/PyRate.py -plotBDNN reptilia/pyrate_mcmc_logs/bdnn3_cross/Reptilia_cleaned_pyrate_input_1_BDS_BDNN_8_4Tc_mcmc.log -b 0.15 -translate -175
- *python ../PyRate/PyRate.py -plotBDNN reptilia/pyrate_mcmc_logs/bdnn3_by/Reptilia_cleaned_pyrate_input_1_BDS_BDNN_8_4Tc_mcmc.log -b 0.15 -translate -175

BDNN run 4 (gibbs):
- python ../PyRate/PyRate.py -plotBDNN reptilia/pyrate_mcmc_logs/bdnn4_cross/  _mcmc.log -b 0.1 -translate -175
- python ../PyRate/PyRate.py -plotBDNN reptilia/pyrate_mcmc_logs/bdnn4_by/  _mcmc.log -b 0.1 -translate -175

RJMCMC1:
- *python ..\PyRate\PyRate.py -plotRJ .\test_reptilia\reptilia\pyrate_mcmc_logs\rjmcmc\

### Preservation Rates Through Time Graph
RJMCMC 1
- *python ..\PyRate\PyRate.py -plotQ .\test_reptilia\reptilia\pyrate_mcmc_logs\rjmcmc\Reptilia_cleaned_pyrate_input_1_Grj_mcmc.log -qShift .\test_reptilia\data\Time_bins_ByStages.txt

### Partial Dependence Plots (PDP)
**Output in pyrate_mcmc_logs/bdnn...**: _PDP.pdf, _PDP.r
BDNN run 3:
- *python ../PyRate/PyRate.py -plotBDNN_effects reptilia/pyrate_mcmc_logs/bdnn3_cross/Reptilia_cleaned_pyrate_input_1_BDS_BDNN_8_4Tc_mcmc.log -plotBDNN_transf_features data/reptilia/reptilia_backscale.txt -translate -175 -b 0.15 -resample 100
- *python ../PyRate/PyRate.py -plotBDNN_effects reptilia/pyrate_mcmc_logs/bdnn3_by/Reptilia_cleaned_pyrate_input_1_BDS_BDNN_8_4Tc_mcmc.log -plotBDNN_transf_features data/reptilia/reptilia_backscale.txt -translate -175 -b 0.15 -resample 100

BDNN run 4:
- python ../PyRate/PyRate.py -plotBDNN_effects reptilia/pyrate_mcmc_logs/bdnn4_cross      mcmc.log -plotBDNN_transf_features data/reptilia/reptilia_backscale.txt
python ../PyRate/PyRate.py -plotBDNN_effects reptilia/pyrate_mcmc_logs/bdnn4_by      mcmc.log -plotBDNN_transf_features data/reptilia/reptilia_backscale.txt

### Partial Dependence Rates: DEFUNCT
*Accourding to Hauffe: only needed if you want n-way interactions where n>3*
BDNN run 3:
- *python ../PyRate/PyRate.py -BDNN_interaction reptilia/pyrate_mcmc_logs/bdnn3_cross/Reptilia_cleaned_pyrate_input_1_BDS_BDNN_8_4Tc_mcmc.log -plotBDNN_transf_features data/reptilia/reptilia_backscale.txt -b 0.15 -resample 100

BDNN run4:
- python ../PyRate/PyRate.py -BDNN_interaction reptilia/pyrate_mcmc_logs/bdnn4      mcmc.log -plotBDNN_transf_features data/reptilia/reptilia_backscale.txt -b 0.1 -resample 100

### Predictor Importance
**Output in pyrate_mcmc_logs/bdnn...**: 
- _contribution_per_species_rates.r
- _contribution_per_species_rates.pdf
- ex_predictor_influence.csv
- ex_shap_per_species.csv
- sp_predictor_influence.csv
- sp_shap_per_species.csv

BDNN run 3:
- *python ../PyRate/PyRate.py -BDNN_pred_importance reptilia/pyrate_mcmc_logs/bdnn3_cross/Reptilia_cleaned_pyrate_input_1_BDS_BDNN_8_4Tc_mcmc.log -plotBDNN_transf_features data/reptilia/reptilia_backscale.txt -b 0.15 -resample 100 -BDNN_nsim_expected_cv 0 -BDNN_pred_importance_interaction
    - BDNN_pred_importance: rank 2-way interactions in addition to per-predictor
    - Notes from the run: Different bin sizes detected due to using -fixShift. Time windows resampled to a resolution of 5.0. 
        - (Because CrossStage's smallest bin size is 5)
- *python ../PyRate/PyRate.py -BDNN_pred_importance reptilia/pyrate_mcmc_logs/bdnn3_by/Reptilia_cleaned_pyrate_input_1_BDS_BDNN_8_4Tc_mcmc.log -plotBDNN_transf_features data/reptilia/reptilia_backscale.txt -b 0.15 -resample 100 -BDNN_nsim_expected_cv 0 -BDNN_pred_importance_interaction

BDNN run 4:
- python ../PyRate/PyRate.py -BDNN_pred_importance reptilia/pyrate_mcmc_logs/bdnn4_cross       _mcmc.log -plotBDNN_transf_features data/reptilia/reptilia_backscale.txt -b 0.1 -resample 100 -BDNN_nsim_expected_cv 0 -BDNN_pred_importance_interaction
- python ../PyRate/PyRate.py -BDNN_pred_importance reptilia/pyrate_mcmc_logs/bdnn4_by       _mcmc.log -plotBDNN_transf_features data/reptilia/reptilia_backscale.txt -b 0.1 -resample 100 -BDNN_nsim_expected_cv 0 -BDNN_pred_importance_interaction