# Analyze Study (Binary Classification)
- version 0.1
- 3.2.2026

## ToDo
- [done] Study details 
- [done] Target metric performance on all tasks 
- [done] Selected features summary
- [done] Model performance on test dataset for a given task
- [done] AUCROC plots
- [done] Confusion matrix
- Individual test feature importances (table + plot)
- [done] Merged test feature importances (table + plot)
- Summary confusion matrix
- creat tests for notebook utils
- beeswarm plot (individual + merged!)


## Imports

In [None]:
from octopus.predict import OctoPredict
from octopus.predict.notebook_utils import (
    show_selected_features,
    show_study_details,
    show_target_metric_performance,
    testset_performance_overview,
    plot_aucroc,
    show_confusionmatrix,
    show_overall_fi_table,
    show_overall_fi_plot,
)

## Input

In [None]:
# INPUT: Select study
study_directory = "../studies/wf_octo_mrmr_octo/"

## Study Details

In [None]:
# Call the utility function to display and validate study details
study_info = show_study_details(study_directory)

# Extract key variables for use in subsequent cells
# path_study = study_info["path"]
# config = study_info["config"]
# ml_type = study_info["ml_type"]
# n_folds_outer = study_info["n_folds_outer"]
# workflow_tasks = study_info["workflow_tasks"]
# outersplit = study_info["outersplit_dirs"]
# expected_task_ids = study_info["expected_task_ids"]
# octo_workflow_lst = study_info["octo_workflow_tasks"]

## Target Metric Performance for all  Tasks

In [None]:
# Display performance (target metric) for all workflow tasks
performance_tables = show_target_metric_performance(study_info, details=False)

## Selected Features Summary

In [None]:
# Display the number of selected features across outer splits and tasks
# Returns three tables: feature count table, feature frequency table, and raw performance dataframe
# sort_task and sort_key parameters sort the frequency table by the specified task-key combination
sort_by_task = None
sort_by_key = None
feature_table, feature_frequency_table, raw_feature_table = show_selected_features(study_info, sort_task=sort_by_task, sort_key=sort_by_key)

## Model Performance on Test Dataset for a given Task


In [None]:
# Input: selected metrics for performance overview
metrics = ["AUCROC", "ACCBAL", "ACC", "F1", "AUCPR", "NEGBRIERSCORE"]
print("Selected metrics: ", metrics)

### Test performance for given task and selected metrics

In [None]:
# load predictor object
task_predictor = OctoPredict(study_path=study_info["path"], task_id=0, results_key="best")
testset_performance = testset_performance_overview(predictor=task_predictor, metrics=metrics)

### AUCROC Plots

In [None]:
plot_aucroc(task_predictor, show_individual=True)

### Confusion Matrix

In [None]:
show_confusionmatrix(task_predictor, threshold=0.5, metrics=metrics)

### Test Feature Importances

#### Calculate Permutation Feature Importances

In [None]:
# (A) Permutation feature importances on test data using final models
# - fi tables are saved in the  study.results dictionary
# - pdf plots are saved in the results directory of the sequence item
#
# calculate pfi for only one experiment
# task_predictor.calculate_fi_test(fi_type="group_permutation", n_repeat=5, experiment_id=4)
#
# calculate pfi for all available experiments
# - n_repeats has major impact on p-values
# - high n_repeats lead to long compute times
print("PFI calculation running.....")
task_predictor.calculate_fi_test(fi_type="group_permutation", n_repeat=3)

In [None]:
fi_table_overall = show_overall_fi_table(task_predictor, fi_type="group_permutation")
fi_table_overall.head(10)

In [None]:
show_overall_fi_plot(task_predictor, fi_type="group_permutation")

#### Calculate Shap Feature Importances

In [None]:
# (D) Shap feature importances on test data using final models
# 
# - for highest quality use "exact" or "kernel"
# - shap_type could be ["kernel", "permutation", "exact"]
# - shap_type "exact" does not scale well with number of features
# - shap_type "permutation" scales better than "exact" but
#   takes longer for a small number of features
# - shap_type "kernel" does scales better than "exact" but is slower than "permutation"
# - fi tables are saved in the  study.results dictionary
# - pdf plots are saved in the results directory
task_predictor.calculate_fi_test(fi_type="shap", shap_type="kernel")

In [None]:
fi_table_overall = show_overall_fi_table(task_predictor, fi_type="shap")
fi_table_overall.head(10)

In [None]:
show_overall_fi_plot(task_predictor, fi_type="shap")