#### Pipeline for labeling ROIs with ROINeT
Note that this has to be run in a ROICaT environment with vrAnalysis installed, not the typical ROICaT environment!!!!
(They are incompatible at the moment due to numpy, but installing roicat[all] then installing vrAnalysis with no-deps and just manually installing the required ones for database importing works fine). 

#### Progress and ToDo List:
- **<<<< DONE >>>>** Create training and testing set that spans all the mice and different imaging planes
- **<<<< DONE >>>>** Process training / testing sets to generate the roinet latents, umap embeddings, images, and umap model (umap model for training data only)
- **<<<< DONE >>>>** Do a bunch of labeling on both sets!!!
- **<<<< DONE >>>>** Save and generate sklearn model for the training data and print reports on the testing data
- **<<<< DONE >>>>** Run all data through model and save results.

In [3]:
%reload_ext autoreload
%autoreload 2

from roicat_support import get_classifier_files
from roicat_support.classifier import (
    choose_sessions,
    define_classification_set, 
    load_classification_set, 
    prepare_suite2p_paths, 
    roi_should_be_ignored, 
    generate_latents_and_embeddings, 
    load_latents_and_embeddings, 
    read_labels,
    labels_to_df,
    save_labels,
    labels_df_to_dict,
    save_classifier,
    load_classifier,
    detect_local_concavities,
    run_integrated_labeler,
    update_labels,
    execute_label_updates,
    visualize_counts,
    visualize_examples,
    train_classifier,
    evaluate_classifier,
    visualize_predictions,
    process_sessions,
    classify_and_save,
)

files = get_classifier_files()
for k, v in files.items():
    print(k, v)

train_sessions D:\localData\analysis\roicat_classification\train_sessions.json
train_latents D:\localData\analysis\roicat_classification\train_latents.npy
train_embeddings D:\localData\analysis\roicat_classification\train_embeddings.npy
train_images D:\localData\analysis\roicat_classification\train_images.npy
train_umap D:\localData\analysis\roicat_classification\train_umap.joblib
train_labels D:\localData\analysis\roicat_classification\train_labels.csv
train_classifier D:\localData\analysis\roicat_classification\train_classifier.joblib
test_sessions D:\localData\analysis\roicat_classification\test_sessions.json
test_latents D:\localData\analysis\roicat_classification\test_latents.npy
test_embeddings D:\localData\analysis\roicat_classification\test_embeddings.npy
test_images D:\localData\analysis\roicat_classification\test_images.npy
test_labels D:\localData\analysis\roicat_classification\test_labels.csv


In [2]:
# Choose training vs testing data
use_training_data = True
use_train_model_for_embeddings = True

# Load saved data from roinet and umap to do labeling
data = load_latents_and_embeddings(use_training_data)
latents = data["latents"]
embeddings = data["embeddings"]
images = data["images"]
model = data["model_umap"]
label_path = files["train_labels"] if use_training_data else files["test_labels"]

if not use_training_data and use_train_model_for_embeddings:
    train_model = load_latents_and_embeddings(True)["model_umap"]
    embeddings = train_model.transform(latents) 

In [None]:
do_labeling = False
if do_labeling:
    labeler = run_integrated_labeler(embeddings, images, label_path, overwrite=False)

In [None]:
run_label_update = False
show_updates = True
execute_updates = False
if run_label_update:
    labels_to_change, labels_to_clear = update_labels(embeddings, images, label_path)
    execute_label_updates(label_path, labels_to_change, labels_to_clear, show_updates=show_updates, execute_updates=execute_updates)

In [8]:
show_counts = False
if show_counts:
    visualize_counts(label_path)

In [None]:
# Visualize some examples
show_examples = True
if show_examples:  
    visualize_examples(images, label_path, max_images_per_label=10, shuffle=True)

In [None]:
# Train a logistic regression model on the training data
train_new_classifier = False
if train_new_classifier:
    train_classifier()

In [None]:
# Check whether the model from the training labels does well on the test labels
show_evaluation_on_test_data = False
if show_evaluation_on_test_data:
    evaluate_classifier(convert_to_goodvsbad=True, show_confusion_matrix=True, checkout_bad_to_good=True)

In [None]:
show_predictions = False
if show_predictions:
    classifier = load_classifier()
    model = classifier["model"]
    id_to_label = classifier["id_to_label"]
    visualize_predictions(model, latents, embeddings, id_to_label)

In [5]:
process_data = True
if process_data:
    process_sessions()

Processing and classifying session ATL076/2025-07-29/702, (134/149)


100%|██████████| 5/5 [00:03<00:00,  1.59it/s]


Using device: cuda:0




starting: running data through network


  0%|          | 0/2633 [00:01<?, ?it/s]

completed: running data through network
Processing and classifying session ATL076/2025-07-31/701, (135/149)


100%|██████████| 5/5 [00:03<00:00,  1.66it/s]


Using device: cuda:0




starting: running data through network


  0%|          | 0/2591 [00:01<?, ?it/s]

completed: running data through network
Processing and classifying session ATL076/2025-08-04/701, (136/149)


100%|██████████| 5/5 [00:02<00:00,  1.79it/s]


Using device: cuda:0




starting: running data through network


  0%|          | 0/2459 [00:01<?, ?it/s]

completed: running data through network
Processing and classifying session ATL076/2025-08-05/701, (137/149)


100%|██████████| 5/5 [00:02<00:00,  1.73it/s]


Using device: cuda:0




starting: running data through network


  0%|          | 0/2588 [00:01<?, ?it/s]

completed: running data through network
Processing and classifying session ATL076/2025-08-06/701, (138/149)


100%|██████████| 5/5 [00:04<00:00,  1.04it/s]


Using device: cuda:0




starting: running data through network


  0%|          | 0/2653 [00:04<?, ?it/s]

completed: running data through network
Processing and classifying session ATL076/2025-08-07/703, (139/149)


100%|██████████| 5/5 [00:03<00:00,  1.55it/s]


Using device: cuda:0




starting: running data through network


  0%|          | 0/2647 [00:01<?, ?it/s]

completed: running data through network
Processing and classifying session ATL076/2025-08-08/702, (140/149)


100%|██████████| 5/5 [00:03<00:00,  1.57it/s]


Using device: cuda:0




starting: running data through network


  0%|          | 0/2584 [00:01<?, ?it/s]

completed: running data through network
Processing and classifying session ATL076/2025-08-13/701, (141/149)


100%|██████████| 5/5 [00:08<00:00,  1.64s/it]


Using device: cuda:0




starting: running data through network


  0%|          | 0/2530 [00:05<?, ?it/s]

completed: running data through network
Processing and classifying session ATL076/2025-08-14/703, (142/149)


100%|██████████| 5/5 [00:03<00:00,  1.51it/s]


Using device: cuda:0




starting: running data through network


  0%|          | 0/2648 [00:01<?, ?it/s]

completed: running data through network
Processing and classifying session ATL076/2025-08-19/704, (143/149)


100%|██████████| 5/5 [00:10<00:00,  2.08s/it]


Using device: cuda:0




starting: running data through network


  0%|          | 0/2612 [00:04<?, ?it/s]

completed: running data through network
Processing and classifying session ATL076/2025-08-20/702, (144/149)


100%|██████████| 5/5 [00:03<00:00,  1.58it/s]


Using device: cuda:0




starting: running data through network


  0%|          | 0/2521 [00:01<?, ?it/s]

completed: running data through network
Processing and classifying session ATL076/2025-08-21/703, (145/149)


100%|██████████| 5/5 [00:10<00:00,  2.08s/it]


Using device: cuda:0




starting: running data through network


  0%|          | 0/2552 [00:05<?, ?it/s]

completed: running data through network
Processing and classifying session ATL076/2025-08-22/702, (146/149)


100%|██████████| 5/5 [00:03<00:00,  1.65it/s]


Using device: cuda:0




starting: running data through network


  0%|          | 0/2536 [00:01<?, ?it/s]

completed: running data through network
Processing and classifying session ATL076/2025-08-26/702, (147/149)


100%|██████████| 5/5 [00:10<00:00,  2.01s/it]


Using device: cuda:0




starting: running data through network


  0%|          | 0/2600 [00:05<?, ?it/s]

completed: running data through network
Processing and classifying session ATL076/2025-08-27/702, (148/149)


100%|██████████| 5/5 [00:03<00:00,  1.54it/s]


Using device: cuda:0




starting: running data through network


  0%|          | 0/2585 [00:01<?, ?it/s]

completed: running data through network
Processing and classifying session ATL076/2025-08-28/702, (149/149)


100%|██████████| 5/5 [00:09<00:00,  1.84s/it]


Using device: cuda:0




starting: running data through network


  0%|          | 0/2587 [00:01<?, ?it/s]

completed: running data through network
