# Bombcell Post-Run Analysis (Open Ephys + Kilosort4)

Assumes Bombcell has already been run and you exported per-probe CSV/JSON summaries.

**Expected folder convention**
- `{NP_recording_name}/bombcell_DEFAULT/`
  - `DUPLICATED_KILOSORT4_FILES/`
  - `batch_DEFAULT_results/`
- `{NP_recording_name}/bombcell_NP2.0/`
  - `DUPLICATED_KILOSORT4_FILES_ACD/`
  - `NP2_ReRun_results/`

In [75]:
NP_recording_name = 'Reach15_20260201_session007_NP_Recording_Number02_2026-02-01_18-25-00'

In [76]:
# New Code
from pathlib import Path

RECORDING_ROOT = Path(r"H:\Grant\Neuropixels\Kilosort_Recordings") / NP_recording_name

BOMBCELL_DEFAULT_ROOT = RECORDING_ROOT / 'bombcell' / "bombcell_DEFAULT"
BOMBCELL_NP20_ROOT = RECORDING_ROOT / 'bombcell'  / "bombcell_NP2.0"
BOMBCELL_SINGLEPROBE_ROOT = RECORDING_ROOT / "bombcell" / "bombcell_single_probe"


DEFAULT_KS_STAGING_ROOT = BOMBCELL_DEFAULT_ROOT 
NP20_KS_STAGING_ROOT = BOMBCELL_NP20_ROOT 
BOMBCELL_KS_SINGLEPROBE_STAGING_ROOT = BOMBCELL_SINGLEPROBE_ROOT 

DEFAULT_EXPORT_ROOT = BOMBCELL_DEFAULT_ROOT / "batch_DEFAULT_results"
NP20_EXPORT_ROOT = BOMBCELL_NP20_ROOT / "NP2_ReRun_results"
SINGLE_EXPORT_ROOT = BOMBCELL_SINGLEPROBE_ROOT / "single_probe_results"

# Make sure they exist
for p in [
    DEFAULT_KS_STAGING_ROOT,
    NP20_KS_STAGING_ROOT,
    BOMBCELL_KS_SINGLEPROBE_STAGING_ROOT,
    DEFAULT_EXPORT_ROOT,
    NP20_EXPORT_ROOT,
    SINGLE_EXPORT_ROOT

]:
    p.mkdir(parents=True, exist_ok=True)
    print('ALL Paths Exist')


ALL Paths Exist
ALL Paths Exist
ALL Paths Exist
ALL Paths Exist
ALL Paths Exist
ALL Paths Exist


## ==================================
## Load in Data
## ==================================

In [None]:
%load_ext autoreload
%autoreload 2

import bombcell as bc
import os
import pandas as pd

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Set TARGET_PROBE

In [85]:
TARGET_PROBE = "B"  # New Code  <-- set to "A","B","C","D","E","F"

#### Load in all the needed data 
- kilosort4 data
- bombcell data
- phy manual curation data

In [None]:
probeA_Dir = fr'H:\Grant\Neuropixels\Kilosort_Recordings\{NP_recording_name}\Record Node 103\experiment1\recording1\continuous\Neuropix-PXI-100.ProbeA'
probeB_Dir = fr'H:\Grant\Neuropixels\Kilosort_Recordings\{NP_recording_name}\Record Node 103\experiment1\recording1\continuous\Neuropix-PXI-100.ProbeB-AP'
probeC_Dir = fr'H:\Grant\Neuropixels\Kilosort_Recordings\{NP_recording_name}\Record Node 103\experiment1\recording1\continuous\Neuropix-PXI-100.ProbeC'
probeD_Dir = fr'H:\Grant\Neuropixels\Kilosort_Recordings\{NP_recording_name}\Record Node 103\experiment1\recording1\continuous\Neuropix-PXI-100.ProbeD'
probeE_Dir = fr'H:\Grant\Neuropixels\Kilosort_Recordings\{NP_recording_name}\Record Node 103\experiment1\recording1\continuous\Neuropix-PXI-100.ProbeE-AP'
probeF_Dir = fr'H:\Grant\Neuropixels\Kilosort_Recordings\{NP_recording_name}\Record Node 103\experiment1\recording1\continuous\Neuropix-PXI-100.ProbeF-AP'

structur_oebin = fr'H:\Grant\Neuropixels\Kilosort_Recordings\{NP_recording_name}\Record Node 103\experiment1\recording1\structure.oebin'
probeA_kilosort4Dir =  probeA_Dir + r'\kilosort4'
probeB_kilosort4Dir =  probeB_Dir + r'\kilosort4'
probeC_kilosort4Dir =  probeC_Dir + r'\kilosort4'
probeD_kilosort4Dir =  probeD_Dir + r'\kilosort4'
probeE_kilosort4Dir =  probeE_Dir + r'\kilosort4'
probeF_kilosort4Dir =  probeF_Dir + r'\kilosort4'

probeA_continousDir = probeA_Dir + r'\continuous.dat'
probeB_continousDir = probeB_Dir + r'\continuous.dat'
probeC_continousDir = probeC_Dir + r'\continuous.dat'
probeD_continousDir = probeD_Dir + r'\continuous.dat'
probeE_continousDir = probeE_Dir + r'\continuous.dat'
probeF_continousDir = probeF_Dir + r'\continuous.dat'

kilosort_Dirs = [probeA_kilosort4Dir,probeB_kilosort4Dir,probeC_kilosort4Dir,probeD_kilosort4Dir,probeE_kilosort4Dir,probeF_kilosort4Dir]
continousDir = [probeA_continousDir,probeB_continousDir,probeC_continousDir,probeD_continousDir,probeE_continousDir,probeF_continousDir]
probeLetters = ['A','B','C','D','E','F']

probeA_Dir = fr'H:\Grant\Neuropixels\Kilosort_Recordings\{NP_recording_name}\Record Node 103\experiment1\recording1\continuous\Neuropix-PXI-100.ProbeA'
probeB_Dir = fr'H:\Grant\Neuropixels\Kilosort_Recordings\{NP_recording_name}\Record Node 103\experiment1\recording1\continuous\Neuropix-PXI-100.ProbeB-AP'
probeC_Dir = fr'H:\Grant\Neuropixels\Kilosort_Recordings\{NP_recording_name}\Record Node 103\experiment1\recording1\continuous\Neuropix-PXI-100.ProbeC'
probeD_Dir = fr'H:\Grant\Neuropixels\Kilosort_Recordings\{NP_recording_name}\Record Node 103\experiment1\recording1\continuous\Neuropix-PXI-100.ProbeD'
probeE_Dir = fr'H:\Grant\Neuropixels\Kilosort_Recordings\{NP_recording_name}\Record Node 103\experiment1\recording1\continuous\Neuropix-PXI-100.ProbeE-AP'
probeF_Dir = fr'H:\Grant\Neuropixels\Kilosort_Recordings\{NP_recording_name}\Record Node 103\experiment1\recording1\continuous\Neuropix-PXI-100.ProbeF-AP'

structur_oebin = fr'H:\Grant\Neuropixels\Kilosort_Recordings\{NP_recording_name}\Record Node 103\experiment1\recording1\structure.oebin'
probeA_kilosort4Dir =  probeA_Dir + r'\kilosort4'
probeB_kilosort4Dir =  probeB_Dir + r'\kilosort4'
probeC_kilosort4Dir =  probeC_Dir + r'\kilosort4'
probeD_kilosort4Dir =  probeD_Dir + r'\kilosort4'
probeE_kilosort4Dir =  probeE_Dir + r'\kilosort4'
probeF_kilosort4Dir =  probeF_Dir + r'\kilosort4'

probeA_continousDir = probeA_Dir + r'\continuous.dat'
probeB_continousDir = probeB_Dir + r'\continuous.dat'
probeC_continousDir = probeC_Dir + r'\continuous.dat'
probeD_continousDir = probeD_Dir + r'\continuous.dat'
probeE_continousDir = probeE_Dir + r'\continuous.dat'
probeF_continousDir = probeF_Dir + r'\continuous.dat'

kilosort_Dirs = [probeA_kilosort4Dir,probeB_kilosort4Dir,probeC_kilosort4Dir,probeD_kilosort4Dir,probeE_kilosort4Dir,probeF_kilosort4Dir]
continousDir = [probeA_continousDir,probeB_continousDir,probeC_continousDir,probeD_continousDir,probeE_continousDir,probeF_continousDir]
probeLetters = ['A','B','C','D','E','F']

print('----- Verifying structue Obein ------')
print(structur_oebin)
if not os.path.exists(structur_oebin):
    raise FileNotFoundError(f'structure.obien file NOT found for {structur_oebin}')
else:
    print(f'structure.obien  Path Found')
print('')

print('----- Verifying Kilosort Dir ------')
for path in kilosort_Dirs:
    path_kilosort = path
    print(path_kilosort)
    if not os.path.exists(path_kilosort):
        raise FileNotFoundError(f'kilosort4 folder NOT found for {path_kilosort}')
    else:
        print(f'Kilosort Path Found')

print('')
print('')
print('----- Verifying Continous.dat Files ------')
for paths in continousDir:
    path_continous = paths
    print(path_continous)
    if not os.path.exists(path_continous):
        raise FileNotFoundError(f'Continous.dat file NOT found for {path_continous}')
    else:
        print(f'Continous.dat file Found')

# =========================
# Old code
# if dst.exists():
#     if overwrite:
#         shutil.rmtree(dst)
#     else:
#         print(...)
# (but then it STILL continues and copytree crashes)
# =========================


# =========================
# New Code
# =========================
from pathlib import Path  # New Code
import shutil  # New Code

def stage_kilosort4_for_bombcell(
    NP_recording_name: str,
    kilosort_dirs: list[str],
    probe_letters: list[str],
    base_root: str = r"H:\Grant\Neuropixels\Kilosort_Recordings",
    dst_prefix: str = "kilosort4_",
    overwrite: bool = False,
    use_hardlinks_when_possible: bool = False,
) -> dict[str, str]:
    """
    Copies each probe's kilosort4 folder into:
      SINGLEPROBE_KS_ROOT\kilosort4_A, kilosort4_B, ...

    If dst already exists and overwrite=False:
      - SKIP copying
      - still return the existing dst path in output mapping
    """
    if len(kilosort_dirs) != len(probe_letters):
        raise ValueError("kilosort_dirs and probe_letters must have same length.")

    recording_root = Path(base_root) / NP_recording_name
    bombcell_root = SINGLEPROBE_KS_ROOT
    bombcell_root.mkdir(parents=True, exist_ok=True)

    out: dict[str, str] = {}

    for src_str, letter in zip(kilosort_dirs, probe_letters):
        src = Path(src_str)
        if not src.exists():
            raise FileNotFoundError(f"Missing source kilosort4 folder for probe {letter}: {src}")

        dst = bombcell_root / f"{dst_prefix}{letter}"

        # =========================
        # Key change: skip if exists
        # =========================
        if dst.exists():
            if overwrite:
                shutil.rmtree(dst)
            else:
                print(f"[SKIP] Probe {letter}: destination already exists: {dst}")
                out[letter] = str(dst)  # New Code
                continue  # New Code  <-- THIS is the fix

        # Fast-ish copy. Hardlink option can save space on same volume, but may fail on some FS.
        if use_hardlinks_when_possible:
            def _copy_function(src_file: str, dst_file: str) -> str:
                try:
                    os.link(src_file, dst_file)  # hardlink
                    return dst_file
                except Exception:
                    shutil.copy2(src_file, dst_file)
                    return dst_file
            import os
            shutil.copytree(src, dst, copy_function=_copy_function)
        else:
            shutil.copytree(src, dst)

        out[letter] = str(dst)

    return out


# ---- Run it ----
dst_kilosort_dirs = stage_kilosort4_for_bombcell(  # New Code
    NP_recording_name=NP_recording_name,           # New Code
    kilosort_dirs=kilosort_Dirs,                   # New Code
    probe_letters=probeLetters,                    # New Code
    overwrite=False,                               # New Code
    use_hardlinks_when_possible=False,             # New Code
)                                                  # New Code

print("Created/verified bombcell staging folders:")  # New Code
for k, v in dst_kilosort_dirs.items():               # New Code
    print(f"Probe {k}: {v}")                         # New Code



# These are the NEW ks_dir paths you should use for Bombcell
probeA_kilosort4Dir_BC = dst_kilosort_dirs["A"]  # New Code
probeB_kilosort4Dir_BC = dst_kilosort_dirs["B"]  # New Code
probeC_kilosort4Dir_BC = dst_kilosort_dirs["C"]  # New Code
probeD_kilosort4Dir_BC = dst_kilosort_dirs["D"]  # New Code
probeE_kilosort4Dir_BC = dst_kilosort_dirs["E"]  # New Code
probeF_kilosort4Dir_BC = dst_kilosort_dirs["F"]  # New Code

# Convenience list in probe-letter order
kilosort_Dirs_BC = [  # New Code
    probeA_kilosort4Dir_BC,  # New Code
    probeB_kilosort4Dir_BC,  # New Code
    probeC_kilosort4Dir_BC,  # New Code
    probeD_kilosort4Dir_BC,  # New Code
    probeE_kilosort4Dir_BC,  # New Code
    probeF_kilosort4Dir_BC,  # New Code
]  # New Code

# Bombcell output folders (where it writes .npy, .parquet, plots, etc.)
probeA_savePath_BC = probeA_kilosort4Dir_BC + r"\bombcell"  # New Code
probeB_savePath_BC = probeB_kilosort4Dir_BC + r"\bombcell"  # New Code
probeC_savePath_BC = probeC_kilosort4Dir_BC + r"\bombcell"  # New Code
probeD_savePath_BC = probeD_kilosort4Dir_BC + r"\bombcell"  # New Code
probeE_savePath_BC = probeE_kilosort4Dir_BC + r"\bombcell"  # New Code
probeF_savePath_BC = probeF_kilosort4Dir_BC + r"\bombcell"  # New Code

savePaths_BC = [  # New Code
    probeA_savePath_BC,  # New Code
    probeB_savePath_BC,  # New Code
    probeC_savePath_BC,  # New Code
    probeD_savePath_BC,  # New Code
    probeE_savePath_BC,  # New Code
    probeF_savePath_BC,  # New Code
]  # New Code
for path in savePaths_BC:
    print(path)
# =========================
# New Code (place near the top, after NP_recording_name + probe dirs are defined)
# =========================
from pathlib import Path  # New Code


RECORDING_ROOT = Path(r"H:\Grant\Neuropixels\Kilosort_Recordings") / NP_recording_name  # New Code
SINGLEPROBE_ROOT = RECORDING_ROOT / "bombcell"/ "bombcell_single_probe"  # New Code

SINGLEPROBE_KS_ROOT = SINGLEPROBE_ROOT # New Code
SINGLEPROBE_EXPORT_ROOT = SINGLEPROBE_ROOT / "single_probe_results"  # New Code

SINGLEPROBE_KS_ROOT.mkdir(parents=True, exist_ok=True)  # New Code
SINGLEPROBE_EXPORT_ROOT.mkdir(parents=True, exist_ok=True)  # New Code

# Map probe -> source kilosort4 dir + raw file
KS_SRC_MAP = {  # New Code
    "A": probeA_kilosort4Dir,
    "B": probeB_kilosort4Dir,
    "C": probeC_kilosort4Dir,
    "D": probeD_kilosort4Dir,
    "E": probeE_kilosort4Dir,
    "F": probeF_kilosort4Dir,
}
RAW_SRC_MAP = {  # New Code
    "A": probeA_continousDir,
    "B": probeB_continousDir,
    "C": probeC_continousDir,
    "D": probeD_continousDir,
    "E": probeE_continousDir,
    "F": probeF_continousDir,
}

if TARGET_PROBE == 'A':
    idx = 0
if TARGET_PROBE == 'B':
    idx = 1
if TARGET_PROBE == 'C':
    idx = 2
if TARGET_PROBE == 'D':
    idx = 3
if TARGET_PROBE == 'E':
    idx = 4
if TARGET_PROBE == 'F':
    idx = 5

ks_src = Path(KS_SRC_MAP[TARGET_PROBE])  # New Code
raw_file_path = RAW_SRC_MAP[TARGET_PROBE]  # New Code
meta_file_path = structur_oebin  # New Code

# Destination duplicated kilosort folder
ks_dir = str(SINGLEPROBE_KS_ROOT / f"kilosort4_{TARGET_PROBE}")  # New Code

# # Bombcell output folder (inside duplicated ks_dir)
# save_path = Path(ks_dir) / "bombcell" / f"SINGLEPROBE_{TARGET_PROBE}"  # New Code

# # Define export root (single-probe)
# probe_out = SINGLEPROBE_EXPORT_ROOT / f"Probe_{TARGET_PROBE}"  # New Code
save_path = ks_dir + '/bombcell'

# save_path.mkdir(parents=True, exist_ok=True)  # New Code

# load CSV
quality_metrics_path = Path(fr"H:\Grant\Neuropixels\Kilosort_Recordings\{NP_recording_name}\bombcell\bombcell_single_probe\single_probe_results\Probe_{TARGET_PROBE}\Probe_{TARGET_PROBE}_quality_metrics.csv")
df = pd.read_csv(quality_metrics_path)
quality_metrics = df

print("TARGET_PROBE:", TARGET_PROBE)  # New Code
print("ks_src:", ks_src)  # New Code
print("ks_dir:", ks_dir)  # New Code
print("raw_file_path:", raw_file_path)  # New Code
print("meta_file_path:", meta_file_path)  # New Code
print("save_path:", save_path)  # New Code


----- Verifying structue Obein ------
H:\Grant\Neuropixels\Kilosort_Recordings\Reach15_20260201_session007_NP_Recording_Number02_2026-02-01_18-25-00\Record Node 103\experiment1\recording1\structure.oebin
structure.obien  Path Found

----- Verifying Kilosort Dir ------
H:\Grant\Neuropixels\Kilosort_Recordings\Reach15_20260201_session007_NP_Recording_Number02_2026-02-01_18-25-00\Record Node 103\experiment1\recording1\continuous\Neuropix-PXI-100.ProbeA\kilosort4
Kilosort Path Found
H:\Grant\Neuropixels\Kilosort_Recordings\Reach15_20260201_session007_NP_Recording_Number02_2026-02-01_18-25-00\Record Node 103\experiment1\recording1\continuous\Neuropix-PXI-100.ProbeB-AP\kilosort4
Kilosort Path Found
H:\Grant\Neuropixels\Kilosort_Recordings\Reach15_20260201_session007_NP_Recording_Number02_2026-02-01_18-25-00\Record Node 103\experiment1\recording1\continuous\Neuropix-PXI-100.ProbeC\kilosort4
Kilosort Path Found
H:\Grant\Neuropixels\Kilosort_Recordings\Reach15_20260201_session007_NP_Recording_N

In [110]:
# quality metric values
quality_metrics_table = pd.DataFrame(quality_metrics)
# quality_metrics_table.insert(0, 'Bombcell_unit_type', unit_type_string)
quality_metrics_table    

Unnamed: 0,Bombcell_unit_type,cluster_id,row_index,phy_clusterID,nSpikes,nPeaks,nTroughs,waveformDuration_peakTrough,spatialDecaySlope,waveformBaselineFlatness,...,maxDriftEstimate,cumDriftEstimate,rawAmplitude,signalToNoiseRatio,isolationDistance,Lratio,silhouetteScore,useTheseTimesStart,useTheseTimesStop,maxChannels
0,GOOD,0,0,0,311892.0,1.0,1.0,166.666667,0.013970,0.020126,...,11.119194,55.112373,72.875717,14.510937,,,,0.000067,8640.000067,2
1,MUA,1,1,1,52805.0,2.0,1.0,133.333333,0.021148,0.050982,...,5.897190,73.563652,23.371343,2.061966,,,,3960.000067,8280.000067,0
2,NOISE,2,2,2,6251.0,1.0,1.0,700.000000,0.017437,0.126556,...,5.049137,49.258495,51.590074,13.997928,,,,5400.000067,7560.000067,1
3,MUA,3,3,3,10056.0,1.0,1.0,200.000000,0.020088,0.015577,...,21.583885,221.071777,65.607953,8.144144,,,,0.000067,3240.000067,1
4,GOOD,4,4,4,15214.0,1.0,1.0,233.333333,0.030410,0.021121,...,21.328331,265.586670,47.304186,7.611712,,,,720.000067,8640.000067,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1311,MUA,1311,1311,1311,72314.0,1.0,1.0,133.333333,0.018073,0.024039,...,50.013672,317.257568,20.537167,2.396758,,,,3600.000067,8640.000067,357
1312,MUA,1312,1312,1312,1414.0,1.0,1.0,300.000000,0.025199,0.036436,...,3.634766,7.037598,13.312992,3.542415,,,,0.000067,360.000067,357
1313,NON-SOMA,1313,1313,1313,723.0,1.0,1.0,233.333333,0.022037,0.047541,...,11.469971,40.971191,46.155959,5.259838,,,,360.000067,2160.000067,357
1314,MUA,1314,1314,1314,2348.0,1.0,1.0,166.666667,0.024749,0.053390,...,13.382080,13.866455,11.809050,2.343361,,,,0.000067,720.000067,357


In [111]:
quality_metrics_table.describe()

Unnamed: 0,cluster_id,row_index,phy_clusterID,nSpikes,nPeaks,nTroughs,waveformDuration_peakTrough,spatialDecaySlope,waveformBaselineFlatness,scndPeakToTroughRatio,...,maxDriftEstimate,cumDriftEstimate,rawAmplitude,signalToNoiseRatio,isolationDistance,Lratio,silhouetteScore,useTheseTimesStart,useTheseTimesStop,maxChannels
count,1316.0,1316.0,1316.0,1316.0,1313.0,1313.0,1313.0,1313.0,1313.0,1313.0,...,1313.0,1313.0,1313.0,1313.0,0.0,0.0,0.0,1313.0,1313.0,1316.0
mean,657.5,657.5,657.5,22049.085106,1.3115,1.058644,261.665397,0.027831,0.049206,0.48826,...,243.200767,1669.955587,143.184949,33.9548,,,,1719.939137,4819.832511,165.417173
std,380.040787,380.040787,380.040787,43714.317009,0.48104,0.250737,156.318596,0.013551,0.042699,0.564762,...,699.197615,8733.265991,212.881879,60.974169,,,,2119.352178,3184.151345,109.866833
min,0.0,0.0,0.0,1.0,1.0,1.0,66.666667,0.001021,0.005695,0.002565,...,0.0,0.0,1.247137,0.210699,,,,6.7e-05,360.000067,0.0
25%,328.75,328.75,328.75,1098.0,1.0,1.0,166.666667,0.018742,0.023185,0.158641,...,5.737915,15.878418,42.28909,8.907994,,,,6.7e-05,1800.000067,66.0
50%,657.5,657.5,657.5,5304.5,1.0,1.0,233.333333,0.025146,0.036892,0.377218,...,18.480469,102.230957,78.155584,18.315633,,,,720.000067,4680.000067,147.5
75%,986.25,986.25,986.25,21131.0,2.0,1.0,333.333333,0.034503,0.057542,0.609524,...,43.171509,312.976074,145.931045,36.659258,,,,2520.000067,8280.000067,265.25
max,1315.0,1315.0,1315.0,391911.0,3.0,3.0,866.666667,0.121548,0.298735,7.241085,...,3814.299561,175132.667462,1335.572946,1291.409175,,,,8640.000067,9000.000067,378.0


## ==================================
## Compare BC --> Kilosort4
## ==================================

#### Compares KS4 > BC > PHY_manual
- this simply loads in and shows the 'GOOD', 'MUA' lables for each of these 3 

In [147]:
from pathlib import Path
import pandas as pd
import numpy as np

probe = "B"

ks_base = Path(ks_dir)
probe_token = f"probe_{probe}".lower()

# --------------------------
# Find files
cluster_info_candidates = list(ks_base.rglob("cluster_info.tsv")) + list(ks_base.rglob("cluster_group.tsv"))
kslabel_candidates = list(ks_base.rglob("cluster_KSLabel.tsv")) + list(ks_base.rglob("cluster_KSlabel.tsv"))

if len(cluster_info_candidates) == 0:
    raise FileNotFoundError(f"No cluster_info.tsv/cluster_group.tsv found under {ks_base}")

if len(kslabel_candidates) == 0:
    raise FileNotFoundError(
        f"No cluster_KSLabel.tsv found under {ks_base}. "
        "If your pipeline uses a different filename, search the folder for 'KSLabel'."
    )

# Prefer paths that include probe token
cluster_info_path = next((p for p in cluster_info_candidates if probe_token in str(p).lower()), cluster_info_candidates[0])
kslabel_path = next((p for p in kslabel_candidates if probe_token in str(p).lower()), kslabel_candidates[0])

print("Using cluster info (manual phy labels):", cluster_info_path)
print("Using cluster_KSLabel (original KS labels):", kslabel_path)

# --------------------------
# Load
phy_info = pd.read_csv(cluster_info_path, sep="\t")
ks_auto = pd.read_csv(kslabel_path, sep="\t")

# Columns in KS auto file are typically: cluster_id, KSLabel
ks_auto_id_col = next((c for c in ["cluster_id", "clusterId", "id", "cluster"] if c in ks_auto.columns), None)
ks_auto_label_col = next((c for c in ["KSLabel", "kslabel", "label", "group"] if c in ks_auto.columns), None)
if ks_auto_id_col is None or ks_auto_label_col is None:
    raise KeyError(f"cluster_KSLabel columns unexpected: {ks_auto.columns.tolist()}")

# Manual phy labels (group)
phy_id_col = next((c for c in ["cluster_id", "clusterId", "id", "cluster"] if c in phy_info.columns), None)
phy_label_col = next((c for c in ["group", "label", "KSLabel", "unit_type"] if c in phy_info.columns), None)
if phy_id_col is None or phy_label_col is None:
    raise KeyError(f"cluster_info columns unexpected: {phy_info.columns.tolist()}")

# --------------------------
# BC columns
bc_id_col = "phy_clusterID"
bc_label_col = "Bombcell_unit_type"
if bc_id_col not in quality_metrics.columns or bc_label_col not in quality_metrics.columns:
    raise KeyError(f"quality_metrics must contain {bc_id_col}, {bc_label_col}. Columns:\n{quality_metrics.columns.tolist()}")

# --------------------------
# Build df with BOTH KS auto + KS manual
df = (
    quality_metrics[[bc_id_col, bc_label_col]]
    .merge(ks_auto[[ks_auto_id_col, ks_auto_label_col]], left_on=bc_id_col, right_on=ks_auto_id_col, how="inner")
    .merge(phy_info[[phy_id_col, phy_label_col]], left_on=bc_id_col, right_on=phy_id_col, how="left")
    .rename(columns={
        ks_auto_label_col: "KS_auto_label",
        phy_label_col: "KS_manual_label"
    })
)

# Normalize
df["BC_unit_type"] = df[bc_label_col].astype(str).str.strip().str.upper()
df["KS_auto_unit_type"] = df["KS_auto_label"].astype(str).str.strip().str.lower()
df["KS_manual_unit_type"] = df["KS_manual_label"].astype(str).str.strip().str.lower()

print("df shape:", df.shape)
display(df[[bc_id_col, "BC_unit_type", "KS_auto_unit_type", "KS_manual_unit_type"]].head())

print("\nBC counts:")
print(df["BC_unit_type"].value_counts(dropna=False))

print("\nKS auto counts (original Kilosort):")
print(df["KS_auto_unit_type"].value_counts(dropna=False))

print("\nKS manual counts (your phy curation):")
print(df["KS_manual_unit_type"].value_counts(dropna=False))

df_BC = pd.DataFrame(df["BC_unit_type"])
df_KS4 = pd.DataFrame(df["KS_auto_unit_type"])
df_PHY_manual = pd.DataFrame(df["KS_manual_unit_type"])
df_PHY_manual_good = df_PHY_manual[df_PHY_manual["KS_manual_unit_type"] == "good"]


Using cluster info (manual phy labels): H:\Grant\Neuropixels\Kilosort_Recordings\Reach15_20260201_session007_NP_Recording_Number02_2026-02-01_18-25-00\bombcell\bombcell_single_probe\kilosort4_B\cluster_info.tsv
Using cluster_KSLabel (original KS labels): H:\Grant\Neuropixels\Kilosort_Recordings\Reach15_20260201_session007_NP_Recording_Number02_2026-02-01_18-25-00\bombcell\bombcell_single_probe\kilosort4_B\cluster_KSLabel.tsv
df shape: (1314, 9)


Unnamed: 0,phy_clusterID,BC_unit_type,KS_auto_unit_type,KS_manual_unit_type
0,0,GOOD,mua,good
1,1,MUA,good,
2,2,NOISE,mua,
3,3,MUA,mua,
4,4,GOOD,good,good



BC counts:
BC_unit_type
GOOD        423
MUA         358
NON-SOMA    286
NOISE       247
Name: count, dtype: int64

KS auto counts (original Kilosort):
KS_auto_unit_type
mua     783
good    531
Name: count, dtype: int64

KS manual counts (your phy curation):
KS_manual_unit_type
NaN      1273
good       36
mua         4
noise       1
Name: count, dtype: int64


### dataframe for each proccess

In [153]:
df_BC.head(), df_KS4.head(), df_PHY_manual.head(), df_PHY_manual_good.head()

(  BC_unit_type
 0         GOOD
 1          MUA
 2        NOISE
 3          MUA
 4         GOOD,
   KS_auto_unit_type
 0               mua
 1              good
 2               mua
 3               mua
 4              good,
   KS_manual_unit_type
 0                good
 1                 NaN
 2                 NaN
 3                 NaN
 4                good,
    KS_manual_unit_type
 0                 good
 4                 good
 5                 good
 15                good
 16                good)

### Directly comparing unitIDs of Kilosort4 Vs. Bombcell

In [155]:
# --------------------------
# Compare Bombcell vs Kilosort4 ORIGINAL labels
# Assumes df has: phy_clusterID, BC_unit_type (GOOD/MUA/NOISE), KS_auto_unit_type (good/mua/noise)

id_col = "phy_clusterID"
bc_col = "BC_unit_type"
ks_col = "KS_auto_unit_type"

for c in [id_col, bc_col, ks_col]:
    if c not in df.columns:
        raise KeyError(f"Missing {c}. df columns:\n{df.columns.tolist()}")

# GOOD agreement / mismatches
both_good = df[(df[bc_col] == "GOOD") & (df[ks_col] == "good")].copy()
bc_good_ks_not = df[(df[bc_col] == "GOOD") & (df[ks_col] != "good")].copy()
ks_good_bc_not = df[(df[bc_col] != "GOOD") & (df[ks_col] == "good")].copy()

# “Dropped” lists (as IDs)
ids_both_good = both_good[id_col].tolist()
ids_bc_dropped_vs_ks_good = ks_good_bc_not[id_col].tolist()   # KS called good, BC did NOT -> BC dropped relative to KS
ids_ks_dropped_vs_bc_good = bc_good_ks_not[id_col].tolist()   # BC called GOOD, KS did NOT -> KS dropped relative to BC

print('Good Counts:')
print("Total KS4 good:", (df["KS_auto_unit_type"] == "good").sum())
print("Total BC GOOD:", (df["BC_unit_type"] == "GOOD").sum())
print('')

print("Counts:")
print("  both GOOD:", len(ids_both_good))
print("  BC dropped vs KS GOOD (KS good, BC not GOOD):", len(ids_bc_dropped_vs_ks_good))
print("  KS dropped vs BC GOOD (BC GOOD, KS not good):", len(ids_ks_dropped_vs_bc_good))

# Optional: inspect small samples
print("\nExample IDs (first 20):")
print("  both GOOD:", ids_both_good[:20])
print("  BC dropped vs KS GOOD:", ids_bc_dropped_vs_ks_good[:20])
print("  KS dropped vs BC GOOD:", ids_ks_dropped_vs_bc_good[:20])

# Optional: save full tables for inspection
both_good_table = both_good[[id_col, bc_col, ks_col]]
bc_dropped_table = ks_good_bc_not[[id_col, bc_col, ks_col]]
ks_dropped_table = bc_good_ks_not[[id_col, bc_col, ks_col]]

both_good_table, bc_dropped_table, ks_dropped_table


Good Counts:
Total KS4 good: 531
Total BC GOOD: 423

Counts:
  both GOOD: 199
  BC dropped vs KS GOOD (KS good, BC not GOOD): 332
  KS dropped vs BC GOOD (BC GOOD, KS not good): 224

Example IDs (first 20):
  both GOOD: [4, 5, 32, 94, 97, 114, 128, 136, 140, 145, 165, 178, 182, 184, 190, 209, 266, 281, 292, 322]
  BC dropped vs KS GOOD: [1, 7, 10, 12, 14, 21, 28, 53, 56, 69, 70, 72, 79, 81, 86, 89, 95, 96, 98, 108]
  KS dropped vs BC GOOD: [0, 11, 15, 16, 17, 20, 22, 23, 37, 39, 40, 46, 66, 71, 74, 76, 77, 82, 84, 85]


(      phy_clusterID BC_unit_type KS_auto_unit_type
 4                 4         GOOD              good
 5                 5         GOOD              good
 32               32         GOOD              good
 92               94         GOOD              good
 95               97         GOOD              good
 ...             ...          ...               ...
 1269           1271         GOOD              good
 1291           1293         GOOD              good
 1303           1305         GOOD              good
 1305           1307         GOOD              good
 1308           1310         GOOD              good
 
 [199 rows x 3 columns],
       phy_clusterID BC_unit_type KS_auto_unit_type
 1                 1          MUA              good
 7                 7        NOISE              good
 10               10          MUA              good
 12               12          MUA              good
 14               14          MUA              good
 ...             ...          ...    

# ==============================================
# NEXT
# ==============================================

In [133]:
from pathlib import Path
import pandas as pd

probe = "B"

# --------------------------
# 1) Locate Kilosort/Phy cluster_info.tsv (contains KS labels: good/mua/noise)
ks_base = Path(ks_dir)
candidates = list(ks_base.rglob("cluster_info.tsv")) + list(ks_base.rglob("cluster_group.tsv"))

if len(candidates) == 0:
    raise FileNotFoundError(
        f"No cluster_info.tsv or cluster_group.tsv found under:\n{ks_base}\n"
        "Point cluster_info_path directly to the KS output folder that contains it."
    )

# If multiple found, prefer one that mentions the probe (Probe_B / probeB / etc.), else take the first
probe_token = f"probe_{probe}".lower()
cluster_info_path = next((p for p in candidates if probe_token in str(p).lower()), candidates[0])

print("Using cluster info:", cluster_info_path)

ks_info = pd.read_csv(cluster_info_path, sep="\t")

# --------------------------
# 2) Pick the KS label + cluster id columns robustly
# cluster id column (usually cluster_id)
ks_id_col = next((c for c in ["cluster_id", "clusterId", "id", "cluster"] if c in ks_info.columns), None)
if ks_id_col is None:
    raise KeyError(f"Could not find KS cluster id column in {cluster_info_path.name}. Columns:\n{ks_info.columns.tolist()}")

# KS label column: usually 'group' (good/mua/noise) or 'KSLabel'
ks_label_col = next((c for c in ["group", "KSLabel", "label", "unit_type"] if c in ks_info.columns), None)
if ks_label_col is None:
    raise KeyError(f"Could not find KS label column in {cluster_info_path.name}. Columns:\n{ks_info.columns.tolist()}")

# --------------------------
# 3) Build the comparison DF: one row per unit with BC label + KS label
# Bombcell label column you already have in quality_metrics
bc_label_col = "Bombcell_unit_type"
bc_id_col = "phy_clusterID"

if bc_label_col not in quality_metrics.columns or bc_id_col not in quality_metrics.columns:
    raise KeyError(f"`quality_metrics` must contain {bc_id_col} and {bc_label_col}. Columns:\n{quality_metrics.columns.tolist()}")

df = (
    quality_metrics[[bc_id_col, bc_label_col]]
    .merge(
        ks_info[[ks_id_col, ks_label_col]],
        left_on=bc_id_col,
        right_on=ks_id_col,
        how="inner",
        validate="one_to_one"
    )
    .rename(columns={ks_label_col: "KS_unit_type"})
)

# Optional: normalize label strings for later comparisons
df["BC_unit_type"] = df[bc_label_col].astype(str).str.strip().str.upper()
df["KS_unit_type"] = df["KS_unit_type"].astype(str).str.strip().str.lower()

print("df shape:", df.shape)
display(df.head())

print("\nBC counts:")
print(df["BC_unit_type"].value_counts(dropna=False))

print("\nKS counts:")
print(df["KS_unit_type"].value_counts(dropna=False))


Using cluster info: H:\Grant\Neuropixels\Kilosort_Recordings\Reach15_20260201_session007_NP_Recording_Number02_2026-02-01_18-25-00\bombcell\bombcell_single_probe\kilosort4_B\cluster_info.tsv
df shape: (1314, 5)


Unnamed: 0,phy_clusterID,Bombcell_unit_type,cluster_id,KS_unit_type,BC_unit_type
0,0,GOOD,0,good,GOOD
1,1,MUA,1,,MUA
2,2,NOISE,2,,NOISE
3,3,MUA,3,,MUA
4,4,GOOD,4,good,GOOD



BC counts:
BC_unit_type
GOOD        423
MUA         358
NON-SOMA    286
NOISE       247
Name: count, dtype: int64

KS counts:
KS_unit_type
NaN      1273
good       36
mua         4
noise       1
Name: count, dtype: int64


In [127]:
# --------------------------
# Cell 1: identify label columns + normalize to {GOOD, MUA, NOISE}
# BC label column (you showed this exists)
bc_col = "bc_unitType"

# Try common KS4 label column names; edit this list if yours differs
ks_candidates = [
    "bc_unitType", "cluster_id"
]
ks_col = next((c for c in ks_candidates if c in df.columns), None)

print('Options for ks_candidates: ', df.columns.tolist())

if ks_col is None:
    raise KeyError(f"Could not find a Kilosort label column. Available columns:\n{df.columns.tolist()}")

# Cluster ID column used to match units
id_candidates = ["phy_clusterID", "cluster_id", "clusterId", "cluster"]
id_col = next((c for c in id_candidates if c in df.columns), None)
if id_col is None:
    raise KeyError(f"Could not find a cluster id column. Available columns:\n{df.columns.tolist()}")

# Normalize labels to uppercase canonical set
def norm_label(x):
    if x is None:
        return None
    s = str(x).strip()
    if s == "" or s.lower() == "nan":
        return None
    sU = s.upper()
    # Map common variants
    if sU in {"GOOD", "G"}:
        return "GOOD"
    if sU in {"MUA", "MULTIUNIT", "MULTI-UNIT"}:
        return "MUA"
    if sU in {"NOISE", "N"}:
        return "NOISE"
    # Kilosort often uses "good"/"mua"/"noise" already; upper() handles that.
    return sU

work = df[[id_col, bc_col, ks_col]].copy()
work["BC"] = work[bc_col].map(norm_label)
work["KS"] = work[ks_col].map(norm_label)

print("Using columns:", {"id_col": id_col, "bc_col": bc_col, "ks_col": ks_col})
print("BC label counts:\n", work["BC"].value_counts(dropna=False))
print("KS label counts:\n", work["KS"].value_counts(dropna=False))


Options for ks_candidates:  ['Bombcell_unit_type', 'cluster_id', 'row_index', 'phy_clusterID', 'nSpikes', 'nPeaks', 'nTroughs', 'waveformDuration_peakTrough', 'spatialDecaySlope', 'waveformBaselineFlatness', 'scndPeakToTroughRatio', 'mainPeakToTroughRatio', 'peak1ToPeak2Ratio', 'troughToPeak2Ratio', 'mainPeak_before_width', 'mainTrough_width', 'percentageSpikesMissing_gaussian', 'percentageSpikesMissing_symmetric', 'RPV_window_index', 'fractionRPVs_estimatedTauR', 'presenceRatio', 'maxDriftEstimate', 'cumDriftEstimate', 'rawAmplitude', 'signalToNoiseRatio', 'isolationDistance', 'Lratio', 'silhouetteScore', 'useTheseTimesStart', 'useTheseTimesStop', 'maxChannels']


KeyError: "['bc_unitType'] not in index"

In [129]:
# --------------------------
# Cell 2: comparisons (GOOD and MUA), confusion matrix, agreement, and ID lists

import pandas as pd

# Confusion matrix restricted to GOOD/MUA/NOISE (drop other/unexpected labels if any)
keep = {"GOOD", "MUA", "NOISE"}
cm_data = work[work["BC"].isin(keep) & work["KS"].isin(keep)].copy()

conf = pd.crosstab(cm_data["BC"], cm_data["KS"], dropna=False)
conf

# Agreement metrics for GOOD and for MUA (treat each as "positive" vs all else)
def binary_metrics(pos_label):
    tp = ((cm_data["BC"] == pos_label) & (cm_data["KS"] == pos_label)).sum()
    fp = ((cm_data["BC"] != pos_label) & (cm_data["KS"] == pos_label)).sum()
    fn = ((cm_data["BC"] == pos_label) & (cm_data["KS"] != pos_label)).sum()
    tn = ((cm_data["BC"] != pos_label) & (cm_data["KS"] != pos_label)).sum()

    precision = tp / (tp + fp) if (tp + fp) else float("nan")
    recall    = tp / (tp + fn) if (tp + fn) else float("nan")
    f1        = 2*precision*recall/(precision+recall) if (precision+recall) else float("nan")
    return pd.Series(
        {"TP": tp, "FP": fp, "FN": fn, "TN": tn,
         "precision": precision, "recall": recall, "f1": f1}
    )

metrics = pd.DataFrame({"GOOD": binary_metrics("GOOD"), "MUA": binary_metrics("MUA")}).T
metrics

# Concrete ID lists you’ll actually inspect
ids_good_agree = cm_data.loc[(cm_data["BC"]=="GOOD") & (cm_data["KS"]=="GOOD"), id_col].tolist()
ids_good_disagree_bc_good = cm_data.loc[(cm_data["BC"]=="GOOD") & (cm_data["KS"]!="GOOD"), id_col].tolist()
ids_good_disagree_ks_good = cm_data.loc[(cm_data["BC"]!="GOOD") & (cm_data["KS"]=="GOOD"), id_col].tolist()

ids_mua_agree = cm_data.loc[(cm_data["BC"]=="MUA") & (cm_data["KS"]=="MUA"), id_col].tolist()
ids_mua_disagree_bc_mua = cm_data.loc[(cm_data["BC"]=="MUA") & (cm_data["KS"]!="MUA"), id_col].tolist()
ids_mua_disagree_ks_mua = cm_data.loc[(cm_data["BC"]!="MUA") & (cm_data["KS"]=="MUA"), id_col].tolist()

print("GOOD agree:", len(ids_good_agree))
print("BC GOOD, KS not GOOD:", len(ids_good_disagree_bc_good))
print("KS GOOD, BC not GOOD:", len(ids_good_disagree_ks_good))

print("MUA agree:", len(ids_mua_agree))
print("BC MUA, KS not MUA:", len(ids_mua_disagree_bc_mua))
print("KS MUA, BC not MUA:", len(ids_mua_disagree_ks_mua))


KeyError: 'BC'

## ==================================
## Cell Classification
## ==================================

#### Begin Cell Classification Comparision
- save_path = bombcells DIRECT save location (same location you use to launch phy)
    - Example save_path: H:\Grant\Neuropixels\Kilosort_Recordings\Reach15_20260201_session007_NP_Recording_Number02_2026-02-01_18-25-00\bombcell\bombcell_single_probe\kilosort4_B/bombcell

In [112]:
# Compare manual vs BombCell classifications (only requires save_path)
bc.compare_manual_vs_bombcell(save_path)

📊 Comparing manual vs BombCell classifications from: H:\Grant\Neuropixels\Kilosort_Recordings\Reach15_20260201_session007_NP_Recording_Number02_2026-02-01_18-25-00\bombcell\bombcell_single_probe\kilosort4_B\bombcell
✅ Loaded BombCell results: 1316 units
❌ No manual classifications found. Please use the GUI to manually classify some units first.
Expected file: H:\Grant\Neuropixels\Kilosort_Recordings\Reach15_20260201_session007_NP_Recording_Number02_2026-02-01_18-25-00\bombcell\bombcell_single_probe\kilosort4_B\bombcell\manual_unit_classifications.csv
❌ No manual classifications found.
   Use the GUI to manually classify some units first:
   bc.unit_quality_gui(ks_dir, quality_metrics, unit_types, param, save_path)


In [113]:
# Compute ephys properties for cell type classification
ephys_param = bc.get_ephys_parameters(ks_dir)

# Compute all ephys properties - now defaults to ks_dir/bombcell
ephys_properties, ephys_param = bc.run_all_ephys_properties(ks_dir, ephys_param, save_path=save_path)

Computing ephys properties for 1318 units ...


Computing ephys properties:   0%|          | 0/1318 [00:00<?, ?it/s]

Ephys properties computation complete!
Ephys properties saved to: H:\Grant\Neuropixels\Kilosort_Recordings\Reach15_20260201_session007_NP_Recording_Number02_2026-02-01_18-25-00\bombcell\bombcell_single_probe\kilosort4_B/bombcell\templates._bc_ephysProperties.parquet
Parameters saved to: H:\Grant\Neuropixels\Kilosort_Recordings\Reach15_20260201_session007_NP_Recording_Number02_2026-02-01_18-25-00\bombcell\bombcell_single_probe\kilosort4_B/bombcell\_bc_ephysParameters.parquet


In [115]:
# Cell type classification with automatic plot generation
# Specify brain region: 'cortex' or 'striatum'
brain_region = 'Pontine Nuclei'  # Change this to 'cortex' for cortical data. Striatum and cortex are the only two options for now. 

print(f"Classifying {brain_region} neurons...")
cell_types = bc.classify_and_plot_brain_region(ephys_properties, ephys_param, brain_region)

Classifying Pontine Nuclei neurons...
We cannot do that yet! 'pontine nuclei' is not supported.
Currently supported regions: 'cortex' and 'striatum'
