**Github - (mirroring the repo into the Drive-style folders)**

1. Open **`00_bootstrap_paths.ipynb`** in Google Colab.
2. Run **all** cells.  
   - This installs dependencies, clones (or confirms) this repo, and **mirrors** it into  
     `/content/drive/MyDrive/msc_final_dataset/…` so that *all* notebooks with hardcoded paths run unchanged.
3. Open **`notebooks/04_results_analysis_2.ipynb`** and run it end-to-end to reproduce the human–AI evaluation:
   - Generates similarity scores (`sim_*`), AI labels, confusion matrices, per-class F1 bars, stratified metrics, etc.
4. If desired, run the earlier notebooks in order:
   - `data_cleaning_1.ipynb` → `data_cleaning_2.ipynb` → `dataset_twitter_communities.ipynb` →  
     `analysis_clustering_annotation.ipynb` → `04_results_analysis.ipynb`

In [None]:
import sys
IN_COLAB = 'google.colab' in sys.modules

if IN_COLAB:
    # Minimal set used across your notebooks
    !pip -q install -U sentence-transformers umap-learn scikit-learn openpyxl wordcloud tqdm

In [None]:
from pathlib import Path
import os, sys, json

REPO_URL  = "https://github.com/lbouz16/msc_project_emotion_sensitivity.git"
REPO_NAME = "msc_project_emotion_sensitivity"

if 'google.colab' in sys.modules:
    ROOT = Path('/content')/REPO_NAME
    if not ROOT.exists():
        !git clone {REPO_URL}
else:
    ROOT = Path.cwd()

print("Repo root ->", ROOT)
assert ROOT.exists(), "Could not locate repo root."

In [None]:
from pathlib import Path
import os, sys

if 'google.colab' in sys.modules:
    from google.colab import drive
    drive.mount('/content/drive')

DRIVE_BASE = Path('/content/drive/MyDrive/msc_final_dataset')
DRIVE_BASE.mkdir(parents=True, exist_ok=True)

# Mirror the repo tree into the Drive base so hardcoded /content/drive/... paths still work
if 'google.colab' in sys.modules:
    print("Syncing repo → Drive mirror…")
    # '--delete' keeps the mirror clean; safe because we write into a dedicated folder
    !rsync -a --delete "{ROOT}/" "{DRIVE_BASE}/"
    print("Done.")

In [None]:
from pathlib import Path
import json, os

PATHS = {
    "REPO_ROOT":             str(ROOT),
    "DRIVE_BASE":            str(DRIVE_BASE),
    "DATA_CSV":              str(DRIVE_BASE/'data'/'final_with_polarity_sbert.csv'),
    "FORMS_DIR":             str(DRIVE_BASE/'annotation_forms'),
    "EMBED_DIR":             str(DRIVE_BASE/'embeddings'),
    "RESULTS_DIR":           str(DRIVE_BASE/'chapter4_results'),
    "CHAP4_OUTPUTS_DIR":     str(DRIVE_BASE/'chapter4_outputs'),
    "BERTOPIC_OUTPUTS_DIR":  str(DRIVE_BASE/'bertopic_outputs'),
}

# Save a small helper JSON in repo root (optional utility for your other notebooks)
with open(Path(ROOT)/"paths_bootstrap.json", "w") as f:
    json.dump(PATHS, f, indent=2)

# Environment variables (if you prefer os.environ lookups later)
os.environ.update({
    "MSC_REPO_ROOT": PATHS["REPO_ROOT"],
    "MSC_DRIVE_BASE": PATHS["DRIVE_BASE"],
    "MSC_DATA_CSV": PATHS["DATA_CSV"],
    "MSC_RESULTS_DIR": PATHS["RESULTS_DIR"],
})

# Sanity print
for k,v in PATHS.items():
    print(f"{k:22} -> {v}")

# Quick existence checks
from pathlib import Path
print("\nExists? DATA_CSV:", Path(PATHS["DATA_CSV"]).exists())
print("Exists? FORMS_DIR:", Path(PATHS["FORMS_DIR"]).exists())

In [None]:
print("""
✅ Setup complete.

Next:
1) Open any notebook under /notebooks (e.g., 04_results_analysis_2.ipynb).
2) Run cells as-is. Hardcoded Drive paths will work (we mirrored the repo to /content/drive/...).
3) New notebooks can also use 'paths_bootstrap.json' or environment variables for ROOT/RESULTS/DATA.

Tip: re-run this bootstrap if you start a fresh Colab session.
""")