# Mantis / Histopathology ‚Äî Safe Colab Deploy Notebook

This notebook **mounts Google Drive**, **clones a GitHub repo into Drive**, sets your **Git identity**, and creates a **single source of truth** for project paths.

## Security: do not leak secrets
- **Never** paste tokens/keys directly into notebook cells.
- If you need a GitHub token for a private repo, use **Colab Secrets** (`GITHUB_TOKEN`) or an interactive prompt (`getpass`) that does **not** print.
- This notebook **does not read or print** any secret files from your Drive.
- Keep secrets in Drive-only: `/content/drive/MyDrive/mit/histopathology_mantis_20260115/secrets/` (and keep them gitignored).

**Pre-filled identity and paths**
- GitHub org/user: `mithridatemelik`
- Git author email: `djamshedelikov@gmail.com`
- Git author name: `Djamshed Melikov`
- Drive project root: `/content/drive/MyDrive/mit/histopathology_mantis_20260115`


## 1) Mount Google Drive


In [None]:
from google.colab import drive
drive.mount("/content/drive")


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## 2) Define your project layout (ONE source of truth)

Persistent layout in Drive:
- `code/` ‚Üí cloned repos
- `data/` ‚Üí datasets and intermediate files
- `outputs/` ‚Üí exports/logs/Mantis-compliant CSVs
- `.cache/` ‚Üí caches
- `secrets/` ‚Üí Drive-only; **never** committed; never printed


In [None]:
from pathlib import Path
import os

DRIVE_ROOT = Path(r"/content/drive/MyDrive/mit/histopathology_mantis_20260115")
CODE_ROOT  = DRIVE_ROOT / "code"
DATA_ROOT  = DRIVE_ROOT / "data"
OUT_ROOT   = DRIVE_ROOT / "outputs"
CACHE_DIR  = DRIVE_ROOT / ".cache"
SECRETS_DIR = DRIVE_ROOT / "secrets"   # Drive-only; never committed

for p in [CODE_ROOT, DATA_ROOT, OUT_ROOT, CACHE_DIR, SECRETS_DIR]:
    p.mkdir(parents=True, exist_ok=True)

os.environ["MANTIS_PROJECT_ROOT"] = str(DRIVE_ROOT)

print("‚úÖ Ready")
print("DRIVE_ROOT:", DRIVE_ROOT)
print("CODE_ROOT :", CODE_ROOT)
print("DATA_ROOT :", DATA_ROOT)
print("OUT_ROOT  :", OUT_ROOT)


‚úÖ Ready
DRIVE_ROOT: /content/drive/MyDrive/mit/histopathology_mantis_20260115
CODE_ROOT : /content/drive/MyDrive/mit/histopathology_mantis_20260115/code
DATA_ROOT : /content/drive/MyDrive/mit/histopathology_mantis_20260115/data
OUT_ROOT  : /content/drive/MyDrive/mit/histopathology_mantis_20260115/outputs


## 3) Configure Git identity (safe; no secrets)


In [None]:
import subprocess

def run(cmd, cwd=None):
    print("‚ñ∂", cmd)
    subprocess.run(cmd, shell=True, check=True, cwd=cwd)

run('git config --global user.email "djamshedelikov@gmail.com"')
run('git config --global user.name "Djamshed Melikov"')
run('git config --global --list | egrep "user.email|user.name" || true')


‚ñ∂ git config --global user.email "djamshedelikov@gmail.com"
‚ñ∂ git config --global user.name "Djamshed Melikov"
‚ñ∂ git config --global --list | egrep "user.email|user.name" || true


## 4) Clone your repo into Drive (persistent)

### What you must set
- `REPO_NAME` = the repository name under `https://github.com/mithridatemelik/`
  - Example: `histopathology_mantis` (replace with the correct one)

### Private repo?
Store a token in **Colab Secrets** as `GITHUB_TOKEN`:
- Colab UI ‚Üí (üîë) Secrets ‚Üí add `GITHUB_TOKEN`

This notebook reads it **without printing**.


In [None]:
import os, getpass, requests

token = os.environ.get("GITHUB_TOKEN") or getpass.getpass("GitHub token (hidden): ")
headers = {"Authorization": f"token {token}", "Accept": "application/vnd.github+json"}

repos = []
page = 1
while True:
    r = requests.get(
        "https://api.github.com/user/repos",
        headers=headers,
        params={"per_page": 100, "page": page}
    )
    r.raise_for_status()
    data = r.json()
    if not data:
        break
    repos.extend(data)
    page += 1

matches = sorted(
    {repo["full_name"] for repo in repos
     if "histopathology" in repo["name"].lower() or "mantis" in repo["name"].lower()}
)

print("Matches:")
for m in matches[:50]:
    print(" -", m)

if not matches:
    print("\nNo matching repos visible to this token/user.")
    print("Likely: wrong owner/org, repo name differs, or you don't have access.")


GitHub token (hidden): ¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑
Matches:

No matching repos visible to this token/user.
Likely: wrong owner/org, repo name differs, or you don't have access.


In [None]:
from pathlib import Path
import os, textwrap, subprocess, getpass
import requests

# -----------------------------
# CONFIG (edit these)
# -----------------------------
OWNER = "mithridatemelik"          # your username OR org name
REPO_NAME = "mantis_histopathology"  # repo you want to create
PRIVATE = True

PROJECT_DIR = Path("/content/drive/MyDrive/mit") / REPO_NAME  # where to build the codebase locally
NOTEBOOK_PATH = None  # OPTIONAL: set to a real .ipynb path if you want to copy it in (recommended)

GIT_USER_NAME = "Djamshed Melikov"
GIT_USER_EMAIL = "mithtidatemelik@gmail.com"

# -----------------------------
# Helpers
# -----------------------------
def sh(cmd, cwd=None):
    subprocess.run(cmd, cwd=cwd, check=True)

def get_token():
    # Prefer Colab secret if present; fallback to hidden prompt
    try:
        from google.colab import userdata
        tok = userdata.get("GITHUB_TOKEN")
        if tok:
            return tok
    except Exception:
        pass
    return getpass.getpass("GitHub token (hidden): ")

def gh_headers(token: str):
    return {
        "Authorization": f"Bearer {token}",
        "Accept": "application/vnd.github+json",
    }

def api_get(url, token):
    r = requests.get(url, headers=gh_headers(token))
    return r

def api_post(url, token, payload):
    r = requests.post(url, headers=gh_headers(token), json=payload)
    return r

def ensure_repo_exists(owner, repo, token, private=True):
    # 1) Check repo existence
    r = api_get(f"https://api.github.com/repos/{owner}/{repo}", token)
    if r.status_code == 200:
        print(f"‚úÖ Repo exists: {owner}/{repo}")
        return

    if r.status_code != 404:
        raise RuntimeError(f"Repo check failed: {r.status_code} {r.text}")

    print(f"‚ÑπÔ∏è Repo not found: {owner}/{repo} ‚Üí will try to create it...")

    # 2) Determine authenticated user login
    me = api_get("https://api.github.com/user", token)
    me.raise_for_status()
    login = me.json()["login"]

    payload = {"name": repo, "private": private}

    # 3) Create repo either under user or org
    if owner.lower() == login.lower():
        create_url = "https://api.github.com/user/repos"
    else:
        create_url = f"https://api.github.com/orgs/{owner}/repos"

    cr = api_post(create_url, token, payload)

    if cr.status_code in (201,):
        print(f"‚úÖ Created repo: {owner}/{repo}")
        return

    if cr.status_code == 403:
        print("‚ùå GitHub API refused repo creation (403).")
        print("Likely missing token permission: Administration (write) OR org policy blocks creation.")
        print("Fix: regenerate token with Repository permissions ‚Üí Administration: Read & write.")
        raise RuntimeError(cr.text)

    raise RuntimeError(f"Repo creation failed: {cr.status_code} {cr.text}")

def write_netrc(login, token):
    # Git will read credentials from ~/.netrc (avoids token in URL)
    netrc = Path.home() / ".netrc"
    netrc.write_text(textwrap.dedent(f"""\
    machine github.com
    login {login}
    password {token}
    """))
    netrc.chmod(0o600)
    return netrc

# -----------------------------
# Main
# -----------------------------
token = get_token()

# Verify token & get login
me = api_get("https://api.github.com/user", token)
me.raise_for_status()
login = me.json()["login"]
print(f"‚úÖ Token works for user: {login}")

# Ensure remote repo exists (create if missing)
ensure_repo_exists(OWNER, REPO_NAME, token, private=PRIVATE)

# Build a basic codebase layout
PROJECT_DIR.mkdir(parents=True, exist_ok=True)
(PROJECT_DIR / "src").mkdir(exist_ok=True)
(PROJECT_DIR / "notebooks").mkdir(exist_ok=True)
(PROJECT_DIR / "data").mkdir(exist_ok=True)

# Add starter files if missing
readme = PROJECT_DIR / "README.md"
if not readme.exists():
    readme.write_text(f"# {REPO_NAME}\n\nCodebase created from Colab notebook.\n")

gitignore = PROJECT_DIR / ".gitignore"
if not gitignore.exists():
    gitignore.write_text(textwrap.dedent("""\
    __pycache__/
    *.pyc
    .ipynb_checkpoints/
    .DS_Store
    data/
    *.pt
    *.pth
    *.ckpt
    """))

# Optionally copy notebook into repo
if NOTEBOOK_PATH:
    nb = Path(NOTEBOOK_PATH)
    if nb.exists() and nb.suffix == ".ipynb":
        dest = PROJECT_DIR / "notebooks" / nb.name
        dest.write_bytes(nb.read_bytes())
        print(f"‚úÖ Copied notebook ‚Üí {dest}")
    else:
        print(f"‚ö†Ô∏è NOTEBOOK_PATH not found or not .ipynb: {NOTEBOOK_PATH}")

# Initialize git repo if needed
if not (PROJECT_DIR / ".git").exists():
    sh(["git", "init", "-b", "main"], cwd=PROJECT_DIR)

# Set git identity (needed for commits in Colab)
sh(["git", "config", "user.name", GIT_USER_NAME], cwd=PROJECT_DIR)
sh(["git", "config", "user.email", GIT_USER_EMAIL], cwd=PROJECT_DIR)

# Set remote
remote_url = f"https://github.com/{OWNER}/{REPO_NAME}.git"
# Use netrc for auth
netrc = write_netrc(login, token)

# Add remote (or update)
try:
    sh(["git", "remote", "add", "origin", remote_url], cwd=PROJECT_DIR)
except subprocess.CalledProcessError:
    sh(["git", "remote", "set-url", "origin", remote_url], cwd=PROJECT_DIR)

# Commit + push
sh(["git", "add", "-A"], cwd=PROJECT_DIR)
# Commit only if there is something to commit
status = subprocess.run(["git", "status", "--porcelain"], cwd=PROJECT_DIR, capture_output=True, text=True)
if status.stdout.strip():
    sh(["git", "commit", "-m", "Initial codebase from notebook"], cwd=PROJECT_DIR)
else:
    print("‚ÑπÔ∏è Nothing new to commit.")

# Push (handles empty remote)
sh(["git", "push", "-u", "origin", "main"], cwd=PROJECT_DIR)

# Cleanup credential file (optional but safer)
netrc.unlink(missing_ok=True)

print(f"‚úÖ Done. Repo: {remote_url}")
print(f"üìÅ Local: {PROJECT_DIR}")


‚úÖ Token works for user: mithridatemelik
‚ÑπÔ∏è Repo not found: mithridatemelik/mantis_histopathology ‚Üí will try to create it...
‚úÖ Created repo: mithridatemelik/mantis_histopathology
‚úÖ Done. Repo: https://github.com/mithridatemelik/mantis_histopathology.git
üìÅ Local: /content/drive/MyDrive/mit/mantis_histopathology


## 5) Add a strong `.gitignore` (protect secrets by default)

This appends safe defaults so you never commit tokens/keys by accident.
It **does not** read or print any secret file contents.


In [None]:
DEFAULT_GITIGNORE_LINES = [
    "# --- Security / secrets ---",
    ".env",
    ".env.*",
    "secrets/",
    "**/secrets/",
    "*.key",
    "*.pem",
    "*.p12",
    "*.pfx",
    "*token*",
    "*secret*",
    "*apikey*",
    "*api_key*",
    "*credentials*",
    "*passwd*",
    "*password*",
    "*private*",
    "# --- Colab / notebooks ---",
    ".ipynb_checkpoints/",
    "# --- Caches ---",
    "__pycache__/",
    "*.pyc",
    ".cache/",
    ".DS_Store",
]

gitignore_path = PROJECT_DIR / ".gitignore"
existing = set()
if gitignore_path.exists():
    existing = set(line.rstrip("\n") for line in gitignore_path.read_text(errors="ignore").splitlines())

new_lines = [ln for ln in DEFAULT_GITIGNORE_LINES if ln not in existing]
if new_lines:
    with gitignore_path.open("a", encoding="utf-8") as f:
        f.write("\n" + "\n".join(new_lines) + "\n")
    print("‚úÖ Updated .gitignore with safe defaults.")
else:
    print("‚úÖ .gitignore already contains safe defaults.")

‚úÖ Updated .gitignore with safe defaults.


## 6) Install dependencies (auto-detect)

Tries:
- `requirements.txt` ‚Üí `pip install -r`
- `pyproject.toml` or `setup.py` ‚Üí `pip install -e .`


In [None]:
import subprocess

def run(cmd, cwd=None):
    print("‚ñ∂", cmd)
    subprocess.run(cmd, shell=True, check=True, cwd=cwd)

run("pip -q install -U pip")

req = PROJECT_DIR / "requirements.txt"
pyproject = PROJECT_DIR / "pyproject.toml"
setup_py = PROJECT_DIR / "setup.py"

if req.exists():
    run(f"pip -q install -r {req}", cwd=str(PROJECT_DIR))
    print("‚úÖ Installed from requirements.txt")
elif pyproject.exists() or setup_py.exists():
    run("pip -q install -e .", cwd=str(PROJECT_DIR))
    print("‚úÖ Installed editable package")
else:
    print("‚ÑπÔ∏è No requirements.txt / pyproject.toml / setup.py found. Skipping install.")

‚ñ∂ pip -q install -U pip
‚ÑπÔ∏è No requirements.txt / pyproject.toml / setup.py found. Skipping install.


## 7) Create `config/paths.py` inside the repo (Drive paths in one place)

Use anywhere in your code:
```python
from config.paths import DATA_ROOT, OUT_ROOT, SECRETS_DIR
```


In [None]:
config_dir = PROJECT_DIR / "config"
config_dir.mkdir(parents=True, exist_ok=True)

paths_py = config_dir / "paths.py"
init_py = config_dir / "__init__.py"
init_py.touch(exist_ok=True)

paths_py.write_text(
'''"""Centralized paths for the Mantis / histopathology project.

Security note:
- Do NOT store tokens/keys in the repo.
- Keep secrets in Drive-only: ${MANTIS_PROJECT_ROOT}/secrets/
"""
from pathlib import Path
import os

PROJECT_ROOT = Path(os.environ.get(
    "MANTIS_PROJECT_ROOT",
    "/content/drive/MyDrive/mit/histopathology_mantis_20260115"
)).resolve()

CODE_ROOT   = PROJECT_ROOT / "code"
DATA_ROOT   = PROJECT_ROOT / "data"
OUT_ROOT    = PROJECT_ROOT / "outputs"
CACHE_DIR   = PROJECT_ROOT / ".cache"
SECRETS_DIR = PROJECT_ROOT / "secrets"  # Drive-only

for p in [CODE_ROOT, DATA_ROOT, OUT_ROOT, CACHE_DIR, SECRETS_DIR]:
    p.mkdir(parents=True, exist_ok=True)
''',
    encoding="utf-8"
)

print("‚úÖ Wrote:", paths_py)

‚úÖ Wrote: /content/drive/MyDrive/mit/mantis_histopathology/config/paths.py


## 8) Smoke test (no secrets printed)


In [None]:
import sys
sys.path.insert(0, str(PROJECT_DIR))

from config.paths import PROJECT_ROOT, DATA_ROOT, OUT_ROOT, SECRETS_DIR
print("PROJECT_ROOT:", PROJECT_ROOT)
print("DATA_ROOT   :", DATA_ROOT)
print("OUT_ROOT    :", OUT_ROOT)
print("SECRETS_DIR :", SECRETS_DIR)

PROJECT_ROOT: /content/drive/MyDrive/mit/histopathology_mantis_20260115
DATA_ROOT   : /content/drive/MyDrive/mit/histopathology_mantis_20260115/data
OUT_ROOT    : /content/drive/MyDrive/mit/histopathology_mantis_20260115/outputs
SECRETS_DIR : /content/drive/MyDrive/mit/histopathology_mantis_20260115/secrets


## 9) Safe pull/update workflow


In [None]:
import subprocess

def run(cmd, cwd=None):
    print("‚ñ∂", cmd)
    subprocess.run(cmd, shell=True, check=True, cwd=cwd)

print("Checking status...")
try:
    run("git status", cwd=str(PROJECT_DIR))
except Exception as e:
    print(f"Error checking status: {e}")

print("\nAttempting to pull updates...")
try:
    run("git pull", cwd=str(PROJECT_DIR))
except subprocess.CalledProcessError:
    print("\n‚ö†Ô∏è `git pull` failed.")
    print("Common reasons:")
    print("1. Uncommitted local changes (see 'git status' above).")
    print("2. No upstream branch set (if the repo is empty).")
    print("Suggestion: Commit your changes locally using the commands below, then pull/push.")
    print("  git add .")
    print("  git commit -m 'Update config'")

Checking status...
‚ñ∂ git status

Attempting to pull updates...
‚ñ∂ git pull

‚ö†Ô∏è `git pull` failed.
Common reasons:
1. Uncommitted local changes (see 'git status' above).
2. No upstream branch set (if the repo is empty).
Suggestion: Commit your changes locally using the commands below, then pull/push.
  git add .
  git commit -m 'Update config'


In [16]:
from pathlib import Path
import subprocess, shlex

# Existing project root on Drive (this already matches your notebook)
DRIVE_ROOT = Path("/content/drive/MyDrive/mit/histopathology_mantis_20260115")
CODE_ROOT  = DRIVE_ROOT / "code"

# The local repo folder you already created + pushed
PROJECT_DIR = Path("/content/drive/MyDrive/mit/mantis_histopathology")

SRC = CODE_ROOT
DST = PROJECT_DIR / "code"

print("SRC:", SRC)
print("DST:", DST)

# sanity check
assert SRC.exists(), f"SRC not found: {SRC}"
PROJECT_DIR.mkdir(parents=True, exist_ok=True)
DST.mkdir(parents=True, exist_ok=True)

# Exclusions (prevent committing junk/large outputs)
EXCLUDES = [
    ".git", ".ipynb_checkpoints", "__pycache__",
    "data", "outputs", "secrets", ".cache",
    "*.ckpt", "*.pth", "*.pt", "*.onnx",
    "*.tif", "*.tiff", "*.svs"
]

exclude_args = " ".join([f"--exclude={shlex.quote(x)}" for x in EXCLUDES])

cmd = f'rsync -av {exclude_args} "{SRC}/" "{DST}/"'
print("Running:", cmd)
subprocess.run(["bash", "-lc", cmd], check=True)

# show git status after sync
subprocess.run(["git", "-C", str(PROJECT_DIR), "status"], check=False)


SRC: /content/drive/MyDrive/mit/histopathology_mantis_20260115/code
DST: /content/drive/MyDrive/mit/mantis_histopathology/code
Running: rsync -av --exclude=.git --exclude=.ipynb_checkpoints --exclude=__pycache__ --exclude=data --exclude=outputs --exclude=secrets --exclude=.cache --exclude='*.ckpt' --exclude='*.pth' --exclude='*.pt' --exclude='*.onnx' --exclude='*.tif' --exclude='*.tiff' --exclude='*.svs' "/content/drive/MyDrive/mit/histopathology_mantis_20260115/code/" "/content/drive/MyDrive/mit/mantis_histopathology/code/"


CompletedProcess(args=['git', '-C', '/content/drive/MyDrive/mit/mantis_histopathology', 'status'], returncode=0)

In [18]:
from pathlib import Path
import subprocess, shlex

DRIVE_ROOT = Path("/content/drive/MyDrive/mit/histopathology_mantis_20260115")
PROJECT_DIR = Path("/content/drive/MyDrive/mit/mantis_histopathology")

SRC = DRIVE_ROOT
DST = PROJECT_DIR

EXCLUDES = [
    ".git", ".ipynb_checkpoints", "__pycache__",
    "data", "outputs", "secrets", ".cache",
    "*.ckpt", "*.pth", "*.pt", "*.onnx",
    "*.tif", "*.tiff", "*.svs"
]

exclude_args = " ".join([f"--exclude={shlex.quote(x)}" for x in EXCLUDES])
cmd = f'rsync -av {exclude_args} "{SRC}/" "{DST}/"'
print("Running:", cmd)

try:
    subprocess.run(["bash", "-lc", cmd], check=True)
    print("‚úÖ rsync completed successfully.")
except subprocess.CalledProcessError as e:
    if e.returncode == 24:
        print(f"\n‚ö†Ô∏è rsync failed with exit status 24. This often means 'Partial transfer due to vanished source files'.")
        print("This can happen in Google Colab when syncing from Google Drive, as files might be temporary or become inaccessible during the transfer.")
        print("Suggestion: This error is often transient. You may try running this cell again. If it persists, inspect the source directory for unstable files.")
    else:
        print(f"\n‚ùå rsync failed with unexpected exit status {e.returncode}.")
        print(f"Error details: {e}")
        # Re-raise the exception to make sure the user knows there's a problem
        raise
finally:
    subprocess.run(["git","-C",str(PROJECT_DIR),"status"], check=False)

Running: rsync -av --exclude=.git --exclude=.ipynb_checkpoints --exclude=__pycache__ --exclude=data --exclude=outputs --exclude=secrets --exclude=.cache --exclude='*.ckpt' --exclude='*.pth' --exclude='*.pt' --exclude='*.onnx' --exclude='*.tif' --exclude='*.tiff' --exclude='*.svs' "/content/drive/MyDrive/mit/histopathology_mantis_20260115/" "/content/drive/MyDrive/mit/mantis_histopathology/"


CalledProcessError: Command '['bash', '-lc', 'rsync -av --exclude=.git --exclude=.ipynb_checkpoints --exclude=__pycache__ --exclude=data --exclude=outputs --exclude=secrets --exclude=.cache --exclude=\'*.ckpt\' --exclude=\'*.pth\' --exclude=\'*.pt\' --exclude=\'*.onnx\' --exclude=\'*.tif\' --exclude=\'*.tiff\' --exclude=\'*.svs\' "/content/drive/MyDrive/mit/histopathology_mantis_20260115/" "/content/drive/MyDrive/mit/mantis_histopathology/"']' returned non-zero exit status 24.

## 10) Recommended usage
- Put datasets in: `DATA_ROOT`
- Put exports/logs/Mantis CSVs in: `OUT_ROOT`
- Put secrets in: `SECRETS_DIR` (Drive-only; never committed)
