# Assignment 0 — Colab Workflow (GitLab + Pre-commit + Submission Validation)

This notebook teaches the standard workflow used throughout the course:

1. Authenticate to GitLab (PAT)
2. Clone your team repo
3. Install dependencies
4. Install **pre-commit** and enable a hook to strip notebook outputs
5. Run `notebooks/submission.ipynb` end-to-end
6. Validate `predictions.csv`
7. Commit + push + tag

> **Do not** print your token. Avoid commands that echo remote URLs after adding a token.


In [None]:
# (Colab) show python and system info
import sys, platform
print(sys.version)
print(platform.platform())


## 1) Enter GitLab repo info

You can clone using HTTPS with a **Personal Access Token (PAT)**.  
Use a token with minimal permissions for repo read/write, and set an expiration.

You will be prompted for:
- GitLab username
- PAT (hidden input)
- Repo HTTPS URL (e.g., `https://gitlab.example.edu/course/team-a.git`)


In [None]:
import getpass, re, os, subprocess
from pathlib import Path

GITLAB_USER = input("GitLab username (e.g., tkline): ").strip()
GITLAB_TOKEN = getpass.getpass("GitLab PAT (input hidden): ").strip()
REPO_URL = input("Repo HTTPS URL: ").strip()

assert REPO_URL.startswith("https://"), "Please provide an https:// clone URL"

# Build an authenticated URL without printing it
AUTHED_URL = re.sub(r"^https://", f"https://{GITLAB_USER}:{GITLAB_TOKEN}@", REPO_URL)

print("Ready to clone (token not displayed).")


In [None]:
# Clone repo
# IMPORTANT: do not print AUTHED_URL.
repo_dir = Path("student_repo")

if repo_dir.exists():
    print("Repo directory already exists; remove it or pick a new folder.")
else:
    subprocess.check_call(["git", "clone", AUTHED_URL, str(repo_dir)])
    print("Cloned into", repo_dir)


In [None]:
# Enter repo
%cd student_repo
!git status


## 2) Install dependencies

This installs whatever is in `requirements.txt`.


In [None]:
!pip -q install -r requirements.txt


## 3) Enable pre-commit hook to strip notebook outputs

This prevents giant notebooks and reduces merge/diff pain.

One-time per clone:
- `pre-commit install`

After that, every `git commit` will strip outputs from `*.ipynb`.


In [None]:
!pip -q install pre-commit
!pre-commit install


## 4) Create your submission notebook from the template (first time only)

If your repo already has `notebooks/submission.ipynb`, skip this.


In [None]:
from pathlib import Path
template = Path("notebooks/submission_template.ipynb")
target = Path("notebooks/submission.ipynb")

if target.exists():
    print("submission.ipynb already exists ✅")
else:
    if not template.exists():
        print("Template not found at notebooks/submission_template.ipynb")
        print("Ask the instructor or pull latest course template.")
    else:
        target.write_bytes(template.read_bytes())
        print("Created notebooks/submission.ipynb from template.")


## 5) Run the submission notebook end-to-end (local)

In Colab, you can open `notebooks/submission.ipynb` and run *Runtime → Run all*.

For a **quick format check**, you can also just run the final cells or run validation below once `predictions.csv` exists.


In [None]:
# OPTIONAL: open notebook in Colab's notebook UI (click in file browser on left):
# notebooks/submission.ipynb


## 6) Validate the predictions file format

This checks:
- required columns
- probabilities in [0, 1]
- row_ids match the test file

It assumes the submission notebook wrote `predictions.csv` in the repo root.


In [None]:
from pathlib import Path
pred_path = Path("predictions.csv")
test_path = Path("data/public/public_test.csv")

if not pred_path.exists():
    print("predictions.csv not found. Run notebooks/submission.ipynb first.")
else:
    !python scripts/validate_submission.py --pred predictions.csv --test data/public/public_test.csv


## 7) Commit + push + tag

You will:
- add changes
- commit (pre-commit hook runs here)
- push
- tag a milestone (example: `milestone_wk3`) and push tags

> **Note:** because we cloned with a token in the remote URL, avoid printing remotes.


In [None]:
!git add -A
!git commit -m "Assignment 0: workflow + initial submission notebook"
!git push


In [None]:
TAG = "milestone_wk1"
!git tag -a {TAG} -m "Course milestone submission"
!git push --tags
print("Tagged and pushed:", TAG)


## Done ✅

If you hit issues:
- Make sure you pulled the latest course template (missing files).
- Make sure `data/public/*` exists in your repo (or your instructor provided it separately).
- If `git push` fails, your token may be missing permissions or expired.
