# AI-Scientist-v2 Colab Notebook (Gemini-only)

This notebook mirrors the structure of `shinka_tutorial.ipynb` and runs the full pipeline:
**Ideation → BFTS Experiments → Debugging → Writeup → Review**.

Notes:
- The repo defaults are patched for `gemini-3-pro-preview`.
- This notebook stores outputs in Google Drive.
- Running LLM-generated code can be unsafe; use a sandboxed environment.

## 0. Colab + Drive setup
Mount Drive and choose where to store the repo and outputs.

In [None]:
from pathlib import Path
from google.colab import drive

drive.mount('/content/drive')

DRIVE_ROOT = Path('/content/drive/MyDrive')
WORK_ROOT = DRIVE_ROOT / 'ai-scientist-v2'
REPO_URL = 'https://github.com/immanuelk1m/AI-Scientist-v2.git'
REPO_DIR = WORK_ROOT / 'AI-Scientist-v2'

WORK_ROOT.mkdir(parents=True, exist_ok=True)
print('Drive root:', DRIVE_ROOT)
print('Work root:', WORK_ROOT)

## 1. Clone repo (or update)
Clones into Drive so `experiments/` are persisted.

In [None]:
import subprocess

if not REPO_DIR.exists():
    subprocess.run(["git", "clone", REPO_URL, str(REPO_DIR)], check=True)
else:
    subprocess.run(["git", "-C", str(REPO_DIR), "pull"], check=True)

print('Repo:', REPO_DIR)

## 2. Install Python dependencies

In [None]:
import subprocess

subprocess.run(["pip", "install", "-r", str(REPO_DIR / "requirements.txt")], check=True)

## 2b. (Optional) Install LaTeX for PDF writeup
Required for `pdflatex` if you want PDF outputs. This can take several minutes.

In [None]:
# Uncomment if you need LaTeX in Colab
# !apt-get update -y
# !apt-get install -y texlive-latex-extra texlive-fonts-recommended texlive-fonts-extra texlive-science latexmk

## 3. API keys (Gemini)
Set `GEMINI_API_KEY`. Optional: `S2_API_KEY` for Semantic Scholar.

In [None]:
import os
from getpass import getpass

os.environ["GEMINI_API_KEY"] = getpass("GEMINI_API_KEY: ")

# Optional (press Enter to skip)
_s2 = getpass("S2_API_KEY (optional): ")
if _s2:
    os.environ["S2_API_KEY"] = _s2

## 4. GPU check

In [None]:
import torch
print('Torch:', torch.__version__)
print('CUDA available:', torch.cuda.is_available())
if torch.cuda.is_available():
    print('GPU:', torch.cuda.get_device_name(0))

## 5. Write your ideation topic
Edit the fields below and save the markdown.

In [None]:
from textwrap import dedent

TOPIC_NAME = "my_research_topic"
TOPIC_MD = dedent('''
# Title
Your project title here

# Keywords
- keyword1
- keyword2
- keyword3

# TL;DR
One-sentence summary of the research idea.

# Abstract
A short abstract describing the task, approach, and expected insights.

# Short Hypothesis
A concise hypothesis you want to test.

# Experiments
- Experiment 1: ...
- Experiment 2: ...

# Risk Factors and Limitations
- Limitation 1: ...
- Limitation 2: ...
''').strip() + "\n"

idea_md_path = REPO_DIR / "ai_scientist" / "ideas" / f"{TOPIC_NAME}.md"
idea_md_path.write_text(TOPIC_MD)
print('Wrote:', idea_md_path)

## 6. Ideation (Gemini-only)
Generates `<topic>.json` next to your topic file.

In [None]:
MODEL = "gemini-3-pro-preview"

cmd = [
    "python",
    str(REPO_DIR / "ai_scientist" / "perform_ideation_temp_free.py"),
    "--workshop-file",
    str(idea_md_path),
    "--model",
    MODEL,
    "--max-num-generations",
    "20",
    "--num-reflections",
    "5",
]

print('Running:', ' '.join(cmd))
subprocess.run(cmd, check=True)

## 7. Run BFTS experiments (default params)
Uses defaults in `bfts_config.yaml` and Gemini for all stages.

In [None]:
idea_json_path = idea_md_path.with_suffix('.json')

cmd = [
    "python",
    str(REPO_DIR / "launch_scientist_bfts.py"),
    "--load_ideas",
    str(idea_json_path),
    "--model_writeup",
    MODEL,
    "--model_writeup_small",
    MODEL,
    "--model_citation",
    MODEL,
    "--model_review",
    MODEL,
    "--model_agg_plots",
    MODEL,
]

print('Running:', ' '.join(cmd))
subprocess.run(cmd, check=True)

## 8. Inspect latest run
Lists the newest experiment folder and useful log locations.

In [None]:
from pathlib import Path

exp_root = REPO_DIR / "experiments"
if not exp_root.exists():
    raise FileNotFoundError("No experiments directory found yet.")

runs = [p for p in exp_root.iterdir() if p.is_dir()]
latest_run = max(runs, key=lambda p: p.stat().st_mtime)
print('Latest run:', latest_run)

print('Log root:', latest_run / 'logs')
print('Figures:', latest_run / 'figures')
print('PDFs:', list(latest_run.glob('*.pdf')))

## 9. View tree plot (HTML)
Opens the latest tree plot if available.

In [None]:
from IPython.display import IFrame, display
import glob

html_candidates = sorted(glob.glob(str(latest_run / "logs" / "stage_*" / "tree_plot.html")))
if html_candidates:
    display(IFrame(src=html_candidates[-1], width="100%", height=800))
else:
    print('No tree_plot.html found yet.')

## 10. (Optional) Re-run writeup
Useful if writeup failed or you want to regenerate PDFs.

In [None]:
# Example: re-run writeup
# cmd = [
#     "python",
#     str(REPO_DIR / "ai_scientist" / "perform_writeup.py"),
#     "--folder",
#     str(latest_run),
#     "--model",
#     MODEL,
#     "--big-model",
#     MODEL,
# ]
# subprocess.run(cmd, check=True)

## 11. Ensure results are stored in Drive
If the repo is in Drive, outputs are already persisted.

In [None]:
print('Repo path:', REPO_DIR)
print('Experiments path:', REPO_DIR / 'experiments')