# Instructions for KramaBench Human Experts

Thank you for helping us build KramaBench by providing high-quality human solutions to selected data-science tasks.

This document describes:

1. How to fork and clone the KramaBench repository.
2. Where and how to save your solutions.
3. How to structure each notebook (draft vs. final solution).
4. How to record your active time per task in the provided Google Sheet.

Please read this document **before** starting your first task.

**Important note about LLM usage**: copilot is fine, but directly feeding the task to LLMs is strictly forbidden.

---

## 1. Your assignment

You should have received a list of **task IDs** (e.g., `archeology-hard-1`, `environment-easy-3`, …) and an **expert ID** (e.g., `expert01`).

For each assigned `task_ID`, you will:

1. Implement a complete solution in a Jupyter notebook.
2. Save the notebook in a designated directory inside the repository.
3. Track your **active time** on the task and log it in the Google Sheet.

---

## 2. Time tracking and Google Sheet

We are interested not only in whether you solve the task, but also **how much human effort** it takes and where that effort goes.

For **each task**:

1. Use a timer (phone/computer) to track **active time only**.  
   - Count time when you are actively reading, exploring data, thinking about the pipeline, writing code, or debugging.
   - Do **not** count long interruptions (meetings, lunch, unrelated work, etc.).

2. Break down your active time into the following categories (rough guidelines):
   - **Data exploration**  
     Inspecting files, reading metadata/column names, plotting, basic profiling, etc.
   - **Pipeline design / planning**  
     Deciding how to structure the pipeline, which files to join, what transformations to apply, etc. (this often overlaps with exploration; make your best judgment).
   - **Coding / implementation**  
     Writing the actual code that loads data, cleans/transforms it, and computes the answer.
   - **Testing / debugging**  
     Fixing errors, handling edge cases, validating outputs, re-running parts of the pipeline.

3. Record your time in the shared Google Sheet:  
   **[Time Tracking Sheet](https://docs.google.com/spreadsheets/d/1RK4M9nxqNQoe4OXEh5yz25w0iGk-90jlSvHPoFPMhS4/edit?usp=sharing)**

   For each `task_ID`, fill in the row with at least:
   - Your **expert ID**
   - The **task ID**
   - **Total active time in minute** (e.g., `45 min`)
   - Approximate breakdown (e.g., `exploration: 15 min`, `design: 10 min`, `coding: 20 min`, `testing: 10 min`)
   - Optional notes (e.g., anything confusing, especially hard steps, etc.)

---

## 3. Fork and clone the repository

You will work in your own fork of the KramaBench repository.

1. Go to the GitHub repo:  
   <https://github.com/mitdbg/LLMBenchmark>

2. **Fork the repo**:
   - Make sure you are logged into your GitHub account.
   - Click the **“Fork”** button in the top-right.
   - Create a fork under your personal GitHub account (or the organization specified to you).

3. **Clone your fork locally**:

   ```bash
   git clone https://github.com/<your-username>/LLMBenchmark.git
   cd LLMBenchmark


---

## 4. Where to put your solutions

All expert solutions should be placed under a new directory inside the repository:

    LLMBenchmark/
      experts/
        <expert_ID>/
          <task_ID>.ipynb
          ...

- Replace `<expert_ID>` with the ID assigned to you (e.g., `exp01`, `alice_smith`).
- If the directory does not exist yet, create it from the repo root:

    mkdir -p experts/<expert_ID>

For each assigned `task_ID`, create a **single Jupyter notebook** named:

    experts/<expert_ID>/<task_ID>.ipynb

**Examples:**

- `experts/expert01/archeology-hard-1.ipynb`
- `experts/expert01/environment-easy-3.ipynb`

All your work should stay inside your expert directory.  
Please **do not modify** benchmark code, ground-truth solutions, or other experts’ folders.

---

## 5. Notebook structure and expectations

Each notebook must contain **two clearly labeled parts**.

### 5.1 Part 1 — Drafting & Exploration

This part should capture your natural workflow as you work toward the solution.

Add a heading like:

    # Part 1 — Drafting and Exploration

Include in this section:

- Data inspection and exploration (reading files, printing heads, plotting, etc.)
- Quick experiments and partial ideas
- Dead ends or discarded approaches
- Sanity checks and notes

You do **not** need to clean this up. We are explicitly interested in your process.

---

### 5.2 Part 2 — Final executable solution

This part should contain a **clean, reproducible, end-to-end solution**.

Add a heading like:

    # Part 2 — Final Solution

It should include:

- A short explanation of your final approach (in Markdown).
- Any helper functions or setup cells that your solution needs.
- **One stand-alone solution cell** that:
  - Imports all required libraries.
  - Loads all necessary data using **relative paths** (e.g., `data/environment/input/...`).
  - Performs all transformations and analysis.
  - Computes and **prints the final answer**.

Label that cell clearly, for example:

    # === FINAL SOLUTION CELL ===

We will use this cell to automatically run and check your solution in a fresh environment.

---

## 6. Reproducibility checklist

Before you consider a notebook finished, please verify:

- You can **restart the kernel and run all cells** from top to bottom without errors.
- No cell depends on hidden state from a previous run.
- All file paths are **relative** (no absolute local paths like `/Users/...`).
- Any randomness is controlled (e.g., set a random seed) so results are stable.
- The **final solution cell** runs successfully on its own and prints the final answer.

---

## 7. Submitting your work

When you have finished your assigned tasks, submit a pull request.

---

Thank you for contributing your expertise to KramaBench!
