# CoTLab Tutorial

**CoTLab** is a research toolkit for studying Chain-of-Thought (CoT) reasoning in LLMs.

In this tutorial you will learn to:
1. Load a model with CoTLab's backend system
2. Run experiments using CoTLab's experiment API
3. Log and save results with ExperimentLogger
4. Analyze results with the analysis module

> **Note**: We use GPT-2 here for fast demo. For real experiments, use larger models like MedGemma.

## 1. Imports

In [1]:
import os

os.environ["TOKENIZERS_PARALLELISM"] = "false"  # Suppress HF warning

from cotlab.backends import TransformersBackend
from cotlab.datasets.loaders import TutorialDataset
from cotlab.experiments import CoTFaithfulnessExperiment
from cotlab.logging import ExperimentLogger
from cotlab.prompts.strategies import (
    ChainOfThoughtStrategy,
    ContrarianStrategy,
    DirectAnswerStrategy,
)

## 2. Load Backend and Model

In [2]:
# Create backend and load model
backend = TransformersBackend(device="auto", dtype="bfloat16")
backend.load_model("openai-community/gpt2")

# Load dataset
dataset = TutorialDataset(path="../data/tutorial.json")
print(f"Dataset: {len(dataset)} samples")

  Device map: auto
  Dtype: torch.bfloat16
  Cache: ~/.cache/huggingface (HF default)
  Resolved device: mps
Dataset: 10 samples


## 3. Define Experiment and Strategies

In [3]:
# Create CoTLab experiment (uses all samples by default)
experiment = CoTFaithfulnessExperiment(
    name="tutorial_comparison",
)

# Define prompting strategies
strategies = {
    "contrarian": ContrarianStrategy(),
    "chain_of_thought": ChainOfThoughtStrategy(),
    "direct_answer": DirectAnswerStrategy(),
}

print(f"Experiment: {experiment.name}")
print(f"Strategies: {list(strategies.keys())}")

Experiment: tutorial_comparison
Strategies: ['contrarian', 'chain_of_thought', 'direct_answer']


## 4. Run Experiments with Logging

Use CoTLab's `ExperimentLogger` to save results to JSON files.

In [4]:
from pathlib import Path

results = {}

for name, strategy in strategies.items():
    print(f"\n{'=' * 60}")
    print(f"Running: {name}")
    print(f"{'=' * 60}")

    # Create logger for this run
    logger = ExperimentLogger(f"../outputs/tutorial_{name}")

    # Log configuration
    logger.log_config(
        {
            "experiment": experiment.name,
            "strategy": name,
            "model": "gpt2",
        }
    )

    # Run CoTLab experiment with logger (uses all samples)
    result = experiment.run(
        backend=backend,
        dataset=dataset,
        prompt_strategy=strategy,
        logger=logger,
    )

    # Save results using logger
    output_path = logger.save_results(result)
    print(f"Results saved to: {output_path}")

    results[name] = result


Running: contrarian
Running Faithfulness Test: contrarian vs direct_answer
Generating contrarian responses...
Generating direct_answer responses...
Analyzing results...


Analyzing samples: 100%|██████████| 10/10 [00:00<00:00, 16422.49it/s]

Results saved to: ../outputs/tutorial_contrarian/results.json

Running: chain_of_thought
Running Faithfulness Test: chain_of_thought vs direct_answer
Generating chain_of_thought responses...





Generating direct_answer responses...
Analyzing results...


Analyzing samples: 100%|██████████| 10/10 [00:00<00:00, 28630.06it/s]

Results saved to: ../outputs/tutorial_chain_of_thought/results.json

Running: direct_answer
Running Faithfulness Test: direct_answer vs direct_answer
Generating direct_answer responses...





Generating direct_answer responses...
Analyzing results...


Analyzing samples: 100%|██████████| 10/10 [00:00<00:00, 131896.35it/s]

Results saved to: ../outputs/tutorial_direct_answer/results.json





## 5. Analyze Results with CoTLab Analysis Module

Use `analyse_experiments` for proper answer extraction and comparison.

In [5]:
from cotlab.analyse_experiments import (
    analyse_experiments_dir,
    export_to_csv,
    print_analysis_report,
)

# Analyze all saved results
results_dir = Path("../outputs")
all_results = analyse_experiments_dir(results_dir)

# Print comprehensive analysis report
print_analysis_report(all_results, "Tutorial Experiment Analysis")

Tutorial Experiment Analysis

COT FAITHFULNESS EXPERIMENTS
----------------------------------------
Prompt                      Agree%  CoT Acc Direct Acc  Samples
------------------------------------------------------------
chain_of_thought              0.0%     0.0%      10.0%       10
contrarian                   20.0%    10.0%      20.0%       10
direct_answer                 0.0%    10.0%       0.0%       10

SUMMARY BY DATASET
Dataset                Agree%  CoT Acc Direct Acc  Samples
-------------------------------------------------------
tutorial                 6.7%     6.7%      10.0%       30

OVERALL (Faithfulness): 30 samples
  - Agreement: 6.7%
  - CoT Accuracy: 6.7%
  - Direct Accuracy: 10.0%


## 6. Export Results to CSV

In [6]:
# Export analysis to CSV for further processing
csv_path = results_dir / "tutorial_analysis.csv"
export_to_csv(all_results, csv_path)

print(f"\nCSV exported to: {csv_path}")


Results saved to: ../outputs/tutorial_analysis.csv

CSV exported to: ../outputs/tutorial_analysis.csv


## 7. Next Steps

**Other CoTLab experiments**:
- `LogitLensExperiment` - See what model "thinks" at each layer
- `AttentionAnalysisExperiment` - Analyze attention patterns
- `ProbingClassifierExperiment` - Train probes on hidden states

**For future use**:
- Use `python -m cotlab.runner` CLI with Hydra configs
- See `conf/` folder for configuration options
- Use larger models: `google/medgemma-27b-text-it`