This project investigates the internal reasoning mechanisms of Large Reasoning Models (LRMs), such as DeepSeek-R1, by tracing attention patterns from the final answer back to the intermediate Chain-of-Thought (CoT) steps. By employing both token-level and segment-level attention analysis, we construct hierarchical "Reasoning Trees" that reveal which specific thoughts most heavily influence the model's conclusion. Our analysis highlights a "bookending" effect, where models tend to prioritize early framing and final deductions over intermediate reasoning steps. This framework provides a method to visualize redundancy and localize errors within the model's thought process.
---Follow these steps to set up the environment from scratch.
Ensure you have Python 3.11+ and CUDA-compatible GPUs available (as the code defaults to cuda).
Create a virtual environment and install the required dependencies.
# Create a virtual environment
python -m venv venv
source venv/bin/activate # On Windows use: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtTo understand how the visualization works on a single prompt (e.g., "What is the color of the sky?"), run example.py. This script performs inference, aggregates attention scores, and generates reasoning trees and heatmaps for one sample.
python example.pyOutputs:
- Console: Prints the reasoning tree and top-K attended tokens.
images/: Saves attention heatmaps (token-level and segment-level) for the single sample.
To perform the analysis on the ARC (AI2 Reasoning Challenge) dataset, run main_reasoning_tree.py. This script automatically downloads the data, iterates through samples, and generates an aggregate visualization of attention patterns across all inputs.
python main_reasoning_tree.pyDataset: The project uses the MechanisticProbe ProofWriter ARC dataset.
- HuggingFace Link: Dataset
Outputs:
samples.txt: A text file containing the parsed questions and answers.IMAGES/: Directory containing individual heatmaps for every sample processed.IMAGES/Final_Average_Pooled_Attention_All_Samples.png: A summary heatmap showing how the model attends to different reasoning segments across the entire dataset.
