## Discovering Skills from Activations and Semantic Interpretations

In [1]:
import platform
platform.python_version()

'3.11.13'

In [None]:
import numpy as np 
import pandas as pd 

import torch

import pickle

In [9]:
with open("./gradmat.pkl", "rb") as f:
    gradmat = pickle.load(f)
gradmat.shape

(50000, 30720)

**PCA-decomposition into 10 indepedent direcitons**

In [10]:
import torch
from sklearn.decomposition import PCA
import numpy as np


# X_np = gradmat.numpy()
X_np = gradmat

pca = PCA(n_components=10)
X_pca_np = pca.fit_transform(X_np)

print(f"Original shape: {X_np.shape}")
print(f"Reduced shape: {X_pca_np.shape}")

print("Principal Directions (pca.components_):")
print(pca.components_)
print(f"\nShape of components: {pca.components_.shape}")

Original shape: torch.Size([12500, 32768])
Reduced shape: (12500, 10)
Principal Directions (pca.components_):
[[ 4.69944300e-06 -1.13567160e-05  7.10994597e-05 ...  4.08888326e-04
  -3.83800094e-02  7.94116746e-03]
 [ 7.28379574e-05  1.45898830e-04 -7.27479631e-04 ... -1.28157783e-02
   2.32922317e-02 -2.24465037e-03]
 [ 6.02579515e-05 -2.92843436e-05  6.06779827e-04 ...  3.44569078e-02
  -2.04270578e-02  6.06610686e-03]
 ...
 [-1.01330149e-04 -1.86528636e-04  3.86041290e-04 ...  1.22737055e-02
  -9.44702018e-03  6.30161204e-03]
 [ 9.78446612e-06 -5.70646281e-05  9.71906754e-06 ...  3.82302043e-02
   6.43567992e-02  1.25881113e-02]
 [ 5.52588856e-05 -8.53770435e-05 -2.82943565e-04 ... -1.35599174e-02
  -1.72868871e-02  1.63748735e-02]]

Shape of components: (10, 32768)


In [11]:
p5dir = pca.components_

In [None]:
with open("pca-nemo-50k-d10.pkl", "wb") as f:
    pickle.dump(pca.components_, f, protocol=4)

**Load 50k Reasoning Examples for the Activations**

In [None]:
from datasets import load_dataset
from itertools import islice

# Load a dataset in streaming mode to get an IterableDataset
dataset = load_dataset("nvidia/Llama-Nemotron-Post-Training-Dataset", 'SFT', split="math", streaming=True)

# Define how many rows you want
num_rows_to_take = 50000

# Create an iterator that yields the first k samples
first_k_samples_iterator = islice(dataset, num_rows_to_take)

# Convert the iterator to a list to see the results
first_k_samples_list = list(first_k_samples_iterator)

qa = []
dataset = first_k_samples_list
for i in range(len(dataset)):
    qa.append(str(dataset[i]['input'][0])+dataset[i]['output'])
len(qa)

**For Each Independent Direction:**

- compute cosine similarity between the skill direction and each reasoning example's activation vector

- rank the similarities to identify examples most positively/negatively correlated with the skill direction

In [16]:
import torch
import torch.nn as nn
import torch.nn.functional as F

indi = 0
sims = F.cosine_similarity(torch.tensor(p5dir[indi]), torch.tensor(gradmat), dim=1)
sims[np.argsort(sims)[-1]], qa[int(np.argsort(sims)[-1])], sims[np.argsort(sims)[0]], qa[int(np.argsort(sims)[0])]

  sims = F.cosine_similarity(torch.tensor(p5dir[indi]), torch.tensor(gradmat), dim=1)


(tensor(0.4898, dtype=torch.float64),
 '{\'role\': \'user\', \'content\': \'Solve the following math problem. Make sure to put the answer (and only answer) inside \\\\boxed{}.\\n\\nIs there a necklace with 2016 pearls, half black and half green, that can be transformed into a necklace of all blue pearls through the described replacement process?\'}<think>\nOkay, let\'s try to figure out this necklace problem. So we have a necklace with 2016 pearls, half are black and half are green. The question is whether we can turn all of them into blue pearls using the replacement process described. Wait, the problem mentions the "described replacement process," but I don\'t see the description here. Hmm, maybe it\'s from a previous problem or part of a standard set of rules? Since it\'s not here, I might need to make assumptions. \n\nWait, maybe the replacement process is a common one in such problems. Often, these problems involve replacing adjacent pairs of pearls according to some rules. For ex

### Using OpenAI GPT-5 API:

- feed k most positive/negative examples (e.g., k=10) for contrastive analysis
  
- to obtain semantic labels of each skill for human interpretation.

In [17]:
import os
os.environ['OPENAI_API_KEY'] = '***<OpenAI API Key>***'
os.environ.get("OPENAI_API_KEY")

from openai import OpenAI

# The client automatically picks up the OPENAI_API_KEY environment variable
client = OpenAI()

In [19]:
d0p10 = np.array(qa)[np.argsort(sims)[-10:]]
d0n10 = np.array(qa)[np.argsort(sims)[:10]]

In [20]:
str(d0p10[-1])

'{\'role\': \'user\', \'content\': \'Solve the following math problem. Make sure to put the answer (and only answer) inside \\\\boxed{}.\\n\\nIs there a necklace with 2016 pearls, half black and half green, that can be transformed into a necklace of all blue pearls through the described replacement process?\'}<think>\nOkay, let\'s try to figure out this necklace problem. So we have a necklace with 2016 pearls, half are black and half are green. The question is whether we can turn all of them into blue pearls using the replacement process described. Wait, the problem mentions the "described replacement process," but I don\'t see the description here. Hmm, maybe it\'s from a previous problem or part of a standard set of rules? Since it\'s not here, I might need to make assumptions. \n\nWait, maybe the replacement process is a common one in such problems. Often, these problems involve replacing adjacent pairs of pearls according to some rules. For example, maybe replacing two pearls of ce

In [22]:
gp1 = ""

for i in range(len(d0p10)):
    gp1 = gp1 + "Example "+str(i) + ": " + str(d0p10[i])

gp2 = ""

for i in range(len(d0n10)):
    gp2 = gp2 + "Example "+str(i) + ": " + str(d0n10[i])

g5resnano3 = []
messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "We are analyzing two contrastive groups of solution traces on math problems for human interpretation. Below are the group 1 examples. Group 1 examples: ["},
    {"role": "user", "content": gp1},
    {"role": "user", "content": "]. Below are the group 2 examples. Group 2 examples: ["},
    {"role": "user", "content": gp2},
    {"role": "user", "content": "]. Analyze the differences in the attributes of the examples and identify the main contrastive axes that differentiate these two groups of examples. Then, for each group of examples, summarize how its attributes are located on these axes. || Analyze carefully and consider all the issues. Finally, summarize the most prominent/obvious distinctions into one pair of the most concise/straightforward keywords (such as Natural language analysis vs. symbolic derivations <or> Step-by-step derivations vs. advanced/abstract operations <or>heavy reasoning vs. straightforward, etc.). Output format: <your analysis>. [**Contrastive Axes**]: <contrastive axes>. [**Group 1 Attributes**]: <group 1 attributes>. [**Group 2 Attributes**]: <group 2 attributes>. [**Final summary keywords pair (3 words vs. 3 words)**]: <final summary keywords pair  (3 words vs. 3 words)>||"},
]



# Create a chat completion request
completion = client.chat.completions.create(
    model="gpt-5",
    messages=messages
)
# Print the model's response
print(completion.choices[0].message.content)

g5resnano3.append(completion.choices[0].message.content)

Group 1 consists largely of open-ended, strategy/game/combinatorics puzzles answered via long, exploratory, often speculative chains of reasoning with frequent hypothesis testing, backtracking, and qualitative argumentation, culminating in short boxed conclusions (often yes/no or small integers). Group 2 consists of standard quantitative math (calculus, algebra, optimization, probability moments) solved by structured, symbolic derivations (substitutions, Lagrange multipliers, MGFs, series/eta/zeta, trigonometric/hyperbolic substitutions), with clear formula-driven steps and exact closed-form results.

[**Contrastive Axes**]: 
- Problem type: qualitative combinatorial/strategy puzzles vs. quantitative calculus/algebra problems
- Reasoning style: exploratory/verbal/heuristic vs. structured/symbolic/formulaic
- Tooling: minimal formalism vs. heavy use of standard techniques (substitutions, LMs, MGFs, series)
- Answer form: simple yes/no or small integers vs. exact expressions/constants (√