<a href="https://colab.research.google.com/github/mrohith29/PhysiSolve/blob/main/Copy_of_main.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Step - 1
### spplitting the dataset into training, testing and evaluation

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
import json
import random
from sklearn.model_selection import train_test_split
from collections import Counter

# Set random seed for reproducibility
RANDOM_SEED = 42
random.seed(RANDOM_SEED)

# Load the dataset
with open(r"/content/drive/MyDrive/dataset/high_school_physics.json", "r", encoding="utf-8") as f:
    data = json.load(f)

# Extract the 'subject' field for stratification
subjects = [item["subject"] for item in data]

# Verify initial distribution
print("Original subject distribution:", Counter(subjects))

# First split: 70% train, 30% temp (test + eval)
train_data, temp_data = train_test_split(
    data,
    train_size=0.7,
    stratify=subjects,
    random_state=RANDOM_SEED
)

# Second split: Split the 30% temp into 15% test and 15% eval (50/50 of temp)
test_data, eval_data = train_test_split(
    temp_data,
    test_size=0.5,  # 50% of 30% = 15% of original
    stratify=[item["subject"] for item in temp_data],
    random_state=RANDOM_SEED
)

# Save train, test, and evaluation sets
with open(r"/content/drive/MyDrive/dataset/train.json", "w", encoding="utf-8") as f:
    json.dump(train_data, f, indent=4)
with open(r"/content/drive/MyDrive/dataset/test.json", "w", encoding="utf-8") as f:
    json.dump(test_data, f, indent=4)
with open(r"/content/drive/MyDrive/dataset/eval.json", "w", encoding="utf-8") as f:
    json.dump(eval_data, f, indent=4)

# Print split sizes and subject distribution
print(f"\nDataset split into {len(train_data)} training, {len(test_data)} testing, and {len(eval_data)} evaluation samples.")
print("\nSubject distribution in each split:")
print("Train:", Counter([item["subject"] for item in train_data]))
print("Test:", Counter([item["subject"] for item in test_data]))
print("Eval:", Counter([item["subject"] for item in eval_data]))

Original subject distribution: Counter({'Electrostatics and Current Electricity': 76, 'Mechanics': 60, 'Kinematics': 55, 'Electromagnetism': 45, 'Thermodynamics': 44, 'Optics': 38, 'Atomic and Modern Physics': 30, 'Electronic Devices': 29, 'Periodic Motion': 13, 'Waves and Oscillations': 10})

Dataset split into 280 training, 60 testing, and 60 evaluation samples.

Subject distribution in each split:
Train: Counter({'Electrostatics and Current Electricity': 53, 'Mechanics': 42, 'Kinematics': 39, 'Thermodynamics': 31, 'Electromagnetism': 31, 'Optics': 27, 'Atomic and Modern Physics': 21, 'Electronic Devices': 20, 'Periodic Motion': 9, 'Waves and Oscillations': 7})
Test: Counter({'Electrostatics and Current Electricity': 11, 'Mechanics': 9, 'Kinematics': 8, 'Electromagnetism': 7, 'Thermodynamics': 6, 'Optics': 5, 'Atomic and Modern Physics': 5, 'Electronic Devices': 5, 'Periodic Motion': 2, 'Waves and Oscillations': 2})
Eval: Counter({'Electrostatics and Current Electricity': 12, 'Mechan

### Analysis of the Output
#### Original Distribution
- **Total samples**: 400
- **Subjects**:
  - Electrostatics and Current Electricity: 76
  - Mechanics: 60
  - Kinematics: 55
  - Electromagnetism: 45
  - Thermodynamics: 44
  - Optics: 38
  - Atomic and Modern Physics: 30
  - Electronic Devices: 29
  - Periodic Motion: 13
  - Waves and Oscillations: 10

#### Split Results
- **Train**: 280 samples (70%)
- **Test**: 60 samples (15%)
- **Eval**: 60 samples (15%)

#### Subject Distribution Across Splits
| Subject                          | Original | Train (70%) | Test (15%) | Eval (15%) |
|----------------------------------|----------|-------------|------------|------------|
| Electrostatics and Current Elec. | 76       | 53 (53.2)   | 11 (11.4)  | 12 (11.4)  |
| Mechanics                        | 60       | 42 (42)     | 9 (9)      | 9 (9)      |
| Kinematics                       | 55       | 39 (38.5)   | 8 (8.25)   | 8 (8.25)   |
| Electromagnetism                 | 45       | 31 (31.5)   | 7 (6.75)   | 7 (6.75)   |
| Thermodynamics                   | 44       | 31 (30.8)   | 6 (6.6)    | 7 (6.6)    |
| Optics                           | 38       | 27 (26.6)   | 5 (5.7)    | 6 (5.7)    |
| Atomic and Modern Physics        | 30       | 21 (21)     | 5 (4.5)    | 4 (4.5)    |
| Electronic Devices               | 29       | 20 (20.3)   | 5 (4.35)   | 4 (4.35)   |
| Periodic Motion                  | 13       | 9 (9.1)     | 2 (1.95)   | 2 (1.95)   |
| Waves and Oscillations           | 10       | 7 (7)       | 2 (1.5)    | 1 (1.5)    |


In [None]:
print(test_data[0])
print(test_data[3])

{'id': 298, 'question': 'What is the change in internal energy of an ideal gas during an isochoric process?', 'subject': 'Thermodynamics', 'choices': ['Zero', 'Positive', 'Negative', 'Depends on the process'], 'answer': 'D', 'explanation': 'In an isochoric process, the change in internal energy depends on the heat added to or removed from the gas.', 'dataset': 'high_school_physics'}
{'id': 206, 'question': 'An ideal gas is heated at constant volume. What happens to its pressure?', 'subject': 'Thermodynamics', 'choices': ['Increases', 'Decreases', 'Remains constant', 'Doubles'], 'answer': 'A', 'explanation': "According to Gay-Lussac's Law, P1/T1 = P2/T2. If temperature increases, pressure increases.", 'dataset': 'high_school_physics'}


### Zero-Shot Evaluation of Flan-T5 from Hugging Face

In [None]:
from transformers import pipeline

qa_pipeline = pipeline("text2text-generation", model="google/flan-t5-base")

def evaluate_model(model, dataset):
    correct = 0
    total = len(dataset)

    for item in dataset:
        question = item["question"]
        idx = ord(item["answer"])-ord("A")
        correct_answer = item["choices"][idx]

        prompt = f"Give me the final answer without any explaination, just the couple of words with units that directly show the answer for the Question: {question} with Choices: {', '.join(item['choices'])} Answer:"
        prediction = model(prompt, max_length=20, truncation=True)[0]["generated_text"]

        # print(correct_answer, prediction, item["id"])
        if correct_answer in prediction:
          correct += 1

    accuracy = (correct / total) * 100
    return accuracy

zero_shot_accuracy = evaluate_model(qa_pipeline, test_data)
print(f"Zero-Shot Accuracy: {zero_shot_accuracy:.2f}%")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

Device set to use cpu


Zero-Shot Accuracy: 18.33%


### Supervised Fine-Tuning Flan-T5-Base on Training Dataset

In [None]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive', force_remount=True)  # Force remount for fresh connection

# Imports
from transformers import T5ForConditionalGeneration, T5Tokenizer, Trainer, TrainingArguments
from torch.utils.data import Dataset
import json
import os
import time
import torch

# Disable W&B logging
os.environ["WANDB_DISABLED"] = "true"

# Use CPU explicitly
device = torch.device("cpu")
print(f"Using device: {device}")

# Load model and tokenizer
model_name = "google/flan-t5-base"
model_path = "/content/drive/MyDrive/dataset/trained_model"
if os.path.exists(model_path):
    print("Loading previously fine-tuned model...")
    model = T5ForConditionalGeneration.from_pretrained(model_path).to(device)
    tokenizer = T5Tokenizer.from_pretrained(model_path, legacy=False)
else:
    print("Initializing fresh FLAN-T5-Large model...")
    model = T5ForConditionalGeneration.from_pretrained(model_name).to(device)
    tokenizer = T5Tokenizer.from_pretrained(model_name, legacy=False)

# Custom dataset class
class PhysicsDataset(Dataset):
    def __init__(self, data, tokenizer, max_length=128):  # Reduced max_length
        self.data = data
        self.tokenizer = tokenizer
        self.max_length = max_length

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        item = self.data[idx]
        prompt = f"Question: {item['question']} Choices: {', '.join(item['choices'])} Answer:"
        idx = ord(item["answer"][0]) - ord("A")
        target = item["choices"][idx]

        encodings = self.tokenizer(prompt, truncation=True, padding="max_length", max_length=self.max_length, return_tensors="pt")
        target_encodings = self.tokenizer(target, truncation=True, padding="max_length", max_length=self.max_length, return_tensors="pt")

        return {
            "input_ids": encodings["input_ids"].squeeze(),
            "attention_mask": encodings["attention_mask"].squeeze(),
            "labels": target_encodings["input_ids"].squeeze(),
        }

# Load datasets
train_file = "/content/drive/MyDrive/dataset/train.json"
eval_file = "/content/drive/MyDrive/dataset/eval.json"
test_file = "/content/drive/MyDrive/dataset/test.json"

with open(train_file, "r", encoding="utf-8") as f:
    train_data = json.load(f)
with open(eval_file, "r", encoding="utf-8") as f:
    eval_data = json.load(f)
with open(test_file, "r", encoding="utf-8") as f:
    test_data = json.load(f)

train_dataset = PhysicsDataset(train_data, tokenizer)
eval_dataset = PhysicsDataset(eval_data, tokenizer)

# Define directories
model_save_dir = "/content/drive/MyDrive/dataset/trained_model"
results_dir = "/content/drive/MyDrive/dataset/results"
os.makedirs(model_save_dir, exist_ok=True)
os.makedirs(results_dir, exist_ok=True)

# Training arguments optimized for CPU
training_args = TrainingArguments(
    output_dir=results_dir,
    per_device_train_batch_size=1,  # Reduced to minimize memory
    per_device_eval_batch_size=1,
    num_train_epochs=3,
    save_steps=70,  # Save every ~1/4 epoch (280 samples / 1 batch = 280 steps, 70 steps ~ 1 epoch)
    eval_strategy="epoch",  # Updated from evaluation_strategy
    logging_dir="./logs",
    run_name=f"flan-t5-finetune-{time.strftime('%Y%m%d-%H%M%S')}",
    report_to="none",
    learning_rate=1e-5,  # Lower LR for stability
    gradient_accumulation_steps=4,  # Effective batch size = 1 * 4 = 4
    fp16=False,  # Disabled for CPU
    save_total_limit=2,  # Keep only 2 latest checkpoints to save space
)

# Initialize trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
)

# Train the model with manual checkpoint saving
for epoch in range(int(training_args.num_train_epochs)):
    print(f"Starting epoch {epoch + 1}/{int(training_args.num_train_epochs)}")
    trainer.train()
    # Save model after each epoch
    model.save_pretrained(f"{model_save_dir}_epoch_{epoch + 1}")
    tokenizer.save_pretrained(f"{model_save_dir}_epoch_{epoch + 1}")
    print(f"Model saved after epoch {epoch + 1} to {model_save_dir}_epoch_{epoch + 1}")

# Final save
model.save_pretrained(model_save_dir)
tokenizer.save_pretrained(model_save_dir)
print(f"Fine-tuned model saved to {model_save_dir}")

# Evaluate on test set
correct = 0
predictions = []
for item in test_data:
    prompt = f"Question: {item['question']} Choices: {', '.join(item['choices'])} Answer:"
    inputs = tokenizer(prompt, return_tensors="pt").to(device)
    outputs = model.generate(**inputs, max_length=10)
    prediction = tokenizer.decode(outputs[0], skip_special_tokens=True).strip()
    idx = ord(item["answer"][0]) - ord("A")
    target = item["choices"][idx]
    predictions.append({"question": item["question"], "predicted": prediction, "target": target})
    if prediction == target:
        correct += 1

accuracy = correct / len(test_data) * 100
print(f"Fine-tuned accuracy on test set: {accuracy:.2f}% ({correct}/{len(test_data)})")

# Save predictions for analysis
with open("/content/drive/MyDrive/dataset/finetune_test_results.json", "w") as f:
    json.dump({"accuracy": accuracy, "predictions": predictions}, f, indent=4)
print("Test results saved to /content/drive/MyDrive/dataset/finetune_test_results.json")

Mounted at /content/drive
Using device: cpu
Initializing fresh FLAN-T5-Large model...
Starting epoch 1/3


Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.48.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.


Epoch,Training Loss,Validation Loss
1,No log,28.344961
2,No log,18.623024


Epoch,Training Loss,Validation Loss
1,No log,28.344961
2,No log,18.623024
3,No log,14.014906


Model saved after epoch 1 to /content/drive/MyDrive/dataset/trained_model_epoch_1
Starting epoch 2/3


Epoch,Training Loss,Validation Loss


### Testing the trained model on test.json

In [2]:
from transformers import T5ForConditionalGeneration, T5Tokenizer
import json
import torch
import os
from tqdm import tqdm

# Paths
model_save_dir = "/content/drive/MyDrive/dataset/trained_model"
test_file_path = "/content/drive/MyDrive/dataset/test.json"  # Adjust if your test file has a different name

# First, let's check if the model files exist
print("Checking model directory contents:")
for root, dirs, files in os.walk(model_save_dir):
    for file in files:
        print(os.path.join(root, file))

# Load the model and tokenizer
try:
    model = T5ForConditionalGeneration.from_pretrained(model_save_dir)
    tokenizer = T5Tokenizer.from_pretrained(model_save_dir)
    print("Model and tokenizer loaded successfully!")
except Exception as e:
    print(f"Error loading model: {e}")
    # If model loading fails, load the original pretrained model
    print("Loading the base model instead...")
    model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-large")
    tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-large", legacy=False)

# Load test data
with open(test_file_path, "r", encoding="utf-8") as f:
    test_data = json.load(f)

print(f"Loaded {len(test_data)} test examples")

# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()

# Testing function
def evaluate_model(model, tokenizer, test_data, device, correct, total):

    for item in tqdm(test_data):
        prompt = f"Question: {item['question']} Choices: {', '.join(item['choices'])} Answer:"
        correct_idx = ord(item["answer"][0]) - ord("A")
        correct_answer = item["choices"][correct_idx]

        # Tokenize input
        input_ids = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=128).input_ids.to(device)

        # Generate output
        with torch.no_grad():
            outputs = model.generate(
                input_ids=input_ids,
                max_length=128,
                num_beams=4,
                early_stopping=True
            )

        # Decode output
        predicted_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

        # Simple string match for evaluation
        if predicted_text.strip() == correct_answer.strip():
            correct += 1
        else:
            # Print some examples of wrong predictions for debugging
            if total < 5:  # Limit to just a few examples
                print(f"\nQuestion: {item['question']}")
                print(f"Choices: {', '.join(item['choices'])}")
                print(f"Correct Answer: {correct_answer}")
                print(f"Predicted: {predicted_text}")

        total += 1

    accuracy = correct / total if total > 0 else 0
    return accuracy

# Evaluate model
print("Evaluating model on test set...")
correct = 0
total = 0
accuracy = evaluate_model(model, tokenizer, test_data, device, correct, total)
print(f"Test Accuracy: {accuracy:.4f} ({correct}/{total})")

Checking model directory contents:
Error loading model: Error no file named pytorch_model.bin, model.safetensors, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory /content/drive/MyDrive/dataset/trained_model.
Loading the base model instead...


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/662 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/3.13G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

Loaded 60 test examples
Evaluating model on test set...


  2%|▏         | 1/60 [00:05<05:14,  5.33s/it]


Question: What is the change in internal energy of an ideal gas during an isochoric process?
Choices: Zero, Positive, Negative, Depends on the process
Correct Answer: Depends on the process
Predicted: Negative


  3%|▎         | 2/60 [00:07<03:26,  3.55s/it]


Question: A convex lens has a focal length of 20 cm. An object is placed 10 cm away. What is the image distance?
Choices: 10 cm, 20 cm, 30 cm, 40 cm
Correct Answer: 40 cm
Predicted: 10 cm


  5%|▌         | 3/60 [00:11<03:26,  3.63s/it]


Question: A pendulum completes one oscillation in 2 seconds. What is its frequency?
Choices: 0.25 Hz, 0.5 Hz, 1 Hz, 2 Hz
Correct Answer: 0.5 Hz
Predicted: 2 Hz


  8%|▊         | 5/60 [00:18<03:08,  3.42s/it]


Question: A 15 Ω resistor is connected across a 45 V battery. What is the current flowing through the resistor?
Choices: 1 A, 2 A, 3 A, 4 A
Correct Answer: 3 A
Predicted: 4 A


100%|██████████| 60/60 [03:33<00:00,  3.56s/it]

Test Accuracy: 0.2667 (0/0)





In [3]:
print(accuracy*100)

26.666666666666668
