<a href="https://colab.research.google.com/github/shreeragkh/Classification-ML-model/blob/main/Untitled2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [360]:
import zipfile
import os

In [361]:
zip_path = "dataset.zip"
extract_path = "/mnt/data/math_dataset"

with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall(extract_path)

In [362]:
os.listdir(extract_path)

['MATH']

In [363]:
import json
df=os.path.join(extract_path, "MATH", "test")

In [364]:
texts = []
labels = []

In [365]:
for topic in os.listdir(df):
    topic_path = os.path.join(df, topic)
    if os.path.isdir(topic_path):
        for file in os.listdir(topic_path):
            if file.endswith(".json"):
                file_path = os.path.join(topic_path, file)
                with open(file_path, "r", encoding="utf-8") as f:
                    data = json.load(f)
                    problem_text = data.get("problem", "")

                    if problem_text.strip():
                        texts.append(problem_text)
                        labels.append(topic)

In [366]:
print("Total samples:", len(texts))
print("Sample label:", labels[0])
print("Sample question:\n", texts[0][:300])

Total samples: 5000
Sample label: algebra
Sample question:
 Find $x$ if $\displaystyle \frac{2}{x} - \frac{3}{5} + \frac{1}{x} = \frac{1}{5}$.


In [367]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.pipeline import FeatureUnion

In [368]:
word_vec = TfidfVectorizer(
    ngram_range=(1,2),
    max_features=10000,
    stop_words=None
)

char_vec = TfidfVectorizer(
    analyzer='char',
    ngram_range=(3,5),
    max_features=5000
)

vectorizer = FeatureUnion([
    ('word', word_vec),
    ('char', char_vec)
])
X = vectorizer.fit_transform(texts)

In [369]:
from sklearn.preprocessing import LabelEncoder

In [370]:
label_encoder = LabelEncoder()
y = label_encoder.fit_transform(labels)
print("Classes:", label_encoder.classes_)
print(y[:5])

Classes: ['algebra' 'counting_and_probability' 'geometry' 'intermediate_algebra'
 'number_theory' 'prealgebra' 'precalculus']
[0 0 0 0 0]


In [371]:
from sklearn.model_selection import train_test_split

In [372]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y,
    test_size=0.2,
    random_state=42,
    stratify=y
)

In [373]:
from sklearn.svm import LinearSVC

In [374]:
svm_model = LinearSVC()
svm_model.fit(X_train, y_train)

In [375]:
from sklearn.metrics import accuracy_score, classification_report

In [376]:
svm_pred = svm_model.predict(X_test)

In [377]:
print("Accuracy:", accuracy_score(y_test, svm_pred))
print("\nClassification Report:\n")
print(classification_report(
    y_test,
    svm_pred,
    target_names=label_encoder.classes_
))

Accuracy: 0.769

Classification Report:

                          precision    recall  f1-score   support

                 algebra       0.78      0.81      0.79       237
counting_and_probability       0.73      0.68      0.71        95
                geometry       0.69      0.76      0.72        96
    intermediate_algebra       0.87      0.86      0.86       181
           number_theory       0.86      0.80      0.83       108
              prealgebra       0.58      0.56      0.57       174
             precalculus       0.90      0.93      0.91       109

                accuracy                           0.77      1000
               macro avg       0.77      0.77      0.77      1000
            weighted avg       0.77      0.77      0.77      1000



In [378]:
sample_questions = texts[:5]

In [379]:
import random
sample_questions = random.sample(texts, 5)

In [380]:
import os
import google.generativeai as genai

In [381]:
from google.colab import userdata
genai.configure(api_key=userdata.get('Gemini_APi_key'))

In [382]:
model = genai.GenerativeModel("gemini-2.5-flash")

In [383]:
def generate_solution(question):
    prompt = f"""
    You are a high school mathematics teacher.
    Solve the following math question step by step.
    Explain each step clearly in simple language that a high school student can understand.
    Question:
    {question}
    """
    response = model.generate_content(prompt)
    return response.text

In [384]:
sample_questions = texts[:5]
for i, q in enumerate(sample_questions, 1):
    print(f"\nQuestion {i}:")
    print(q)
    print("\nGenerated Student-Friendly Solution:")
    print(generate_solution(q))
    print("="*80)


Question 1:
Find $x$ if $\displaystyle \frac{2}{x} - \frac{3}{5} + \frac{1}{x} = \frac{1}{5}$.

Generated Student-Friendly Solution:
Hello class! Let's tackle this equation step by step, just like solving a puzzle. Our goal is to find the value of $x$ that makes this equation true.

Here's our equation:
$\displaystyle \frac{2}{x} - \frac{3}{5} + \frac{1}{x} = \frac{1}{5}$

---

**Step 1: Combine the terms that have 'x' in them.**

Look at the left side of the equation. We have two terms that involve $x$: $\frac{2}{x}$ and $\frac{1}{x}$. Since they both have the same denominator ($x$), we can easily add them together.

*   Think of it like adding 2 apples and 1 apple – you get 3 apples! Here, our "apples" are "1/x".
*   So, $\frac{2}{x} + \frac{1}{x} = \frac{2+1}{x} = \frac{3}{x}$.

Now, our equation looks a lot simpler:
$\displaystyle \frac{3}{x} - \frac{3}{5} = \frac{1}{5}$

---

**Step 2: Get all the terms with 'x' on one side and all the numbers on the other side.**

Right now, we 