# SHSAT AI: Question Generator and Explainer

# SHAT AI - Question Generation Model

This notebook demonstrates how to use a fine-tuned GPT-based model to generate SHSAT-style math questions.
The model is trained with a JSONL dataset and uses the `openai` Python API to interact with the fine-tuned model.


## 1. Library Imports

Import all required libraries for model interaction and file handling.


## 2. Load Dataset

Load and inspect the dataset used to fine-tune the GPT model. The dataset should be in `.jsonl` format, containing prompt-completion pairs.


## 3. Model Configuration and Initialization

Setup the OpenAI API key and specify the fine-tuned model name.


## 4. Generate SHSAT Math Questions

Define a function to generate questions by passing a custom prompt to the fine-tuned model.


## 5. Example Usage

Use a sample prompt and display the model's response.


## 6. Current Limitations

This model is trained on a limited dataset and may occasionally generate incorrect, unclear, or repetitive content.
Future Improvements versions can benefit from a larger and more diverse dataset.


## 7. Future Improvements Improvements

- Expand dataset coverage for different SHSAT math topics.
- Improve formatting and answer-explanation clarity.
- Build a web interface to input prompts and get real-time questions.


In [1]:
from google.colab import drive
drive.mount('/content/drive')


Mounted at /content/drive


##File Uploading

In [4]:
!pip install openai




In [5]:
!pip install python-dotenv


Collecting python-dotenv
  Downloading python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB)
Downloading python_dotenv-1.0.1-py3-none-any.whl (19 kB)
Installing collected packages: python-dotenv
Successfully installed python-dotenv-1.0.1


##Local Variables and Required Libaries

In [20]:
import os
import openai
from dotenv import load_dotenv
import time

# Load environment variables once
env_path = "/content/drive/MyDrive/.env"
load_dotenv(dotenv_path=env_path)

api_key = os.getenv("OPENAI_API_KEY")


##Dataset Uploaded

In [7]:
#file  uploaded
file_path = "/content/drive/MyDrive/shsat_finetune_data.jsonl"
if os.path.exists(file_path):
    print("Dataset uploaded successfully!")
else:
    print("Please upload the dataset manually in Colab.")


Dataset uploaded successfully!


In [None]:
pip install --upgrade openai




##File Upload


In [16]:


client = OpenAI(api_key=api_key)

file_path = "/content/drive/MyDrive/shsat_finetune_data.jsonl"

with open(file_path, "rb") as file:
    uploaded_file = client.files.create(
        file=file,
        purpose="fine-tune"
    )

print("File uploaded:")
print("ID:", uploaded_file.id)


File uploaded:
ID: file-PM2oP8NwSknofhChbYqi2n


##Fine-Tuning GPT-3.5 Turbo with Custom SHSAT Dataset





In [17]:

api_key = os.getenv("OPENAI_API_KEY")
#file path
file_path = "/content/drive/MyDrive/shsat_finetune_data.jsonl"

#file opening for fine-tuning
with open(file_path, "rb") as file:
    upload_response = openai.files.create(
        file=file,
        purpose="fine-tune"
    )


file_id = upload_response.id
print("Uploaded File ID:", file_id)

# fine-tuning
fine_tune_response = openai.fine_tuning.jobs.create(
    training_file=file_id,
    model="gpt-3.5-turbo"
)


fine_tune_id = fine_tune_response.id
print("Fine-tuning Job ID:", fine_tune_id)


Uploaded File ID: file-68DvUozqv97vGEgMkKR8i6
Fine-tuning Job ID: ftjob-S1TUPNANOQubLwCRmzRo4LdL


##Checking Accessible Models from OpenAI API

In [18]:


api_key = os.getenv("OPENAI_API_KEY")

models = openai.models.list()

model_ids = [model.id for model in models.data]

# available models
print("Available models:", model_ids)


Available models: ['gpt-4o-mini-transcribe', 'gpt-4o-audio-preview-2024-12-17', 'dall-e-3', 'dall-e-2', 'gpt-4o-audio-preview-2024-10-01', 'omni-moderation-latest', 'omni-moderation-2024-09-26', 'gpt-4o-realtime-preview-2024-10-01', 'babbage-002', 'tts-1-hd-1106', 'text-embedding-3-large', 'gpt-4', 'gpt-4o-mini-2024-07-18', 'gpt-4o-2024-05-13', 'gpt-4o-realtime-preview-2024-12-17', 'tts-1-hd', 'gpt-4o-mini-audio-preview', 'gpt-4o-audio-preview', 'o1-preview-2024-09-12', 'gpt-4o-realtime-preview', 'gpt-3.5-turbo-instruct-0914', 'gpt-4o-mini-search-preview', 'tts-1-1106', 'davinci-002', 'gpt-3.5-turbo-1106', 'gpt-4o-search-preview', 'gpt-4-turbo', 'gpt-3.5-turbo-instruct', 'gpt-3.5-turbo', 'gpt-4-turbo-preview', 'gpt-4o-mini-search-preview-2025-03-11', 'gpt-4o-mini-realtime-preview', 'chatgpt-4o-latest', 'whisper-1', 'gpt-3.5-turbo-0125', 'gpt-4-turbo-2024-04-09', 'gpt-3.5-turbo-16k', 'gpt-4o', 'gpt-4o-mini-realtime-preview-2024-12-17', 'gpt-4-1106-preview', 'text-embedding-ada-002', 'o1

##Fine-Tune Execution Pipeline

In [21]:
api_key = os.getenv("OPENAI_API_KEY")

# Upload JSONL
upload_response = client.files.create(
    file=open("/content/drive/MyDrive/shsat_finetune_data.jsonl", "rb"),
    purpose="fine-tune"
)
file_id = upload_response.id
print(f"File uploaded. File ID: {file_id}")

#  fine-tuning job started
fine_tune_response = client.fine_tuning.jobs.create(
    training_file=file_id,
    model="gpt-3.5-turbo"
)
job_id = fine_tune_response.id
print("Fine-tuning started. Job ID: {job_id}")

# Monitoring job status
while True:
    job = client.fine_tuning.jobs.retrieve(job_id)
    print(f"📡 Status: {job.status}")
    if job.status in ["succeeded", "failed", "cancelled"]:
        break
    time.sleep(15)

# successful get the model ID
if job.status == "succeeded":
    print("\n Fine-tuned model ready: {job.fine_tuned_model}")
else:
    print("\n Fine-tuning failed. Check your dataset and try again.")


File uploaded. File ID: file-ThqD1sD1ovxaZpfhpVhFQD
Fine-tuning started. Job ID: {job_id}
📡 Status: validating_files
📡 Status: validating_files
📡 Status: validating_files
📡 Status: validating_files
📡 Status: validating_files
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡

##Automated Math Question Creation

In [None]:

api_key = os.getenv("OPENAI_API_KEY")

response = client.chat.completions.create(
    model="ft:gpt-3.5-turbo-0125:personal::BDKqehOH",
    messages=[
        {"role": "user", "content": "Generate 5 math question by algebra and give answer students."}
    ]
)

print(response.choices[0].message.content)


1. What is the solution to the equation 3x + 5 = 17?
Answer: 4.0
2. What is the solution to the equation 8x - 3 = 17?
Answer: 2.5
3. What is the solution to the equation 4*x=20?
Answer: 5.0
4. What is the solution to the equation 9*x+3=15?
Answer: 1.0
5. What is the solution to the equation 5*x-8=7?
Answer: 3.0
