# SHAT AI - Question Generation Model

This notebook demonstrates how to use a fine-tuned GPT-based model to generate SHSAT-style math questions.
The model is trained with a JSONL dataset and uses the `openai` Python API to interact with the fine-tuned model.


## 1. Library Imports

Import all required libraries for model interaction and file handling.


## 2. Load Dataset

Load and inspect the dataset used to fine-tune the GPT model. The dataset should be in `.jsonl` format, containing prompt-completion pairs.


## 3. Model Configuration and Initialization

Setup the OpenAI API key and specify the fine-tuned model name.


## 4. Generate SHSAT Math Questions

Define a function to generate questions by passing a custom prompt to the fine-tuned model.


## 5. Example Usage

Use a sample prompt and display the model's response.


## 6. Limitations

This model is trained on a limited dataset and may occasionally generate incorrect, unclear, or repetitive content.
Future versions can benefit from a larger and more diverse dataset.


## 7. Future Improvements

- Expand dataset coverage for different SHSAT math topics.
- Improve formatting and answer-explanation clarity.
- Build a web interface to input prompts and get real-time questions.


In [None]:
!pip install openai




In [None]:
!pip install python-dotenv




In [None]:
from google.colab import drive
drive.mount('/content/drive')


Mounted at /content/drive


In [None]:


#file  uploaded
file_path = "/content/drive/MyDrive/shsat_finetune_data.jsonl"
if os.path.exists(file_path):
    print("Dataset uploaded successfully!")
else:
    print("Please upload the dataset manually in Colab.")


Dataset uploaded successfully!


In [None]:
pip install --upgrade openai




In [None]:
pip install --upgrade openai




In [None]:
import os
from dotenv import load_dotenv

# 🔥 Explicitly load .env from Google Drive
env_path = "/content/drive/MyDrive/.env"
load_dotenv(dotenv_path=env_path)

api_key = os.getenv("OPENAI_API_KEY")

# Test
print("API Key found:", api_key is not None)


API Key found: True


In [None]:
from openai import OpenAI

client = OpenAI(api_key=api_key)

file_path = "/content/drive/MyDrive/shsat_finetune_data.jsonl"

with open(file_path, "rb") as file:
    uploaded_file = client.files.create(
        file=file,
        purpose="fine-tune"
    )

print("File uploaded:")
print("ID:", uploaded_file.id)


File uploaded:
ID: file-YGH6MpywCGFqBpV3YRNBsr


In [None]:
!ls -a


.  ..  .config	drive  sample_data


In [None]:
pip install --upgrade openai




In [None]:

import os
from dotenv import load_dotenv

# 🔥 Explicitly load .env from Google Drive
env_path = "/content/drive/MyDrive/.env"
load_dotenv(dotenv_path=env_path)

api_key = os.getenv("OPENAI_API_KEY")


# Define the file path
file_path = "/content/drive/MyDrive/shsat_finetune_data.jsonl"

# file for fine-tuning
with open(file_path, "rb") as file:
    upload_response = openai.files.create(
        file=file,
        purpose="fine-tune"
    )


file_id = upload_response.id
print("Uploaded File ID:", file_id)

# Start fine-tuning after successful file upload
fine_tune_response = openai.fine_tuning.jobs.create(
    training_file=file_id,
    model="gpt-3.5-turbo"
)


fine_tune_id = fine_tune_response.id
print("Fine-tuning Job ID:", fine_tune_id)


Uploaded File ID: file-8BzzDjh1EwPwrhJoxpSSDH
Fine-tuning Job ID: ftjob-E62QUdllEwKYonNFbdFn8ZnP


In [None]:

import os
from dotenv import load_dotenv

# 🔥 Explicitly load .env from Google Drive
env_path = "/content/drive/MyDrive/.env"
load_dotenv(dotenv_path=env_path)

api_key = os.getenv("OPENAI_API_KEY")

models = openai.models.list()

model_ids = [model.id for model in models.data]

# Print available models
print("Available models:", model_ids)


Available models: ['gpt-4o-mini-transcribe', 'gpt-4o-audio-preview-2024-12-17', 'dall-e-3', 'dall-e-2', 'gpt-4o-audio-preview-2024-10-01', 'o1-mini-2024-09-12', 'omni-moderation-latest', 'omni-moderation-2024-09-26', 'gpt-4o-realtime-preview-2024-10-01', 'babbage-002', 'o1-mini', 'tts-1-hd-1106', 'gpt-4o-audio-preview', 'text-embedding-3-large', 'gpt-4', 'gpt-4o-2024-05-13', 'gpt-4o-realtime-preview', 'tts-1-hd', 'gpt-4o-mini-audio-preview', 'o1-preview-2024-09-12', 'gpt-3.5-turbo-instruct-0914', 'gpt-4o-mini-search-preview', 'tts-1-1106', 'davinci-002', 'gpt-3.5-turbo-1106', 'gpt-4o-search-preview', 'gpt-4-turbo', 'gpt-4o-realtime-preview-2024-12-17', 'gpt-3.5-turbo-instruct', 'gpt-3.5-turbo', 'gpt-4-turbo-preview', 'gpt-4o-mini-search-preview-2025-03-11', 'gpt-4o-mini-realtime-preview', 'gpt-4o-mini', 'chatgpt-4o-latest', 'whisper-1', 'gpt-3.5-turbo-0125', 'gpt-4o-2024-08-06', 'gpt-4-turbo-2024-04-09', 'gpt-3.5-turbo-16k', 'gpt-4o', 'gpt-4o-mini-realtime-preview-2024-12-17', 'gpt-4-1

In [None]:
import openai
import time

import os
from dotenv import load_dotenv

# 🔥 Explicitly load .env from Google Drive
env_path = "/content/drive/MyDrive/.env"
load_dotenv(dotenv_path=env_path)

api_key = os.getenv("OPENAI_API_KEY")

# Upload JSONL
upload_response = client.files.create(
    file=open("/content/drive/MyDrive/shsat_finetune_data.jsonl", "rb"),
    purpose="fine-tune"
)
file_id = upload_response.id
print(f"File uploaded. File ID: {file_id}")

# Start the fine-tuning job
fine_tune_response = client.fine_tuning.jobs.create(
    training_file=file_id,
    model="gpt-3.5-turbo"
)
job_id = fine_tune_response.id
print(f"🚀 Fine-tuning started. Job ID: {job_id}")

# Monitor job status
while True:
    job = client.fine_tuning.jobs.retrieve(job_id)
    print(f"📡 Status: {job.status}")
    if job.status in ["succeeded", "failed", "cancelled"]:
        break
    time.sleep(15)

# If successful, get the model ID
if job.status == "succeeded":
    print(f"\n Fine-tuned model ready: {job.fine_tuned_model}")
else:
    print("\n❌ Fine-tuning failed. Check your dataset and try again.")


File uploaded. File ID: file-UzSiEPgVZcSyiiw93Suxtv
🚀 Fine-tuning started. Job ID: ftjob-jAppyEr4puCDNcTYxy549Jcq
📡 Status: validating_files
📡 Status: validating_files
📡 Status: validating_files
📡 Status: validating_files
📡 Status: validating_files
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: running
📡 Status: run

In [None]:
!pip install --upgrade openai


Collecting openai
  Downloading openai-1.68.0-py3-none-any.whl.metadata (25 kB)
Collecting sounddevice>=0.5.1 (from openai)
  Downloading sounddevice-0.5.1-py3-none-any.whl.metadata (1.4 kB)
Downloading openai-1.68.0-py3-none-any.whl (605 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m605.6/605.6 kB[0m [31m11.8 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading sounddevice-0.5.1-py3-none-any.whl (32 kB)
Installing collected packages: sounddevice, openai
  Attempting uninstall: openai
    Found existing installation: openai 1.66.3
    Uninstalling openai-1.66.3:
      Successfully uninstalled openai-1.66.3
Successfully installed openai-1.68.0 sounddevice-0.5.1


In [None]:
import openai
import os
from dotenv import load_dotenv

# 🔥 Explicitly load .env from Google Drive
env_path = "/content/drive/MyDrive/.env"
load_dotenv(dotenv_path=env_path)

api_key = os.getenv("OPENAI_API_KEY")

response = client.chat.completions.create(
    model="ft:gpt-3.5-turbo-0125:personal::BDKqehOH",
    messages=[
        {"role": "user", "content": "Generate 5 math question by algebra and give answer students."}
    ]
)

print(response.choices[0].message.content)


1. What is the solution to the equation 3x + 5 = 17?
Answer: 4.0
2. What is the solution to the equation 8x - 3 = 17?
Answer: 2.5
3. What is the solution to the equation 4*x=20?
Answer: 5.0
4. What is the solution to the equation 9*x+3=15?
Answer: 1.0
5. What is the solution to the equation 5*x-8=7?
Answer: 3.0


In [2]:
!git init


[33mhint: Using 'master' as the name for the initial branch. This default branch name[m
[33mhint: is subject to change. To configure the initial branch name to use in all[m
[33mhint: [m
[33mhint: 	git config --global init.defaultBranch <name>[m
[33mhint: [m
[33mhint: Names commonly chosen instead of 'master' are 'main', 'trunk' and[m
[33mhint: 'development'. The just-created branch can be renamed via this command:[m
[33mhint: [m
[33mhint: 	git branch -m <name>[m
Initialized empty Git repository in /content/.git/


In [3]:
!apt-get install git


Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
git is already the newest version (1:2.34.1-1ubuntu1.12).
0 upgraded, 0 newly installed, 0 to remove and 29 not upgraded.


In [None]:
!git remote add origin

fatal: not a git repository (or any of the parent directories): .git


In [6]:
!git config --global user.email "mirrahatfiverr@example.com"
!git config --global user.name "mirrahat"


In [7]:
!git checkout -b devb

Switched to a new branch 'devb'


In [9]:
token = "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"
repo = 'mirrahat/SHAT-AI-MATH'
branch = "devb"

!git add .
!git commit -m "Updated project files"
!git push https://{token}@github.com/{repo}.git {branch} --force

On branch devb
nothing to commit, working tree clean
Enumerating objects: 28, done.
Counting objects: 100% (28/28), done.
Delta compression using up to 2 threads
Compressing objects: 100% (21/21), done.
Writing objects: 100% (28/28), 8.42 MiB | 1.55 MiB/s, done.
Total 28 (delta 5), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (5/5), done.[K
remote: 
remote: Create a pull request for 'devb' on GitHub by visiting:[K
remote:      https://github.com/mirrahat/SHAT-AI-MATH/pull/new/devb[K
remote: 
To https://github.com/mirrahat/SHAT-AI-MATH.git
 * [new branch]      devb -> devb


In [5]:
!git add .
!git push  origin dev

error: src refspec dev does not match any
[31merror: failed to push some refs to 'https://github.com/mirrahat/SHAT-AI-MATH.git'
[m

In [None]:
!git commit -m "uploaded"

Author identity unknown

*** Please tell me who you are.

Run

  git config --global user.email "you@example.com"
  git config --global user.name "Your Name"

to set your account's default identity.
Omit --global to set the identity only in this repository.

fatal: unable to auto-detect email address (got 'root@0eebb413227d.(none)')


In [None]:
!git config --global user.name "mirrahat"
!git config --global user.email "mirrahatfiverr@example.com"


In [None]:
!git checkout -b devb

Switched to a new branch 'devb'


In [None]:
!git push origin devb


fatal: could not read Password for 'https://ghp_XYZ1234567890@github.com': No such device or address


In [1]:
!git remote -v


fatal: not a git repository (or any of the parent directories): .git
