**To fine-tune a GPT-3 Babbage-02 model**

for stock price prediction, we'll walk through the process step-by-step, from preparing the data to fine-tuning the model. Here's how you can do it conceptually, along with some code examples.

**Step 1: Load and Explore the Data**

First, let's load the provided stock data and examine its structure.

In [None]:
!pip install --upgrade openai --quiet

from google.colab import userdata
OPEN_AI_KEY=userdata.get('opeaikey4o')

from openai import OpenAI
import matplotlib.pyplot as plt
import time


client = OpenAI(api_key=OPEN_AI_KEY)

In [8]:
import pandas as pd

# Load the CSV file
file_path = 'GOOGLE_stock.csv'
data = pd.read_csv(file_path)

# Display the first few rows of the data
print(data.head())


In [None]:
len(data)

In [4]:
training_data_df=data[:int(len(data)*0.8)]
validation_data_df=data[int(len(data)*0.8):]

In [None]:
len(training_data_df)

In [None]:
len(validation_data_df)

In [8]:
training_data_df.head()

In [None]:
validation_data_df.head()

This code will load the CSV file and display the first few rows so we can understand what the data looks like. Typically, the columns might include Date, Open, High, Low, Close, Volume, etc.

**Step 2: Preprocess the Data**

We need to preprocess the data, converting it into a format that GPT  can understand. We'll create a textual representation of the data, since GPT models are designed to handle text.

**Scaling the Data**

We'll scale the numerical features to make the training process smoother and ensure the model's outputs are within a reasonable range.

# Preprocessing data

In [8]:
def create_text_dataset(data, look_back=5):
    X, y = [], []
    for i in range(len(data) - look_back):
        input_sequence = data.iloc[i:i + look_back]  # Use iloc for integer-based indexing
        target = data.iloc[i + look_back]['Close']  # Access 'Close' column by name
        input_text = f"Open: {input_sequence.iloc[-1]['Open']}, High: {input_sequence.iloc[-1]['High']}, Low: {input_sequence.iloc[-1]['Low']}, Volume: {input_sequence.iloc[-1]['Volume']}"
        print(input_text)
        output_text = f" Close: {target}"
        X.append(input_text)
        y.append(output_text)
    return X, y

look_back = 2  # Number of previous days to consider for prediction
X_text, y_text = create_text_dataset(training_data_df, look_back)  # Assuming 'training_data_df' is defined

# Combine input and output text for fine-tuning
train_texts = [input_text + output_text for input_text, output_text in zip(X_text, y_text)]

# Validation data

In [None]:
def create_text_dataset(data, look_back=5):
    X, y = [], []
    for i in range(len(data) - look_back):
        input_sequence = data.iloc[i:i + look_back]  # Use iloc for integer-based indexing
        target = data.iloc[i + look_back]['Close']  # Access 'Close' column by name
        input_text = f"Open: {input_sequence.iloc[-1]['Open']}, High: {input_sequence.iloc[-1]['High']}, Low: {input_sequence.iloc[-1]['Low']}, Volume: {input_sequence.iloc[-1]['Volume']}"
        output_text = f" Close: {target}"
        X.append(input_text)
        y.append(output_text)
    return X, y

look_back = 5  # Number of previous days to consider for prediction
X_text, y_text = create_text_dataset(validation_data_df, look_back)  # Assuming 'training_data_df' is defined

# Combine input and output text for fine-tuning
Validation_texts = [input_text + output_text for input_text, output_text in zip(X_text, y_text)]

# Prepare the data in the format required for fine-tuning


In [None]:

training_data = [{"prompt": x.split(" Close:")[0], "completion": x.split(" Close:")[1]} for x in train_texts]


In [None]:

validation_data = [{"prompt": x.split(" Close:")[0], "completion": x.split(" Close:")[1]} for x in Validation_texts]


### We then need to save our data as .jsonl files, with each line being one training example conversation.



In [None]:
import json
import openai
import os
import pandas as pd
from pprint import pprint
import time
import matplotlib.pyplot as plt

In [None]:
def write_jsonl(data_list:list,filename:str):
  with open(filename,"w") as out:
    for data in data_list:
      jout=json.dumps(data)+"\n"
      out.write(jout)



In [None]:
training_file_name="google_stock_training.jsonl"
write_jsonl(training_data,training_file_name)

validation_file_name="google_stock_validation.jsonl"
write_jsonl(validation_data,validation_file_name)

In [None]:
!head -n 5 google_stock_training.jsonl

In [None]:
!head -n 5 google_stock_validation.jsonl

In [None]:
!openai tools fine_tunes.prepare_data -f google_stock_training.jsonl  -q

In [None]:
!openai tools fine_tunes.prepare_data -f google_stock_validation.jsonl  -q

In [None]:
train_file=client.files.create(file=open("google_stock_training_prepared.jsonl", "rb"),purpose='fine-tune')
valid_file=client.files.create(file=open("google_stock_validation_prepared.jsonl", "rb"),purpose='fine-tune')


#Upload files
You can now upload the files to our Files endpoint to be used by the fine-tuned model.



**Step 3: Fine-Tune the GPT 3 Babbage-02 Model**

Fine-tuning GPT-3 Babbage generally requires access to OpenAI's fine-tuning API. Here's a conceptual approach to how this would be done:

Prepare the Dataset: Ensure your data is ready in the format required by GPT-4 3 Babbage.

Use OpenAI's API for Fine-Tuning: Fine-tune the model using OpenAI's API.
Here’s how you might structure the process conceptually:

**Dataset Preparation**

Prepare the dataset for fine-tuning by structuring the inputs and outputs into prompts and completions.

In [None]:
fine_tuning_job=client.fine_tuning.jobs.create(training_file=train_file.id,validation_file=valid_file.id,model="babbage-002")
print(fine_tuning_job)

FineTuningJob(id='ftjob-oSloWqk2PdQcQkxlD91ShgcC', created_at=1724350987, error=Error(code=None, message=None, param=None), fine_tuned_model=None, finished_at=None, hyperparameters=Hyperparameters(n_epochs='auto', batch_size='auto', learning_rate_multiplier='auto'), model='babbage-002', object='fine_tuning.job', organization_id='org-PRq3tYU2rFVBkygKcRdWRjwB', result_files=[], seed=1157125374, status='validating_files', trained_tokens=None, training_file='file-ySLbv8wO8ZuPjATSBzA80aXN', validation_file='file-zH1yjtOZuOD58pgPk8qprRaW', estimated_finish=None, integrations=[], user_provided_suffix=None)


In [None]:
retrieved_jobs=client.fine_tuning.jobs.retrieve(fine_tuning_job.id)
status=retrieved_jobs.status
print(status)

validating_files


In [None]:
fine_tuning_job_id=fine_tuning_job.id

In [None]:
fine_tuning_job_id='ftjob-oSloWqk2PdQcQkxlD91ShgcC'

In [None]:
while True:
  time.sleep(5)
  retrieved_jobs=client.fine_tuning.jobs.retrieve(fine_tuning_job_id)
  status=retrieved_jobs.status
  print(status)
  if(status=='succeeded'):
    break
  if(status=='failed'):
    break

In [None]:
fine_tune_results = client.fine_tuning.jobs.retrieve(fine_tuning_job_id)
ft_stock_model = fine_tune_results.fine_tuned_model
ft_stock_model

In [None]:
response=client.fine_tuning.jobs.list_events(fine_tuning_job_id)
events=response.data
events.reverse()

for event in events:
  print(event)

In [None]:
steps=[]
train_loss=[]
for e in events:
  if(e.data):
    steps.append(e.data['step'])
    train_loss.append(e.data['train_loss'])
print(steps)
print(train_loss)

In [None]:
plt.plot(steps,train_loss,marker='o',linestyle='-')

In [None]:
response = client.fine_tuning.jobs.retrieve(fine_tuning_job_id)
fine_tuned_model_id = response.fine_tuned_model

if fine_tuned_model_id is None:
    raise RuntimeError("Fine-tuned model ID not found. Your job has likely not been completed yet.")

print("Fine-tuned model ID:", fine_tuned_model_id)

In [None]:
system_message="You are a helpful stock market price prediction assistant. You have to predict the completion value for the given stock data"

In [None]:
!head -n 5 google_stock_validation_prepared.jsonl

In [None]:
test = pd.read_json('google_stock_validation_prepared.jsonl', lines=True)
test.head()

In [None]:
test['prompt'][0]

We need to use the same separator following the prompt which we used during fine-tuning. In this case it is

Based on the analysis we will perform the following actions:
- [Recommended] Add a suffix separator ` ->` to all prompts [Y/n]: Y
- [Recommended] Add a suffix ending `\n` to all completions [Y/n]: Y

Since we're concerned with classification, we want the temperature to be as low as possible, and we only require one token completion to determine the prediction of the model.



In [None]:
fine_tune_job = client.fine_tuning.jobs.retrieve(fine_tuning_job_id)


In [None]:
ft_model = fine_tune_job.fine_tuned_model
# note that this calls the legacy completions api - https://platform.openai.com/docs/api-reference/completions

res=client.completions.create(
    model=ft_model,
    prompt=test['prompt'][0]+"  ->  ",
    max_tokens=1,
    temperature=0,


)
res.choices[0].text

In [None]:
print(res)