# Reading & Preprocessing the dataset


In [6]:
prompt = """
you are an IELTS examiner. your task is to evaluate a writing section in an IELTS academic
exam. you have to provide overall band score in <BAND_SCORE> </BAND_SCORE> tags and detailed evaluation in <EVALUATION></EVALUATION> tags . I will provide you the grading
criteria in <CRITERIA> </CRITERIA> tags. The user will send you the task and his answer and you should respond with a feedback on how well does the user follow the grading criteria and his score. Provide his score in this format <Score>Score</Score>.
<CRITERIA>
TASK RESPONSE (TR)
For Task 2 of both AC and GT Writing tests, candidates are required to formulate and
develop a position in relation to a given prompt in the form of a question or
statement, using a minimum of 250 words. Ideas should be supported by evidence,
and examples may be drawn from a candidate’s own experience.
The TR criterion assesses:
▪ how fully the candidate responds to the task.
▪ how adequately the main ideas are extended and supported.
▪ how relevant the candidate’s ideas are to the task.
▪ how clearly the candidate opens the discourse, establishes their position and
formulates conclusions.
▪ how appropriate the format of the response is to the task.
COHERENCE AND COHESION (CC)
This criterion is concerned with the overall organisation and logical development of
the message: how the response organises and links information, ideas and language.
Coherence refers to the linking of ideas through logical sequencing, while cohesion
refers to the varied and appropriate use of cohesive devices (e.g. logical connectors,
conjunctions and pronouns) to assist in making clear the relationships between and
within sentences.
The CC criterion assesses:
▪ the coherence of the response via the logical organisation of information
and/or ideas, or the logical progression of the argument.
▪ the appropriate use of paragraphing for topic organisation and presentation.
▪ the logical sequencing of ideas and/or information within and across
paragraphs.
▪ the flexible use of reference and substitution (e.g. definite articles, pronouns).
▪ the appropriate use of discourse markers to clearly mark the stages in a
response, e.g. [First of all | In conclusion], and to signal the relationship between ideas and/or information, e.g. [as a result | similarly].

LEXICAL RESOURCE (LR)
This criterion refers to the range of vocabulary the candidate has used and the
accuracy and appropriacy of that use in terms of the specific task.
The LR criterion assesses:
▪ the range of general words used (e.g. the use of synonyms to avoid repetition).
▪ the adequacy and appropriacy of the vocabulary (e.g. topic-specific items,
indicators of writer’s attitude).
▪ the precision of word choice and expression.
▪ the control and use of collocations, idiomatic expressions and sophisticated
phrasing.
▪ the density and communicative effect of errors in spelling.
▪ the density and communicative effect of errors in word formation.
GRAMMATICAL RANGE AND ACCURACY (GRA)
This criterion refers to the range and accurate use of the candidate’s grammatical
resource via the candidate’s writing at sentence level.
The GRA criterion assesses:
▪ the range and appropriacy of structures used in a given response (e.g. simple,
compound and complex sentences).
▪ the accuracy of simple, compound and complex sentences.
▪ the density and communicative effect of grammatical errors.
▪ the accurate and appropriate use of punctuation.
</CRITERIA>
"""

In [4]:
import pandas as pd
import json

df = pd.read_csv('datasets/ielts_buddy_dataset.csv')
df.head()

Unnamed: 0,Question,Answer,Feedback,Final Score
0,As countries have developed there has been a t...,Many countries around the world are becoming r...,The family size essay is well-organized - the ...,9.0
1,Many men and women are making the decision to ...,"In the past, it was a natural step that a coup...",This essay on having children later in life w...,8.5
2,"Today, people in many countries have to spend ...","In today's world, many individuals find that t...",The essay would achieve a high score for IELTS...,8.5
3,"Nowadays, families are not as close as in the ...","In today's world, many individuals find that t...",This family values essay would merit a high IE...,9.0
4,"In recent times, many people are making the de...",There has been a tendency in many countries ov...,This living alone essay would receive a high I...,9.0


In [11]:
# Converting the dataset to the format needed for fine-tuning
dataset = []
for index, row in df.iterrows():
    message = {
        "messages": [
            {"role": "system", "content": prompt},
            {"role": "user", "content": f"Here is the task:\n <Task>{row['Question']}</Task> \n And here is my answer: \n <Answer>{row['Answer']}</Answer>"},
            {"role": "assistant", "content": f"Feedback: {row['Feedback']}\n <Score>{row['Final Score']}</Score>."}
        ]
    }
    dataset.append(message)

print(dataset[0])

{'messages': [{'role': 'system', 'content': '\nyou are an IELTS examiner. your task is to evaluate a writing section in an IELTS academic\nexam. you have to provide overall band score in <BAND_SCORE> </BAND_SCORE> tags and detailed evaluation in <EVALUATION></EVALUATION> tags . I will provide you the grading\ncriteria in <CRITERIA> </CRITERIA> tags. The user will send you the task and his answer and you should respond with a feedback on how well does the user follow the grading criteria and his score. Provide his score in this format <Score>Score</Score>.\n<CRITERIA>\nTASK RESPONSE (TR)\nFor Task 2 of both AC and GT Writing tests, candidates are required to formulate and\ndevelop a position in relation to a given prompt in the form of a question or\nstatement, using a minimum of 250 words. Ideas should be supported by evidence,\nand examples may be drawn from a candidate’s own experience.\nThe TR criterion assesses:\n▪ how fully the candidate responds to the task.\n▪ how adequately the

In [12]:
# Saving the dataset in jsonl format
with open('datasets/ielts_buddy_dataset.jsonl', 'w') as outfile:
    for entry in dataset:
        json.dump(entry, outfile)
        outfile.write('\n')

## Fine-tuning

In [18]:
import os
import openai
from utils import *

api_key = open_file('../openapikey.txt')

openai.api_key = api_key

# Uploading the file
with open('datasets/ielts_buddy_dataset.jsonl', 'rb') as file:
  response = openai.File.create(file=file, purpose='fine-tune')
  file_id = response['id']
  print("File Uploaded Successfully with ID: ", file_id)

File Uploaded Successfully with ID:  file-vFZOLs5IxtUWMiFD1oO8jGQS


In [20]:
model_name = "gpt-3.5-turbo"

# Creating Finetuning job
response = openai.FineTuningJob.create(
    training_file=file_id,
    model=model_name
)

job_id = response['id']
print("Job Created Successfully with ID: ", job_id)

Job Created Successfully with ID:  ftjob-1lmYuHosNzjxv113ho6cvfE6
