# Development of a Practice System for Job Interview (PSJI) using LLM

## Contents

>[Development of a Practice System for Job Interview (PSJI) using LLM](#scrollTo=VycP4gsjHmCp)

>>[Contents](#scrollTo=h1bAmKuR8KKj)

>>[Installing required libraries](#scrollTo=9lODnqus6-8U)

>>[GPT-3.5 Turbo](#scrollTo=2iI1aqS67H9w)

>>[Fine-tuning GPT-3 Turbo](#scrollTo=rfk6ySB67ME_)

>>>[Prepare Data](#scrollTo=fIyTVBR5I66Q)

>>>[Upload files](#scrollTo=MO5HkqpuJTYe)

>>>[Create a fine-tuning job](#scrollTo=0pBGW_OlJKyH)

>>>[Use fine-tuned module](#scrollTo=aeZOaacmKMxT)

>>[GPT-4](#scrollTo=LofpL_gu7QFB)

>>[Evaluations](#scrollTo=rEu04CGUqRjQ)



## Installing required libraries

In [None]:
!pip install openai tensorrt
!pip install spacy scikit-learn
!python -m spacy download en_core_web_md
!pip install --upgrade urllib3
!pip install bert-score

In [None]:
import openai
import pandas as pd
import json
import spacy
from sklearn.metrics.pairwise import cosine_similarity
import warnings
warnings.filterwarnings('ignore')

## GPT-3.5 Turbo

In [None]:
system_prompt = '''You are a job interviewer.
                You will provide feedback on user's answers to your questions.
                Provide the feedback in two parts:
                1. Good feedback 2. To improve '''

assistant_prompt = "How do you handle receiving constructive criticism?"
user_response = "I view constructive criticism as an opportunity to grow and improve. I always listen attentively to feedback, reflect on it, and work on implementing the suggested improvements. I also appreciate open communication and believe it fosters a collaborative and productive work environment."
reference = "Good feedback: The answer demonstrates a positive and receptive attitude towards constructive criticism. Shows a proactive approach to personal growth and development. To improve: Could mention a specific example of a time when they received and successfully implemented constructive criticism. Might discuss how they encourage and facilitate feedback from colleagues or superiors."

In [None]:
results = openai.ChatCompletion.create(
  model="gpt-3.5-turbo",
  temperature = 0.2,
  top_p = 0.1,
  messages=[
        {"role": "system", "content": system_prompt},
        {"role": "assistant", "content": prompt},
        {"role": "user", "content": content_string},
    ]
)

feedback_hypothesis_non_tuned = results["choices"][0]["message"]["content"]

## Fine-tuning GPT-3 Turbo

### 1. Prepare Data

In [None]:
# Load the CSV dataset into a pandas DataFrame
df = pd.read_csv('interview_dataset.csv', encoding='utf-8')

# Initialize an empty list to store the JSON objects
json_objects = []

# Iterate through each row in the DataFrame
for index, row in df.iterrows():
    # Create a dictionary in the desired format
    json_dict = {
        "messages": [
            { "role": "system", "content": f"You are a job interviewer. You will provide feedback on user's answers to your questions. You have asked this question: {row['Question']}. Provide the feedback in two parts: 1. Good feedback 2. To improve " },
            { "role": "user", "content": row['Answer'] },
            { "role": "assistant", "content": row['Feedback'] }
        ]
    }

    # Append the dictionary to the list
    json_objects.append(json_dict)

# Specify the output JSONL file
jsonl_file = 'processed_data.jsonl'

# Write the list of JSON objects to the JSONL file
with open(jsonl_file, 'w', encoding='utf-8') as jsonlfile:
    for json_obj in json_objects:
        # Serialize the dictionary to a JSON string and write it to the JSONL file
        jsonlfile.write(json.dumps(json_obj, ensure_ascii=False) + '\n')

print(f"Conversion completed. JSONL file saved as '{jsonl_file}'.")


Conversion completed. JSONL file saved as 'processed_data.jsonl'.


### 2. Upload files

In [None]:
def open_file(filepath):
  with open(filepath, 'r', encoding='utf-8') as infile:
    return infile.read()

def save_file(filepath, content):
  with open(filepath, 'a', encoding='utf-8') as outfile:
    outfile.write(content)

In [None]:
api_key = open_file('apikey.txt')
openai.api_key = api_key

In [None]:
with open('processed_data.jsonl', 'rb') as file:
  response = openai.File.create(
      file=file,
      purpose='fine-tune'
  )
file_id = response['id']

In [None]:
print(f'File uploaded with ID: {file_id}')

File uploaded with ID: file-t4tDVbPBaecptDAcRj0uplYO


### 3. Create a fine-tuning job

In [None]:
response = openai.FineTuningJob.create(
    training_file=file_id,
    model='gpt-3.5-turbo'
)

job_id = response['id']
print(f'Fine-tuning job created successfully with ID: {job_id}')

In [None]:
job_id = 'ftjob-8hsMvJ0JczaXVnIXakWNSYw2'

### 4. Use fine-tuned module

In [None]:
results = openai.ChatCompletion.create(
  model="ft:gpt-3.5-turbo-0613:personal::7xER4tHy",
  temperature = 0.2,
  top_p = 0.1,
  messages=[
        {"role": "system", "content": system_prompt},
        {"role": "assistant", "content": prompt},
        {"role": "user", "content": content_string},
    ]
)

feedback_hypothesis_fine_tuned = results["choices"][0]["message"]["content"]

## Analyzing Feedback

In [None]:
def format_feedback(text):
    lines = text.split('.')  # Split the input text into lines
    lines.pop()
    feeds = ""
    for line in lines:
      feed = line.rstrip(' ').lstrip(' ')
      if 'Good feedback:' in feed:
        feeds += 'Good feedback: \n'
        feeds += '・' + str(feed.split(': ')[1]) +'\n'
      elif 'To improve:' in feed:
        feeds += 'To improve: \n'
        feeds += '・' + str(feed.split(': ')[1]) + '\n'
      else:
        feeds += '・' + feed + '\n'
    return feeds

In [None]:
formatted_feedback = format_feedback(results["choices"][0]["message"]["content"])
print(formatted_feedback)

Good feedback: 
・The answer demonstrates a positive and proactive approach to receiving constructive criticism
・Highlights a willingness to learn and grow from feedback
To improve: 
・Could mention a specific example of how they have applied constructive criticism in the past
・Might add a brief mention of how they provide feedback to others, showcasing a well-rounded approach to communication



## Evaluations

In [None]:
# Load the spaCy model
nlp = spacy.load("en_core_web_md")

# Process the sentences using spaCy
doc_reference = nlp(reference)
doc_gpt35 = nlp(feedback_hypothesis_non_tuned)
doc_gpt35tuned = nlp(feedback_hypothesis_fine_tuned)
doc_gpt4 = nlp(feedback_hypothesis_gpt4)

# Calculate the cosine similarity between sentence vectors
similarity_score_gpt35 = cosine_similarity([doc_reference.vector], [doc_gpt35.vector])[0][0]
similarity_score_gpt35tuned = cosine_similarity([doc_reference.vector], [doc_gpt35tuned.vector])[0][0]
similarity_score_gpt4 = cosine_similarity([doc_reference.vector], [doc_gpt4.vector])[0][0]

print(f"Similarity Score: {similarity_score_gpt35}")
print(f"Similarity Score: {similarity_score_gpt35tuned}")
print(f"Similarity Score: {similarity_score_gpt4}")

Similarity Score: 0.9717057347297668
Similarity Score: 0.9781246185302734
Similarity Score: 0.9602669477462769


In [None]:
!bert-score -r /content/data/reference.txt -c /content/data/feedback_hypothesis_fine_tuned.txt --lang en

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.33.1)_fast-tokenizer P: 0.943326 R: 0.951863 F1: 0.947575


In [None]:
!bert-score -r /content/data/reference.txt -c /content/data/feedback_hypothesis_non_tuned.txt --lang en

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.33.1)_fast-tokenizer P: 0.893636 R: 0.898526 F1: 0.896074


In [None]:
!bert-score -r /content/data/reference.txt -c /content/data/feedback_gpt4.txt --lang en

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.33.1)_fast-tokenizer P: 0.875962 R: 0.923553 F1: 0.899128
