## LLM APIs - HAVE A SEPARATE BRANCH TO RUN ALL OF THIS

First we need to investigate the metrics we will be using. We can do this and take an initial look at the data for this section - translating French and English.

In [1]:
from llm_utils import *

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
# df = pd.read_csv('data/translation/en-fr-filtered.csv')
df = pd.read_csv('data/translation/english_french_filtered.csv', index_col=0)
testing_df = df[:50] # This is what we will be using for the majority of our testing.

The line below loads in the model we will be using, it may take some time.

In [3]:
model_path = download_model("Unbabel/wmt22-comet-da")
model = load_from_checkpoint(model_path)

Fetching 5 files: 100%|██████████| 5/5 [00:00<00:00, 52038.51it/s]
Lightning automatically upgraded your loaded checkpoint from v1.8.3.post1 to v2.5.0.post0. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint ../../../../.cache/huggingface/hub/models--Unbabel--wmt22-comet-da/snapshots/f49d328952c3470eff6bb6f545d62bfdb6e66304/checkpoints/model.ckpt`
Encoder model frozen.
/Users/olinewcombe/Documents/GitHub/LLMWorkshop/.venv/lib/python3.12/site-packages/pytorch_lightning/core/saving.py:195: Found keys that are not in the model state dict but in the checkpoint: ['encoder.model.embeddings.position_ids']


In [6]:
data = [
    {
        "src": df.en.iloc[0], # Source text - this is what is to be translated
        "mt": df.fr.iloc[0], # Machine translation - this is what has been translated by our AI
        "ref": df.fr.iloc[0] # Reference text - this is the ground truth translation, or 'gold standard', what we are assuming is perfect.
    },
    {
        "src": df.en.iloc[0],
        "mt": "Hiii girlie",
        "ref": "Hiii girlie",
    },
]

model_output = model.predict(data)

GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


Predicting: 0it [00:00, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Predicting DataLoader 0: 100%|██████████| 1/1 [00:00<00:00,  1.39it/s]


In [7]:
model_output.scores

[0.9269278645515442, 0.9839427471160889]

Note how the second score is higher - why?

## Exercises

Play around with examples in languages you know, how good is it (genuine question)? 

Lets now move onto the main event -  the OpenAI API! You should have set up the API key as in the email sent out previously, if not, find the API key

We can do a quick test to see if you are connected.

In [4]:
client = OpenAI()
# client = OpenAI(api_key="sk-xxxxxxxxxxxxxxxx") # This is a stopgap solution, do not use this in production!!

completion = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": "write a haiku about ai"}
    ]
)

HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [9]:
print(completion.choices[0].message.content)

Silent lines of code,  
Dreams woven in algorithms,  
Life in bits and bytes.


Lets look at the structure of this. We initialise a client object, which allows us to access the OpenAI API and make requests via it. A method of this client is to create a completion, that is ask the LLM to complete a response from our prompt. The client has many other options:

In [9]:
dir(client)

['__annotations__',
 '__class__',
 '__class_getitem__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__enter__',
 '__eq__',
 '__exit__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__orig_bases__',
 '__parameters__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_base_url',
 '_build_headers',
 '_build_request',
 '_calculate_retry_timeout',
 '_client',
 '_custom_headers',
 '_custom_query',
 '_default_stream_cls',
 '_enforce_trailing_slash',
 '_idempotency_header',
 '_idempotency_key',
 '_limits',
 '_make_sse_decoder',
 '_make_status_error',
 '_make_status_error_from_response',
 '_maybe_override_cast_to',
 '_parse_retry_after_header',
 '_platform',
 '_prepare_options',
 '_prepare_request',
 '_prepare_url',
 '_process_response',
 '_process_response_data',
 '_p

We then specify the model along with our message to the API. This is familiar, our message 'content' here is exactly the sort of thing we may ask ChatGPT. But, we have the extra argument of 'role' - note that messages is a list! Here we are only utilising the 'user' role, so it is in essence what one would do on ChatGPT.com. There is more flexibility here however. 

In [10]:
completion = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "you are a dog and can only answer with 'woof'"}, # Technically the 'system' prompt for OpenAI is now called 'developer'.
        {"role": "user", "content": "write a haiku about ai"}
    ]
)

HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [11]:
print(completion.choices[0].message.content)

Woof, woof, woof, woof,  
Whispers of code and data,  
Woof, woof, dreams emerge.


As we see this has the potential to have a lot of impact and adds an extra level of complexity to prompt design. It can be shown that by simply telling the LLM it is an expert in something, or giving it increased contextual clues even if they are not directly relevant to the task, can improve performance.

Another type of role is that of the assistant. It can also be used to show the model what to do, but isn't so forceful.

In [12]:
completion = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": "write a haiku about ai"},
        {"role": "assistant", "content": "woof"},
        {"role": "user", "content": "write a haiku about data science"},
        {"role": "assistant", "content": "woof"},
        {"role": "user", "content": "write a haiku about machine learning"},
        {"role": "assistant", "content": "woof"},
        {"role": "user", "content": "write a haiku about deep learning"},
    ]
)

HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [13]:
print(completion.choices[0].message.content)

Layers intertwine,  
Neurons spark with thought and dreams,  
Data whispers truth.


This indicates using the assistant messages has more of a subtle effect. It is 'show' not 'tell' - the prior beliefs of gpt4o-mini as to what a good response is in this case are overriding our examples. LLMs are probabilistic models - this is almost Bayesian. We shall quantify the effect of this <b>few-shot prompting</b> later - it is very effective in the right use cases, this one is rather forced.

These methods so far are tinkering with the prompt via natural language. This is hard to quantify - how do you test different prompts in a principled way? Some parameters that are easier to visualise are temperature and top p.

Temperature defaults to 1 from [0,2]. It is a measure of how 'random' our output will be, 0 represents close to deterministic behaviour, whereas 2 is more creative.

## Exercise

Repeat the same prompts with varying temperature and top p to assess the effect they have. Only vary one at a time.

In [16]:
prompt = ''
temperature = 1
top_p = 1

completion = client.chat.completions.create(
    model="gpt-4o-mini",
    temperature=temperature,
    top_p=top_p,
    messages=[
        {"role": "user", "content": prompt} # Throughout please try to keep token utilisation low
    ]
)

HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [17]:
print(completion.choices[0].message.content)

Hello! How can I assist you today?


## Exercise

Question: For our use case, translation, what sort of temperature/top p should we have?

Now we can move onto getting a baseline for our dataset. We can do this simply by looping over.

## Exercise 

1) Use BasicAPICall() from llm_utils to write a function that loops over our dataset, translates the <b>French text into English</b>, and stores the results in a dataframe. Below is a structure for the code.

In [5]:
system_prompt =''

def BaselineEval(df, system_prompt):
    # Setup output structure here
    for en, fr in zip(df.en, df.fr):
        # Loop logic here including BasicAPICall() and adding to output structure in a way that makes it easy to process
        # f strings or .format() could be useful here for the prompt
        pass
    new_df = '' # I suggest putting this into a dataframe, potentially combined with the original df for ease. Not essential, look at pd.concat
    return new_df

In [6]:
def BaselineEval(df, system_prompt, model="gpt-4o-mini",):
    output_list = []
    for en, fr in zip(df.en, df.fr):
        prompt = f"Translate the following French text into English: {fr}"
        output_text = BasicAPICall(prompt, model = model, system_prompt=system_prompt)
        output_list.append(output_text)
    new_df = pd.concat([df, pd.Series(output_list, name = 'translation')], axis=1)
    return new_df

Then we can run it:

In [13]:
system_prompt = """You are an expert in language translation, especially translating French into English. 
                    Reply with the only translated text, with no other information."""
evals_df = BaselineEval(testing_df, system_prompt)

HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.co

## Exercise 

Write functions to, given this new df, compute COMET scores for each translation. Add this as a new column.

This can be done either individually, like this:

In [8]:
def SingleComet(src, mt, ref):
    data = [
        {
            "src": src, # Source text - this is what is to be translated
            "mt": mt, # Machine translation - this is what has been translated by our AI
            "ref": ref # Reference text - this is the ground truth translation, or 'gold standard', what we are assuming is perfect.
        }
    ]
    model_output = model.predict(data)
    return model_output.scores[0]

def DataframeComet(df):
    scores = []
    for en, fr, translation in zip(df.en, df.fr, df.translation):
        scores.append(SingleComet(fr, translation, en))
    new_df = pd.concat([df, pd.Series(scores, name = 'scores')], axis=1)
    return new_df

Or (Ask ChatGPT to) use batch processing for performance improvements.

In [9]:
def SingleCometBatch(data):
    """
    Process a batch of data through the COMET model.
    Args:
        data (list of dicts): Each dict contains 'src', 'mt', and 'ref' keys.
    Returns:
        list of float: COMET scores for each input in the batch.
    """
    model_output = model.predict(data)
    return model_output.scores

def DataframeCometBatch(df, batch_size=32):
    """
    Process a dataframe with COMET scoring using batching for efficiency.
    Args:
        df (pd.DataFrame): DataFrame containing 'en', 'fr', and 'translation' columns.
        batch_size (int): Number of examples to process in a single batch.
    Returns:
        pd.DataFrame: DataFrame with an added 'scores' column.
    """
    data = [
        {
            "src": fr,  # Source text (fr)
            "mt": translation,  # Machine translation
            "ref": en  # Reference text (en)
        }
        for en, fr, translation in zip(df.en, df.fr, df.translation)
    ]
    
    scores = []
    for i in range(0, len(data), batch_size):
        batch = data[i:i+batch_size]
        scores.extend(SingleCometBatch(batch))
    
    # Add the scores as a new column to the DataFrame
    df['scores'] = scores
    return df

In [27]:
scores_df = DataframeCometBatch(evals_df, batch_size=64)

GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


Predicting DataLoader 0: 100%|██████████| 4/4 [00:02<00:00,  1.40it/s]


In [17]:
scores_df.loc[26].en, scores_df.loc[26].translation

('This is why we call upon the Commission and the Council to support this process by joining us in making representations to the Tunisian authorities.',
 'That is why we are asking the Commission and the Council to support this process, and we are also appealing to the Tunisian authorities.')

We have our scores, can we improve them? Let's try changing the temperature first:

In [18]:
def CustomTempEval(df, system_prompt, temperature = 1):
    output_list = []
    for en, fr in zip(df.en, df.fr):
        message = {}
        prompt = f"Translate the following French text into English: {fr}"
        
        completion = client.chat.completions.create(
            model="gpt-4o-mini",
            temperature=temperature,
            messages=[
                {"role": "system", "content": system_prompt}, 
                {"role": "user", "content": prompt} 
            ]
        )
        
        output_list.append(completion.choices[0].message.content)
    new_df = pd.concat([df, pd.Series(output_list, name = 'translation')], axis=1)
    return new_df

In [19]:
system_prompt = """You are an expert in language translation, especially between English and French. 
                    Reply with the only translated text, with no other information."""
temp_evals_df = CustomTempEval(testing_df, system_prompt, temperature = 0)

HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.co

In [20]:
temp_scores_df = DataframeCometBatch(temp_evals_df)
temp_scores_df

GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


Predicting DataLoader 0: 100%|██████████| 2/2 [00:03<00:00,  1.58s/it]
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
Predicting DataLoader 0: 100%|██████████| 2/2 [00:01<00:00,  1.41it/s]


Unnamed: 0.1,Unnamed: 0,en,fr,translation,scores
0,0,"Although, as you will have seen, the dreaded '...","Comme vous avez pu le constater, le grand ""bog...","As you may have noticed, the great ""Y2K bug"" d...",0.761
1,1,It is nevertheless the case that not a penny m...,Il n'en demeure pas moins que pas un centime d...,"Nevertheless, not a single cent more than what...",0.84127
2,2,"Furthermore, although there might be internati...","En outre, quand bien même on parviendrait à un...","Furthermore, even if a global agreement were r...",0.851953
3,3,(NL) That the Florenz report has caused such a...,(NL) L'ampleur des réactions suscitées par le ...,The scale of the reactions triggered by the Fl...,0.859365
4,4,"However, these isolated elements which make se...","Néanmoins, ces paragraphes, marqués au coin du...","Nevertheless, these paragraphs, marked by comm...",0.832945
5,5,It is good for us that we have the power to do...,C'est une bonne chose pour nous que nous ayons...,It is a good thing for us that we have the pow...,0.895085
6,6,"Mr President, the agreement being promoted wil...","Monsieur le Président, l'accord proposé consti...","Mr. President, the proposed agreement will be ...",0.880732
7,7,"Moreover, Europe is still the main destination...",De plus l'Europe demeure la principale destina...,"Moreover, Europe remains the main tourist dest...",0.877902
8,8,Fish escapes from aquaculture premises have be...,On a signalé par le passé la fuite de poissons...,"In the past, the escape of fish from aquacultu...",0.882934
9,9,"Mr President of the Commission, when you enter...","Monsieur le Président de la Commission, lorsqu...","Mr. President of the Commission, when you took...",0.884391


In [21]:
temp_scores_df.scores.mean(), scores_df.scores.mean()

(0.853339056968689, 0.8526142728328705)

(Hopefully!) we see a small increase in performance. This is small, partially due to French being a fairly popular language so is already very optimised. If you would like, please do play around with less common ones - you will see a large increase!

In practice, it is rare that individual calls are made to the API - it is inefficient as the same basic infrastructure to run the code is repeated each time. It is also less cost effective than doing it in bulk (2x). As such we look to batch processing to process our translations. 

Rather than writing requests in the same form we have been, we need to create a .jsonl file to contain our requests. Here is an example of the format we want for our .jsnol files. One request per line, accessing the completions branch of the API and allowing for differing user/system/assistant messages and parameters. <b>The custom_id needs to be specified uniquely each time.</b>

In [54]:
{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Hello world!"}],"max_tokens": 1000}}
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "system", "content": "You are an unhelpful assistant."},{"role": "user", "content": "Hello world!"}],"max_tokens": 1000}}

{'custom_id': 'request-2',
 'method': 'POST',
 'url': '/v1/chat/completions',
 'body': {'model': 'gpt-4o-mini',
  'messages': [{'role': 'system',
    'content': 'You are an unhelpful assistant.'},
   {'role': 'user', 'content': 'Hello world!'}],
  'max_tokens': 1000}}

## Exercise

Decide on your system and user prompts and use the function below to generate the jsonl file. Have a look and see if you want to change anything, e.g. temperature, top_p, the amount of few-shot examples. If you would like to, change the functions in llm_utils.py and restart the notebook to apply the changes.

In [22]:
system_prompt = ''
few_shot_df = df[-3:] # See if you want to change the number, or where you sample from, or include your own!

In [39]:
system_prompt = """You are an expert in language translation, especially between English and French. Reply with the only translated text, with no other information.""" # Change this
output_file = "few_shot_batch.jsonl"
    
create_jsonl_data_with_custom_id(testing_df, few_shot_df, system_prompt, output_file)

JSONL file saved to few_shot_batch.jsonl


As an aside, these functions were entirely generated by the free chats available for GPT4o:

<em>"My task is evaluating machine translations, and for that I need to get the translations. as such,I want to create jsonl files for batch processing on the OpenAI API. I have 3 few-shot prompts stored in a dataframe, called few_shot_df, with columns fr for a french sentence and en for english. I also have a dataframe called df which has all the other translations I want to perform, with the same 2 columns of fr and en. My system prompt is 

system_prompt = """You are an expert in language translation, especially between English and French. Reply with the only translated text, with no other information."""

My prompt structure is prompt = f"Translate the following French text into English: {df.fr.iloc[idx]}", where the input of the f-string is the french text I want to translate into english. Write me code to create a jsonl file that calls gpt-4o-mini with the given system prompt, 3-shot prompting and then the given prompt."</em>

Then:

<em>"Can you refactor these into useful functions that are not too long - nice and modular"</em>

Now we can ask the API to do the batch processing!

## Exercise

Use https://platform.openai.com/docs/guides/batch (can absolutely just copy/paste code) to begin the batch job - put your name or an identifier in the metadata description! Use client.batches.retrieve to take a look at the progress.

In [40]:
client = OpenAI()

# Upload the batch file to OpenAI
batch_input_file = client.files.create(
    file=open(output_file, "rb"),
    purpose="batch"
)

HTTP Request: POST https://api.openai.com/v1/files "HTTP/1.1 200 OK"


In [25]:
batch_input_file_id = batch_input_file.id
client.batches.create(
    input_file_id=batch_input_file_id,
    endpoint="/v1/chat/completions",
    completion_window="24h",
    metadata={
        "description": "workshop job - Oli's Batch"
    }
)

HTTP Request: POST https://api.openai.com/v1/batches "HTTP/1.1 200 OK"


Batch(id='batch_6797e1f1b4808190af5687b96c1d51ea', completion_window='24h', created_at=1738007025, endpoint='/v1/chat/completions', input_file_id='file-Mo8C6ffvYBdvfZ17dPP5p3', object='batch', status='validating', cancelled_at=None, cancelling_at=None, completed_at=None, error_file_id=None, errors=None, expired_at=None, expires_at=1738093425, failed_at=None, finalizing_at=None, in_progress_at=None, metadata={'description': "workshop job - Oli's Batch"}, output_file_id=None, request_counts=BatchRequestCounts(completed=0, failed=0, total=0))

In [26]:
batch = client.batches.retrieve("batch_6797e1f1b4808190af5687b96c1d51ea")
print(batch)

HTTP Request: GET https://api.openai.com/v1/batches/batch_6797e1f1b4808190af5687b96c1d51ea "HTTP/1.1 200 OK"


Batch(id='batch_6797e1f1b4808190af5687b96c1d51ea', completion_window='24h', created_at=1738007025, endpoint='/v1/chat/completions', input_file_id='file-Mo8C6ffvYBdvfZ17dPP5p3', object='batch', status='in_progress', cancelled_at=None, cancelling_at=None, completed_at=None, error_file_id=None, errors=None, expired_at=None, expires_at=1738093425, failed_at=None, finalizing_at=None, in_progress_at=1738007027, metadata={'description': "workshop job - Oli's Batch"}, output_file_id=None, request_counts=BatchRequestCounts(completed=0, failed=0, total=50))


In [48]:
file_response = client.files.content("file-")

output_file = 'few_shot_batch_results.jsonl'
with open(output_file, "w", encoding="utf-8") as f:
        f.write(file_response.text)

HTTP Request: GET https://api.openai.com/v1/files/file-HNsiZeKifsjezZX7iXUFbg/content "HTTP/1.1 200 OK"


In [49]:
few_shot_results_df = add_translations_to_df(testing_df, output_file, translation_column='translation')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["custom_id"] = [f"request-{idx + 1}" for idx in range(len(df))]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[translation_column] = df["custom_id"].map(translation_map)


In [52]:
few_shot_scores_df = DataframeCometBatch(few_shot_results_df)
few_shot_scores_df.scores.mean()

GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


Predicting DataLoader 0: 100%|██████████| 2/2 [00:01<00:00,  1.16it/s]
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
Predicting DataLoader 0: 100%|██████████| 2/2 [00:00<00:00,  2.19it/s]


0.8110618817806244

This will take some time, so let's come back to the results later. The process to fine-tune a model is very very similar.

In [42]:
def CreateFineTuneJsonl(df, system_prompt, output_file):
    content = []
    for idx, row in df.iterrows():
        messages = [create_system_message(system_prompt), 
        {"role" : "user", "content" : f"Translate the following French text into English: {row.fr}"},  
        {"role" : "assistant", "content" : f"{row.en}"}]
        content.append({"messages": messages})
    save_to_jsonl(content, output_file)
    return None

In [43]:
finetune_df = df[-100:]
output_file = "finetune_batch.jsonl"
system_prompt = """You are an expert in language translation, especially between English and French. Reply with the only translated text, with no other information."""

In [44]:
CreateFineTuneJsonl(finetune_df, system_prompt, output_file)

JSONL file saved to finetune_batch.jsonl


In [45]:
client.files.create(
  file=open(output_file, "rb"),
  purpose="fine-tune"
)

HTTP Request: POST https://api.openai.com/v1/files "HTTP/1.1 200 OK"


FileObject(id='file-KzegkpfUWB9Lb58dyvF78K', bytes=64369, created_at=1738009502, filename='finetune_batch.jsonl', object='file', purpose='fine-tune', status='processed', status_details=None)

In [46]:
client.fine_tuning.jobs.create(
    training_file="file-KzegkpfUWB9Lb58dyvF78K",
    model="gpt-4o-mini-2024-07-18"
)

HTTP Request: POST https://api.openai.com/v1/fine_tuning/jobs "HTTP/1.1 200 OK"


FineTuningJob(id='ftjob-2C2pXoYsYbf3Ll0yf1w6a6s4', created_at=1738009567, error=Error(code=None, message=None, param=None), fine_tuned_model=None, finished_at=None, hyperparameters=Hyperparameters(batch_size='auto', learning_rate_multiplier='auto', n_epochs='auto'), model='gpt-4o-mini-2024-07-18', object='fine_tuning.job', organization_id='org-Dm9H5NobPutslK2NG4vsyAm6', result_files=[], seed=1681402647, status='validating_files', trained_tokens=None, training_file='file-KzegkpfUWB9Lb58dyvF78K', validation_file=None, estimated_finish=None, integrations=[], method=Method(dpo=None, supervised=MethodSupervised(hyperparameters=MethodSupervisedHyperparameters(batch_size='auto', learning_rate_multiplier='auto', n_epochs='auto')), type='supervised'), user_provided_suffix=None)

In [7]:
client.fine_tuning.jobs.retrieve("ftjob-2C2pXoYsYbf3Ll0yf1w6a6s4").fine_tuned_model

HTTP Request: GET https://api.openai.com/v1/fine_tuning/jobs/ftjob-2C2pXoYsYbf3Ll0yf1w6a6s4 "HTTP/1.1 200 OK"


'ft:gpt-4o-mini-2024-07-18:oliprojects::AuQ6VO4e'

In [10]:
system_prompt = """You are an expert in language translation, especially translating French into English. 
                    Reply with the only translated text, with no other information."""
finetune_evals_df = BaselineEval(testing_df, system_prompt, model = 'ft:gpt-4o-mini-2024-07-18:oliprojects::AuQ6VO4e')

HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.co

In [12]:
finetune_scores_df = DataframeCometBatch(finetune_evals_df)

GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


Predicting DataLoader 0: 100%|██████████| 2/2 [00:01<00:00,  1.26it/s]
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
Predicting DataLoader 0: 100%|██████████| 2/2 [00:00<00:00,  2.37it/s]


In [13]:
finetune_scores_df.scores.mean()

0.8390222835540772

# HERES ONE I FINETUNED EARLIER!! - whilst we wait I can talk about UKHSA RESULTS