# In the Loop Labeling

In-the-loop AI Assisted Labeling: adds functionality for users to review and correct the initial AI’s labels – the user will review and correct the labels and these corrections will be collected and used as new training data

New training data: used to re-train the initial model (to fine-tune the initial AI model, can be a basic retraining loop or a more complex ML)

Iterative Improvement: for this part, we just repeat the process above of generating the labels, gathering the corrections as new training data, retraining the model on it, etc. and the model should become more accurate each time)

Visualizations: we need to visualize the improvement in accuracy over each iteration, which we can save to a csv file when we collect the new training data – but we also need to tell the user the accuracy each time the model is used so we can generate the accuracy data and print it to user and simultaneously export it to csv file

Export functionality: to allow users to export the newly labeled data back to csv

In [7]:
# Importing necessary libraries 
import os
import getpass
import pandas as pd
from openai import OpenAI

In [8]:
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

In [9]:
import time

In [10]:
from openai import OpenAI
client = OpenAI()

In [11]:
from ai_assisted_coding_final import assistant

In [12]:
assistant_manager = assistant.OpenAIAssistantManager(client)

# Create an assistant
#assistant_manager.create_assistant()
assistant_manager.create_custom_assistant()

asst_Y0SLlsCDqyv2TTqVMKPYo4sA


Assistant(id='asst_Y0SLlsCDqyv2TTqVMKPYo4sA', created_at=1702108882, description='A custom tool for classifying teacher utterances using gpt-3.5-turbo.', file_ids=[], instructions='You are the co-founder of an ed-tech startup training an automated teacher feedback tool to classify utterances made. I am going to provide several sentences. \n                                            Please classify each sentence as one of the following: OTR (opportunity to respond), PRS (praise), REP (reprimand), or NEU (neutral)\n                                            Only answer with the following labels: OTR, PRS, REP, NEU', metadata={}, model='gpt-3.5-turbo', name='Custom Teacher Utterances Classifier', object='assistant', tools=[])

In [13]:
def read_csv(file_path):
    df = pd.read_csv(file_path)
    return df['Text'].tolist()


In [14]:
def process_lines(lines, assistant_manager):
    data = []
    for line in lines:
        thread, completed_run = assistant_manager.create_thread_and_run(line)
        
        # Wait for the response and get it
        response_page = assistant_manager.get_response()

        # Collect all messages from the response page
        messages = [msg for msg in response_page]  # Iterating over the response page to collect messages

        # Extract the user message and the assistant's label
        if messages:
            user_message = line
            # Assuming the last message is from the assistant and contains the label
            assistant_message = messages[-1].content[0].text.value
            label = assistant_message.split()[-1]  # Extract label
            data.append((user_message, label))

    return data




In [15]:
# Assuming you have an instance of OpenAIAssistantManager as assistant_manager
lines = read_csv('data/009-1.csv')

In [29]:
lines

['Good morning class, today we are going to learn about nouns.',
 'A noun is a word that represents a person, place, thing, or idea.',
 'Can anyone give me an example of a noun?',
 "That's right, 'dog' is a noun because it is a thing.",
 "Let's write down some nouns in our notebooks.",
 "Now, let's talk about verbs. Does anyone know what a verb is?",
 'A verb is a word that describes an action, occurrence, or state of being.',
 'Can someone give me an example of a verb?',
 "Great example, 'run' is a verb because it is an action.",
 "Now, let's write down some verbs in our notebooks.",
 'Next, we are going to learn about adjectives.',
 'An adjective is a word that describes a noun.',
 'Can someone give me an example of an adjective?',
 "Exactly, 'beautiful' is an adjective because it describes a noun.",
 "Let's write down some adjectives in our notebooks.",
 'Now we are going to form sentences using nouns, verbs, and adjectives.',
 'A sentence is a group of words that expresses a comple

In [17]:
# need to make this a group and feed in x number of lines at a time increasing the beatch size along with the accuracy
messages = process_lines(lines[0:4], assistant_manager)


In [18]:
messages

[('Good morning class, today we are going to learn about nouns.', 'nouns.'),
 ('A noun is a word that represents a person, place, thing, or idea.',
  'idea.'),
 ('Can anyone give me an example of a noun?', 'noun?'),
 ("That's right, 'dog' is a noun because it is a thing.", 'thing.')]

In [19]:
df = pd.DataFrame(messages, columns=["Text", "Label"])

In [20]:
# need to make this interactive and feed in the label and then make it available for download as a csv
df

Unnamed: 0,Text,Label
0,"Good morning class, today we are going to lear...",nouns.
1,"A noun is a word that represents a person, pla...",idea.
2,Can anyone give me an example of a noun?,noun?
3,"That's right, 'dog' is a noun because it is a ...",thing.


In [21]:
# after labels are fed in send message like this WITH THE new labels 

""" 
Great. Here are some more examples of how to classify utterances::

user: Can someone give me an example of a pronoun?
assistant: OTR
user: That's right, 'he' is a pronoun because it can take the place of a noun.
assistant: PRS
user: "You need to keep quiet while someone else is reading."
assistant: REP
user: A pronoun is a word that can take the place of a noun.
assistant: NEU

I am going to provide several more sentences. Only answer with the following labels: OTR, PRS, REP, NEU
"""

# feed in next batch of labels and iterate 

' \nGreat. Here are some more examples of how to classify utterances::\n\nuser: Can someone give me an example of a pronoun?\nassistant: OTR\nuser: That\'s right, \'he\' is a pronoun because it can take the place of a noun.\nassistant: PRS\nuser: "You need to keep quiet while someone else is reading."\nassistant: REP\nuser: A pronoun is a word that can take the place of a noun.\nassistant: NEU\n\nI am going to provide several more sentences. Only answer with the following labels: OTR, PRS, REP, NEU\n'

In [22]:
# function to process the data in batches
def process_lines_in_batches(lines, assistant_manager, batch_size=10):
    batched_data = []
    for i in range(0, len(lines), batch_size):
        batch = lines[i:i + batch_size]
        batched_data.extend(process_lines(batch, assistant_manager))
    return batched_data

lines = read_csv('data/009-1.csv')
batched_messages = process_lines_in_batches(lines, assistant_manager, batch_size=10)
print(batched_messages[:20])

KeyboardInterrupt: 

In [28]:
# file path test file
csv_file_path = 'data/009-1.csv'

# reading in data
lines = read_csv(csv_file_path)

# process data in batches of size 10
batched_messages = process_lines_in_batches(lines, assistant_manager, batch_size=10)

# Display the first few results for testing
print(batched_messages[:5])  # Adjust as needed to check the results

# function to export the data to csv
#def export_to_csv(data, output_file_path):
   # data.to_csv(output_file_path, index=False)


[('Good morning class, today we are going to learn about nouns.', 'nouns.'), ('A noun is a word that represents a person, place, thing, or idea.', 'idea.'), ('Can anyone give me an example of a noun?', 'noun?'), ("That's right, 'dog' is a noun because it is a thing.", 'thing.'), ("Let's write down some nouns in our notebooks.", 'notebooks.')]


In [27]:
def label_data(user_input):
    # This function should take user input and return the labeled data
    # For simplicity, assume the user inputs the correct label directly
    return user_input

def get_predictions_with_context(context, new_data, assistant_manager):
    predictions = []
    for data in new_data:
        prompt = context + f"\n\n{data}"
        print(f"Sending prompt: {prompt}")  # Debugging line
        thread, completed_run = assistant_manager.create_thread_and_run(prompt)
        response_page = assistant_manager.get_response()
        messages = [msg for msg in response_page]
        print(f"Received messages: {messages}")  # Debugging line
        if messages:
            prediction = messages[-1].content[0].text.value.split()[-1]
            predictions.append(prediction)
    return predictions


def interactive_labeling_loop(initial_data, unlabeled_data, assistant_manager):
    context = ""
    batch_size = 5  # Start with 5
    all_labeled_data = []

    # Start with initial data for context
    for data in initial_data:
        label = label_data(data)
        context += f"\n{data}: {label}"
        all_labeled_data.append((data, label))

    # Loop through unlabeled data in batches
    for i in range(0, len(unlabeled_data), batch_size):
        batch = unlabeled_data[i:i + batch_size]
        predictions = get_predictions_with_context(context, batch, assistant_manager)

        # User reviews and corrects predictions
        for data, prediction in zip(batch, predictions):
            print(f"Predicted for '{data}': {prediction}")
            correct_label = label_data(data)
            context += f"\n{data}: {correct_label}"
            all_labeled_data.append((data, correct_label))

        # function to increase batch size based on accuracy or other criteria
        # def increase_batch_size(batch_size, accuracy):
        # if accuracy > 0.9:
        # batch_size += 5
        # if accuracy > 0.95:
        # batch_size += 5
        # return batch_size

        # Increase batch size based on accuracy
        # batch_size = increase_batch_size(batch_size, accuracy)

        # Re-train the model
        # ...

        # Get the accuracy
        # accuracy = get_accuracy()
        # print(f"Accuracy: {accuracy}")

        # Export the labeled data
        # export_to_csv(all_labeled_data, "labeled_data.csv")
        # ...

    return all_labeled_data

# Example usage
# initial_data = [...]  # Your initial 5 sentences
# unlabeled_data = [...]  # The rest of your data
# all_labeled_data = interactive_labeling_loop(initial_data, unlabeled_data, assistant_manager)


In [None]:
# file path test file
csv_file_path = 'data/009-1.csv'

# reading in data
lines = read_csv(csv_file_path)

# process data in batches of size 10
batched_messages = process_lines_in_batches(lines, assistant_manager, batch_size=10)

# Display the first few results for testing
print(batched_messages[:5])  # Adjust as needed to check the results

# function to export the data to csv
#def export_to_csv(data, output_file_path):
# data.to_csv(output_file_path, index=False)
