# Question Answering

In [None]:
%load_ext autoreload
%autoreload 2

import sys
import os
root_path = os.path.abspath(os.path.join(os.path.dirname(__file__), ".." ))
sys.path.insert(0, root_path)

This approach is build using the same logic as the other modules - going for composition, and trying to make things easy to refactor. We again have a class which does most of the work. Again, we have a little folder full of `.txt` files which we can use to instruct the LLMs we work with. For the example case, I've written one for complaints detection. As always, this is an example and a good prompt would be one developed with Lee.

Let's spin up an instance of our question answering classifier object, and look at that prompt. 

In [None]:
from src.question_answering_approach import question_answering

## Claude:

In [None]:
anth = question_answering.AnthropicClassifier(pre_prompt_name="Complaints_detector_1")
print(anth.pre_prompt)

OK. So now let's test this out by giving it a review to classify

In [None]:
review_to_classify = "I had a horrific experience at the Greenfield Medical Centre recently. I went in for a routine check-up and left feeling violated and unsafe. During my appointment, the receptionist stole my purse from my bag. I only realized it when I got home and found my bag open and my money missing. I couldn't believe that a member of the clinic's staff would stoop so low as to steal from patients. It was a clear case of criminality, and I felt utterly disgusted and violated. This incident has shattered my trust in this practice, and I strongly advise everyone to stay away from Greenfield Medical Centre."
ans = anth.classify_single_review(review=review_to_classify)

print(ans)

So far so good. Importantly, we're getting back a response in the form which we want; a single binary digit. 

Now let's try this on some more data. I'm going to load in some generated data and run it through the model. Note again the housekeeping we need to do; adding the `train test val` split to the df. 

In [None]:
from src.data_ingestion import data_ingestion

dataset_name = "complaints_gen_35turbo_context1_v6_3600"

In [None]:
data_ingestion.add_train_test_val_labels_to_df(dataset_name)

In [None]:
df = data_ingestion.DataRetrieverDatastore(dataset_name).dataset

df.head()

OK now we want to do the analysis on this. 

Again, we've paramaterised this as far as possible. Give the thing lists of the datasets you want analysed. Also, you can say whether you want things balanced or not. You can also specify `n_to_sample` if you don't want to classify the entire dataset - like here! 

Note that, unlike the embeddings approach, we're not re-registering dataframes here. That's because we're not really doing anything computationally intensive and repeatable, we're just getting a classification given a prompt and a review. 

For the thing below, every time it gets a new response for a review it'll print a `.`, a kind of progress bar. 

In [None]:
anth.classify_datasets(
    positive_label_dataset_name_list=[dataset_name],
    negative_label_dataset_name_list=[],
    y_column_name="Is Complaint",
    name_of_column_to_classify="Comment Text",
    train_test_val_label="test",
    n_to_sample=20,
    balance_data=False,
)

In [None]:
print(anth.preds)

Given that we gave it entirely generated complaints, we would have been hoping for a list of all 1's here!


In [None]:
anth.get_assessor()
anth.assessor.get_and_display_confusion_matrix()

Not looking too hot. Remember that:
- this isn't real data
- Have not engineered the prompt *at all* with Lee.

In [None]:
# todo Must do logging here!