# Data Generation
We've built some tools for automating the generation of comments and reviews. This particular workflow uses OpenAI. 

In [5]:
%load_ext autoreload
%autoreload 2

import sys
import os
root_path = os.path.abspath(os.path.join(os.path.dirname(__file__), ".." ))
sys.path.insert(0, root_path)

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [6]:
from src.data_augmentation import data_augmentation

In [8]:
reviews = data_augmentation.generate_openai_review(
    context="You are Dorothy from the wizard of Oz. You are reviewing shoes, and comparing them to your sparkling red ones.",
    n=1,
)
print(reviews)

["Title: A Whirlwind of Comparison: The Ruby Slippers Reign Supreme\n\nReview:\nOh, my! It's Dorothy from the Land of Oz here to share my thoughts on shoes. But let's be honest, none can hold a candle to my iconic sparkling red ruby slippers. So, let's take a whirlwind journey through some of the shoes I've come across, and I'll show you just why my pair is simply unbeatable.\n\nFirst up, we have the classic black pumps. While they may be sleek and versatile, they lack the enchanting allure that my ruby slippers possess. There's just something about the way the light bounces off those sparkling gems that captures the attention of all who see them. They are truly a magical accessory that adds that extra touch of fairy tale wonder.\n\nNext, we venture into the realm of sneakers. Sure, they're comfortable and great for long walks on the yellow brick road, but let's face it, they lack the pizzazz that my ruby slippers bring to the table. There's nothing quite as captivating as the gleaming

Now, we want to leverage this to write reviews for NHS services. To do this we need to prompt accordingly. Below is a generic template prompt which I have written, and saved as a .txt file

> Assistant is a designed to help generate typical, realistic reviews which are similar to those which patients would leave on an NHS website. 
> Assistant is to emulate a reading age of {reading_age}. 
> The review ought to have a {sentiment} sentiment. 
> The review ought to be about a {location}. 
> The review should be sure to mention {topic}.
> The review should be roughly {word_count} words long.
> The review should be sure to describe an experience. 


This is saved in `data_augmentation/prompts_and_text/comment_generation_context_1.txt`

Notice that we have parameterised various elements of this. This will allow us to generate more varied and specific reviews. 

We can use a value dictionary to generate a context prompt then

In [11]:
prompt_parameters = {
    "reading_age": "13",
    "sentiment": "negative",
    "location": "GP Practice",
    "topic": "medication",
    "word_count": "200",
}

context = data_augmentation.generate_parameterised_context(
    "comment_generation_context_1", prompt_parameters=prompt_parameters
)
print(context)

Assistant is a designed to help generate typical, realistic reviews which are similar to those which patients would leave on an NHS website. 
Assistant is to emulate a reading age of 13. 
The review ought to have a negative sentiment. 
The review ought to be about a GP Practice. 
The review should be sure to mention medication.
The review should be roughly 200 words long.
The review should be sure to describe an experience. 


Let's use this context to generate a few reviews

In [14]:
reviews = data_augmentation.generate_openai_review(context=context, n=3)
for r in reviews:
    print(r)
    print("-" * 50)

I recently had a pretty disappointing experience at my local GP practice and I wanted to share it with others. It was my first visit to this particular practice, and I have to say, I was not impressed. 

Firstly, the receptionist seemed incredibly disinterested in helping me. She barely made eye contact and could hardly be bothered to answer my questions. It made me feel unwelcome and unheard, which is not the kind of service you expect when you're visiting a doctor.

Once I finally got in to see the GP, things didn't improve. The doctor seemed rushed and barely gave me a chance to explain my symptoms. Instead, they quickly glanced at my records and prescribed a medication without even bothering to ask if I had any allergies or previous bad reactions. It felt like a one-size-fits-all approach, without any consideration for my personal health history.

To make matters worse, when I went to collect my prescription from the pharmacy, they had made a mistake and given me the wrong medicati

If we change `n` above, we can get more reviews for this given context. However, what we really want to do is to vary the context as well. 

In the `data_augmentation` module, we've provided some sample lists of values to populate the context dictionaries. Here's an example

In [15]:
data_augmentation.LOCATION_VALUE_LIST

['GP Practice', 'hospital', 'dentist', 'care centre']

We can use these value lists to combinatorically generate contexts:

In [18]:
import random

paras = data_augmentation.generate_parameter_dictionaries_combinatorically(
    dictionary_of_parameter_names_to_value_lists={
        "reading_age": data_augmentation.READING_AGE_VALUE_LIST,
        "sentiment": data_augmentation.SENTIMENT_VALUE_LIST,
        "location": data_augmentation.LOCATION_VALUE_LIST,
        "topic": data_augmentation.TOPIC_VALUE_LIST,
        "word_count": data_augmentation.WORD_COUNT_VALUE_LIST,
    }
)

paras_sample = random.sample(list(paras), 5)
[print(p) for p in paras_sample]

{'reading_age': 'adult', 'sentiment': 'very negative', 'location': 'dentist', 'topic': '', 'word_count': '200'}
{'reading_age': '12', 'sentiment': 'slightly negative', 'location': 'care centre', 'topic': '', 'word_count': '50'}
{'reading_age': '8', 'sentiment': 'very positive', 'location': 'dentist', 'topic': '', 'word_count': '200'}
{'reading_age': 'adult', 'sentiment': 'neutral', 'location': 'care centre', 'topic': '', 'word_count': '400'}
{'reading_age': '12', 'sentiment': 'very positive', 'location': 'dentist', 'topic': '', 'word_count': '50'}


[None, None, None, None, None]

OK! So we've printed five of the parameter dictionaries. Next let's look at an example of how we would actually use this. 

## Generation for specific problems
Let's say I want to generate reviews which would be used to help with the complaints model. 

For this I want to suggest a specific list of relevant topics. I also want to remove the positive sentiments from the sentiment list


In [23]:
complaints_params = data_augmentation.generate_parameter_dictionaries_combinatorically(
    dictionary_of_parameter_names_to_value_lists={
        "reading_age": data_augmentation.READING_AGE_VALUE_LIST,
        "sentiment": ["very negative", "slightly negative", "neutral"],
        "location": data_augmentation.LOCATION_VALUE_LIST,
        "topic": [
            "a complaint",
            "fraud",
            "malpractice",
            "harassment",
            "theft",
            "racism",
        ],
        "word_count": data_augmentation.WORD_COUNT_VALUE_LIST,
    }
)
complaints_params = list(complaints_params)
print(len(complaints_params))

864


OK! So now we have 864 different relevant context prompts. We can then use the `n` parameter on the actual comment generation function to get to the number of comment that we want. For the sake of this demo, let's just generate five different reviews from five different contexts. 

In [24]:
for param_set in random.sample(complaints_params, 5):
    context = data_augmentation.generate_parameterised_context(
        base_context_filename="comment_generation_context_1",
        prompt_parameters=param_set,
    )
    reviews += data_augmentation.generate_openai_review(context=context, n=1)

[print(r) for r in reviews]

I recently had a pretty disappointing experience at my local GP practice and I wanted to share it with others. It was my first visit to this particular practice, and I have to say, I was not impressed. 

Firstly, the receptionist seemed incredibly disinterested in helping me. She barely made eye contact and could hardly be bothered to answer my questions. It made me feel unwelcome and unheard, which is not the kind of service you expect when you're visiting a doctor.

Once I finally got in to see the GP, things didn't improve. The doctor seemed rushed and barely gave me a chance to explain my symptoms. Instead, they quickly glanced at my records and prescribed a medication without even bothering to ask if I had any allergies or previous bad reactions. It felt like a one-size-fits-all approach, without any consideration for my personal health history.

To make matters worse, when I went to collect my prescription from the pharmacy, they had made a mistake and given me the wrong medicati

[None, None, None, None, None, None, None, None]

Clearly, the context prompt provided is not sufficient at reducing the literacy level. This is a point to improve. 

When working on a new problem, remember that you can create a new prompt context template in the appropriate folder, with whatever parameters you wish. 

Also remember to register your generated data when you've created it!