---
### The main objective is to experiment with OpenAI API  to extract the list of subject of the issues reported in the review.
1. USE GPT4 Generate a list of reviews.
2. For each  Review from our sample dataset we will generate a list of issues with prompt. 
4. We also have a list of curated issues
5. Using embedding we will map reviews generated issue to the curated list of issue. This will help to map reviews+issues
---

In [1]:
import openai
import os
import pandas as pd
from tqdm import tqdm

In [2]:
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

openai.api_key  = os.environ['OPENAI_API_KEY']

In [3]:
# helper function
def get_completion(promt, model='gpt-3.5-turbo'):
    messages =[{"role":"user","content":promt}]
    response = openai.ChatCompletion.create(
        model=model,
        messages = messages,
        temperature = 0
    )
    
    return response.choices[0].message["content"]

### Step 1: Generate 20 sample reviews with ChatGPT:

Output "generated reviews":

In [4]:
generated_reviews = ["Enhance cleanliness in the patient rooms and common areas.",
"Provide more diverse and nutritious meal options for patients.",
"Ensure that all medical staff consistently wear identification badges.",
"Improve the efficiency of the billing process to avoid unnecessary delays.",
"Increase the frequency of nurse rounds, especially during the night.",
"Offer more comprehensive and clear discharge instructions.",
"Upgrade the in-room entertainment and Wi-Fi services for patients.",
"Ensure that all staff adhere to noise reduction protocols, particularly during sleeping hours.",
"Provide more comfortable bedding and pillows for patients.",
"Enhance patient privacy during consultations and examinations.",
"Improve the clarity and frequency of communication regarding treatment plans.",
"Ensure timely and accurate updating of patient records.",
"Increase the availability of specialists for consultations.",
"Provide additional seating and amenities for visitors in patient rooms.",
"Offer more accessible and clear channels for providing patient feedback.",
"Ensure that all patient concerns and complaints are addressed promptly.",
"Improve the availability and accessibility of patient transport within the facility.",
"Enhance the training of staff in handling patient queries and concerns.",
"Improve the signage and wayfinding tools within the hospital to assist patients and visitors.",
"Ensure that all medical equipment is in good working order and is regularly maintained."]

Save reviews into one data frame:

In [5]:
df_reviews = pd.DataFrame(generated_reviews, columns=['review'])

In [6]:
df_reviews.head()

Unnamed: 0,review
0,Enhance cleanliness in the patient rooms and c...
1,Provide more diverse and nutritious meal optio...
2,Ensure that all medical staff consistently wea...
3,Improve the efficiency of the billing process ...
4,"Increase the frequency of nurse rounds, especi..."


### Test a sample prompts

In [7]:
def get_issue(review_i):
    prompt = """ 
            I have reviews, each review reports the area of improvemnet, 
            summarise the review to focus only on issue details,
            return only 4-7 words that indicate the issue only,
            if review has multiple issues,report all issues from the review comma separete, 
            below are examples. 
            Review: "The staff at the reception who took appointments were rude and their attitude needs to be improved."
            issue: rude attitude
            ###
            Review: {}
            issue""".format(review_i)
    return get_completion(prompt)

In [8]:
df_reviews["issue"]=df_reviews['review'].apply(lambda x:get_issue(x))

In [9]:
df_reviews.head(25)

Unnamed: 0,review,issue
0,Enhance cleanliness in the patient rooms and c...,cleanliness
1,Provide more diverse and nutritious meal optio...,"diverse, nutritious meal options"
2,Ensure that all medical staff consistently wea...,identification badges
3,Improve the efficiency of the billing process ...,"efficiency, billing process, unnecessary delays"
4,"Increase the frequency of nurse rounds, especi...",increase frequency nurse rounds
5,Offer more comprehensive and clear discharge i...,"comprehensive, clear discharge instructions"
6,Upgrade the in-room entertainment and Wi-Fi se...,"in-room entertainment, Wi-Fi services"
7,Ensure that all staff adhere to noise reductio...,noise reduction protocols
8,Provide more comfortable bedding and pillows f...,"comfortable bedding, pillows"
9,Enhance patient privacy during consultations a...,patient privacy


### Get The most similar issue. We will use embedding to find similar issues:

In [10]:
import tiktoken
from openai.embeddings_utils import get_embedding,cosine_similarity

In [12]:
reference_df = pd.read_csv("reference_issues.csv")
reference_df.head()

Unnamed: 0,review
0,communication barrier (unable to speak patient...
1,unclear explanation of condition/treatment
2,unclear explanation of medication/side effects
3,healthcare team/department does not communicat...
4,improper handover between shifts


## Embedding model `davinci-001`:

In [13]:
embedding_model="text-similarity-davinci-001"

In [14]:
def get_embedding(text, model):
   text = text.replace("\n", " ")
   return openai.Embedding.create(input = [text], model=model)['data'][0]['embedding']

In [15]:
# This may take a few minutes
df_reviews["issues_embedding"] = df_reviews["issue"].apply(lambda x: get_embedding(str(x), model=embedding_model))


In [16]:
df_reviews.to_csv("generated_reviews_issues_embeddings_reviews.csv",index=False)

### Reference reviews embeddings:


In [17]:
reference_df["issues_embedding"] = reference_df["review"].apply(lambda x: get_embedding(str(x), model=embedding_model))

In [18]:
reference_df.to_csv("reference_issues_embeddings_reviews.csv",index=False)

### Use cosine similarity 

In [20]:
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
# Convert embedding columns to numpy arrays
issue_embeddings = np.stack(df_reviews['issues_embedding'].values)
issue_embeddings_2 = np.stack(reference_df['issues_embedding'].values)

# Compute cosine similarity
similarity_matrix = cosine_similarity(issue_embeddings, issue_embeddings_2)



In [23]:
# Get the index of the best matching Subject-Issue pair for each review
best_match_indices = np.argmax(similarity_matrix, axis=1)

# Assign the best matching Subject-Issue pair to each review
df_reviews['best_matching_issue'] = reference_df.iloc[best_match_indices]['review'].values

In [24]:
df_reviews

Unnamed: 0,review,issue,issues_embedding,best_matching_issue
0,Enhance cleanliness in the patient rooms and c...,cleanliness,"[0.003511534770950675, 0.002744081662967801, -...",Lack of variety
1,Provide more diverse and nutritious meal optio...,"diverse, nutritious meal options","[-0.00585941132158041, 0.012729832902550697, -...",To provide vegetarian options
2,Ensure that all medical staff consistently wea...,identification badges,"[-0.002421565819531679, 0.00709309009835124, -...",irregular appointments
3,Improve the efficiency of the billing process ...,"efficiency, billing process, unnecessary delays","[0.005677062552422285, 0.0036361219827085733, ...",None or unclear explanation of bill breakdown/...
4,"Increase the frequency of nurse rounds, especi...",increase frequency nurse rounds,"[-0.0063506001606583595, 0.005575826857239008,...",had to request nurses for help repeatedly
5,Offer more comprehensive and clear discharge i...,"comprehensive, clear discharge instructions","[-0.008813797496259212, 0.005621498916298151, ...",unclear explanation of medication/side effects
6,Upgrade the in-room entertainment and Wi-Fi se...,"in-room entertainment, Wi-Fi services","[-0.010869619436562061, 0.0025473684072494507,...",Provide books and entertainment
7,Ensure that all staff adhere to noise reductio...,noise reduction protocols,"[0.005337875336408615, 0.006341904401779175, -...",Insufficient wards
8,Provide more comfortable bedding and pillows f...,"comfortable bedding, pillows","[-0.0017200494185090065, 0.0029350428376346827...",Beds/pillows were uncomfortable or not clean
9,Enhance patient privacy during consultations a...,patient privacy,"[-0.005347792990505695, 0.0005724677466787398,...",Patients not shared what to expect during the ...


In [25]:
df_reviews.to_csv("reviews_with_issues_similarity.csv", index=False)

**Similarity Review to issue**:

In [26]:
# This may take a few minutes
df_reviews["review_embedding"] = df_reviews["review"].apply(lambda x: get_embedding(str(x), model=embedding_model))


In [27]:
# Convert embedding columns to numpy arrays
review_embeddings = np.stack(df_reviews['review_embedding'].values)
issue_embeddings_2 = np.stack(reference_df['issues_embedding'].values)

# Compute cosine similarity
similarity_matrix = cosine_similarity(review_embeddings, issue_embeddings_2)
# Get the index of the best matching Subject-Issue pair for each review
best_match_indices = np.argmax(similarity_matrix, axis=1)

# Assign the best matching Subject-Issue pair to each review
df_reviews['best_matching_review_issue'] = reference_df.iloc[best_match_indices]['review'].values

In [30]:
df_reviews[['review', 'issue','best_matching_issue','best_matching_review_issue']]

Unnamed: 0,review,issue,best_matching_issue,best_matching_review_issue
0,Enhance cleanliness in the patient rooms and c...,cleanliness,Lack of variety,Limit number of visitors to allow patients to ...
1,Provide more diverse and nutritious meal optio...,"diverse, nutritious meal options",To provide vegetarian options,To provide vegetarian options
2,Ensure that all medical staff consistently wea...,identification badges,irregular appointments,Should not ask patients to make payment before...
3,Improve the efficiency of the billing process ...,"efficiency, billing process, unnecessary delays",None or unclear explanation of bill breakdown/...,To provide update on payment process (eg. Appl...
4,"Increase the frequency of nurse rounds, especi...",increase frequency nurse rounds,had to request nurses for help repeatedly,Lengthen service of shuttle bus and increase f...
5,Offer more comprehensive and clear discharge i...,"comprehensive, clear discharge instructions",unclear explanation of medication/side effects,More outings and increase variety of activitie...
6,Upgrade the in-room entertainment and Wi-Fi se...,"in-room entertainment, Wi-Fi services",Provide books and entertainment,Lengthen service of shuttle bus and increase f...
7,Ensure that all staff adhere to noise reductio...,noise reduction protocols,Insufficient wards,Limit number of visitors to allow patients to ...
8,Provide more comfortable bedding and pillows f...,"comfortable bedding, pillows",Beds/pillows were uncomfortable or not clean,Limit number of visitors to allow patients to ...
9,Enhance patient privacy during consultations a...,patient privacy,Patients not shared what to expect during the ...,Limit number of visitors to allow patients to ...
