---
### The main objective is to experiment with OpenAI API  for sentiment and topic modelling anlysis.
1. For each Google Review from our sample dataset we will detect sentiment
2. Generrate topic for each review. 
3. **Aggregating topics and sentiments:** Both topics and sentiments are tightly coupled with reviews. We will be aggregating topics and sentiments at clinic level and count the composite keys for each clinic. This final step helps us better understand the granularity of the reviews per clinic and categorizing clinics per topic in an aggregated manner.


---

In [49]:
import openai
import os
import pandas as pd
from tqdm import tqdm

In [50]:
api_key = os.getenv('gpt_api')

In [51]:
openai.api_key=api_key

In [52]:
# helper function
def get_completion(promt, model='gpt-3.5-turbo'):
    messages =[{"role":"user","content":promt}]
    response = openai.ChatCompletion.create(
        model=model,
        messages = messages,
        temperature = 0
    )
    
    return response.choices[0].message["content"]

#### Sentiment Analysis test promts:

In [11]:
review = """
        Thank you all the nurses in duty in ward 921..Since \
        the 1st day of my hospitalised 14Apr-17Apr was attended \
        with patient and courteous, helpful,friendly, caring and \
        attend to my needs fast…Really Appreciated… \
        I would like to compliment on their good job. Thumbs up
"""

In [15]:
promt = f"""
What is the sentiment of the following product review.
return only one word that indicate the sentiment.
Review text : '''{review}'''
"""
response = get_completion(promt)
print(response)

Positive


In [13]:
promt_emotion = f"""
Identify a list of emotions that the writer of the following review is expressing. \
Include no more than  five items in the list. \
Format your answer as a list of lower-case words separated by comas.
Review text : '''{review}'''
"""
response = get_completion(promt_emotion)
print(response)

appreciative, grateful, impressed, satisfied, content


#### Sentiment Analysis for Raffles Clinic Reviews:

In [8]:
df_reviews = pd.read_csv("data/raffles_top5_reviews.csv")

In [9]:
df_reviews.head()

Unnamed: 0,place_id,reviews,lat,lng
0,ChIJbZHV_a892jERNhrm-bgNZXE,The staff is very kind and patient. Unfortunat...,1.372692,103.94959
1,ChIJbZHV_a892jERNhrm-bgNZXE,"Tested positive for dengue, and the first two ...",1.372692,103.94959
2,ChIJbZHV_a892jERNhrm-bgNZXE,I wasn't very impressed with the doctor during...,1.372692,103.94959
3,ChIJbZHV_a892jERNhrm-bgNZXE,I called to only ask if I can get a medicine b...,1.372692,103.94959
4,ChIJbZHV_a892jERNhrm-bgNZXE,I wanted to scan for registration but the staf...,1.372692,103.94959


Iterating over every review:

In [21]:
sentiment_res = []
for review in tqdm(df_reviews.reviews.to_list()):
    promt_i = f"""
            What is the sentiment of the following product review.
            return only one word that indicate the sentiment.
            Review text : '''{review}'''
            """
    response_i = get_completion(promt_i)
    sentiment_res.append(response_i)
    

100%|██████████| 87/87 [01:31<00:00,  1.05s/it]


In [36]:
len(sentiment_res)

87

In [42]:
sentiment_res_processed = [val.lower() for val in sentiment_res]
sentiment_res_processed = [val.replace('.','') for val in sentiment_res_processed]

In [45]:
df_reviews['sentiment'] = sentiment_res_processed
df_reviews.head()

Unnamed: 0,place_id,reviews,lat,lng,sentiment
0,ChIJbZHV_a892jERNhrm-bgNZXE,The staff is very kind and patient. Unfortunat...,1.372692,103.94959,neutral
1,ChIJbZHV_a892jERNhrm-bgNZXE,"Tested positive for dengue, and the first two ...",1.372692,103.94959,negative
2,ChIJbZHV_a892jERNhrm-bgNZXE,I wasn't very impressed with the doctor during...,1.372692,103.94959,negative
3,ChIJbZHV_a892jERNhrm-bgNZXE,I called to only ask if I can get a medicine b...,1.372692,103.94959,negative
4,ChIJbZHV_a892jERNhrm-bgNZXE,I wanted to scan for registration but the staf...,1.372692,103.94959,negative


In [46]:
df_reviews.sentiment.value_counts()

negative         51
positive         25
neutral           5
horrible          2
disgusted         1
unpleasant        1
abysmal           1
disappointing     1
Name: sentiment, dtype: int64

In [48]:
df_reviews.to_csv("data/raffles_top5_reviews_sentiments.csv", index=False)

---
### Topic Moddeling:

In [53]:
df_reviews = pd.read_csv("data/raffles_top5_reviews_sentiments.csv")

In [55]:
df_reviews.head(3)

Unnamed: 0,place_id,reviews,lat,lng,sentiment
0,ChIJbZHV_a892jERNhrm-bgNZXE,The staff is very kind and patient. Unfortunat...,1.372692,103.94959,neutral
1,ChIJbZHV_a892jERNhrm-bgNZXE,"Tested positive for dengue, and the first two ...",1.372692,103.94959,negative
2,ChIJbZHV_a892jERNhrm-bgNZXE,I wasn't very impressed with the doctor during...,1.372692,103.94959,negative


In [86]:
df_reviews.reviews = df_reviews.reviews.astype(str)
all_reviews_text_part1 = ' '.join(df_reviews.reviews.to_list()[:44])
all_reviews_text_part2 = ' '.join(df_reviews.reviews.to_list()[44:80])


In [87]:
len(all_reviews_text_part1)

17532

In [88]:
len(df_reviews.reviews.to_list())

87

### Topic modelinng model's has maximum context length 
1. This context length is 4097 tokens (gpt-3.5-turbo). Therefore I will split all reviews into two batches and call api to get topics separetly for each batch. Then I manually review the topics, some topics might be very similar to each other ( "Long queue/wait time" and "Long wait times"). Then I will manually compile a list of relevant topics. Finally for each review we will call api to loop through each topic and verify if a topic is in a review. 

In [89]:
prompt = f"""
Determine 12 topics that are being discussed in the \
following text, which is delimited by triple backticks.

Make each item one or two words long.

Format your response as a list of items separated by commas.

Text sample: '''{all_reviews_text_part1}'''
"""
response = get_completion(prompt)
print(response)
response.split(sep=',')

1. Staff kindness and patience
2. Long queue/wait time
3. Health screening/check-up
4. Positive dengue test
5. Doctor and receptionist behavior
6. Evening consults
7. Rude customer service
8. Personal information privacy
9. Service quality decline
10. Dental treatment
11. Antibiotics prescription
12. Understaffing and long wait times


['1. Staff kindness and patience\n2. Long queue/wait time\n3. Health screening/check-up\n4. Positive dengue test\n5. Doctor and receptionist behavior\n6. Evening consults\n7. Rude customer service\n8. Personal information privacy\n9. Service quality decline\n10. Dental treatment\n11. Antibiotics prescription\n12. Understaffing and long wait times']

In [90]:
prompt = f"""
Determine 12 topics that are being discussed in the \
following text, which is delimited by triple backticks.

Make each item one or two words long.

Format your response as a list of items separated by commas.

Text sample: '''{all_reviews_text_part2}'''
"""
response = get_completion(prompt)
print(response)
response.split(sep=',')

1. Lack of doctors at Raffles Medical Changi T3
2. Unpleasant experience with flu vaccination appointment
3. Inefficient registration process
4. Long wait times
5. Slow medication dispensing
6. Rude staff
7. Inexperienced staff at the counter
8. Issues with online appointment system
9. Misdiagnosis
10. Positive experiences with certain doctors
11. Lack of phone call response
12. Varying wait times and experiences at different Raffles Medical clinics


['1. Lack of doctors at Raffles Medical Changi T3\n2. Unpleasant experience with flu vaccination appointment\n3. Inefficient registration process\n4. Long wait times\n5. Slow medication dispensing\n6. Rude staff\n7. Inexperienced staff at the counter\n8. Issues with online appointment system\n9. Misdiagnosis\n10. Positive experiences with certain doctors\n11. Lack of phone call response\n12. Varying wait times and experiences at different Raffles Medical clinics']

In [91]:
# Compile Topics list:
topics = [
         "Long queue/wait time",
          "Doctor/Staff/Receptionist behavior",
          "Rude Customer service",
          "Misdiagnosis",
          "vaccination",
          "Dental treatment",
          "Health screening/check-up",
          "Service quality"
         ]

In [93]:
review_i = df_reviews.reviews.to_list()[0]
prompt = f""" Determine whether each item in the following list of \
topics is a topic in the text below, which
is delimited with triple backticks.
Give your answer as list with 0 or 1 for each topic.\
List of topics: '''{", ".join(topics)}'''
Text sample: '''{review_i}''' 
"""


In [94]:
response = get_completion(prompt)
print(response)

[1, 0, 0, 0, 0, 0, 1, 0]


In [101]:
list_topics = []
topics_str = ", ".join(topics)
for review in tqdm(df_reviews.reviews.to_list()):
    promt_i = f"""
            Determine whether each item in the following list of \
            topics is a topic in the text below, which
            is delimited with triple backticks.

            Give your answer as list with 0 or 1 for each topic.\

            List of topics: '''{topics_str}'''
            Text sample: '''{review}'''
            """
    response_i = get_completion(promt_i)
    list_topics.append(response_i)

100%|██████████| 87/87 [04:45<00:00,  3.28s/it]


In [103]:
len(list_topics)

87

In [102]:
list_topics[:5]

['[1, 0, 0, 0, 0, 0, 1, 1]',
 '[1, 1, 1, 0, 0, 0, 0, 1]',
 'Long queue/wait time: 1\nDoctor/Staff/Receptionist behavior: 1\nRude Customer service: 0\nMisdiagnosis: 1\nVaccination: 0\nDental treatment: 0\nHealth screening/check-up: 0\nService quality: 1',
 '[1, 1, 1, 0, 0, 0, 0, 1]',
 'Long queue/wait time: 0\nDoctor/Staff/Receptionist behavior: 1\nRude Customer service: 0\nMisdiagnosis: 0\nVaccination: 0\nDental treatment: 0\nHealth screening/check-up: 0\nService quality: 0']

In [116]:
# the responce format varies.
# extract digits only and separete with coma:

list_topics_int = []
for val in list_topics:
    val = val.replace('[','')
    val = val.replace(']','')
    val = val.replace(',','')
    num_ids = [int(x) for x in val.split() if x.isdigit()] 
    if len(num_ids)==1:
        num_ids=[0]*8
        list_topics_int.append(num_ids)
    else :
        list_topics_int.append(num_ids)

In [118]:
list_topics_int[:10]

[[1, 0, 0, 0, 0, 0, 1, 1],
 [1, 1, 1, 0, 0, 0, 0, 1],
 [1, 1, 0, 1, 0, 0, 0, 1],
 [1, 1, 1, 0, 0, 0, 0, 1],
 [0, 1, 0, 0, 0, 0, 0, 0],
 [1, 1, 1, 0, 0, 0, 0, 1],
 [0, 1, 0, 0, 0, 0, 0, 1],
 [0, 1, 1, 0, 0, 0, 0, 1],
 [0, 1, 0, 0, 0, 0, 1, 1],
 [1, 1, 1, 0, 0, 0, 0, 1]]

#### Aggregate topics and sentiments on clinics level

In [131]:
# 1. Creating a composite key of topic name and sentiment.
# This is because we are counting frequency of this combination.
col_topics = list()
review_sentiment = df_reviews.sentiment.to_list()
for val, sentiment in zip(list_topics_int, review_sentiment):
    topics_col = [str(topic_i)+"_"+str(sentiment) for topic_i, ids in zip(topics, val) if ids==1]
    col_topics.append(topics_col)
    
df_reviews['TopicSentiment'] = col_topics
df_reviews.head(3)

Unnamed: 0,place_id,reviews,lat,lng,sentiment,TopicSentiment
0,ChIJbZHV_a892jERNhrm-bgNZXE,The staff is very kind and patient. Unfortunat...,1.372692,103.94959,neutral,"[Long queue/wait time_neutral, Health screenin..."
1,ChIJbZHV_a892jERNhrm-bgNZXE,"Tested positive for dengue, and the first two ...",1.372692,103.94959,negative,"[Long queue/wait time_negative, Doctor/Staff/R..."
2,ChIJbZHV_a892jERNhrm-bgNZXE,I wasn't very impressed with the doctor during...,1.372692,103.94959,negative,"[Long queue/wait time_negative, Doctor/Staff/R..."


In [140]:
# Create place id group
place_id_DF = df_reviews.groupby('place_id').agg({'TopicSentiment': 'sum', 'reviews': lambda x: ' '.join(x)}).reset_index()

In [141]:
place_id_DF.head()

Unnamed: 0,place_id,TopicSentiment,reviews
0,ChIJDQ5i_q892jEREshja_0JOeU,"[Doctor/Staff/Receptionist behavior_positive, ...","Came here twice. Staff at the dental side, if ..."
1,ChIJL7ZNaFQ82jERG0nE6vQksRM,"[Long queue/wait time_positive, Doctor/Staff/R...",Dr George is one of the best Dr that listened ...
2,ChIJL_NVReMV2jERUxYmGmVoKOQ,"[Long queue/wait time_negative, Rude Customer ...",Their queue system sucks . Bottleneck is not r...
3,ChIJNbUBTyI92jERJ9Cx-rO_VPU,"[Doctor/Staff/Receptionist behavior_positive, ...",I came for my vaccination and it was a breeze....
4,ChIJSetFSDYZ2jERvlUSx0ed-gQ,"[Long queue/wait time_negative, Doctor/Staff/R...",I went to the clinic at 8:30 when it is suppos...


In [142]:
# Count appreances of topics-sentiment combo for product
from collections import Counter

place_id_DF['TopTopics'] = place_id_DF['TopicSentiment'].apply(Counter)
place_id_DF.head()

Unnamed: 0,place_id,TopicSentiment,reviews,TopTopics
0,ChIJDQ5i_q892jEREshja_0JOeU,"[Doctor/Staff/Receptionist behavior_positive, ...","Came here twice. Staff at the dental side, if ...",{'Doctor/Staff/Receptionist behavior_positive'...
1,ChIJL7ZNaFQ82jERG0nE6vQksRM,"[Long queue/wait time_positive, Doctor/Staff/R...",Dr George is one of the best Dr that listened ...,"{'Long queue/wait time_positive': 1, 'Doctor/S..."
2,ChIJL_NVReMV2jERUxYmGmVoKOQ,"[Long queue/wait time_negative, Rude Customer ...",Their queue system sucks . Bottleneck is not r...,"{'Long queue/wait time_negative': 4, 'Rude Cus..."
3,ChIJNbUBTyI92jERJ9Cx-rO_VPU,"[Doctor/Staff/Receptionist behavior_positive, ...",I came for my vaccination and it was a breeze....,{'Doctor/Staff/Receptionist behavior_positive'...
4,ChIJSetFSDYZ2jERvlUSx0ed-gQ,"[Long queue/wait time_negative, Doctor/Staff/R...",I went to the clinic at 8:30 when it is suppos...,"{'Long queue/wait time_negative': 1, 'Doctor/S..."


In [144]:
TOP_TOPICS=3
# Sorting topics-sentiment combo based on their appearance
place_id_DF['TopTopics'] = place_id_DF['TopTopics'].apply(lambda x: sorted(x, key=x.get, reverse=True))

# Select Top k topics-sentiment combo for each product/review
place_id_DF['TopTopics'] = place_id_DF['TopTopics'].apply(lambda x: x[:TOP_TOPICS])

# Sneak peek
place_id_DF.head()

Unnamed: 0,place_id,TopicSentiment,reviews,TopTopics
0,ChIJDQ5i_q892jEREshja_0JOeU,"[Doctor/Staff/Receptionist behavior_positive, ...","Came here twice. Staff at the dental side, if ...","[Doctor/Staff/Receptionist behavior_positive, ..."
1,ChIJL7ZNaFQ82jERG0nE6vQksRM,"[Long queue/wait time_positive, Doctor/Staff/R...",Dr George is one of the best Dr that listened ...,"[Doctor/Staff/Receptionist behavior_positive, ..."
2,ChIJL_NVReMV2jERUxYmGmVoKOQ,"[Long queue/wait time_negative, Rude Customer ...",Their queue system sucks . Bottleneck is not r...,"[Long queue/wait time_negative, Service qualit..."
3,ChIJNbUBTyI92jERJ9Cx-rO_VPU,"[Doctor/Staff/Receptionist behavior_positive, ...",I came for my vaccination and it was a breeze....,"[Doctor/Staff/Receptionist behavior_positive, ..."
4,ChIJSetFSDYZ2jERvlUSx0ed-gQ,"[Long queue/wait time_negative, Doctor/Staff/R...",I went to the clinic at 8:30 when it is suppos...,"[Long queue/wait time_positive, Service qualit..."


In [146]:
for clinic in place_id_DF.place_id.tolist():
    print(f'QUESTION: {clinic}')
    print('TOPICS:')
    [print(val) for val in place_id_DF[place_id_DF.place_id==clinic].TopTopics.tolist()[0]]
    print('---')

QUESTION: ChIJDQ5i_q892jEREshja_0JOeU
TOPICS:
Doctor/Staff/Receptionist behavior_positive
Dental treatment_positive
Service quality_positive
---
QUESTION: ChIJL7ZNaFQ82jERG0nE6vQksRM
TOPICS:
Doctor/Staff/Receptionist behavior_positive
Service quality_positive
Long queue/wait time_positive
---
QUESTION: ChIJL_NVReMV2jERUxYmGmVoKOQ
TOPICS:
Long queue/wait time_negative
Service quality_negative
Rude Customer service_negative
---
QUESTION: ChIJNbUBTyI92jERJ9Cx-rO_VPU
TOPICS:
Doctor/Staff/Receptionist behavior_positive
Service quality_positive
vaccination_positive
---
QUESTION: ChIJSetFSDYZ2jERvlUSx0ed-gQ
TOPICS:
Long queue/wait time_positive
Service quality_positive
Long queue/wait time_negative
---
QUESTION: ChIJWfa9h6Qi2jERlrgVLn4JAe8
TOPICS:
Rude Customer service_negative
Service quality_negative
Long queue/wait time_negative
---
QUESTION: ChIJXTiNyrIi2jER1vY4SDJ1gCo
TOPICS:
Doctor/Staff/Receptionist behavior_positive
Service quality_positive
Doctor/Staff/Receptionist behavior_negative


Join Aggregated topic-sentiments with locations:

In [150]:
place_id_DF.shape

(18, 4)

In [152]:
df_reviews.columns

Index(['place_id', 'reviews', 'lat', 'lng'], dtype='object')

In [154]:
df_reviews = pd.read_csv("data/raffles_top5_reviews.csv")
df_reviews=df_reviews[['place_id', 'lat', 'lng']]
df_reviews.drop_duplicates(inplace=True)
df_join = pd.merge(place_id_DF, df_reviews, on='place_id', how='left')

### Save results:

In [155]:
df_join.head()

Unnamed: 0,place_id,TopicSentiment,reviews,TopTopics,lat,lng
0,ChIJDQ5i_q892jEREshja_0JOeU,"[Doctor/Staff/Receptionist behavior_positive, ...","Came here twice. Staff at the dental side, if ...","[Doctor/Staff/Receptionist behavior_positive, ...",1.372618,103.949543
1,ChIJL7ZNaFQ82jERG0nE6vQksRM,"[Long queue/wait time_positive, Doctor/Staff/R...",Dr George is one of the best Dr that listened ...,"[Doctor/Staff/Receptionist behavior_positive, ...",1.36648,103.964491
2,ChIJL_NVReMV2jERUxYmGmVoKOQ,"[Long queue/wait time_negative, Rude Customer ...",Their queue system sucks . Bottleneck is not r...,"[Long queue/wait time_negative, Service qualit...",1.406028,103.902254
3,ChIJNbUBTyI92jERJ9Cx-rO_VPU,"[Doctor/Staff/Receptionist behavior_positive, ...",I came for my vaccination and it was a breeze....,"[Doctor/Staff/Receptionist behavior_positive, ...",1.343062,103.953104
4,ChIJSetFSDYZ2jERvlUSx0ed-gQ,"[Long queue/wait time_negative, Doctor/Staff/R...",I went to the clinic at 8:30 when it is suppos...,"[Long queue/wait time_positive, Service qualit...",1.306457,103.90465


In [156]:
df_join.shape

(18, 6)

In [157]:
df_join.to_csv("data/review_sentiment_topics.csv", index=False)