<div align="right">Wanchana Ponthongmak<br>
6136168 RADS/D<br>
RADS611 Advance Modeling</div>

# <center> Sentiment Analysis by Deeplearning

## <center><b>Introduction

<p style="text-indent: 2.5em;">
    One of the most important elements for businesses is to understand what their customers or clients think about the products or services that they create (customer's voice). The business owner will be able to improve their products and services with more cost-effectiveness. Thanks to the advent of online social networks which has produced online customer expression of the products or services. The sentiment analysis is one way to extract customer opinion.
<p style="text-indent: 2.5em;">
    Sentiment analysis is also known as opinion mining is a field within Natural Language Processing (NLP) which builds systems to identify and extract opinions within sentences. Sentiment analysis is widely applied to voice of the customer materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine.
<p style="text-indent: 2.5em;"> 
    This study objective is to understand the opinion of patients in three large university hospitals, including Ramathibodi hospital, Siriraj hospital, and Chulalongkorn hospital and use that knowledge to improve a hospital business through creating a development plan. The patient's comments on social media are retrieved then summarize them into two groups, positive attitude, and negative attitude. The deep learning approaches are used to create a model to predict the patient's opinion.

## <center><b>Methodology

>1.  Patient's comment retrieval from internet
<p style="text-indent: 2.5em;">
    The patient's comment of three hospitals were retrieved from www.honestdocs.co with satisfaction score range from one (low satisfaction) to five (high satisfaction)

>2.  Translate the comments into English
<p style="text-indent: 2.5em;">
    Since the www.honestdocs.co reviews all Thai's hospital, most of the comments are written in Thai. In this study, we aim to analyze the sentences in English. As a result, we used Google cloud translate API to translate Thai's comments into English.

>3.  Preprocessing data
<p style="text-indent: 2.5em;">
    There are xxx steps in order to prepare data to feed into deep learning model
    1) merging database
    2) deduplication
    3) class labeling
    4) class balancing
    5) data preprocessing
        a) lower case
        b) negation handling
        c) lemmatization
        d) punctuation removing
        e) stopword removing
        f) sequence padding

6) data spliting
<p style="text-indent: 2.5em;">

7) pre-train network preparing
<p style="text-indent: 2.5em;">

In [None]:
8) model experiment 

9) model evaluation

10) visualization

## <center><b>Get start

### Required Libraries

In [7]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
import time
import contractions
from nltk.stem import WordNetLemmatizer
from nltk.corpus import stopwords
from keras.preprocessing.sequence import pad_sequences
from keras.preprocessing.text import Tokenizer
from keras.models import Sequential
from keras.layers import Dense, Dropout, Embedding, LSTM, GRU
from keras.wrappers.scikit_learn import KerasClassifier
from keras.initializers import Constant
from sklearn.model_selection import train_test_split
from sklearn.model_selection import RandomizedSearchCV, GridSearchCV
from sklearn.metrics import confusion_matrix
from sklearn.metrics import roc_curve
from sklearn.metrics import roc_auc_score
from sklearn.metrics import precision_recall_curve
from sklearn.metrics import average_precision_score
from sklearn.externals import joblib
from bs4 import BeautifulSoup as bs
import requests
from google.cloud import translate
import os
cwd = os.getcwd()
print(cwd)

C:\Users\GL63\OneDrive\RADS611


## 1) Patient's comment retrieval from internet
## 2) Translate the comments into English

In [8]:
# create google credential key for accessing google cloud API
credential_path = r"data\ID.json"
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = credential_path

In [9]:
# define url to download
url_rama = 'https://www.honestdocs.co/hospitals/ramathibodi-hospital'
url_siri = 'https://www.honestdocs.co/hospitals/siriraj-hospital'
url_chula = 'https://www.honestdocs.co/hospitals/king-chulalongkorn-memorial-hospital'

In [None]:
# create def for downloading patient's comment on website and translate to English
def tran2eng(url):
    # set variable to store comment and score
    comment = []
    score = []
    for i in range(100):
        r = requests.get(url, params=dict(query="web scraping",page=i))
        soup = bs(r.text,'html.parser')
        j = len(soup.find_all('div',{'class':'comments__content'})) # count number of comments
        for i in range(j): 
            comment.append(soup.find_all('div',{'class':'comments__content'})[i].get_text())
            score.append(soup.find_all('span',{'class':'stars star-rating'})[i].attrs['data-score'])
            i +=1
        i +=1
        if len(soup.find_all('div',{'class':'comments__content'})) <= 0:
            break       
    df = pd.DataFrame({'comment': comment, 'score' : score})  
    # Instantiates a client
    translator = translate.Client()
    df['en_com'] = df['comment'].apply(translator.translate, target_language='en').apply(lambda x : x['translatedText'])
    return df

In [None]:
# download and translate comment to dataset
rama = tran2eng(url_rama)
siri = tran2eng(url_siri)
chula = tran2eng(url_chula)

In [None]:
# export data set
rama.to_csv(r"data\rama.csv", sep=';', index=False, encoding='utf-8', chunksize=100)
siri.to_csv(r"data\siri.csv", sep=';', index=False, encoding='utf-8', chunksize=100)
chula.to_csv(r"data\chula.csv", sep=';', index=False, encoding='utf-8', chunksize=100)

In [27]:
# import data set 
    # use for creating quick initial dataset 
rama = pd.read_csv(r"data\rama.csv", sep = ';')
siri = pd.read_csv(r"data\siri.csv", sep = ';')
chula = pd.read_csv(r"data\chula.csv", sep = ';')