# Service Quality Monitoring in Confined Spaces Through Mining Twitter Data
The proposed method comprises of two main tasks.


## Task 1: Aspect Extraction using fine-tuned BERT language model
In this notebook, we use the BERT approach to transform tweets into a vector of words. Then, using a binary classifier, multi-label tweets can be classified into semantically-related groups, i.e., service quality aspects in our application. 

# Setup

In [20]:
import warnings
warnings.filterwarnings("ignore")

from data_from_csv import *
from getting_data_ready import *
from PaddingInputExample import *
from InputExample import *
from InputFeatures import *
from BERT import *

# Training Phase

## Loading Tweets

In [2]:
csv_file = data_from_csv(os.path.join("Data",'SouthernCross.csv'))
data = csv_file.read_csv()
csv_file.report_on_classes()

Safety:  134
CleanlinessView:  133
Information:  146
Service:  739
Comfort:  196
PersonnelCard:  66
Additional:  205


In [3]:
data_ready = getting_data_ready(data, 0.1)
X_train, y_train, X_test, y_test = data_ready.splitting_data()

## Getting Ready the Train Data

In [4]:
X_resampled, y_resampled = data_ready.resampling_data(X_train, y_train)
x_train_resampled = data_ready.resampled_to_table(X_resampled, y_resampled)
data_ready.report_on_resampled_classes()

Safety:  4200
CleanlinessView:  4200
Information:  4200
Service:  6000
Comfort:  4200
PersonnelCard:  4800
Additional:  2400


## Getting Ready the Test Data

In [5]:
test_data = data_ready.to_table(X_test, y_test)

## Bert Model

In [6]:
bert_path = "uncased_L-12_H-768_A-12\\"
output_path = 'bert-output'
bert_class = BERT(bert_path, output_path)




### Training Model

In [7]:
# google-pasta==0.1.6
# pip install gast==0.2.2

In [8]:
bert_class.train_bert(x_train_resampled)



INFO:tensorflow:***** Running training *****
INFO:tensorflow:  Num examples = 16800
INFO:tensorflow:  Batch size = 32
INFO:tensorflow:  Num steps = 525

INFO:tensorflow:Using config: {'_model_dir': 'bert-output', '_tf_random_seed': None, '_save_summary_steps': 500, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 1, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x000001EE226C99C8>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_

INFO:tensorflow:global_step/sec: 0.0382175
INFO:tensorflow:loss = 0.06770197, step = 201 (2616.603 sec)
INFO:tensorflow:global_step/sec: 0.0382161
INFO:tensorflow:loss = 0.056423012, step = 301 (2616.699 sec)
INFO:tensorflow:global_step/sec: 0.0382181
INFO:tensorflow:loss = 0.07670855, step = 401 (2616.560 sec)
INFO:tensorflow:global_step/sec: 0.0381699
INFO:tensorflow:loss = 0.05395896, step = 501 (2619.862 sec)
INFO:tensorflow:Saving checkpoints for 525 into bert-output\model.ckpt.
Instructions for updating:
Use standard file APIs to delete files with this prefix.
INFO:tensorflow:Loss for final step: 0.035924464.
Training took time  3:56:16.682455


### Testing model

In [9]:
test_labels = bert_class.testing_model(test_data)

Beginning Predictions!
Prediction took time  0:00:00.000995
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:num_labels:7;logits:Tensor("loss/BiasAdd:0", shape=(?, 7), dtype=float32);labels:Tensor("loss/Cast:0", shape=(?, 7), dtype=float32)
INFO:tensorflow:**** Trainable Variables ****
mode: infer probabilities: Tensor("loss/Sigmoid:0", shape=(?, 7), dtype=float32)
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from bert-output\model.ckpt-525
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
Classification report: 
               precision    recall  f1-score   support

           0       0.75      0.69      0.72        13
           1       0.67      0.46      0.55        13
           2       0.60      0.60      0.60        15
           3       0.87      0.89      0.88        74
           4       0.64      0.45      0.53        20
           5       0.50      0.14      0.22         7

# Evaluating Phase

In [None]:
def get_classification_report_as_df(report):
    cl_results = report.split()[4:39]
    cl_results_rest = report.split()[39:]
    df = pd.DataFrame(columns=['P','R','F'])
    for i in range(0,len(cl_results),5):
        df.loc[len(df)] = cl_results[i+1:i+4]

    for i in range(0,len(cl_results_rest),6):
        df.loc[len(df)] = cl_results_rest[i+2:i+5]

    return df


def mean_std_from_results(reports_dic):
    mean_result = pd.DataFrame(columns=['P','R','F'])
    std_result = pd.DataFrame(columns=['P','R','F'])
    for j in range(0,len(reports_dic[1])): # for each aspect
        df = pd.DataFrame(columns=['P','R','F'])
        for i in reports_dic: # for each fold
            df.loc[len(df)] = list(reports_dic[i].loc[j])
        df = df.apply(pd.to_numeric)
        mean_result.loc[len(mean_result)] = df.mean()
        std_result.loc[len(std_result)] = df.std()
    return mean_result, std_result

## Loading Tweets

In [17]:
eval_csv_file = data_from_csv(os.path.join("Data",'Flinders.csv'))
eval_data = eval_csv_file.read_csv()
eval_csv_file.report_on_classes()

Safety:  75
CleanlinessView:  21
Information:  194
Service:  948
Comfort:  117
PersonnelCard:  107
Additional:  16


## Evaluating Prediction

In [24]:
eval_data = eval_data.reset_index(drop=True)
eval_labels = bert_class.testing_model(eval_data)

Beginning Predictions!
Prediction took time  0:00:00.000997
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:num_labels:7;logits:Tensor("loss/BiasAdd:0", shape=(?, 7), dtype=float32);labels:Tensor("loss/Cast:0", shape=(?, 7), dtype=float32)
INFO:tensorflow:**** Trainable Variables ****
mode: infer probabilities: Tensor("loss/Sigmoid:0", shape=(?, 7), dtype=float32)
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from bert-output\model.ckpt-525
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
Classification report: 
               precision    recall  f1-score   support

           0       0.86      0.41      0.56        75
           1       0.43      0.29      0.34        21
           2       0.68      0.83      0.75       194
           3       0.92      0.92      0.92       948
           4       0.63      0.43      0.51       117
           5       0.74      0.50      0.59       107