# Sentiment Analysis in Mandarin on Food Delivery Reviews

---

[Article](https://news.machinelearning.sg/posts/sentiment_analysis_in_mandarin_with_xlnet) | [Github](https://github.com/eugenesiow/practical-ml/blob/master/notebooks/Sentiment_Analysis_Mandarin_Food_Reviews.ipynb) | More Notebooks @ [eugenesiow/practical-ml](https://github.com/eugenesiow/practical-ml)

---



Notebook to train a mandarin XLNet model to perform sentiment analysis. The [dataset](https://github.com/SophonPlus/ChineseNlpCorpus#%E6%83%85%E6%84%9F%E8%A7%82%E7%82%B9%E8%AF%84%E8%AE%BA-%E5%80%BE%E5%90%91%E6%80%A7%E5%88%86%E6%9E%90) used is the unbalanced WAIMAI_10K (10,000 food delivery reviews from a food delivery platform in China). The dataset has binary labels: **`postive`** or **`negative`**. There is no published  state-of-the-art model that we know of on this dataset, however, there have been attempts using [BERT](https://github.com/BruceJust/Sentiment-classification-by-BERT) and sklearn's [SVM-SVC](https://www.programmersought.com/article/48933926195/) which report accuracy of about 89% and 85% respectively. We will train a state-of-the-art model with accuracy of 91.5% and an F1-score of 87.1%. Note that F1-score is a better measure as the dataset is unbalanced, but as the 2 previous attempts use accuracy as the measure of reporting, therefore we also report accuracy score for comparison.

The notebook is structured as follows:
* Setting up the GPU Environment
* Getting Data
* Training and Testing the Model
* Using the Model (Running Inference)

## Task Description

> Sentiment analysis is the task of classifying the polarity of a given text.

# Setting up the GPU Environment

#### Ensure we have a GPU runtime

If you're running this notebook in Google Colab, select `Runtime` > `Change Runtime Type` from the menubar. Ensure that `GPU` is selected as the `Hardware accelerator`. This will allow us to use the GPU to train the model subsequently.

#### Install Dependencies and Restart Runtime

In [1]:
!pip install -q transformers
!pip install -q simpletransformers

[K     |████████████████████████████████| 1.5MB 9.2MB/s 
[K     |████████████████████████████████| 890kB 15.3MB/s 
[K     |████████████████████████████████| 2.9MB 46.9MB/s 
[?25h  Building wheel for sacremoses (setup.py) ... [?25l[?25hdone
[K     |████████████████████████████████| 204kB 8.9MB/s 
[K     |████████████████████████████████| 1.8MB 13.7MB/s 
[K     |████████████████████████████████| 7.4MB 14.5MB/s 
[K     |████████████████████████████████| 317kB 52.3MB/s 
[K     |████████████████████████████████| 1.1MB 59.7MB/s 
[K     |████████████████████████████████| 51kB 450kB/s 
[K     |████████████████████████████████| 71kB 11.7MB/s 
[K     |████████████████████████████████| 133kB 61.4MB/s 
[K     |████████████████████████████████| 102kB 16.3MB/s 
[K     |████████████████████████████████| 163kB 50.6MB/s 
[K     |████████████████████████████████| 81kB 12.1MB/s 
[K     |████████████████████████████████| 4.5MB 45.9MB/s 
[K     |████████████████████████████████| 112kB 63

You might see the error `ERROR: google-colab X.X.X has requirement ipykernel~=X.X, but you'll have ipykernel X.X.X which is incompatible` after installing the dependencies. **This is normal** and caused by the `simpletransformers` library.

The **solution** to this will be to **reset the execution environment** now. Go to the menu `Runtime` > `Restart runtime` then continue on from the next section to download and process the data.

# Getting Data

#### Pulling the data from Github

We pull the data from the [ChineseNlpCorpus](https://github.com/SophonPlus/ChineseNlpCorpus#%E6%83%85%E6%84%9F%E8%A7%82%E7%82%B9%E8%AF%84%E8%AE%BA-%E5%80%BE%E5%90%91%E6%80%A7%E5%88%86%E6%9E%90) github repository to a `pandas` dataframe. We then display the top few rows to check if it has been downloaded correctly with `.head()`.

In [2]:
import pandas as pd
data_df = pd.read_csv('https://raw.githubusercontent.com/SophonPlus/ChineseNlpCorpus/master/datasets/waimai_10k/waimai_10k.csv', usecols=['label','review'])
data_df = data_df.rename(columns={'review': 'text', 'label': 'labels'})
data_df.head()

Unnamed: 0,labels,text
0,1,很快，好吃，味道足，量大
1,1,没有送水没有送水没有送水
2,1,非常快，态度好。
3,1,方便，快捷，味道可口，快递给力
4,1,菜味道很棒！送餐很及时！


We split the dataset into a training set (80% of the samples) and a test set (20% of the samples). We also choose a fixed value for `fixed_random_state` so that this split is deterministic (always the same samples). 

We can then check the dataset properties (6,387 train negative, 3,302 train positive, 1,600 test negative and 798 test positive, an unbalanced dataset). The label **`0`** is the **`negative`** polarity class while **`1`** is the **`positive`** polarity class.

In [4]:
from sklearn.model_selection import train_test_split
fixed_random_state = 5
train_df, test_df = train_test_split(data_df, test_size=0.2, random_state=fixed_random_state)

data = [[train_df.labels.value_counts()[0], test_df.labels.value_counts()[0]], 
        [train_df.labels.value_counts()[1], test_df.labels.value_counts()[1]]]

# Prints out the dataset sizes of train and test sets per label.
pd.DataFrame(data, columns=["Train", "Test"])

Unnamed: 0,Train,Test
0,6387,1600
1,3202,798


# Training and Testing the Model

#### Set up the Training Arguments

We set up the training arguments. Here we train to 2 epochs to reduce the training time as much as possible, the BERT article on this dataset trained to 10 epochs but didn't see much gain in overall accuracy. It is also possible to split out a development set and use that to evaluate for a better model, this 10k dataset is quite small though and we are confident we can get good accuracy with just 2 epochs (we are impatient).

In [12]:
train_args = {
    'reprocess_input_data': True,
    'overwrite_output_dir': True,
    'sliding_window': True,
    'max_seq_length': 64,
    'num_train_epochs': 2,
    'train_batch_size': 128,
    'fp16': True,
    'output_dir': '/outputs/',
}

#### Train the Model

Once we have setup the `train_args` dictionary, the next step would be to train the model. We use the pre-trained mandarin XLNet model, [`hfl/chinese-xlnet-mid`](https://huggingface.co/hfl/chinese-xlnet-mid) from the awesome [Hugging Face Transformers](https://github.com/huggingface/transformers) library and model repository as the base and use the [Simple Transformers library](https://simpletransformers.ai/docs/classification-models/) on top of it to make it so we can train the classification model with just 2 lines of code. The pre-trained mandarin model base we use is by [HFL](https://huggingface.co/hfl) with more details at this [repository](https://github.com/ymcui/Chinese-XLNet).

[XLNet](https://arxiv.org/pdf/1906.08237.pdf) is an auto-regressive language model which outputs the joint probability of a sequence of tokens based on the transformer architecture with recurrence. Although its also bigger than BERT and has a (slightly) different architecture, it's change in training objective is probably the biggest contribution. It's training objective is to predict each word in a sequence using any combination of other words in that sequence which seems to perform better on ambiguous contexts.

In [13]:
from simpletransformers.classification import ClassificationModel
import pandas as pd
import logging
import sklearn

logging.basicConfig(level=logging.DEBUG)
transformers_logger = logging.getLogger('transformers')
transformers_logger.setLevel(logging.WARNING)

# We use the XLNet base cased pre-trained model.
model = ClassificationModel('xlnet', 'hfl/chinese-xlnet-mid', num_labels=2, args=train_args) 

# Train the model, there is no development or validation set for this dataset 
# https://simpletransformers.ai/docs/tips-and-tricks/#using-early-stopping
model.train_model(train_df)

# Evaluate the model in terms of accuracy score
result, model_outputs, wrong_predictions = model.eval_model(test_df, acc=sklearn.metrics.f1_score)

DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): huggingface.co:443
DEBUG:urllib3.connectionpool:https://huggingface.co:443 "HEAD /hfl/chinese-xlnet-mid/resolve/main/config.json HTTP/1.1" 200 0
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): huggingface.co:443
DEBUG:urllib3.connectionpool:https://huggingface.co:443 "HEAD /hfl/chinese-xlnet-mid/resolve/main/pytorch_model.bin HTTP/1.1" 302 0
Some weights of the model checkpoint at hfl/chinese-xlnet-mid were not used when initializing XLNetForSequenceClassification: ['lm_loss.weight', 'lm_loss.bias']
- This IS expected if you are initializing XLNetForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing XLNetForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a B

HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=9589.0), HTML(value='')))

INFO:simpletransformers.classification.classification_model: 10105 features created from 9589 samples.





HBox(children=(HTML(value='Epoch'), FloatProgress(value=0.0, max=2.0), HTML(value='')))

HBox(children=(HTML(value='Running Epoch 0 of 2'), FloatProgress(value=0.0, max=79.0), HTML(value='')))








HBox(children=(HTML(value='Running Epoch 1 of 2'), FloatProgress(value=0.0, max=79.0), HTML(value='')))





INFO:simpletransformers.classification.classification_model: Training of xlnet model complete. Saved to /outputs/.
INFO:simpletransformers.classification.classification_model: Converting to features started. Cache is not used.
INFO:simpletransformers.classification.classification_model: Sliding window enabled


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=2398.0), HTML(value='')))

INFO:simpletransformers.classification.classification_model: 2398 features created from 2398 samples.





HBox(children=(HTML(value='Running Evaluation'), FloatProgress(value=0.0, max=316.0), HTML(value='')))




INFO:simpletransformers.classification.classification_model:{'mcc': 0.8071975326521607, 'tp': 693, 'tn': 1500, 'fp': 100, 'fn': 105, 'acc': 0.8711502199874293, 'eval_loss': 0.23408312606919982}


The F1-score for the model is **87.1%**.

As mentioned earlier, the class distribution (the number of **`positive`** vs **`negative`**) is not balanced (not evenly distributed), so [F1-score is a better accuracy measure](https://sebastianraschka.com/faq/docs/computing-the-f1-score.html).

Previous articles, however, published accuracy on the the test/validation set. Hence, we will also calculate the accuracy score of our model.

In [14]:
result, model_outputs, wrong_predictions = model.eval_model(test_df, acc=sklearn.metrics.accuracy_score)

INFO:simpletransformers.classification.classification_model: Converting to features started. Cache is not used.
INFO:simpletransformers.classification.classification_model: Sliding window enabled


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=2398.0), HTML(value='')))

INFO:simpletransformers.classification.classification_model: 2398 features created from 2398 samples.





HBox(children=(HTML(value='Running Evaluation'), FloatProgress(value=0.0, max=316.0), HTML(value='')))




INFO:simpletransformers.classification.classification_model:{'mcc': 0.8071975326521607, 'tp': 693, 'tn': 1500, 'fp': 100, 'fn': 105, 'acc': 0.914512093411176, 'eval_loss': 0.23408312606919982}


We see that the accuracy score from the model after training for 2 epochs is **91.5%** ('acc': 0.914512093411176).

> We've just trained a new state-of-the-art mandarin sentiment analysis model on the WAIMAI_10K dataset of food delivery reviews!

## Using the Model (Running Inference)

Running the model to do some predictions/inference is as simple as calling `model.predict(input_list)`.

In [18]:
samples = ['送错地方了，态度还不好，豆腐脑撒的哪都是，本次用餐体验很不好', # food was sent to the wrong place and the attitude was bad...
           '很不错，服务非常好，很认真'] # really quite good, service was very good, very sincere
predictions, _ = model.predict(samples)
label_dict = {0: 'negative', 1: 'positive'}
for idx, sample in enumerate(samples):
  print('{} - {}: {}'.format(idx, label_dict[predictions[idx]], sample))

INFO:simpletransformers.classification.classification_model: Converting to features started. Cache is not used.
INFO:simpletransformers.classification.classification_model: Sliding window enabled


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=2.0), HTML(value='')))

INFO:simpletransformers.classification.classification_model: 2 features created from 2 samples.





HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=1.0), HTML(value='')))


0 - negative: 送错地方了，态度还不好，豆腐脑撒的哪都是，本次用餐体验很不好
1 - positive: 很不错，服务非常好，很认真


We can connect to Google Drive with the following code to save any files you want to persist. You can also click the `Files` icon on the left panel and click `Mount Drive` to mount your Google Drive.

The root of your Google Drive will be mounted to `/content/drive/My Drive/`. If you have problems mounting the drive, you can check out this [tutorial](https://towardsdatascience.com/downloading-datasets-into-google-drive-via-google-colab-bcb1b30b0166).

In [None]:
from google.colab import drive
drive.mount('/content/drive/')

You can move the model checkpount files which are saved in the `/outputs/` directory to your Google Drive.

In [None]:
import shutil
shutil.move('/outputs/', "/content/drive/My Drive/outputs/")

More Notebooks @ [eugenesiow/practical-ml](https://github.com/eugenesiow/practical-ml) and do drop us some feedback on how to improve the notebooks on the [Github repo](https://github.com/eugenesiow/practical-ml/).