## Custom Text Classification using TensorFlow Lite

#### **Install TF Lite Model Maker**
Install the **TensorFlow Lite Model Maker** library. TF Lite Model Maker makes it easy to train models on custom dataset and reduces time to train by using Transfer Learning on pre-trained models.

In [5]:
import numpy as np
from numpy.random import RandomState
import pandas as pd
import os

from tflite_model_maker import model_spec
from tflite_model_maker import text_classifier
from tflite_model_maker.config import ExportFormat
from tflite_model_maker.config import QuantizationConfig
from tflite_model_maker.text_classifier import AverageWordVecSpec
from tflite_model_maker.text_classifier import DataLoader

import tensorflow as tf
assert tf.__version__.startswith('2')
tf.get_logger().setLevel('ERROR')

ContextualVersionConflict: (protobuf 3.6.1 (/home/mhkim/anaconda3/lib/python3.6/site-packages), Requirement.parse('protobuf>=3.12.0'), {'google-api-core'})

In [None]:
df_train = pd.read_csv('processed_train_simple.csv', error_bad_lines=False, engine="python", encoding='utf-8-sig')

df_test = pd.read_csv('processed_test_simple.csv', error_bad_lines=False, engine="python", encoding='utf-8-sig')

Mounted at /content/drive


#### **Install necessary libraries**

#### **Import dataset**
Import the training and test dataset and read them as CSV files using the Pandas library.

In [None]:
df_train = pd.read_csv('processed_train_simple.csv', error_bad_lines=False, engine="python", encoding='utf-8-sig')

df_test = pd.read_csv('processed_test_simple.csv', error_bad_lines=False, engine="python", encoding='utf-8-sig')


#### **View dataset**
Check your dataset and see if it is properly imported or not.

In [None]:
df_train.head()

Unnamed: 0.1,Unnamed: 0,sentence,label
0,0,아내가 드디어 출산하게 되어서 정말 신이 나.,E6
1,1,당뇨랑 합병증 때문에 먹어야 할 약이 열 가지가 넘어가니까 스트레스야.,E3
2,2,고등학교에 올라오니 중학교 때보다 수업이 갑자기 어려워져서 당황스러워.,E5
3,3,재취업이 돼서 받게 된 첫 월급으로 온 가족이 외식을 할 예정이야. 너무 행복해.,E6
4,4,빚을 드디어 다 갚게 되어서 이제야 안도감이 들어.,E6


In [None]:
df_test.head()

Unnamed: 0.1,Unnamed: 0,sentence,label
0,0,요즘 부모님과 많이 부딪혀.,E1
1,1,엄마가 결국 집을 나갔어. 너무 너무 슬퍼.,E2
2,2,학교에서 한 친구를 괴롭히는 무리에게 그만하라고 했어.,E3
3,3,이번에 팀장님이 간단한 조사 업무를 부탁하셨는데 내가 잘못 처리했어. 너무 절망적이야.,E5
4,4,남편이 이혼할 때 위자료를 주지 않으려고 변호사를 고용했어.,E1


#### **Choose a model architecture**
Choose any one model architecture of your choice and comment the rest. Each model architecture is different from the other and will yield different results. The MobileBERT model takes more time to train as its architecture is quite complex. However, feel free to play with different architectures until you find the best result.

In [None]:
# spec = model_spec.get('average_word_vec')
spec = model_spec.get('mobilebert_classifier')
# spec = model_spec.get('bert_classifier')
# spec = AverageWordVecSpec(wordvec_dim=32)


#### **Customize the MobileBERT model hyperparameters**

**Note:** Run this cell only if you have chosen the `MobileBERT Classifier` model architecture.

The model parameters you can adjust are:

* `seq_len`: Length of the sequence to feed into the model.
* `initializer_range`: The standard deviation of the `truncated_normal_initializer` for initializing all weight matrices.
* `trainable`: Boolean that specifies whether the pre-trained layer is trainable.

The training pipeline parameters you can adjust are:

* `model_dir`: The location of the model checkpoint files. If not set, a temporary directory will be used.
* `dropout_rate`: The dropout rate.
* `learning_rate`: The initial learning rate for the Adam optimizer.
* `tpu`: TPU address to connect to.

For instance, you can set the `seq_len=256` (default is 128). This allows the model to classify longer text.

In [None]:
spec.seq_len = 256

#### **Load training and test data**
Load the training and test data CSV files to prepare the model training process. Make sure the `is_training` parameter for `test_data` is set to `False`.

In [None]:
train_data = DataLoader.from_csv(
      filename='processed_train_simple.csv',
      text_column='sentence',
      label_column='label',
      model_spec=spec,
      is_training=True)

test_data = DataLoader.from_csv(
      filename='processed_test_simple.csv',
      text_column='sentence',
      label_column='label',
      model_spec=spec,
      is_training=False)

#### **Train model**
Start the model training on the train dataset. Feel free to play around with different no. of epochs until you find the ideal epoch value that gives the best results.

In [None]:
model = text_classifier.create(train_data, model_spec=spec, epochs=100)

Epoch 1/100
 564/4694 [==>...........................] - ETA: 47:40:06 - loss: 6.6130 - test_accuracy: 0.1732

#### **Examine your model structure - Layers of the neural network**

In [None]:
model.summary()

#### **Evaluate the model**
Evaluate the model accuracy on the test data and see for yourself if the model needs some tweakings such as increase in dataset or hyperparameter tuning in order to increase the accuracy.

In [None]:
loss, acc = model.evaluate(test_data)

#### **Export TF Lite model**
The final model is exported as a TF Lite model which can be downloaded and directly deployed on your Android app.

In [None]:
model.export(export_dir='test')