## MXNet Logistic Regression

Example of multi class Logistic Regression implementation on MXNet with Embedding layer. Run on Python 3.10 with these libraries.

```
mxnet-cu112==1.9.1
scikit-learn==1.1.2
onnx==1.12.0
```

### Preparation

In [1]:
import re

import mxnet as mx
from mxnet import nd, gluon, symbol
from mxnet.gluon import nn
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer

In [2]:
MAX_LEN = 200
BATCH_SIZE = 32
LR = 0.001
EPOCH = 10

mx.random.seed(42)

### Preprocessing

* Use sklearn to get 20 newsgroups dataset and create dictionary (word/ID pair).
* Since sklearn doesn't have function to turn word into ID, it's done manually.
* Don't forget to pad or trim to match Logistic Regression input length.
* Value of `len(vectorizer.vocabulary_)` is used as padding since `0` to `len(vectorizer.vocabulary_)-1` already used by dictionary. Another approach where `0` used as padding and increment dictionary ID by 1 also works.

In [3]:
# Get dataset
ng_train = fetch_20newsgroups(subset='train', remove=('headers', 'footers', 'quote'))
ng_test  = fetch_20newsgroups(subset='test',  remove=('headers', 'footers', 'quote'))

# Use vectorizer to create dictionary
vectorizer = CountVectorizer(stop_words='english', max_df=0.6, min_df=25)
vectorizer.fit(ng_train.data)

# Process text
X_train = nd.empty((len(ng_train.data), MAX_LEN ), dtype='int')
X_test  = nd.empty((len(ng_test.data),  MAX_LEN ), dtype='int')
for idx, X in enumerate(ng_train.data + ng_test.data):
    # Split words using vectorizer regex and convert word to word ID (e.g. "amd" to 529)
    words = re.findall(vectorizer.token_pattern, X)
    words_id = [vectorizer.vocabulary_[w] for w in words if w in vectorizer.vocabulary_]
    
    # Padding or remove left over
    if len(words_id) > MAX_LEN:
        words_id = words_id[:MAX_LEN]
    else:
        words_id.extend([len(vectorizer.vocabulary_)] * (MAX_LEN - len(words_id)) )

    words_id = nd.array(words_id)
    if idx < len(ng_train.data):
        X_train[idx] = words_id
    else:
        X_test[idx-len(ng_train.data)] = words_id


# Process label
y_train, y_test = nd.array(ng_train.target), nd.array(ng_test.target)
label = ng_train.target_names

### Prepare train

* Use hybrid model to improve training/inference speed.
* "Vanilla" logistic regresion use Sigmoid and L2 loss, but this example use Softmax and Softmax Cross Entropy Loss.
* `gluon.loss.SoftmaxCrossEntropyLoss` could handle integer/one-hot-encode as true label and logit/softmax value as predicted label.
* Freeze `count` embedding so MXNet will only focus to find weight of each word for each class.

In [4]:
class Net(nn.HybridBlock):
    def __init__(self, **kwargs):
        super(Net, self).__init__(**kwargs)
        self.weight = nn.Embedding(
            len(vectorizer.vocabulary_), 20,
            weight_initializer=mx.initializer.Xavier()
        )
        self.count  = nn.Embedding(
            len(vectorizer.vocabulary_), 1,
            weight_initializer=mx.initializer.One()
        )
        self.flatten = nn.Flatten()

        
    def hybrid_forward(self, F, x):
        weight = self.weight(x)
        count  = self.count(x)
        # x = nd.linalg_gemm2(weight, count, transpose_a=True)
        x = symbol.linalg_gemm2(weight, count, transpose_a=True)
        x = self.flatten(x)
        # x = nd.softmax(x)

        return x

In [5]:
train_data = gluon.data.DataLoader(
    gluon.data.dataset.ArrayDataset(X_train, y_train),
    batch_size=BATCH_SIZE, shuffle=True)
test_data  = gluon.data.DataLoader(
    gluon.data.dataset.ArrayDataset(X_test,  y_test),
    batch_size=BATCH_SIZE, shuffle=False)

net = Net()
net.initialize()
net.hybridize()

softmax_cross_entropy = gluon.loss.SoftmaxCrossEntropyLoss()
# embedding0_weight means only train self.weight on Net()
trainer = gluon.Trainer(net.collect_params('embedding0_weight'), 'adam', {'learning_rate': LR})

In [6]:
def acc(output, label):
    # output: (batch, num_output) float32 ndarray
    # label: (batch, ) int32 ndarray
    return (output.argmax(axis=1) ==
            label.astype('float32')).mean().asscalar()

### Train

In [7]:
for epoch in range(EPOCH):
    train_loss, train_acc, valid_acc = 0., 0., 0.
    for data, label in train_data:
        # forward + backward
        with mx.autograd.record():
            output = net(data)
            loss = softmax_cross_entropy(output, label)
        loss.backward()

        # update parameters
        trainer.step(BATCH_SIZE)

        # calculate training metrics
        train_loss += loss.mean().asscalar()
        train_acc += acc(output, label)

    # calculate test accuracy, then show training progress
    for data, label in test_data:
        valid_acc += acc(net(data), label)
    print("Epoch %d: loss %.3f, train acc %.3f, test acc %.3f" % (
            epoch, train_loss/len(train_data), train_acc/len(train_data),
            valid_acc/len(test_data)))

Epoch 0: loss 2.142, train acc 0.509, test acc 0.563
Epoch 1: loss 1.229, train acc 0.788, test acc 0.628
Epoch 2: loss 0.927, train acc 0.844, test acc 0.644
Epoch 3: loss 0.752, train acc 0.882, test acc 0.645
Epoch 4: loss 0.635, train acc 0.901, test acc 0.646
Epoch 5: loss 0.552, train acc 0.914, test acc 0.664
Epoch 6: loss 0.483, train acc 0.926, test acc 0.664
Epoch 7: loss 0.432, train acc 0.933, test acc 0.676
Epoch 8: loss 0.387, train acc 0.941, test acc 0.667
Epoch 9: loss 0.350, train acc 0.947, test acc 0.667


### Export to ONNX

* `net.export` only works when hybrid model is used.
* `net.save` use work both Gluon/imperative and Symbol/symbolic based model.
* Export model to ONNX currently only works with hybrid model.

In [8]:
net.export('logreg')

sym = './logreg-symbol.json'
params = f'./logreg-0000.params'
in_shapes = [(1, MAX_LEN)]
in_types = ['int']
onnx_file_path = './logreg-model.onnx'

mx.onnx.export_model(sym, params, in_shapes, in_types, onnx_file_path)

'./logreg-model.onnx'