<a href="https://colab.research.google.com/github/sagorbrur/bangla-bert/blob/master/notebook/bangla-bert-evaluation-classification-task.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Measuring performance of Bangla-Electra, bangla-bert-base, and Multilingual BERT on classification tasks

## Dependencies

In [None]:
! pip install tensorboardX pandas simpletransformers transformers

Collecting tensorboardX
[?25l  Downloading https://files.pythonhosted.org/packages/af/0c/4f41bcd45db376e6fe5c619c01100e9b7531c55791b7244815bac6eac32c/tensorboardX-2.1-py2.py3-none-any.whl (308kB)
[K     |████████████████████████████████| 317kB 4.7MB/s 
Collecting simpletransformers
[?25l  Downloading https://files.pythonhosted.org/packages/56/35/31022262786f4aa070fe472677cea66fade8d221181a86825096af021e2c/simpletransformers-0.48.14-py3-none-any.whl (214kB)
[K     |████████████████████████████████| 215kB 9.8MB/s 
[?25hCollecting transformers
[?25l  Downloading https://files.pythonhosted.org/packages/19/22/aff234f4a841f8999e68a7a94bdd4b60b4cebcfeca5d67d61cd08c9179de/transformers-3.3.1-py3-none-any.whl (1.1MB)
[K     |████████████████████████████████| 1.1MB 15.0MB/s 
Collecting tqdm>=4.47.0
[?25l  Downloading https://files.pythonhosted.org/packages/bd/cf/f91813073e4135c1183cadf968256764a6fe4e35c351d596d527c0540461/tqdm-4.50.2-py2.py3-none-any.whl (70kB)
[K     |██████████████████

In [None]:
!nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243


**RESTART RUNTIME**

## Import Data

Both of these benchmark datasets come from https://github.com/rezacsedu/BengFastText/ and are described in this pre-print: https://arxiv.org/abs/2004.07807 

In [None]:
! cp ./drive/My\ Drive/mlin/bengalicorpus/tasks/Hate_Speech_*.csv ./
! cp ./drive/My\ Drive/mlin/bengalicorpus/tasks/train/bangla.pos ./train_pos.txt
! cp ./drive/My\ Drive/mlin/bengalicorpus/tasks/train/bangla.neg ./train_neg.txt
! cp ./drive/My\ Drive/mlin/bengalicorpus/tasks/test/bangla.pos ./test_pos.txt
! cp ./drive/My\ Drive/mlin/bengalicorpus/tasks/test/bangla.neg ./test_neg.txt

### Sentiment analysis, training data



In [None]:
! head -n 3 train_pos.txt

 আমার খুব প্রিয় মডেল আমি খুব ভালো বাসি মিম আপু 
 ভাই সব আপনাদের খুব ভাল লাগছে 
 আপু তুমি অনেক ন্যচারাল সুন্দর 


In [None]:
import pandas as pd 

pos_arr = open('./train_pos.txt', 'r').read().split("\n")
df1 = pd.DataFrame(data={'text': pos_arr })
df1['labels'] = 1

neg_arr = open('./train_neg.txt', 'r').read().split("\n")
df2 = pd.DataFrame(data={'text': neg_arr })
df2['labels'] = 0
df = pd.concat([df1, df2])
df = df.dropna()
df.sample(5)

Unnamed: 0,text,labels
135,তারুন্যের অহংকার বাংলাদেশের আলোকিত সন্তান জনা...,1
809,মোদি ভারতের হওয়াটাই একটা ভূল ।। যে কালো টাকার...,0
1030,বিশ্ব একজন সংগ্রামি নেতাকে হারাল,1
3241,"হ্য ভাই,,অরা অন্যর সমলোচনা করতে পারে,,খুব করে,...",0
2698,আরো কি আপনার জন্য অনুরোধ করতে পারেন?,1


### Sentiment analysis, test set

In [None]:
tpos_arr = open('./test_pos.txt', 'r').read().split("\n")
tdf1 = pd.DataFrame(data={'text': tpos_arr })
tdf1['labels'] = 1

tneg_arr = open('./test_neg.txt', 'r').read().split("\n")
tdf2 = pd.DataFrame(data={'text': tneg_arr })
tdf2['labels'] = 0
test = pd.concat([tdf1, tdf2])
test = test.dropna()
test.sample(5)

Unnamed: 0,text,labels
11,সুজন ভাউ ১ম টেস্টের পরে বলেছিলেন উনাদের মনে হয়...,0
27,নারকেল চাল কিন্তু ভাল ছিল।,1
413,তবে এই জায়গাটিতে জয়িয়া এবং থাই খাবারের দুর...,1
597,"মসলাযুক্ত খাবারের জাতিগত একটি ফ্যান হচ্ছে, ভা...",1
425,"আমরা কি ভাবনার মধ্যে গিয়েছিলাম,আমাদের একটি d...",0


## Training sentiment analysis model

In [None]:
from simpletransformers.classification import ClassificationModel



### Bangla-Electra

In [None]:
# set use_cuda=False on CPU-only platforms
model = ClassificationModel('bert', 'monsoon-nlp/bangla-electra', num_labels=2, use_cuda=True, args={
    'reprocess_input_data': True,
    'use_cached_eval_features': False,
    'overwrite_output_dir': True,
    'num_train_epochs': 3,
    'silent': True
}) # , weight=[2.5, 1.0]
model.train_model(df.sample(frac=1))

Some weights of the model checkpoint at monsoon-nlp/bangla-electra were not used when initializing BertForSequenceClassification: ['electra.embeddings.word_embeddings.weight', 'electra.embeddings.position_embeddings.weight', 'electra.embeddings.token_type_embeddings.weight', 'electra.embeddings.LayerNorm.weight', 'electra.embeddings.LayerNorm.bias', 'electra.embeddings_project.weight', 'electra.embeddings_project.bias', 'electra.encoder.layer.0.attention.self.query.weight', 'electra.encoder.layer.0.attention.self.query.bias', 'electra.encoder.layer.0.attention.self.key.weight', 'electra.encoder.layer.0.attention.self.key.bias', 'electra.encoder.layer.0.attention.self.value.weight', 'electra.encoder.layer.0.attention.self.value.bias', 'electra.encoder.layer.0.attention.output.dense.weight', 'electra.encoder.layer.0.attention.output.dense.bias', 'electra.encoder.layer.0.attention.output.LayerNorm.weight', 'electra.encoder.layer.0.attention.output.LayerNorm.bias', 'electra.encoder.layer.0

(2586, 0.4163653976244352)

Does it work on the test set?

In [None]:
result, model_outputs, wrong_predictions = model.eval_model(test)
bads = {}
for pred in wrong_predictions:
    if pred.label in bads:
        bads[pred.label] += 1
    else:
        bads[pred.label] = 1
print("wrong predictions:")
print(str(len(wrong_predictions)) + ' wrong out of ' + str(len(test)))
bads

wrong predictions:
472 wrong out of 1532


{0: 433, 1: 39}

In [None]:
print("Accuracy %")
(1532-472)/1532*100

Accuracy %


69.19060052219321

### Compare to bangla-bert-base

In [None]:
bert2 = ClassificationModel('bert', 'sagorsarker/bangla-bert-base', num_labels=2, use_cuda=True, args={
    'reprocess_input_data': True,
    'use_cached_eval_features': False,
    'overwrite_output_dir': True,
    'num_train_epochs': 3,
    'silent': True
})
bert2.train_model(df.sample(frac=1))

Some weights of the model checkpoint at sagorsarker/bangla-bert-base were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model ch

(2586, 0.28165751674515216)

In [None]:
result, model_outputs, wrong_predictions = bert2.eval_model(test)
bads = {}
for pred in wrong_predictions:
    if pred.label in bads:
        bads[pred.label] += 1
    else:
        bads[pred.label] = 1
print("wrong predictions:")
print(str(len(wrong_predictions)) + ' wrong out of ' + str(len(test)))
bads

wrong predictions:
454 wrong out of 1532


{0: 429, 1: 25}

In [None]:
(1532-454)/1532*100

70.36553524804178

### Compare to Multilingual BERT

In [None]:
bert = ClassificationModel('bert', 'bert-base-multilingual-uncased', num_labels=2, use_cuda=True, args={
    'reprocess_input_data': True,
    'use_cached_eval_features': False,
    'overwrite_output_dir': True,
    'num_train_epochs': 3,
    'silent': True
})
bert.train_model(df)

Some weights of the model checkpoint at bert-base-multilingual-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model 

Selected optimization level O1:  Insert automatic casts around Pytorch functions and Tensor methods.

Defaults for this optimization level are:
enabled                : True
opt_level              : O1
cast_model_type        : None
patch_torch_functions  : True
keep_batchnorm_fp32    : None
master_weights         : None
loss_scale             : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled                : True
opt_level              : O1
cast_model_type        : None
patch_torch_functions  : True
keep_batchnorm_fp32    : None
master_weights         : None
loss_scale             : dynamic




Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 32768.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 8192.0




Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 8192.0


In [None]:
result, model_outputs, wrong_predictions = bert.eval_model(test)
bads = {}
for pred in wrong_predictions:
    if pred.label in bads:
        bads[pred.label] += 1
    else:
        bads[pred.label] = 1
print("wrong predictions:")
print(str(len(wrong_predictions)) + ' wrong out of ' + str(len(test)))
bads

wrong predictions:
488 wrong out of 1532


{0: 402, 1: 86}

In [None]:
print("Accuracy %")
(1532-488)/1532*100

Accuracy %


68.1462140992167

## Hate Speech Task

In [None]:
train = pd.read_csv("./Hate_Speech_Train.csv", sep='\t', names=['labels', 'text'])
train = train[['text', 'labels']]
train.labels = pd.Categorical(train.labels)
train['labels'] = train['labels'].cat.codes
train = train.dropna()

In [None]:
len(train.labels.unique())

5

In [None]:
# this category-> number encoding works because Train and Test
# introduce values in the same order, so 0=0, 1=1
# if this isn't the case for you, convert differently
test_hate = pd.read_csv("./Hate_Speech_Test.csv", sep='\t', names=['labels', 'text'])
test_hate = test_hate[['text', 'labels']]
test_hate.labels = pd.Categorical(test_hate.labels)
test_hate['labels'] = test_hate['labels'].cat.codes
test_hate = test_hate.dropna()

### Bangla-Electra

In [None]:
# set use_cuda=False on CPU-only platforms
model = ClassificationModel('bert', 'monsoon-nlp/bangla-electra', num_labels=5, use_cuda=True, args={
    'reprocess_input_data': True,
    'use_cached_eval_features': False,
    'overwrite_output_dir': True,
    'num_train_epochs': 3,
    'silent': True
})
model.train_model(train)

Some weights of the model checkpoint at monsoon-nlp/bangla-electra were not used when initializing BertForSequenceClassification: ['electra.embeddings.word_embeddings.weight', 'electra.embeddings.position_embeddings.weight', 'electra.embeddings.token_type_embeddings.weight', 'electra.embeddings.LayerNorm.weight', 'electra.embeddings.LayerNorm.bias', 'electra.embeddings_project.weight', 'electra.embeddings_project.bias', 'electra.encoder.layer.0.attention.self.query.weight', 'electra.encoder.layer.0.attention.self.query.bias', 'electra.encoder.layer.0.attention.self.key.weight', 'electra.encoder.layer.0.attention.self.key.bias', 'electra.encoder.layer.0.attention.self.value.weight', 'electra.encoder.layer.0.attention.self.value.bias', 'electra.encoder.layer.0.attention.output.dense.weight', 'electra.encoder.layer.0.attention.output.dense.bias', 'electra.encoder.layer.0.attention.output.LayerNorm.weight', 'electra.encoder.layer.0.attention.output.LayerNorm.bias', 'electra.encoder.layer.0

Selected optimization level O1:  Insert automatic casts around Pytorch functions and Tensor methods.

Defaults for this optimization level are:
enabled                : True
opt_level              : O1
cast_model_type        : None
patch_torch_functions  : True
keep_batchnorm_fp32    : None
master_weights         : None
loss_scale             : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled                : True
opt_level              : O1
cast_model_type        : None
patch_torch_functions  : True
keep_batchnorm_fp32    : None
master_weights         : None
loss_scale             : dynamic




In [None]:
result, model_outputs, wrong_predictions = model.eval_model(test_hate)
bads = {}
for pred in wrong_predictions:
    if pred.label in bads:
        bads[pred.label] += 1
    else:
        bads[pred.label] = 1
print("wrong predictions:")
print(str(len(wrong_predictions)) + ' wrong out of ' + str(len(test_hate)))
bads

wrong predictions:
223 wrong out of 323


  mcc = cov_ytyp / np.sqrt(cov_ytyt * cov_ypyp)


{0: 47, 1: 75, 3: 51, 4: 50}

In [1]:
(323-223)/223*100

44.843049327354265

### Test on bangla-bert

In [None]:
model2 = ClassificationModel('bert', 'sagorsarker/bangla-bert-base', num_labels=5, use_cuda=True, args={
    'reprocess_input_data': True,
    'use_cached_eval_features': False,
    'overwrite_output_dir': True,
    'num_train_epochs': 3,
    'silent': True
})
model2.train_model(train)

Some weights of the model checkpoint at sagorsarker/bangla-bert-base were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model ch

(330, 0.5708351454217777)

In [None]:
result, model_outputs, wrong_predictions = model2.eval_model(test_hate)
bads = {}
for pred in wrong_predictions:
    if pred.label in bads:
        bads[pred.label] += 1
    else:
        bads[pred.label] = 1
print("wrong predictions:")
print(str(len(wrong_predictions)) + ' wrong out of ' + str(len(test_hate)))
bads

wrong predictions:
91 wrong out of 323


{0: 13, 1: 23, 2: 26, 3: 20, 4: 9}

In [None]:
(323-91)/323*100

71.8266253869969

### mBERT

In [None]:
model3 = ClassificationModel('bert', 'bert-base-multilingual-uncased', num_labels=5, use_cuda=True, args={
    'reprocess_input_data': True,
    'use_cached_eval_features': False,
    'overwrite_output_dir': True,
    'num_train_epochs': 3,
    'silent': True
})
model3.train_model(train)
result, model_outputs, wrong_predictions = model3.eval_model(test_hate)
bads = {}
for pred in wrong_predictions:
    if pred.label in bads:
        bads[pred.label] += 1
    else:
        bads[pred.label] = 1
print("wrong predictions:")
print(str(len(wrong_predictions)) + ' wrong out of ' + str(len(test_hate)))
bads

Some weights of the model checkpoint at bert-base-multilingual-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model 

wrong predictions:
154 wrong out of 323


{0: 18, 1: 40, 2: 36, 3: 35, 4: 25}

In [None]:
(323-154)/323*100

52.32198142414861

## News Topic Task
https://github.com/soham96/Bangla2Vec

In [None]:
! git clone https://github.com/soham96/Bangla2Vec.git

Cloning into 'Bangla2Vec'...
remote: Enumerating objects: 70, done.[K
remote: Total 70 (delta 0), reused 0 (delta 0), pack-reused 70[K
Unpacking objects: 100% (70/70), done.


In [None]:
! unzip Bangla2Vec/data/Archive.zip

Archive:  Bangla2Vec/data/Archive.zip
  inflating: classification.txt      
   creating: __MACOSX/
  inflating: __MACOSX/._classification.txt  
  inflating: ebala_classification.txt  
  inflating: anandabazar_classification.txt  


In [None]:
! head -n 5 ebala_classification.txt

entertainment||খুব শিগগিরই বিয়ে? ‘প্রমাণ’ নিয়ে তোলপাড় বলিউড||https://ebela.in/entertainment/are-deepika-and-ranveer-going-to-marry-soon-dgtl-1.340745
state||কত সার্জ নিতে পারবে ওলা, উবের। জানিয়ে দিল সরকার।||https://ebela.in/state/state-government-gives-new-guidelines-to-ola-and-uber-dgtl-1.827280
sports||বাগানে বসন্ত হাইতিয়ানের পায়ে, খালিদের সংসারে টানা দুই জয়||https://ebela.in/sports/sony-norde-magic-helps-khalid-earn-three-points-dgtl-1.931469
national||বিয়েবাড়িতে খাবার শেষ! তারপরেই অতিথিরা ঘটালেন মারাত্মক কাণ্ড||https://ebela.in/national/one-killed-in-uttar-pradesh-s-wedding-party-after-runs-out-of-plates-dgtl-1.821870
national||মোদীর রাজ্যে পদ্ম-আংটিতে হাজার-হাজার হিরে! বিশ্বজয় ভারতীয়র, দাম আকাশছোঁয়া||https://ebela.in/national/indian-jewellers-set-guinness-world-record-with-a-ring-containing-6690-diamonds-dgtl-1.824156


In [None]:
ebala = pd.read_csv("./ebala_classification.txt", sep='|', names=['labels','blank1', 'text','blank2', 'url'])
ebala = ebala[['text', 'labels']]
ebala.labels = pd.Categorical(ebala.labels)
ebala['labels'] = ebala['labels'].cat.codes
ebala = ebala.dropna()
ebala.head()

Unnamed: 0,text,labels
0,খুব শিগগিরই বিয়ে? ‘প্রমাণ’ নিয়ে তোলপাড় বলিউড,0
1,"কত সার্জ নিতে পারবে ওলা, উবের। জানিয়ে দিল সরকার।",4
2,"বাগানে বসন্ত হাইতিয়ানের পায়ে, খালিদের সংসারে ট...",3
3,বিয়েবাড়িতে খাবার শেষ! তারপরেই অতিথিরা ঘটালেন ...,2
4,মোদীর রাজ্যে পদ্ম-আংটিতে হাজার-হাজার হিরে! বিশ...,2


In [None]:
# random_state=880 (Bangladesh country code)
from sklearn.model_selection import train_test_split
train, test = train_test_split(ebala, random_state=880)

In [None]:
len(train.labels.unique())

6

### Bangla-Electra

In [None]:
# set use_cuda=False on CPU-only platforms
model = ClassificationModel('bert', 'monsoon-nlp/bangla-electra', num_labels=6, use_cuda=True, args={
    'reprocess_input_data': True,
    'use_cached_eval_features': False,
    'overwrite_output_dir': True,
    'num_train_epochs': 3,
    'silent': True
})
model.train_model(train)

Some weights of the model checkpoint at monsoon-nlp/bangla-electra were not used when initializing BertForSequenceClassification: ['electra.embeddings.word_embeddings.weight', 'electra.embeddings.position_embeddings.weight', 'electra.embeddings.token_type_embeddings.weight', 'electra.embeddings.LayerNorm.weight', 'electra.embeddings.LayerNorm.bias', 'electra.embeddings_project.weight', 'electra.embeddings_project.bias', 'electra.encoder.layer.0.attention.self.query.weight', 'electra.encoder.layer.0.attention.self.query.bias', 'electra.encoder.layer.0.attention.self.key.weight', 'electra.encoder.layer.0.attention.self.key.bias', 'electra.encoder.layer.0.attention.self.value.weight', 'electra.encoder.layer.0.attention.self.value.bias', 'electra.encoder.layer.0.attention.output.dense.weight', 'electra.encoder.layer.0.attention.output.dense.bias', 'electra.encoder.layer.0.attention.output.LayerNorm.weight', 'electra.encoder.layer.0.attention.output.LayerNorm.bias', 'electra.encoder.layer.0

Selected optimization level O1:  Insert automatic casts around Pytorch functions and Tensor methods.

Defaults for this optimization level are:
enabled                : True
opt_level              : O1
cast_model_type        : None
patch_torch_functions  : True
keep_batchnorm_fp32    : None
master_weights         : None
loss_scale             : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled                : True
opt_level              : O1
cast_model_type        : None
patch_torch_functions  : True
keep_batchnorm_fp32    : None
master_weights         : None
loss_scale             : dynamic




Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 131072.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 65536.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 65536.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 65536.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 32768.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 65536.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 65536.0


In [None]:
result, model_outputs, wrong_predictions = model.eval_model(test)
bads = {}
for pred in wrong_predictions:
    if pred.label in bads:
        bads[pred.label] += 1
    else:
        bads[pred.label] = 1
print("wrong predictions:")
print(str(len(wrong_predictions)) + ' wrong out of ' + str(len(test)))
bads

wrong predictions:
2490 wrong out of 14092


{0: 378, 1: 498, 2: 725, 3: 253, 4: 630, 5: 6}

In [None]:
(14092-2490)/14092*100

82.3304002270792

### bangla-bert-base

In [None]:
bert2 = ClassificationModel('bert', 'sagorsarker/bangla-bert-base', num_labels=6, use_cuda=True, args={
    'reprocess_input_data': True,
    'use_cached_eval_features': False,
    'overwrite_output_dir': True,
    'num_train_epochs': 3,
    'silent': True
})
bert2.train_model(train)
result, model_outputs, wrong_predictions = bert2.eval_model(test)
bads = {}
for pred in wrong_predictions:
    if pred.label in bads:
        bads[pred.label] += 1
    else:
        bads[pred.label] = 1
print("wrong predictions:")
print(str(len(wrong_predictions)) + ' wrong out of ' + str(len(test)))
bads

Some weights of the model checkpoint at sagorsarker/bangla-bert-base were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model ch

wrong predictions:
1523 wrong out of 14092


{0: 245, 1: 232, 2: 476, 3: 150, 4: 415, 5: 5}

In [None]:
(14092-1523)/14092*100

89.19244961680386

### mBERT performance

In [None]:
bert = ClassificationModel('bert', 'bert-base-multilingual-uncased', num_labels=6, use_cuda=True, args={
    'reprocess_input_data': True,
    'use_cached_eval_features': False,
    'overwrite_output_dir': True,
    'num_train_epochs': 3,
    'silent': True
})
bert.train_model(train)

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=625.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=672271273.0, style=ProgressStyle(descri…




Some weights of the model checkpoint at bert-base-multilingual-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model 

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=871891.0, style=ProgressStyle(descripti…


Selected optimization level O1:  Insert automatic casts around Pytorch functions and Tensor methods.

Defaults for this optimization level are:
enabled                : True
opt_level              : O1
cast_model_type        : None
patch_torch_functions  : True
keep_batchnorm_fp32    : None
master_weights         : None
loss_scale             : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled                : True
opt_level              : O1
cast_model_type        : None
patch_torch_functions  : True
keep_batchnorm_fp32    : None
master_weights         : None
loss_scale             : dynamic




Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 32768.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0




Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 8192.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 4096.0




Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 8192.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 8192.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 8192.0


In [None]:
result, model_outputs, wrong_predictions = bert.eval_model(test)
bads = {}
for pred in wrong_predictions:
    if pred.label in bads:
        bads[pred.label] += 1
    else:
        bads[pred.label] = 1
print("wrong predictions:")
print(str(len(wrong_predictions)) + ' wrong out of ' + str(len(test)))
bads

wrong predictions:
3907 wrong out of 14092


{0: 675, 1: 505, 2: 1118, 3: 751, 4: 852, 5: 6}

In [None]:
(14092-3907)/14092*100

72.27504967357365