# Test NERDA with Google Colab

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/smaakage85/nerda-colab/blob/main/test.ipynb)

**Make sure that Google Colab Runtime is set to GPU**.

Install NERDA. 

In [2]:
!pip install NERDA

Collecting NERDA
  Downloading https://files.pythonhosted.org/packages/c4/93/c0b71e6473181bf16c820cf38a997525dacd40a55d749ac5db7f73fe6781/NERDA-0.8.6-py3-none-any.whl
Collecting progressbar
  Downloading https://files.pythonhosted.org/packages/a3/a6/b8e451f6cff1c99b4747a2f7235aa904d2d49e8e1464e0b798272aa84358/progressbar-2.5.tar.gz
Collecting transformers<=3.5.1
[?25l  Downloading https://files.pythonhosted.org/packages/3a/83/e74092e7f24a08d751aa59b37a9fc572b2e4af3918cb66f7766c3affb1b4/transformers-3.5.1-py3-none-any.whl (1.3MB)
[K     |████████████████████████████████| 1.3MB 11.3MB/s 
Collecting pyconll
  Downloading https://files.pythonhosted.org/packages/39/6f/86bd5d0eaa6821ba9193bbed16b660ea6f342fe63ec2e4fa2c61377bb44b/pyconll-2.3.3-py3-none-any.whl
Collecting sentencepiece==0.1.91
[?25l  Downloading https://files.pythonhosted.org/packages/d4/a4/d0a884c4300004a78cca907a6ff9a5e9fe4f090f5d95ab341c53d28cbc58/sentencepiece-0.1.91-cp36-cp36m-manylinux1_x86_64.whl (1.1MB)
[K     |███

Import dependencies and download ressources

In [3]:
import nltk
nltk.download('punkt')
from NERDA.datasets import download_dane_data, get_dane_data
from NERDA.models import NERDA
# download Danish NER data set, DaNE
download_dane_data()

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
Reading http://danlp-downloads.alexandra.dk/datasets/ddt.zip


'archive extracted to /root/.dane'

## Train ELECTRA model for NER in Danish

Set model configuration. Remember to set Google Colab Runtime to GPU.

In [6]:
tag_scheme = ['B-PER',
              'I-PER', 
              'B-ORG', 
              'I-ORG', 
              'B-LOC', 
              'I-LOC', 
              'B-MISC', 
              'I-MISC']
model = NERDA(dataset_training = get_dane_data('train'),
              dataset_validation = get_dane_data('dev'),
              tag_scheme = tag_scheme,
              tag_outside = 'O',
              transformer = 'Maltehb/-l-ctra-danish-electra-small-uncased',
              hyperparameters = {'epochs' : 5,
                                 'warmup_steps' : 500,
                                 'train_batch_size': 13,
                                 'learning_rate': 0.0001})

Device automatically set to: cuda


Train model

In [7]:
model.train()

  0%|          | 0/338 [00:00<?, ?it/s]


 Epoch 1 / 5


100%|██████████| 338/338 [00:28<00:00, 11.89it/s]
100%|██████████| 71/71 [00:02<00:00, 35.19it/s]
  0%|          | 0/338 [00:00<?, ?it/s]

Train Loss = 0.6901343162639959 Valid Loss = 0.2215158384240849

 Epoch 2 / 5


100%|██████████| 338/338 [00:28<00:00, 11.79it/s]
100%|██████████| 71/71 [00:02<00:00, 32.00it/s]
  0%|          | 0/338 [00:00<?, ?it/s]

Train Loss = 0.18718081671391895 Valid Loss = 0.14077660121338467

 Epoch 3 / 5


100%|██████████| 338/338 [00:28<00:00, 12.00it/s]
100%|██████████| 71/71 [00:02<00:00, 32.09it/s]
  0%|          | 0/338 [00:00<?, ?it/s]

Train Loss = 0.10380902324290671 Valid Loss = 0.12002295927262642

 Epoch 4 / 5


100%|██████████| 338/338 [00:29<00:00, 11.59it/s]
100%|██████████| 71/71 [00:02<00:00, 32.31it/s]
  0%|          | 0/338 [00:00<?, ?it/s]

Train Loss = 0.06557886774056541 Valid Loss = 0.10949709867192826

 Epoch 5 / 5


100%|██████████| 338/338 [00:28<00:00, 11.81it/s]
100%|██████████| 71/71 [00:01<00:00, 36.57it/s]


Train Loss = 0.04763972320312843 Valid Loss = 0.11310201950213859


'Model trained successfully'

Evaluate performance of model

In [9]:
model.evaluate_performance(get_dane_data('test'))

Unnamed: 0,Level,F1-Score
0,B-PER,0.918033
1,I-PER,0.974729
2,B-ORG,0.662069
3,I-ORG,0.722689
4,B-LOC,0.779817
5,I-LOC,0.615385
6,B-MISC,0.666667
7,I-MISC,0.71875
0,AVG_MICRO,0.798246
0,AVG_MACRO,0.757267


Predict new text

In [10]:
text = "Cristiano Ronaldo spiller for Juventus FC" 
model.predict_text(text)

([['Cristiano', 'Ronaldo', 'spiller', 'for', 'Juventus', 'FC']],
 [['B-PER', 'I-PER', 'O', 'O', 'B-ORG', 'I-ORG']])