# Crosslingual Transfer for Danish NER



### Reading data from disk

All tokens are marked with BIO tags and the data contains four entity types: `LOC` (locations), `MISC` (miscellaneous), `ORG` (organizations) and `PER` (person names):

```
Berlingske      B-ORG
Tidendes        O
afslag          O
kom             O
først           O
seks            O
uger            O
senere          O
og              O
lignede         O
til             O
forveksling     O
afslaget        O
fra             O
Jyllands-Posten B-ORG
.               O
```

Sentences are separated by blank lines.

Define a function `read_data` which takes a file as input and returns a list of sentences. Each sentence is a list of pairs `(token, bio_tag)`, for example `("Jyllands-Posten", "B-ORG")`.  

In [139]:
import os.path as path
from collections import defaultdict

def read_data(file):
    
    '''
    Takes an open file and returns a list of sentences
    '''
    
    sent_dict = defaultdict(list)
    i = 0
    for line in file:
        if line.startswith("-DOCSTART-"):
            continue
        if '\t' in line:
            word, tag = line.split('\t')
        
        if line.startswith(" ") or line.startswith("\t") or line == '\n':
            i += 1
        else:
            sent_dict[i].append((word, tag.strip("\n")))
                    
    return list(sent_dict.values())
                

import io

# Tests
test_string = "Berlingske\tB-ORG"
assert(read_data(io.StringIO(test_string)) == [[("Berlingske","B-ORG")]])


test_string = ""
assert not read_data(io.StringIO(test_string))

test_string = '''hi\tO
nv\tO
.\tO
 \t 
hey\tO
!\tO
 \t 
'''
assert len(read_data(io.StringIO(test_string))) == 2


test_string = '''hi\tO
nv\tO
.\tO
 \t 
hey\tO
!\tO
 \t
 \t 
'''
assert len(read_data(io.StringIO(test_string))) == 2
assert [] not in read_data(io.StringIO(test_string))


In [140]:
danish_train = read_data(open(path.join("data","danish-train.conll")))
danish_dev = read_data(open(path.join("data","danish-dev.conll")))
danish_test = read_data(open(path.join("data","danish-test.conll")))

print(danish_train[0])
print(danish_dev[0])

[('På', 'O'), ('fredag', 'O'), ('har', 'O'), ('SID', 'B-ORG'), ('inviteret', 'O'), ('til', 'O'), ('reception', 'O'), ('i', 'O'), ('SID-huset', 'O'), ('i', 'O'), ('anledning', 'O'), ('af', 'O'), ('at', 'O'), ('formanden', 'O'), ('Kjeld', 'B-PER'), ('Christensen', 'I-PER'), ('går', 'O'), ('ind', 'O'), ('i', 'O'), ('de', 'O'), ('glade', 'O'), ('tressere', 'O'), ('.', 'O')]
[('Hvor', 'O'), ('kommer', 'O'), ('julemanden', 'O'), ('fra', 'O'), ('?', 'O')]


### Converting data into spaCy format


The NER models in spaCy take training data in a specific format which differs from the BIO annotation in our files:

```
('På fredag har SID inviteret til reception i SID-huset i anledning af at formanden Kjeld Christensen går ind i de glade tressere .', {'entities': [[14, 17, 'ORG'], [82, 99, 'PER']]})
```

Each example is a pair where the first member is a sentence string like `'På fredag har SID inviteret til reception i SID-huset i anledning af at formanden Kjeld Christensen går ind i de glade tressere .'`. Note that the tokens in the sentence including punctuation are separated by spaces. The second member is a dictionary which specifies all named entities in a list `{'entities': [[14, 17, 'ORG'], [82, 99, 'PER']]}`.

Each entity like `[14, 17, 'ORG']` gives the start index of the entity (`14` in this case), its end (`17`) and the type of the entity `ORG`. The organization here is 

```
print('På fredag har SID inviteret til reception i SID-huset i anledning af at formanden Kjeld Christensen går ind i de glade tressere .'[14:17])

SID
```
and the person is:
```
print('På fredag har SID inviteret til reception i SID-huset i anledning af at formanden Kjeld Christensen går ind i de glade tressere .'[82:99])

Kjeld Christensen
```

The function below takes a dataset in BIO-format and returns it in spaCy format.


In [141]:
def get_entity(tag):
    return tag[2:]

def get_spacy_ner_data(data):
    result = []
    
    for sent in data:
        final_dict = dict()
        final_dict['entities'] = []
        
        entities = defaultdict(list)
        i = 0
        
        curr_idx = 0
        
        tokens = []
        for pair in sent:
            token, tag = pair
            tokens.append(token)
            
            if tag.startswith("B-"):
                i += 1
                curr_type = get_entity(tag)              
                entities[(curr_type, i)].append(curr_idx)
                entities[(curr_type, i)].append(curr_idx + len(token))
            
            elif tag.startswith("I-"):
                I_type = get_entity(tag)
                if I_type == curr_type:
                    entities[(curr_type, i)].append(curr_idx + len(token))
                               
            curr_idx += len(token) + 1
        
        if entities:
            for key, values in entities.items():
                ent = [values[0], values[-1], key[0]]
                    
                final_dict['entities'].append(ent)
        
        result.append((" ".join(tokens), final_dict))
   
    return result

# Tests
test_data = [[("The","O"),
              ("dog","O"),
              ("slept","O"),
              (".","O")]]
assert(get_spacy_ner_data(test_data) == [('The dog slept .',{'entities':[]})])

test_data = [[("The","O"),
              ("dog","B-ORG"),
              ("slept","O"),
              (".","O")]]
assert(get_spacy_ner_data(test_data) == [('The dog slept .',{'entities':[[4, 7, 'ORG']]})])


test_data = [[("The","O"),
              ("dog","B-ORG"),
              ("slept","B-ORG"),
              ("Henry", "B-ORG"),
              (".","O")]]

assert(get_spacy_ner_data(test_data) == [('The dog slept Henry .',{'entities':[[4, 7, 'ORG'], [8, 13, 'ORG'], [14, 19, 'ORG']]})])


test_data = [[("The","O"),
              ("dog","B-ORG"),
              ("slept","I-ORG"),
              ("Henry", "B-ORG"),
              (".","O")]]

assert(get_spacy_ner_data(test_data) == [('The dog slept Henry .',{'entities':[[4, 13, 'ORG'], [14, 19, 'ORG']]})])


test_data = [[('Anne', 'B-PER'),
              ('showed', 'O'),
              ('Sue', 'B-PER'),
              ('Mengqiu Huang', 'B-PER'),
              ("'s", 'O'),
              ('new', 'O'),
              ('painting', 'O')]]

assert get_spacy_ner_data(test_data)[0][1] == {'entities': [[0, 4, 'PER'], [12, 15, 'PER'], [16, 29, 'PER']]}


test_data = [[("The","O"),
              ("dog","B-ORG"),
              ("slept","I-ORG"),
              ("Henry", "I-ORG"),
              (".","O")]]

assert get_spacy_ner_data(test_data)[0][1] == {'entities': [[4, 19, 'ORG']]}


In [143]:
danish_spacy_train = get_spacy_ner_data(danish_train)
danish_spacy_dev = get_spacy_ner_data(danish_dev)
danish_spacy_test = get_spacy_ner_data(danish_test)

print(danish_spacy_train[0])

('På fredag har SID inviteret til reception i SID-huset i anledning af at formanden Kjeld Christensen går ind i de glade tressere .', {'entities': [[14, 17, 'ORG'], [82, 99, 'PER']]})


### Evaluation for NER

In [144]:
def evaluate(sys_spacy_data,gold_spacy_data):
    precision, recall, fscore = 0, 0, 0

    p_total, r_total = 0, 0
    
    correct = 0
    
    for sys, gold in zip(sys_spacy_data, gold_spacy_data):
        sys_ent = sys[1]['entities']
        gold_ent = gold[1]['entities']
        
        p_total += len(sys_ent)
        r_total += len(gold_ent)
        
        for r in sys_ent:
            if r in gold_ent:
                correct += 1

    precision = correct / p_total 
    recall = correct / r_total 
    
    if precision == 0 and recall == 0:
        fscore = 0
    else:
        fscore = 2 * precision * recall / (precision + recall)
    
    return precision * 100, recall * 100, fscore * 100

# tests
sys_data = [("word1 word2 word3 word4",{"entities":[(0,5,"PER"),(12,17,"LOC")]}),
            ("word1 word2 word3 word4",{"entities":[(6,11,"ORG")]})]

gold_data = [("word1 word2 word3 word4",{"entities":[(0,6,"PER"),(12,17,"LOC")]}),
             ("word1 word2 word3 word4",{"entities":[]})]

precision, recall, fscore = evaluate(sys_data,gold_data)
assert(precision == 1.0/3 * 100)
assert(recall    == 1.0/2 * 100)
assert(fscore    == 2*1.0/3*1.0/2 / (1.0/3 + 1.0/2) * 100)

## Training a NER model on the Danish data

### Initializing the NER model


In [128]:
import spacy 
from spacy.util import minibatch, compounding
from random import shuffle, seed
import numpy as np
import torch

def init_model(spacy_train_data, language):
    model = spacy.blank(language)

    seed(0)
    np.random.seed(0)
    spacy.util.fix_random_seed(0)
    torch.manual_seed(0)
    
    if "ner" not in model.pipe_names:
        ner = model.create_pipe("ner")
        model.add_pipe(ner, last=True)
    else:
        ner = model.get_pipe("ner")
    
    for _, annotations in spacy_train_data:
        for ent in annotations.get("entities"):
            ner.add_label(ent[2])

    # Make sure we're only training the NER component of the pipeline
    pipe_exceptions = ["ner"]
    other_pipes = [pipe for pipe in model.pipe_names if pipe not in pipe_exceptions]

    # Start training so that we can use the model to annotate data
    model.disable_pipes(*other_pipes)
    optimizer = model.begin_training()
    return model, optimizer

danish_untrained_model, _ = init_model(danish_spacy_train,"da")

### Annotating the development set


In [122]:
def annotate(spacy_data, model):
    result = []
    
    for sent in spacy_data:
        entities_dict = {'entities': []}
        annotated = model(sent[0])
        ents = annotated.ents
        if ents:
            for ent in ents:
                entities_dict['entities'].append([ent.start_char, ent.end_char, ent.label_])
        
        result.append((sent[0], entities_dict))    
    return result


### Training the NER model


In [126]:
from copy import deepcopy
import random

def train(spacy_train_data, spacy_dev_data, epochs,language):
    # Initialize model and get optimizer
    model, optimizer = init_model(spacy_train_data,language)
    
    # Make sure we don't permute the original training data.
    spacy_train_data = deepcopy(spacy_train_data)
    
    for itn in range(epochs):
        losses = {}
        
        random.shuffle(spacy_train_data)
        batches = minibatch(spacy_train_data, size=compounding(4.0, 32.0, 1.001))
        for batch in batches:
            texts, annotations = zip(*batch)
            model.update(
                texts,  # batch of texts
                annotations,  # batch of annotations
                drop=0.1,  # dropout - make it harder to memorise data
                losses=losses,
            )
           
        # Evaluate model
        print("Loss for epoch %u: %.4f" % (itn+1, losses["ner"]))
        spacy_dev_sys = annotate(spacy_dev_data, model)
        p, r, f = evaluate(spacy_dev_sys,spacy_dev_data)
        print("  PRECISION: %.2f%%, RECALL: %.2f%%, F-SCORE: %.2f%%" % (p,r,f))
    return model

Train a NER model on the Danish training data for 20 epochs.

In [127]:
danish_model = train(danish_spacy_train,danish_spacy_dev,20,"da")
print()
print("Evaluating model on development set:")
danish_spacy_dev_sys = annotate(danish_spacy_dev, danish_model)
p, r, f = evaluate(danish_spacy_dev_sys,danish_spacy_dev)
print("  PRECISION: %.2f%%, RECALL: %.2f%%, F-SCORE: %.2f%%" % (p,r,f))

Loss for epoch 1: 678.1348
  PRECISION: 27.27%, RECALL: 20.75%, F-SCORE: 23.57%
Loss for epoch 2: 210.5326
  PRECISION: 31.90%, RECALL: 29.97%, F-SCORE: 30.91%
Loss for epoch 3: 125.7944
  PRECISION: 51.61%, RECALL: 41.50%, F-SCORE: 46.01%
Loss for epoch 4: 81.3938
  PRECISION: 45.30%, RECALL: 47.26%, F-SCORE: 46.26%
Loss for epoch 5: 59.5731
  PRECISION: 44.92%, RECALL: 39.48%, F-SCORE: 42.02%
Loss for epoch 6: 43.0145
  PRECISION: 40.79%, RECALL: 41.50%, F-SCORE: 41.14%
Loss for epoch 7: 46.8049
  PRECISION: 40.58%, RECALL: 40.35%, F-SCORE: 40.46%
Loss for epoch 8: 30.4313
  PRECISION: 42.06%, RECALL: 43.52%, F-SCORE: 42.78%
Loss for epoch 9: 24.2765
  PRECISION: 40.91%, RECALL: 41.50%, F-SCORE: 41.20%
Loss for epoch 10: 15.3588
  PRECISION: 45.86%, RECALL: 44.67%, F-SCORE: 45.26%
Loss for epoch 11: 13.2599
  PRECISION: 49.43%, RECALL: 49.57%, F-SCORE: 49.50%
Loss for epoch 12: 8.2948
  PRECISION: 41.99%, RECALL: 46.11%, F-SCORE: 43.96%
Loss for epoch 13: 13.0894
  PRECISION: 42.64%,

### Pocket learning

Re-write training algorithm to ["pocket"](https://en.wikipedia.org/wiki/Perceptron#Variants) the best model so far. 


In [166]:
from copy import deepcopy

def train(spacy_train_data, spacy_dev_data, epochs,language):
    # Initialize model and get optimizer
    model, optimizer = init_model(spacy_train_data,language)
    
    # Make sure we don't permute the original training data.
    spacy_train_data = deepcopy(spacy_train_data)
    
    best_model = None
    current_fscore = 0
    
    
    for itn in range(epochs):
        losses = {}

        random.shuffle(spacy_train_data)
        batches = minibatch(spacy_train_data, size=compounding(4.0, 32.0, 1.001))
        for batch in batches:
            texts, annotations = zip(*batch)
            model.update(
                texts,  # batch of texts
                annotations,  # batch of annotations
                drop=0.1,  # dropout - make it harder to memorise data
                losses=losses,
            )

        print("Loss for epoch %u: %.4f" % (itn+1, losses["ner"]))
        spacy_dev_sys = annotate(spacy_dev_data, model)
        p, r, f = evaluate(spacy_dev_sys,spacy_dev_data)
        print("  PRECISION: %.2f%%, RECALL: %.2f%%, F-SCORE: %.2f%%" % (p,r,f))
        if f > current_fscore:
            current_fscore = f
            best_model = deepcopy(model)
    
    return best_model

Re-train the Danish model for 20 epochs.

In [167]:
danish_model = train(danish_spacy_train,danish_spacy_dev,20,"da")
print()
print("Evaluating model on development set:")
danish_spacy_dev_sys = annotate(danish_spacy_dev, danish_model)
p, r, f = evaluate(danish_spacy_dev_sys,danish_spacy_dev)
print("  PRECISION: %.2f%%, RECALL: %.2f%%, F-SCORE: %.2f%%" % (p,r,f))

  "__main__", mod_spec)
  "__main__", mod_spec)


Loss for epoch 1: 677.7490
  PRECISION: 23.47%, RECALL: 6.63%, F-SCORE: 10.34%
Loss for epoch 2: 225.6457
  PRECISION: 32.17%, RECALL: 31.99%, F-SCORE: 32.08%
Loss for epoch 3: 115.5140
  PRECISION: 53.77%, RECALL: 45.24%, F-SCORE: 49.14%
Loss for epoch 4: 84.2517
  PRECISION: 49.27%, RECALL: 48.70%, F-SCORE: 48.99%
Loss for epoch 5: 69.0883
  PRECISION: 48.68%, RECALL: 53.31%, F-SCORE: 50.89%
Loss for epoch 6: 52.4823
  PRECISION: 52.07%, RECALL: 50.72%, F-SCORE: 51.39%
Loss for epoch 7: 28.4619
  PRECISION: 56.39%, RECALL: 52.16%, F-SCORE: 54.19%
Loss for epoch 8: 26.9459
  PRECISION: 49.72%, RECALL: 51.01%, F-SCORE: 50.36%
Loss for epoch 9: 23.1341
  PRECISION: 52.23%, RECALL: 53.89%, F-SCORE: 53.05%
Loss for epoch 10: 13.9116
  PRECISION: 51.46%, RECALL: 55.91%, F-SCORE: 53.59%
Loss for epoch 11: 14.2523
  PRECISION: 54.43%, RECALL: 51.30%, F-SCORE: 52.82%
Loss for epoch 12: 4.4521
  PRECISION: 52.99%, RECALL: 51.01%, F-SCORE: 51.98%
Loss for epoch 13: 3.0246
  PRECISION: 50.79%, R

### Adding pretrained bilingual embeddings

[spaCy documentation](https://spacy.io/api/vocab#from_disk)

In [131]:
def init_model(spacy_train_data, language):
    model = spacy.blank(language)

    seed(0)
    np.random.seed(0)
    spacy.util.fix_random_seed(0)
    torch.manual_seed(0)
    
    if "ner" not in model.pipe_names:
        ner = model.create_pipe("ner")
        model.add_pipe(ner, last=True)
    else:
        ner = model.get_pipe("ner")
    
    for _, annotations in spacy_train_data:
        for ent in annotations.get("entities"):
            ner.add_label(ent[2])
    
    model.vocab.from_disk("data/vocab")

    # Make sure we're only training the NER component of the pipeline
    pipe_exceptions = ["ner"]
    other_pipes = [pipe for pipe in model.pipe_names if pipe not in pipe_exceptions]

    # Start training so that we can use the model to annotate data
    model.disable_pipes(*other_pipes)
    optimizer = model.begin_training()

    return model, optimizer

In [132]:
danish_model = train(danish_spacy_train,danish_spacy_dev,20,"da")
print()
print("Evaluating model on development set:")
danish_spacy_dev_sys = annotate(danish_spacy_dev, danish_model)
p, r, f = evaluate(danish_spacy_dev_sys,danish_spacy_dev)
print("  PRECISION: %.2f%%, RECALL: %.2f%%, F-SCORE: %.2f%%" % (p,r,f))

  "__main__", mod_spec)
  "__main__", mod_spec)


Loss for epoch 1: 652.7799
  PRECISION: 28.57%, RECALL: 22.48%, F-SCORE: 25.16%
Loss for epoch 2: 183.7652
  PRECISION: 32.90%, RECALL: 29.11%, F-SCORE: 30.89%
Loss for epoch 3: 127.5094
  PRECISION: 47.60%, RECALL: 45.82%, F-SCORE: 46.70%
Loss for epoch 4: 86.4522
  PRECISION: 41.41%, RECALL: 50.72%, F-SCORE: 45.60%
Loss for epoch 5: 84.4125
  PRECISION: 51.46%, RECALL: 50.72%, F-SCORE: 51.09%
Loss for epoch 6: 50.0252
  PRECISION: 53.95%, RECALL: 47.26%, F-SCORE: 50.38%
Loss for epoch 7: 35.0164
  PRECISION: 53.17%, RECALL: 55.62%, F-SCORE: 54.37%
Loss for epoch 8: 30.1722
  PRECISION: 48.25%, RECALL: 47.55%, F-SCORE: 47.90%
Loss for epoch 9: 26.7410
  PRECISION: 56.29%, RECALL: 51.59%, F-SCORE: 53.83%
Loss for epoch 10: 13.7209
  PRECISION: 52.94%, RECALL: 49.28%, F-SCORE: 51.04%
Loss for epoch 11: 12.2926
  PRECISION: 56.33%, RECALL: 53.89%, F-SCORE: 55.08%
Loss for epoch 12: 13.2249
  PRECISION: 54.50%, RECALL: 57.64%, F-SCORE: 56.02%
Loss for epoch 13: 6.0137
  PRECISION: 50.89%,

## Training an English NER model and fine-tuning on Danish data

Train an English NER system on the [CoNLL 2003 dataset](https://www.aclweb.org/anthology/W03-0419.pdf).

In [148]:
english_train = read_data(open(path.join("data","english-train.conll")))
english_dev = read_data(open(path.join("data","english-dev.conll")))

english_spacy_train = get_spacy_ner_data(english_train)
english_spacy_dev = get_spacy_ner_data(english_dev)

english_model = train(english_spacy_train,english_spacy_dev,5,"en")

  "__main__", mod_spec)
  "__main__", mod_spec)


Loss for epoch 1: 11086.7898
  PRECISION: 88.80%, RECALL: 88.10%, F-SCORE: 88.45%
Loss for epoch 2: 6077.8644
  PRECISION: 89.70%, RECALL: 89.58%, F-SCORE: 89.64%
Loss for epoch 3: 4595.8670
  PRECISION: 89.50%, RECALL: 89.70%, F-SCORE: 89.60%
Loss for epoch 4: 3642.7214
  PRECISION: 90.34%, RECALL: 89.99%, F-SCORE: 90.16%
Loss for epoch 5: 2964.1685
  PRECISION: 90.76%, RECALL: 90.41%, F-SCORE: 90.58%


### Fine-tuning the model on Danish data



In [149]:
from copy import deepcopy

def retrain(spacy_train_data, spacy_dev_data, epochs,model,pretrained_fn):
    # Make sure we don't permute the original training data.
    spacy_train_data = deepcopy(spacy_train_data)
    
    model = deepcopy(model)
    
    for itn in range(epochs):
        losses = {}

        random.shuffle(spacy_train_data)
        batches = minibatch(spacy_train_data, size=compounding(4.0, 32.0, 1.001))
        for batch in batches:
            texts, annotations = zip(*batch)
            model.update(
                texts,  # batch of texts
                annotations,  # batch of annotations
                drop=0.1,  # dropout - make it harder to memorise data
                losses=losses,
            )

        
        # Evaluate model
        print("Loss for epoch %u: %.4f" % (itn+1, losses["ner"]))
        spacy_dev_sys = annotate(spacy_dev_data, model)
        p, r, f = evaluate(spacy_dev_sys,spacy_dev_data)
        print("  PRECISION: %.2f%%, RECALL: %.2f%%, F-SCORE: %.2f%%" % (p,r,f))
    return model

transfer_model = retrain(danish_spacy_train, danish_spacy_dev, 20,english_model,"data/vocab")

Loss for epoch 1: 224.5756
  PRECISION: 58.59%, RECALL: 59.94%, F-SCORE: 59.26%
Loss for epoch 2: 74.6681
  PRECISION: 61.94%, RECALL: 64.27%, F-SCORE: 63.08%
Loss for epoch 3: 38.5355
  PRECISION: 60.38%, RECALL: 63.69%, F-SCORE: 61.99%
Loss for epoch 4: 11.4595
  PRECISION: 58.27%, RECALL: 61.96%, F-SCORE: 60.06%
Loss for epoch 5: 4.2907
  PRECISION: 59.62%, RECALL: 62.54%, F-SCORE: 61.04%
Loss for epoch 6: 1.7737
  PRECISION: 60.11%, RECALL: 61.67%, F-SCORE: 60.88%
Loss for epoch 7: 1.3902
  PRECISION: 59.34%, RECALL: 62.25%, F-SCORE: 60.76%
Loss for epoch 8: 0.0069
  PRECISION: 61.03%, RECALL: 61.38%, F-SCORE: 61.21%
Loss for epoch 9: 0.0001
  PRECISION: 61.82%, RECALL: 62.54%, F-SCORE: 62.18%
Loss for epoch 10: 0.0018
  PRECISION: 61.05%, RECALL: 63.69%, F-SCORE: 62.34%
Loss for epoch 11: 0.0003
  PRECISION: 62.25%, RECALL: 63.69%, F-SCORE: 62.96%
Loss for epoch 12: 0.0077
  PRECISION: 61.94%, RECALL: 64.27%, F-SCORE: 63.08%
Loss for epoch 13: 0.0007
  PRECISION: 61.43%, RECALL: 6

In [150]:
print("Evaluating basic Danish model on test set:")
danish_spacy_test_sys_basic = annotate(danish_spacy_test, danish_model)
p, r, f = evaluate(danish_spacy_test_sys_basic,danish_spacy_test)
print("  PRECISION: %.2f%%, RECALL: %.2f%%, F-SCORE: %.2f%%" % (p,r,f))
print()

print("Evaluating basic transfer model on test set:")
danish_spacy_test_sys_transfer = annotate(danish_spacy_test, transfer_model)
p, r, f = evaluate(danish_spacy_test_sys_transfer,danish_spacy_test)
print("  PRECISION: %.2f%%, RECALL: %.2f%%, F-SCORE: %.2f%%" % (p,r,f))

Evaluating basic Danish model on test set:
  PRECISION: 58.67%, RECALL: 52.05%, F-SCORE: 55.16%

Evaluating basic transfer model on test set:
  PRECISION: 62.37%, RECALL: 63.33%, F-SCORE: 62.85%


### Analyzing the results

* Does the transfer model get better at identifying names which are similar in the Danish and English NER data?
* Deos the transfer model identify more purely Danish names which are not found by the basic model? 

In [171]:

for i in range(len(danish_spacy_test_sys_basic)):
    basic_entities = danish_spacy_test_sys_basic[i][1]['entities']
    transfer_entities = danish_spacy_test_sys_transfer[i][1]['entities']
    gold_entities = danish_spacy_test[i][1]['entities']
    
    if len(basic_entities) == len(transfer_entities) and len(basic_entities) == len(gold_entities) and len(basic_entities) != 0:
        for basic, transfer, gold in zip(basic_entities, transfer_entities, gold_entities):
            if basic != transfer:
            
                print("Basic:")
                print(danish_spacy_test_sys_basic[i][0][basic[0]:basic[1]], "|", basic[2])
                print("Transfer:")
                print(danish_spacy_test_sys_transfer[i][0][transfer[0]:transfer[1]],"|",  transfer[2])
                print("Gold:")
                print(danish_spacy_test[i][0][gold[0]:gold[1]], "|", gold[2])
                print("-----")
        
        

Basic:
Rusland | ORG
Transfer:
Rusland | LOC
Gold:
Rusland | LOC
-----
Basic:
Ruslands | ORG
Transfer:
Ruslands | MISC
Gold:
Ruslands | LOC
-----
Basic:
Skjern | LOC
Transfer:
Skjern | PER
Gold:
Skjern | LOC
-----
Basic:
Skjerns | ORG
Transfer:
Skjerns | MISC
Gold:
Skjerns | LOC
-----
Basic:
Max | ORG
Transfer:
Max | PER
Gold:
Max | PER
-----
Basic:
Henning Dyremose | PER
Transfer:
Finansmininister Henning Dyremose | PER
Gold:
Henning Dyremose | PER
-----
Basic:
Europa | LOC
Transfer:
Europa Cup | MISC
Gold:
Europa Cup | MISC
-----
Basic:
Atansa Stokus | MISC
Transfer:
Atansa Stokus | PER
Gold:
Atansa Stokus | PER
-----
Basic:
Eduardas Potashinskas | MISC
Transfer:
Eduardas Potashinskas | PER
Gold:
Eduardas Potashinskas | PER
-----
Basic:
Københavns Kommune | PER
Transfer:
Københavns | MISC
Gold:
Københavns Kommune | ORG
-----
Basic:
Brøndbys | LOC
Transfer:
Brøndbys | MISC
Gold:
Brøndbys | ORG
-----
Basic:
Lyngby | LOC
Transfer:
Lyngby | ORG
Gold:
Lyngby | ORG
-----
Basic:
Steen Uno |

Discussion:

By inspecting the examples printed above, the transfer model do have better predictions on the name entities that are similar to English words. Examples below are the names that transfer model predicts correctly but the basic model gets wrong:
- Ruslands: it is similar to "Rus-" and "lands" in English, which means "Russian lands"
- Europa Cup: these are the same word forms as in English
- Eddie: the same as the English name "Eddie"
- Panasonic: the same as the English brand name

In these cases, the transfer model could predict the labels more accurately. However, if the name entities are purely Danish, the transfer model does not necessarily perform better than the basic model. Examples below are the names that transfer model predicts wrong but the basic model gets correctly:

- Skjern: this is the name of a Denmark town, which is purely Danish.
- Ejby: this is again the name of a Denmark town.
- Schlüter: this is a name in purely Danish

Which means, the transfer model is able to predict the name entities that contains English elements/structures, but for entities in pure Danish, it may not be able to transfer the "knowledges" from English to apply for it. However, through my inspection, I can see that lots of Danish name entities actually contain English elements, which could be the reason that the transfer model overall performs better than the basic model. 