# chunker: default program

In [None]:
from default import *
import os

## Run the default solution on dev

In [None]:
chunker = LSTMTagger(os.path.join('data', 'train.txt.gz'), os.path.join('data', 'chunker'), '.tar')
decoder_output = chunker.decode('data/input/dev.txt')

100%|██████████| 1027/1027 [00:02<00:00, 459.66it/s]


```
processed 23663 tokens with 11896 phrases; found: 11672 phrases; correct: 8568.
accuracy:  84.35%; (non-O)
accuracy:  85.65%; precision:  73.41%; recall:  72.02%; FB1:  72.71
             ADJP: precision:  36.49%; recall:  11.95%; FB1:  18.00  74
             ADVP: precision:  71.36%; recall:  39.45%; FB1:  50.81  220
            CONJP: precision:   0.00%; recall:   0.00%; FB1:   0.00  0
             INTJ: precision:   0.00%; recall:   0.00%; FB1:   0.00  0
               NP: precision:  70.33%; recall:  76.80%; FB1:  73.42  6811
               PP: precision:  92.40%; recall:  87.14%; FB1:  89.69  2302
              PRT: precision:  65.00%; recall:  57.78%; FB1:  61.18  40
             SBAR: precision:  84.62%; recall:  41.77%; FB1:  55.93  117
               VP: precision:  63.66%; recall:  58.25%; FB1:  60.83  2108
(73.40644276901988, 72.02420981842637, 72.70875763747455)
```

## Documentation

The biggest change we made was through the semi-character RNN. As instructed for the baseline solution, we implemented this model to deal with noisy inputs. character_level_representation() is the baseline solution which simply creates 3 100 dimensional vectors. The first vector encodes the first character, the last vector encodes the last character, and the 3rd vector stores the character counts of all the other characters in between. Our second experimental implementation called character_level_representation_v2() was an extension of that work. In this function, we are extending that idea to encode the second, and second-to-last characters in their own vectors.

Both of these functions also implement an idea that was in the "Combating Adversarial Misspellings with Robust Word Recognition" paper. In this paper, the authors suggest various backoff methods such as passing through the word, backing off to a neutral word, or backing off to a neutral model. We decided to implement the backoff to a neutral word model, and we chose the backoff word as "a". We hope that this will make the model more robust to the misspellings in the test set. We also normalize the internal character counts.

Note that we needed to implement some other small changes in the codebase to have these functions work. This meant that in the training function we created an encoded tensor and passed this into the forward function. In the forward function, this was concatenated to the embedding vector. Although single line changes, we are noting these here for your reference.

In [None]:
def character_level_representation(sentence):
    """
    Encoding first and last characters and internal counts.
    """
    
    first_characters = torch.zeros((len(sentence), len(string.printable)))
    last_characters = torch.zeros((len(sentence), len(string.printable)))
    other_characters = torch.zeros((len(sentence), len(string.printable)))

    for i, word in enumerate(sentence):
        if word == '[UNK]':
            word = 'a'
        
        # First and last characters
        first_characters[i][string.printable.find(word[0])] = 1
        last_characters[i][string.printable.find(word[-1])] = 1

        # Non Edge Characters
        for j in range(1, len(word)-1):
            other_characters[i][string.printable.find(word[j])] += 1

            # Normalize Internal Character Vector
            tmax = torch.max(other_characters[i])
            tmin = torch.min(other_characters[i])
            other_characters[i] = (other_characters[i]-tmin) / (tmax - tmin)
            
    return torch.cat([first_characters, other_characters, last_characters], dim=1)

def character_level_representation_v2(sentence):
    """
    Encoding first, second, second last and last characters, and internal counts.
    """
    
    first_characters = torch.zeros((len(sentence), len(string.printable)))
    second_characters = torch.zeros((len(sentence), len(string.printable)))
    last_characters = torch.zeros((len(sentence), len(string.printable)))
    second_last_characters = torch.zeros((len(sentence), len(string.printable)))
    other_characters = torch.zeros((len(sentence), len(string.printable)))

    for i, word in enumerate(sentence):
        if word == '[UNK]':
            word = 'a'
        
        # First and last characters
        first_characters[i][string.printable.find(word[0])] = 1
        last_characters[i][string.printable.find(word[-1])] = 1

        if len(word) > 3:
            second_characters[i][string.printable.find(word[1])] = 1
            second_last_characters[i][string.printable.find(word[-2])] = 1

        # Non Edge Characters
        for j in range(2, len(word)-2):
            other_characters[i][string.printable.find(word[j])] += 1

            # Normalize Internal Character Vector
            tmax = torch.max(other_characters[i])
            tmin = torch.min(other_characters[i])
            other_characters[i] = (other_characters[i]-tmin) / (tmax - tmin)
            
    return torch.cat([first_characters, second_characters, other_characters, second_last_characters, last_characters], dim=1)

## Analysis

The first iteration of the model was the default code. This resulted in the following scores. One of the notable places where this model underperformed was on the `ADJP` tags. 

```
processed 23663 tokens with 11896 phrases; found: 11672 phrases; correct: 8568.
accuracy:  84.35%; (non-O)
accuracy:  85.65%; precision:  73.41%; recall:  72.02%; FB1:  72.71
             ADJP: precision:  36.49%; recall:  11.95%; FB1:  18.00  74
             ADVP: precision:  71.36%; recall:  39.45%; FB1:  50.81  220
            CONJP: precision:   0.00%; recall:   0.00%; FB1:   0.00  0
             INTJ: precision:   0.00%; recall:   0.00%; FB1:   0.00  0
               NP: precision:  70.33%; recall:  76.80%; FB1:  73.42  6811
               PP: precision:  92.40%; recall:  87.14%; FB1:  89.69  2302
              PRT: precision:  65.00%; recall:  57.78%; FB1:  61.18  40
             SBAR: precision:  84.62%; recall:  41.77%; FB1:  55.93  117
               VP: precision:  63.66%; recall:  58.25%; FB1:  60.83  2108
(73.40644276901988, 72.02420981842637, 72.70875763747455)
```

The next iteration of our model was the baseline solution that implemented a semi-character RNN to deal with noisy inputs. This was denoted in the character_level_representation() function above. One important thing to note is that we initially did not not normalize the internal character count, but normalizing this resulting in a small gain on the FB1 score. We can see that our correct count went from 8568 to 9270. We can also see that the FB1 score increased by almost 5 points. This iteration of the model actually ended up being our highest-scoring solution, and we have selected this as our final submission. One interesting point is that this model scored worse on the PP and SBAR tags than the default model.

```
processed 23663 tokens with 11896 phrases; found: 12141 phrases; correct: 9270.
accuracy:  86.68%; (non-O)
accuracy:  87.87%; precision:  76.35%; recall:  77.93%; FB1:  77.13
             ADJP: precision:  46.00%; recall:  20.35%; FB1:  28.22  100
             ADVP: precision:  66.03%; recall:  43.47%; FB1:  52.42  262
            CONJP: precision:   0.00%; recall:   0.00%; FB1:   0.00  0
             INTJ: precision:   0.00%; recall:   0.00%; FB1:   0.00  0
               NP: precision:  75.81%; recall:  81.64%; FB1:  78.62  6717
               PP: precision:  91.30%; recall:  87.26%; FB1:  89.23  2333
              PRT: precision:  61.36%; recall:  60.00%; FB1:  60.67  44
             SBAR: precision:  82.44%; recall:  45.57%; FB1:  58.70  131
               VP: precision:  66.33%; recall:  73.52%; FB1:  69.74  2554
(76.35285396590066, 77.92535305985206, 77.13108957024588)
```

The final and best model iteration was a last-minute improvement. After speaking with the professor in class, we found out that we are able to tune the biderctional parameter. We expected this to outperform our previous iterations, and it did not disappoint. The FB1 score improved by over 3 points. We hypthesize that this is because the model is able to use information from two directions rather than just one, and as a result is more accurate.

```
processed 23663 tokens with 11896 phrases; found: 12036 phrases; correct: 9609.
accuracy:  88.63%; (non-O)
accuracy:  89.42%; precision:  79.84%; recall:  80.78%; FB1:  80.30
             ADJP: precision:  49.00%; recall:  21.68%; FB1:  30.06
             ADVP: precision:  69.23%; recall:  47.49%; FB1:  56.33
            CONJP: precision:   0.00%; recall:   0.00%; FB1:   0.00
             INTJ: precision:   0.00%; recall:   0.00%; FB1:   0.00
               NP: precision:  79.48%; recall:  83.21%; FB1:  81.30
               PP: precision:  93.81%; recall:  90.58%; FB1:  92.16
              PRT: precision:  71.43%; recall:  55.56%; FB1:  62.50
             SBAR: precision:  83.33%; recall:  56.96%; FB1:  67.67
               VP: precision:  70.18%; recall:  78.56%; FB1:  74.13
(79.83549351944168, 80.77505043712172, 80.30252381748286)
```