# Recognize named entities on Twitter with LSTMs

In this assignment, you will use a recurrent neural network to solve Named Entity Recognition (NER) problem. NER is a common task in natural language processing systems. It serves for extraction such entities from the text as persons, organizations, locations, etc. In this task you will experiment to recognize named entities from Twitter.

For example, we want to extract persons' and organizations' names from the text. Than for the input text:

    Ian Goodfellow works for Google Brain

a NER model needs to provide the following sequence of tags:

    B-PER I-PER    O     O   B-ORG  I-ORG

Where *B-* and *I-* prefixes stand for the beginning and inside of the entity, while *O* stands for out of tag or no tag. Markup with the prefix scheme is called *BIO markup*. This markup is introduced for distinguishing of consequent entities with similar types.

A solution of the task will be based on neural networks, particularly, on Bi-Directional Long Short-Term Memory Networks (Bi-LSTMs).

### Libraries

For this task you will need the following libraries:
 - [Tensorflow](https://www.tensorflow.org) — an open-source software library for Machine Intelligence.
 
In this assignment, we use Tensorflow 1.15.0. You can install it with pip:

    !pip install tensorflow==1.15.0
     
 - [Numpy](http://www.numpy.org) — a package for scientific computing.
 
If you have never worked with Tensorflow, you would probably need to read some tutorials during your work on this assignment, e.g. [this one](https://www.tensorflow.org/tutorials/recurrent) could be a good starting point. 

### Data

The following cell will download all data required for this assignment into the folder `week2/data`.

In [1]:
try:
    import google.colab
    IN_COLAB = True
except:
    IN_COLAB = False

if IN_COLAB:
    ! wget https://raw.githubusercontent.com/hse-aml/natural-language-processing/master/setup_google_colab.py -O setup_google_colab.py
    import setup_google_colab
    setup_google_colab.setup_week2()

import sys
sys.path.append("..")
from common.download_utils import download_week2_resources

download_week2_resources()

File data/train.txt is already downloaded.
File data/validation.txt is already downloaded.
File data/test.txt is already downloaded.


### Load the Twitter Named Entity Recognition corpus

We will work with a corpus, which contains tweets with NE tags. Every line of a file contains a pair of a token (word/punctuation symbol) and a tag, separated by a whitespace. Different tweets are separated by an empty line.

The function *read_data* reads a corpus from the *file_path* and returns two lists: one with tokens and one with the corresponding tags. You need to complete this function by adding a code, which will replace a user's nickname to `<USR>` token and any URL to `<URL>` token. You could think that a URL and a nickname are just strings which start with *http://* or *https://* in case of URLs and a *@* symbol for nicknames.

In [2]:
import re
URL_REGEX = "(https?|ftp|file):\/\/[\-A-Za-z0-9+&@#\/%?=~_|!:,.;]*[\-A-Za-z0-9+&@#\/%=~_|]"
USER_REGEX = '@([A-Za-z0-9_]+)'

In [3]:
def replace_urls(token):
    return re. sub(URL_REGEX, '<URL>', token)

def replace_username(token):
    return re.sub(USER_REGEX, '<USR>', token)

In [4]:
assert replace_urls('http://t.co/eQmogqqABK this is a tweet with url') == '<URL> this is a tweet with url'

In [5]:
assert replace_username('this is a tweet from a user @espoir_mur') == 'this is a tweet from a user <USR>'

In [6]:
def read_data(file_path):
    tokens = []
    tags = []
    
    tweet_tokens = []
    tweet_tags = []
    for line in open(file_path, encoding='utf-8'):
        line = line.strip()
        if not line:
            if tweet_tokens:
                tokens.append(tweet_tokens)
                tags.append(tweet_tags)
            tweet_tokens = []
            tweet_tags = []
        else:
            token, tag = line.split()
            token = replace_urls(token)
            token = replace_username(token)
            tweet_tokens.append(token)
            tweet_tags.append(tag)
    return tokens, tags

And now we can load three separate parts of the dataset:
 - *train* data for training the model;
 - *validation* data for evaluation and hyperparameters tuning;
 - *test* data for final evaluation of the model.

In [7]:
train_tokens, train_tags = read_data('data/train.txt')
validation_tokens, validation_tags = read_data('data/validation.txt')
test_tokens, test_tags = read_data('data/test.txt')

You should always understand what kind of data you deal with. For this purpose, you can print the data running the following cell:

In [8]:
for i in range(3):
    for token, tag in zip(train_tokens[i], train_tags[i]):
        print('%s\t%s' % (token, tag))
    print()

RT	O
<USR>	O
:	O
Online	O
ticket	O
sales	O
for	O
Ghostland	B-musicartist
Observatory	I-musicartist
extended	O
until	O
6	O
PM	O
EST	O
due	O
to	O
high	O
demand	O
.	O
Get	O
them	O
before	O
they	O
sell	O
out	O
...	O

Apple	B-product
MacBook	I-product
Pro	I-product
A1278	I-product
13.3	I-product
"	I-product
Laptop	I-product
-	I-product
MD101LL/A	I-product
(	O
June	O
,	O
2012	O
)	O
-	O
Full	O
read	O
by	O
eBay	B-company
<URL>	O
<URL>	O

Happy	O
Birthday	O
<USR>	O
!	O
May	O
Allah	B-person
s.w.t	O
bless	O
you	O
with	O
goodness	O
and	O
happiness	O
.	O



### Prepare dictionaries

To train a neural network, we will use two mappings: 
- {token}$\to${token id}: address the row in embeddings matrix for the current token;
- {tag}$\to${tag id}: one-hot ground truth probability distribution vectors for computing the loss at the output of the network.

Now you need to implement the function *build_dict* which will return {token or tag}$\to${index} and vice versa. 

In [9]:
from collections import defaultdict

In [12]:
def build_dict(tokens_or_tags, special_tokens):
    """
        tokens_or_tags: a list of lists of tokens or tags
        special_tokens: some special tokens
    """
    # Create a dictionary with default value 0
    token_to_index = defaultdict(lambda: 0)
    index_to_token = list()
    index_to_token_set = set()
    
    
    # Create mappings from tokens (or tags) to indices and vice versa.
    # At first, add special tokens (or tags) to the dictionaries.
    # The first special token must have index 0.
    
    # Mapping tok2idx should contain each token or tag only once. 
    # To do so, you should:
    # 1. extract unique tokens/tags from the tokens_or_tags variable, which is not
    #    occur in special_tokens (because they could have non-empty intersection)
    # 2. index them (for example, you can add them into the list idx2tok
    # 3. for each token/tag save the index into tok2idx).
    
    for index, special_token in enumerate(special_tokens):
        index_to_token_set.add(special_token)
        index_to_token.append(special_token)
        
    for sentence_index, token_or_tag in enumerate(tokens_or_tags):
        for word_index, word_or_tag in enumerate(token_or_tag, start=len(special_tokens)):
            if word_or_tag not in index_to_token_set:
                index_to_token_set.add(word_or_tag)
                index_to_token.append(word_or_tag)
    token_to_index  = {token : index for index, token in enumerate(index_to_token)}
    return token_to_index, index_to_token

In [13]:
actual_token_to_index, actual_index_to_token = build_dict([['espoir', 'is', 'working ', 'on', 'an', 'nlp', 'exercice']] , ['<UNK>', '<PAD>'])

expected_token_to_index, expected_index_to_token  = {'<UNK>': 0, 
                                                     '<PAD>': 1, 
                                                     'espoir': 2, 
                                                     'is': 3, 
                                                     'working ': 4, 
                                                     'on': 5, 
                                                     'an': 6, 
                                                     'nlp': 7, 
                                                     'exercice': 8},['<UNK>', '<PAD>', 'espoir', 'is', 'working ', 'on', 'an', 'nlp', 'exercice']
assert actual_index_to_token == expected_index_to_token
assert actual_token_to_index == expected_token_to_index

After implementing the function *build_dict* you can make dictionaries for tokens and tags. Special tokens in our case will be:
 - `<UNK>` token for out of vocabulary tokens;
 - `<PAD>` token for padding sentence to the same length when we create batches of sentences.

In [14]:
special_tokens = ['<UNK>', '<PAD>']
special_tags = ['O']

# Create dictionaries 
token_to_index, index_to_token = build_dict(train_tokens + validation_tokens, special_tokens)
tag_to_index, index_to_tag = build_dict(train_tags, special_tags)

In [15]:
assert token_to_index['<UNK>'] == 0 
assert index_to_token[0] == '<UNK>'
assert list(tag_to_index.keys()) == list(index_to_tag)
for index in tag_to_index.values():
    assert index in range(0, len(index_to_tag))

In [16]:
def words_to_indexes(tokens_list):
    return [token_to_index.get(word) for word in tokens_list]

def tags_to_indexes(tags_list):
    return [tag_to_index.get(tag) for tag in tags_list]

def indexes_to_tokens(indexes):
    return [index_to_token[index] for index in indexes]

def indexes_to_tags(indexes):
    return [index_to_tag[index] for index in indexes]

### Generate batches

Neural Networks are usually trained with batches. It means that weight updates of the network are based on several sequences at every single time. The tricky part is that all sequences within a batch need to have the same length. So we will pad them with a special `<PAD>` token. It is also a good practice to provide RNN with sequence lengths, so it can skip computations for padding parts. We provide the batching function *batches_generator* readily available for you to save time. 

In [17]:
import numpy as np

In [18]:
def batches_generator(batch_size, tokens, tags,
                      shuffle=True, allow_smaller_last_batch=True):
    """Generates padded batches of tokens and tags."""
    
    n_samples = len(tokens)
    if shuffle:
        order = np.random.permutation(n_samples)
    else:
        order = np.arange(n_samples)

    n_batches = n_samples // batch_size
    if allow_smaller_last_batch and n_samples % batch_size:
        n_batches += 1

    for k in range(n_batches):
        batch_start = k * batch_size
        batch_end = min((k + 1) * batch_size, n_samples)
        current_batch_size = batch_end - batch_start
        x_list = []
        y_list = []
        max_len_token = 0
        for index in order[batch_start: batch_end]:
            #print(f'the token I am getting are {tokens[index]}')
            x_list.append(words_to_indexes(tokens[index]))
            y_list.append(tags_to_indexes(tags[index]))
            #print(f'the tags am getting are {tags[index]}')
            max_len_token = max(max_len_token, len(tags[index]))
            
        # Fill in the data into numpy nd-arrays filled with padding indices.
        x = np.ones([current_batch_size, max_len_token], dtype=np.int32) * token_to_index['<PAD>']
        y = np.ones([current_batch_size, max_len_token], dtype=np.int32) * tag_to_index['O']
        lengths = np.zeros(current_batch_size, dtype=np.int32)
        for n in range(current_batch_size):
            utt_len = len(x_list[n])
            x[n, :utt_len] = x_list[n]
            lengths[n] = utt_len
            y[n, :utt_len] = y_list[n]
        #print(f'for the following tokens  {tokens[index]}, the tags am getting are {tags[index]}, and their indexes are, {tags_to_indexes(tags[index])}')
        yield x, y, lengths

some checking;.. 

In [19]:
for x_batch, y_batch, lengths in batches_generator(1,  train_tokens , train_tags,): 
    for tokens, tags in zip(x_batch, y_batch):
        print(f"The sentence is : {' '.join(indexes_to_tokens(tokens))}") 
        print("***" * 5)
        print(f"The tags are {' '.join(indexes_to_tags(tags))}")
    print("===" * 4)

The sentence is : So happy moving into my new place on Tuesday !
***************
The tags are O O O O O O O O O O
The sentence is : 63,000 people affected by UCF #data #breach : [ welive# security.com ] The University of Central Florida ( UCF ) has… <URL>
***************
The tags are O O O O B-other O O O O O O O B-other I-other I-other I-other I-other O B-other O O O
The sentence is : Middlesex Hospital suffers patient data security breach - <URL> ( blog ) <URL> #datasecurity
***************
The tags are B-other I-other O O O O O O O O O O O O
The sentence is : RT <USR> : When the sun smiles , the birds smile back ! <URL>
***************
The tags are O O O O O O O O O O O O O O
The sentence is : you are not bringing tthe $ 10.000 a week for life on the 30th
***************
The tags are O O O O O O O O O O O O O O
The sentence is : RT <USR> : Our goal is 1,000 supporters ( of any pledge level ) by next Tuesday . Can you help ? <URL> #noenemies
***************
The tags are O O O O O O O

The tags are O O O O B-geo-loc O O O
The sentence is : linda <URL>
***************
The tags are B-person O
The sentence is : just whooped st . francis preps asssss . and i scored a goal :) and its friday . and i have no homeworkkk . SICK LIFEEEE
***************
The tags are O O O O O O O O O O O O O O O O O O O O O O O O O O
The sentence is : RT <USR> #TEAMBOMADE everywhere ! today from 7-10pm Golf Mixer at Trophy Club Gwinnett 3254 Clu ... <URL> &lt; -- MAP
***************
The tags are O O O O O O O O O O O B-facility I-facility I-facility O O O O O O O
The sentence is : <USR> I have chance with you ? webcam enters Sunday 7:00 pm ? please ? to talk right ! kisses on the chin
***************
The tags are O O O O O O O O O O O O O O O O O O O O O O O
The sentence is : tos <URL> July 02 , 2015 at 12:10 PM
***************
The tags are O O O O O O O O O
The sentence is : AI : DEMAND RELEASE FOR SYRIAN BLOGGER : 17-9-2010Tal al-Mallohi , a 19-year-old Syrian woman , has been held incommun .

***************
The tags are O B-company O O O O O O B-company O O B-product O O O B-company O
The sentence is : <USR> <USR> 23rd
***************
The tags are O O O
The sentence is : Looking for something to do tonight ? Samuel Savoirfaire Williams at Potbelly Lincoln Park in Chicago , IL <URL>
***************
The tags are O O O O O O O B-person I-person I-person O B-facility I-facility I-facility O B-geo-loc O B-geo-loc O
The sentence is : RT <USR> : Who 's birthday is it tomorrow ? <URL>
***************
The tags are O O O O O O O O O O O
The sentence is : nulinhos " by ISABEL KERSHNER via NYT <URL> August 17 , 2015 at 01:00 AM
***************
The tags are O O O B-person I-person O B-company O O O O O O O O
The sentence is : Got my phone taken up today . Guess why . <USR> . That 's why .
***************
The tags are O O O O O O O O O O O O O O O O
The sentence is : Tonight is Theeeeeeeeeee niiiiiiiiiight ;) . *Gwen Stefani voice* :] .
***************
The tags are O O O O O O B-person 

The sentence is : Look out for me on VATSpy as Conn Taggart so you may get a guess on what the next pics will be ! ( 2/2 )
***************
The tags are O O O O O B-product O B-person I-person O O O O O O O O O O O O O O O O O
The sentence is : <USR> <USR> Well . just beg <USR> .. PLEASE come to the #SBLeurope Next year !. We need you there ! *puss in boots eyes*
***************
The tags are O O O O O O O O O O O O O O O O O O O O O O O O O
The sentence is : RT <USR> : The case for deadline-triggered sanctions #Iran | <USR> <USR> <URL>
***************
The tags are O O O O O O O O O O O O O
The sentence is : NY Congressman Michael Grimm , who pleaded guilty to tax evasion , says he'll resign Jan . 5 : <URL>
***************
The tags are B-geo-loc O B-person I-person O O O O O O O O O O O O O O O O
The sentence is : RT <USR> : I love waking up in the middle of the night knowing i have more time to sleep . #teenthings #comedownwithlove
***************
The tags are O O O O O O O O O O O O O 

***************
The tags are O O O O O O O O O O O O O O O O O O
The sentence is : Jupiter : Closest Approach in Nearly 50 years . Catch it nxt week . It wnt B that big or bright again til 2022 <URL>
***************
The tags are B-geo-loc O O O O O O O O O O O O O O O O O O O O O O O O
The sentence is : Don't know if I should actually go to school tomorrow
***************
The tags are O O O O O O O O O O
The sentence is : Uhmm . Looks like I'm not leaving my house tonight !
***************
The tags are O O O O O O O O O O O
The sentence is : Rock your #Monday Tweeps ! :) <USR> <USR> <USR> <USR> <USR> <USR> <USR> <URL>
***************
The tags are O O O O O O O O O O O O O O
The sentence is : <USR> hahahahahah t'es mon gars
***************
The tags are O O O O O
The sentence is : Sunday bloody Sunday .
***************
The tags are O O O O
The sentence is : Damn c'mon Wisconsin
***************
The tags are O O B-geo-loc
The sentence is : <USR> delicioso . hows your week ?
***************

***************
The tags are O O O O O O O O O O O O O O O O
The sentence is : Santa Monica , CA - At least seven dead in #shooting near LA-area community college <URL> via <USR>
***************
The tags are B-geo-loc I-geo-loc O B-geo-loc O O O O O O O O O O O O O O
The sentence is : Why is it that when we KNOW the batteries in a remote are dead , we push the buttons harder ?
***************
The tags are O O O O O O O O O O O O O O O O O O O O O
The sentence is : Sexy Saturday Round-Up <URL>
***************
The tags are O O O O
The sentence is : RT <USR> : Surround yourself with only people who are going to lift you higher ! Happy Sunday !
***************
The tags are O O O O O O O O O O O O O O O O O O O
The sentence is : RT <USR> : North Korea official says not interested in Iran-style deal <URL> #News #Breaking #sun
***************
The tags are O O O B-geo-loc I-geo-loc O O O O O O O O O O O
The sentence is : RT <USR> : #StarWars fans are preparing for Monday 's #TheForceAwakens pr

The tags are O O O O O O B-geo-loc O O B-geo-loc I-geo-loc O O O O O O O O O O O O
The sentence is : #uknouugly when #TeamFollowBack dnt follow you back
***************
The tags are O O O O O O O
The sentence is : <USR> <USR> police failed to act on disciplining an officer for shooting a black 17 y/o 16 times until a journalist won a FOIA
***************
The tags are O O O O O O O O O O O O O O O O O O O O O O O B-other
The sentence is : Gonna love b days , don't have to go to school till 7th period
***************
The tags are O O O O O O O O O O O O O O
The sentence is : RT <USR> : Richard may need some help . <URL>
***************
The tags are O O O B-person O O O O O O
The sentence is : I love to see a child 's face when they think of the fairies . Like a fairy their little faces light up :)
***************
The tags are O O O O O O O O O O O O O O O O O O O O O O O O
The sentence is : <USR> I think that 's when I'm gonna be there
***************
The tags are O O O O O O O O O O
The

The sentence is : meet and greet penulis <USR> tgl 27 april 2014 .
***************
The tags are O O O O O O O O O O
The sentence is : 'Listen to my new freestyle " September Freestyle ".. <URL> make sure you give me feedback on that ...'
***************
The tags are O O O O O O O O O O O O O O O O O O O
The sentence is : Well had the weigh in today . I've lost a pound !!! I've had no alcohol , no puddings , no crisps , no chocolate &amp; I've only lost a pound !!!!!
***************
The tags are O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O
The sentence is : <USR> see you Friday at 630 ... good ?
***************
The tags are O O O O O O O O O
The sentence is : <USR> anek kicu i dstw :-( patuakhali science and technology university bt admit howe thakbo 2nd tym dibo abar
***************
The tags are O O O O O O O O O O O O O O O O O O O
The sentence is : RT <USR> : Helloooo ! Good Friday morning ! Live <USR> now until 10am <USR> <USR> <USR> # <URL>
***************
The ta

The sentence is : Parliament to continue debate on refugee crisis : Ljubljana , 2 November - The parliamentary Home Affairs Commit ... <URL>
***************
The tags are O O O O O O O O B-geo-loc O O O O O O B-other I-other I-other O O
The sentence is : Mon amour pour cette terre n'est pas plus grand que Sarkozy #punchline
***************
The tags are O O O O O O O O O O O O
The sentence is : Movies Tonight ? ( :
***************
The tags are O O O O O
The sentence is : #GetParadox #ATLPromoTour April 13-15 Brought to you by <USR> Follow on IG &amp; amp ; Twitter is <USR> <URL>
***************
The tags are O O O O O O O O O O O B-company O O O B-company O O O
The sentence is : I ain't necessary listenin ' to the lyrics but the beat got head rockin'. &quot; It 's Gucci Time !&quot; #nodisrespect I fucks with every now &amp; then ...
***************
The tags are O O O O O O O O O O O O O O O B-other I-other I-other I-other O O O O O O O O O O
The sentence is : Google may return to China w

***************
The tags are O O O O O O O O O O O O O O O O O O O O
The sentence is : TheEconomist : China 's fertility fell so steeply that it may be impossible to reverse <URL> <URL>
***************
The tags are B-company O B-geo-loc O O O O O O O O O O O O O O
The sentence is : SPC Jan 7 , 2016 0600 UTC Day 1 Convective Outlook <URL>
***************
The tags are O O O O O O O O O O O O
The sentence is : No Good Punk : Thug knocks 84-year-old to the ground , steals her purse <URL>
***************
The tags are O O O O O O O O O O O O O O O
The sentence is : Our thoughts and prayers go out to those in DC who have been affected by the Washington Navy Yard shooting .
***************
The tags are O O O O O O O O O B-geo-loc O O O O O O B-facility I-facility I-facility O O
The sentence is : RT <USR> : Your #Cochrane Eagle front for April 23 , 2015 . <URL>
***************
The tags are O O O O B-product I-product O O O O O O O O
The sentence is : <USR> January 25 , 2015 at 01:19 AM <URL>
**

The sentence is : RT <USR> : Not feeling school tomorrow .
***************
The tags are O O O O O O O O
The sentence is : RT <USR> : Not looking forward to tomorrow
***************
The tags are O O O O O O O O
The sentence is : . #HyukBestMaknae September 16 , 2015 at 09:53 PM <URL>
***************
The tags are O O O O O O O O O O
The sentence is : You may need to confront an uncomfortable situation at work to ... More for Capricorn <URL>
***************
The tags are O O O O O O O O O O O O O O B-other O
The sentence is : I cooked a heart shaped pancake of tofu as a Sunday 's breakfast . <URL>
***************
The tags are O O O O O O O O O O O O O O O
The sentence is : i kep thjnking it Thursday ?
***************
The tags are O O O O O O
The sentence is : RT <USR> : Apr 9 , 2015 DISCOVERY THURSDAY 9pm <USR> CURVE ft . #Armon #IsaacHo #Sick <USR> <URL>
***************
The tags are O O O O O O O O O O O O O O O O O O O
The sentence is : Starwood Reports Payment Information Data Breach : 

The sentence is : RT <USR> : spirit lead me where my trust is without borders , let me walk upon the waters , wherever you may call me .
***************
The tags are O O O O O O O O O O O O O O O O O O O O O O O O O O
The sentence is : tonight at The Lodge : braised bison shortribs , kabocha squash risotto <URL>
***************
The tags are O O B-facility I-facility O O O O O O O O O
The sentence is : 4m TalkTalk customers’ details may have been stolen in latest breach <URL> pic.twitter.com/ge7SS1brsV
***************
The tags are O B-company O O O O O O O O O O O
The sentence is : By tomorrow
***************
The tags are O O
The sentence is : Eurozone Consumer Confidence Strongest Since July 2007 : Consumers in the eurozone were more upbeat in March th ... <URL>
***************
The tags are B-other O O O O O O O O O O B-other O O O O O O O O
The sentence is : RT <USR> : And <USR> is launching an online forum tomorrow . We'll have that address later . #transit
***************
The tags a

The tags are O O O O O O O O O O O O O O O O O O O O O O O
The sentence is : Swiss #Skin Care Product K . Diamond Now Available in #China <URL> #Scott_Walker : LAUSANNE , Switzerland , July 13 , 20 ...
***************
The tags are O O O O B-product I-product I-product O O O B-geo-loc O O O O O O O O O O O O
The sentence is : RT <USR> : C'Mon just eat one more disgusting thing ! -the wee hours
***************
The tags are O O O O O O O O O O O O O O
The sentence is : <USR> Monday Morning Church
***************
The tags are O O O O
The sentence is : Looks like #king county #Metro may have overestimated the #vanpool demand ( they just sit in this lot ) <USR> <URL>
***************
The tags are O O B-other I-other O O O O O O O O O O O O O O O O O
The sentence is : Apple MacBook Pro A1278 13.3 " Laptop - MD101LL/A ( June , 2012 ) - Full read by eBay <URL> <URL>
***************
The tags are B-product I-product I-product I-product I-product I-product I-product I-product I-product O O O O O O 

***************
The tags are O O O O O O O O O O O
The sentence is : I posted reasons for the shooting at Marysville Pilchuck High School : <URL> … The cause is NO different than that of most ...
***************
The tags are O O O O O O O B-facility I-facility I-facility I-facility O O O O O O O O O O O O O
The sentence is : 4th street " <USR> : Uphuzani ? RT <USR> : That thing gives me chest pains hey " MsGuxx : A guareezy would be lovely rn ""
***************
The tags are O O O O O O O O O O O O O O O O O O O O O O O O O O O
The sentence is : RT <USR> : august , september , halloween , thanksgiving , christmas
***************
The tags are O O O O O O O B-other O B-other O B-other
The sentence is : RT <USR> : 81 Days Till Last Sacrifice
***************
The tags are O O O O O O B-other I-other
The sentence is : HAVE YOU HEARD DJ STRATEGY IS NOW AT VISIONS LOUNGE IN HICKORY , NC ON WEDNESDAY ; S NIGHTS ... WOW !! DONT MISS IT !
***************
The tags are O O O B-musicartist I-musicart

The sentence is : #Russia believed to be behind #Pentagon 's Joint Staff email breach <URL> … pic.twitter.com/4rDMGYbSUl
***************
The tags are B-geo-loc O O O O O O O O O O O O O
The sentence is : Friday <URL>
***************
The tags are O O
The sentence is : Friday Lunch Out with #Grazildas #northpark #InstaMagAndroid @ Next Door Noodles North Park Makati <URL>
***************
The tags are O O O O O O O O B-facility I-facility I-facility B-geo-loc I-geo-loc I-geo-loc O
The sentence is : RT <USR> : FREESTYLE FRIDAY !! Chat picks topics !! <URL> #Twitch
***************
The tags are O O O O O O O O O O O O
The sentence is : Adulting has been so difficult today , but tomorrow is Thursday 's and Tomorrows are always better .
***************
The tags are O O O O O O O O O O O O O O O O O O
The sentence is : man my twin wanna act fake today but its okay bcuz i still love you
***************
The tags are O O O O O O O O O O O O O O O
The sentence is : ​Why you shouldn't be scared by t

The sentence is : RT <USR> : <USR> ' Honeymooners ' to Come to Charlestown on March 6 and 7 | Catonsville , MD Patch <URL>
***************
The tags are O O O O O B-movie O O O O B-geo-loc O O O O O O B-geo-loc O B-geo-loc O O
The sentence is : Federal lawsuit filed over police shooting <URL> #chicago
***************
The tags are O O O O O O O O
The sentence is : Another good turn out at our juniors #getintogolf session at 11.00 - 12.00 on a Saturday morning <USR> <URL>
***************
The tags are O O O O O O O O O O O O O O O O O O O
The sentence is : I want it NOW ! ;) RT : <USR> Pumpkin Moonshine has arrived in Nashville -- Give it a week to get it in stores . KY ships next week !
***************
The tags are O O O O O O O O O B-product I-product O O O B-geo-loc O O O O O O O O O O O B-geo-loc O O O O
The sentence is : Dah september 2015 , Ada lagi manusia yang berebut ikan keli . Aih , Manusia .
***************
The tags are O O O O O O O O O O O O O O O O
The sentence is : RT <USR>

## Build a recurrent neural network

This is the most important part of the assignment. Here we will specify the network architecture based on TensorFlow building blocks. It's fun and easy as a lego constructor! We will create an LSTM network which will produce probability distribution over tags for each token in a sentence. To take into account both right and left contexts of the token, we will use Bi-Directional LSTM (Bi-LSTM). Dense layer will be used on top to perform tag classification.  

In [20]:
import numpy as np
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()



Instructions for updating:
non-resource variables are not supported in the long term


In [21]:
from tensorflow import keras
from tensorflow.keras import Sequential, layers, callbacks, Model
from tensorflow.keras.layers import Dense, LSTM, Dropout, GRU, Bidirectional, Embedding

In [22]:
tf.random.set_random_seed(42)

In [23]:
n_words = len(token_to_index)

In [24]:
class BiLSTMModel():
    pass

First, we need to create [placeholders](https://www.tensorflow.org/api_docs/python/tf/compat/v1/placeholder) to specify what data we are going to feed into the network during the execution time.  For this task we will need the following placeholders:
 - *input_batch* — sequences of words (the shape equals to [batch_size, sequence_len]);
 - *ground_truth_tags* — sequences of tags (the shape equals to [batch_size, sequence_len]);
 - *lengths* — lengths of not padded sequences (the shape equals to [batch_size]);
 - *dropout_ph* — dropout keep probability; this placeholder has a predefined value 1;
 - *learning_rate_ph* — learning rate; we need this placeholder because we want to change the value during training.

It could be noticed that we use *None* in the shapes in the declaration, which means that data of any size can be feeded. 

You need to complete the function *declare_placeholders*.

In [25]:
BATCH_SIZE = 32
EPOCHS = 5
MAX_LEN = 75
EMBEDDING = 200
UNITS  = 200

In [26]:
def declare_placeholders(self):
    """Specifies placeholders for the model."""

    # Placeholders for input and ground truth output.
    self.input_batch = tf.placeholder(dtype=tf.int32, shape=[None, None], name='input_batch') 
    self.ground_truth_tags = tf.placeholder(dtype=tf.int32, shape=[None, None ], name='ground_truth_shape')
    # Placeholder for lengths of the sequences.
    self.lengths = tf.placeholder(dtype=tf.int32, shape=[None], name='lengths') 
    
    # Placeholder for a dropout keep probability. If we don't feed
    # a value for this placeholder, it will be equal to 1.0.
    self.dropout_ph = tf.placeholder_with_default(tf.cast(1.0, tf.float32), shape=[])
    
    # Placeholder for a learning rate (tf.float32).
    self.learning_rate_ph = tf.placeholder(dtype=tf.float32, shape=[], name='learning_rate') 

In [27]:
BiLSTMModel.__declare_placeholders = classmethod(declare_placeholders)

Now, let us specify the layers of the neural network. First, we need to perform some preparatory steps: 
 
- Create embeddings matrix with [tf.Variable](https://www.tensorflow.org/api_docs/python/tf/Variable). Specify its name (*embeddings_matrix*), type  (*tf.float32*), and initialize with random values.
- Create forward and backward LSTM cells. TensorFlow provides a number of RNN cells ready for you. We suggest that you use *LSTMCell*, but you can also experiment with other types, e.g. GRU cells. [This](http://colah.github.io/posts/2015-08-Understanding-LSTMs/) blogpost could be interesting if you want to learn more about the differences.
- Wrap your cells with [DropoutWrapper](https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/DropoutWrapper). Dropout is an important regularization technique for neural networks. Specify all keep probabilities using the dropout placeholder that we created before.
 
After that, you can build the computation graph that transforms an input_batch:

- [Look up](https://www.tensorflow.org/api_docs/python/tf/nn/embedding_lookup) embeddings for an *input_batch* in the prepared *embedding_matrix*.
- Pass the embeddings through [Bidirectional Dynamic RNN](https://www.tensorflow.org/api_docs/python/tf/nn/bidirectional_dynamic_rnn) with the specified forward and backward cells. Use the lengths placeholder here to avoid computations for padding tokens inside the RNN.
- Create a dense layer on top. Its output will be used directly in loss function.  
 
Fill in the code below. In case you need to debug something, the easiest way is to check that tensor shapes of each step match the expected ones. 
 

In [28]:
def build_layers(self, vocabulary_size, embedding_dim, n_hidden_rnn, n_tags):
    """Specifies bi-LSTM architecture and computes logits for inputs."""
    
    # Create embedding variable (tf.Variable) with dtype tf.float32
    initial_embedding_matrix = np.random.randn(vocabulary_size, embedding_dim) / np.sqrt(embedding_dim)
    embedding_matrix_variable = tf.Variable(initial_value = initial_embedding_matrix, dtype= tf.float32)
    
    # Create RNN cells (for example, tf.nn.rnn_cell.BasicLSTMCell) with n_hidden_rnn number of units 
    # and dropout (tf.nn.rnn_cell.DropoutWrapper), initializing all *_keep_prob with dropout placeholder.
    
    forward_cell = tf.nn.rnn_cell.LSTMCell(n_hidden_rnn)
    forward_cell = tf.nn.rnn_cell.DropoutWrapper(forward_cell, self.dropout_ph, self.dropout_ph)
    
    backward_cell = tf.nn.rnn_cell.LSTMCell(n_hidden_rnn)
    backward_cell = tf.nn.rnn_cell.DropoutWrapper(backward_cell, self.dropout_ph, self.dropout_ph)
    

    # Look up embeddings for self.input_batch (tf.nn.embedding_lookup).
    # Shape: [batch_size, sequence_len, embedding_dim].
    embedings = tf.nn.embedding_lookup(embedding_matrix_variable, self.input_batch)
    
    
    # Pass them through Bidirectional Dynamic RNN (tf.nn.bidirectional_dynamic_rnn).
    # Shape: [batch_size, sequence_len, 2 * n_hidden_rnn]. 
    # Also don't forget to initialize sequence_length as self.lengths and dtype as tf.float32.
    outputs, states = tf.nn.bidirectional_dynamic_rnn(cell_bw=backward_cell, 
                                                      cell_fw=forward_cell, 
                                                      dtype=tf.float32,
                                                      inputs=embedings,
                                                      sequence_length=self.lengths)
    
    output_fw, output_bw = outputs
    states_fw, states_bw = states
    rnn_output = tf.concat([output_fw, output_bw], axis=2)
    

    # Dense layer on top.
    # Shape: [batch_size, sequence_len, n_tags].   
    self.logits = tf.compat.v1.layers.dense(rnn_output, n_tags, activation=None)
    

In [29]:
BiLSTMModel.__build_layers = classmethod(build_layers)

To compute the actual predictions of the neural network, you need to apply [softmax](https://www.tensorflow.org/api_docs/python/tf/nn/softmax) to the last layer and find the most probable tags with [argmax](https://www.tensorflow.org/api_docs/python/tf/argmax).

In [30]:
def compute_predictions(self):
    """Transforms logits to probabilities and finds the most probable tags."""
    
    # Create softmax (tf.nn.softmax) function
    self. softmax_output = tf.nn.softmax(self.logits, name='softmax')
    
    # Use argmax (tf.argmax) to get the most probable tags
    # Don't forget to set axis=-1
    # otherwise argmax will be calculated in a wrong way
    predicitons = tf.math.argmax(self.softmax_output, axis=-1)
    self.predictions = predicitons

In [31]:
BiLSTMModel.__compute_predictions = classmethod(compute_predictions)

During training we do not need predictions of the network, but we need a loss function. We will use [cross-entropy loss](http://ml-cheatsheet.readthedocs.io/en/latest/loss_functions.html#cross-entropy), efficiently implemented in TF as 
[cross entropy with logits](https://www.tensorflow.org/api_docs/python/tf/nn/softmax_cross_entropy_with_logits_v2). Note that it should be applied to logits of the model (not to softmax probabilities!). Also note,  that we do not want to take into account loss terms coming from `<PAD>` tokens. So we need to mask them out, before computing [mean](https://www.tensorflow.org/api_docs/python/tf/reduce_mean).

In [32]:
def compute_loss(self, n_tags, PAD_index):
    """Computes masked cross-entopy loss with logits."""
    
    # Create cross entropy function function (tf.nn.softmax_cross_entropy_with_logits_v2)
    ground_truth_tags_one_hot = tf.one_hot(self.ground_truth_tags, n_tags)
    loss_tensor =  tf.nn.softmax_cross_entropy_with_logits(labels=ground_truth_tags_one_hot, logits=self.logits)
    
    mask = tf.cast(tf.not_equal(self.input_batch, PAD_index), tf.float32)
    # Create loss function which doesn't operate with <PAD> tokens (tf.reduce_mean)
    # Be careful that the argument of tf.reduce_mean should be
    # multiplication of mask and loss_tensor.
    self.loss = tf.reduce_mean(loss_tensor*mask)

In [33]:
BiLSTMModel.__compute_loss = classmethod(compute_loss)

The last thing to specify is how we want to optimize the loss. 
We suggest that you use [Adam](https://www.tensorflow.org/api_docs/python/tf/train/AdamOptimizer) optimizer with a learning rate from the corresponding placeholder. 
You will also need to apply clipping to eliminate exploding gradients. It can be easily done with [clip_by_norm](https://www.tensorflow.org/api_docs/python/tf/clip_by_norm) function. 

In [34]:
def perform_optimization(self):
    """Specifies the optimizer and train_op for the model."""
    
    # Create an optimizer (tf.train.AdamOptimizer)
    self.optimizer =  tf.train.AdamOptimizer(learning_rate=self.learning_rate_ph)
    self.grads_and_vars = self.optimizer.compute_gradients(self.loss)
    
    # Gradient clipping (tf.clip_by_norm) for self.grads_and_vars
    # Pay attention that you need to apply this operation only for gradients 
    # because self.grads_and_vars also contains variables.
    # list comprehension might be useful in this case.
    clip_norm = tf.cast(1.0, tf.float32)
    self.grads_and_vars = [ (tf.clip_by_norm(gv[0], clip_norm),gv[1]) for gv in self.grads_and_vars]
    
    self.train_op = self.optimizer.apply_gradients(self.grads_and_vars)

In [35]:
BiLSTMModel.__perform_optimization = classmethod(perform_optimization)

Congratulations! You have specified all the parts of your network. You may have noticed, that we didn't deal with any real data yet, so what you have written is just recipes on how the network should function.
Now we will put them to the constructor of our Bi-LSTM class to use it in the next section. 

In [36]:
def init_model(self, vocabulary_size, n_tags, embedding_dim, n_hidden_rnn, PAD_index):
    self.__declare_placeholders()
    self.__build_layers(vocabulary_size, embedding_dim, n_hidden_rnn, n_tags)
    self.__compute_predictions()
    self.__compute_loss(n_tags, PAD_index)
    self.__perform_optimization()

In [37]:
BiLSTMModel.__init__ = classmethod(init_model)

## Train the network and predict tags

[Session.run](https://www.tensorflow.org/api_docs/python/tf/Session#run) is a point which initiates computations in the graph that we have defined. To train the network, we need to compute *self.train_op*, which was declared in *perform_optimization*. To predict tags, we just need to compute *self.predictions*. Anyway, we need to feed actual data through the placeholders that we defined before. 

In [38]:
def train_on_batch(self, session, x_batch, y_batch, lengths, learning_rate, dropout_keep_probability):
    #print("***"*4, y_batch.shape)
    feed_dict = {self.input_batch: x_batch,
                 self.ground_truth_tags: y_batch,
                 self.learning_rate_ph: learning_rate,
                 self.dropout_ph: dropout_keep_probability,
                 self.lengths: lengths}
    
    session.run(self.train_op, feed_dict=feed_dict)

In [39]:
BiLSTMModel.train_on_batch = classmethod(train_on_batch)

Implement the function *predict_for_batch* by initializing *feed_dict* with input *x_batch* and *lengths* and running the *session* for *self.predictions*.

In [40]:
def predict_for_batch(self, session, x_batch, lengths):
    feed_dict = {self.input_batch: x_batch,
                 self.lengths: lengths}
    predictions=session.run(self.predictions, feed_dict=feed_dict)
    softmax_output = session.run(self.softmax_output, feed_dict=feed_dict)
    return predictions, softmax_output

In [41]:
BiLSTMModel.predict_for_batch = classmethod(predict_for_batch)

We finished with necessary methods of our BiLSTMModel model and almost ready to start experimenting.

### Evaluation 
To simplify the evaluation process we provide two functions for you:
 - *predict_tags*: uses a model to get predictions and transforms indices to tokens and tags;
 - *eval_conll*: calculates precision, recall and F1 for the results.

In [42]:
from evaluation import precision_recall_f1

In [43]:
def predict_tags(model, session, token_idxs_batch, lengths):
    """Performs predictions and transforms indices to tokens and tags."""
    
    tag_indexes_batch, softmax_batch = model.predict_for_batch(session, token_idxs_batch, lengths)
    
    tags_batch, tokens_batch, probs_batch = [], [], []
    for tag_idxs, token_idxs, softmax_probs in zip(tag_indexes_batch, token_idxs_batch, softmax_batch):
        tags, tokens, probs = [], [], []
        for tag_idx, token_idx, softmax_prob in zip(tag_idxs, token_idxs, softmax_probs):
            if not index_to_tag[tag_idx]:
                print("following token index was not found.. *** " ,  token_idx)
            tags.append(index_to_tag[tag_idx])
            tokens.append(index_to_token[token_idx])
            probs.append(softmax_prob)
        tags_batch.append(tags)
        tokens_batch.append(tokens)
        probs_batch.append(probs)
    return tags_batch, tokens_batch, probs_batch
    
    
def eval_conll(model, session, tokens, tags, short_report=True):
    """Computes NER quality measures using CONLL shared task script."""
    
    y_true, y_pred = [], []
    for x_batch, y_batch, lengths in batches_generator(1, tokens, tags):
        tags_batch, tokens_batch, probs_batch = predict_tags(model, session, x_batch, lengths)
        if len(x_batch[0]) != len(tags_batch[0]):
            raise Exception("Incorrect length of prediction for the input, "
                            "expected length: %i, got: %i" % (len(x_batch[0]), len(tags_batch[0])))
        predicted_tags = []
        ground_truth_tags = []
        for gt_tag_idx, pred_tag, token in zip(y_batch[0], tags_batch[0], tokens_batch[0]): 
            if token != '&amp;amp;lt;PAD&amp;amp;gt;':
                ground_truth_tags.append(index_to_tag[gt_tag_idx])
                predicted_tags.append(pred_tag)

        # We extend every prediction and ground truth sequence with 'O' tag
        # to indicate a possible end of entity.
        y_true.extend(ground_truth_tags + ['O'])
        y_pred.extend(predicted_tags + ['O'])
    #print("do we have any empty tags? ", all(y_pred))
    results = precision_recall_f1(y_true=y_true, y_pred=y_pred, print_results=True, short_report=short_report)
    return results

## Run your experiment

Create *BiLSTMModel* model with the following parameters:
 - *vocabulary_size* — number of tokens;
 - *n_tags* — number of tags;
 - *embedding_dim* — dimension of embeddings, recommended value: 200;
 - *n_hidden_rnn* — size of hidden layers for RNN, recommended value: 200;
 - *PAD_index* — an index of the padding token (`<PAD>`).

Set hyperparameters. You might want to start with the following recommended values:
- *batch_size*: 32;
- 4 epochs;
- starting value of *learning_rate*: 0.005
- *learning_rate_decay*: a square root of 2;
- *dropout_keep_probability*: try several values: 0.1, 0.5, 0.9.

However, feel free to conduct more experiments to tune hyperparameters and earn extra points for the assignment.

In [44]:
BATCH_SIZE = 32
EPOCHS = 5
MAX_LEN = 75
EMBEDDING = 200
UNITS  = 200

In [45]:
"""class BiLSTMModel(Model):
    def __init__(self, **kwargs):
        super(BiLSTMModel, self).__init__(**kwargs)
        self.embedding = Embedding(input_dim=n_words + 2 ,
                                   output_dim=EMBEDDING, 
                                   input_length=MAX_LEN,
                                   mask_zero=True)
        self.bidirectional = Bidirectional(LSTM(units=50, return_sequences=True, recurrent_dropout=.1))
        self.dense = Dense(1)
    def call(self, inputs):
        x = self.embedding(inputs)
        x = self.bidirectional(x)
        x = self.dense(x)
        return x"""

'class BiLSTMModel(Model):\n    def __init__(self, **kwargs):\n        super(BiLSTMModel, self).__init__(**kwargs)\n        self.embedding = Embedding(input_dim=n_words + 2 ,\n                                   output_dim=EMBEDDING, \n                                   input_length=MAX_LEN,\n                                   mask_zero=True)\n        self.bidirectional = Bidirectional(LSTM(units=50, return_sequences=True, recurrent_dropout=.1))\n        self.dense = Dense(1)\n    def call(self, inputs):\n        x = self.embedding(inputs)\n        x = self.bidirectional(x)\n        x = self.dense(x)\n        return x'

In [46]:
"""bi_lstm_model = BiLSTMModel(name='ner_bi_directional_model')
bi_lstm_model.compile(optimizer='rmsprop',
                      loss='categorical_cross_entropy',
                      metrics=['accuracy'])
bi_lstm_model.summary()"""

"bi_lstm_model = BiLSTMModel(name='ner_bi_directional_model')\nbi_lstm_model.compile(optimizer='rmsprop',\n                      loss='categorical_cross_entropy',\n                      metrics=['accuracy'])\nbi_lstm_model.summary()"

In [47]:
len(index_to_token)

20495

In [56]:

tf.compat.v1.reset_default_graph()
model = BiLSTMModel(vocabulary_size=len(token_to_index),
                    n_tags=len(tag_to_index),
                    embedding_dim=200
                    ,n_hidden_rnn=200,
                    PAD_index=token_to_index.get('<PAD>'))

batch_size = 32
n_epochs = 10
learning_rate = 0.005
learning_rate_decay = np.sqrt(2)
dropout_keep_probability = 0.75

If you got an error *"Tensor conversion requested dtype float64 for Tensor with dtype float32"* in this point, check if there are variables without dtype initialised. Set the value of dtype equals to *tf.float32* for such variables.

Finally, we are ready to run the training!

In [57]:
n_epochs

10

In [58]:
sess = tf.Session()
sess.run(tf.global_variables_initializer())

print('Start training... \n')
for epoch in range(n_epochs):
    # For each epoch evaluate the model on train and validation data
    print('-' * 20 + ' Epoch {} '.format(epoch+1) + 'of {} '.format(n_epochs) + '-' * 20)
    print('Train data evaluation:')
    eval_conll(model, sess, train_tokens, train_tags, short_report=True)
    print('Validation data evaluation:')
    eval_conll(model, sess, validation_tokens, validation_tags, short_report=True)
    
    # Train the model
    for x_batch, y_batch, lengths in batches_generator(batch_size, train_tokens, train_tags):
        model.train_on_batch(sess, x_batch, y_batch, lengths, learning_rate, dropout_keep_probability)
        
    # Decaying the learning rate
    learning_rate = learning_rate / learning_rate_decay
    
print('...training finished.')

Start training... 

-------------------- Epoch 1 of 10 --------------------
Train data evaluation:
processed 105778 tokens with 4489 phrases; found: 76253 phrases; correct: 213.

precision:  0.28%; recall:  4.74%; F1:  0.53

Validation data evaluation:
processed 12836 tokens with 537 phrases; found: 9357 phrases; correct: 25.

precision:  0.27%; recall:  4.66%; F1:  0.51

-------------------- Epoch 2 of 10 --------------------
Train data evaluation:
processed 105778 tokens with 4489 phrases; found: 3122 phrases; correct: 870.

precision:  27.87%; recall:  19.38%; F1:  22.86

Validation data evaluation:
processed 12836 tokens with 537 phrases; found: 231 phrases; correct: 73.

precision:  31.60%; recall:  13.59%; F1:  19.01

-------------------- Epoch 3 of 10 --------------------
Train data evaluation:
processed 105778 tokens with 4489 phrases; found: 4922 phrases; correct: 2649.

precision:  53.82%; recall:  59.01%; F1:  56.30

Validation data evaluation:
processed 12836 tokens with 53

Now let us see full quality reports for the final model on train, validation, and test sets. To give you a hint whether you have implemented everything correctly, you might expect F-score about 40% on the validation set.

**The output of the cell below (as well as the output of all the other cells) should be present in the notebook for peer2peer review!**

In [51]:
print('-' * 20 + ' Train set quality: ' + '-' * 20)
train_results = eval_conll(model, sess, train_tokens, train_tags, short_report=False)

print('-' * 20 + ' Validation set quality: ' + '-' * 20)
validation_results = eval_conll(model, sess, validation_tokens, validation_tags, short_report=False)

print('-' * 20 + ' Test set quality: ' + '-' * 20)
test_results = eval_conll(model, sess, test_tokens, test_tags, short_report=False)

-------------------- Train set quality: --------------------
processed 105778 tokens with 4489 phrases; found: 4518 phrases; correct: 4447.

precision:  98.43%; recall:  99.06%; F1:  98.75

	     company: precision:   98.46%; recall:   99.53%; F1:   98.99; predicted:   650

	    facility: precision:   96.26%; recall:   98.41%; F1:   97.32; predicted:   321

	     geo-loc: precision:   99.60%; recall:   99.80%; F1:   99.70; predicted:   998

	       movie: precision:   98.48%; recall:   95.59%; F1:   97.01; predicted:    66

	 musicartist: precision:   98.28%; recall:   98.28%; F1:   98.28; predicted:   232

	       other: precision:   96.49%; recall:   98.15%; F1:   97.31; predicted:   770

	      person: precision:   99.66%; recall:   99.77%; F1:   99.72; predicted:   887

	     product: precision:   98.75%; recall:   99.06%; F1:   98.90; predicted:   319

	  sportsteam: precision:   97.72%; recall:   98.62%; F1:   98.17; predicted:   219

	      tvshow: precision:   98.21%; recall:  

TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

### Conclusions

Could we say that our model is state of the art and the results are acceptable for the task? Definately, we can say so. Nowadays, Bi-LSTM is one of the state of the art approaches for solving NER problem and it outperforms other classical methods. Despite the fact that we used small training corpora (in comparison with usual sizes of corpora in Deep Learning), our results are quite good. In addition, in this task there are many possible named entities and for some of them we have only several dozens of trainig examples, which is definately small. However, the implemented model outperforms classical CRFs for this task. Even better results could be obtained by some combinations of several types of methods, e.g. see [this](https://arxiv.org/abs/1603.01354) paper if you are interested.