# What is Named Entity Recognition ? 

Es un problema de *Sequence Labeling*.  A diferencia de obtener una clase para todo un texto, cada token del texto debe ser taggeado.

### Algo de teoría breve

#### Sequence Labeling & Structured Output

Al ser un proceso de etiquetar sequencias, que puede ser mejor, etiquetar toda la sequencia de golpe, o ir palabra a palabra?

Bueno, con lo que hemos visto hasta ahora, tanto en Machine Learning, como en Deep Learning, no hay manera de etiquetar todo de golpe, porque normalmente tenemos datasets tal que las *ys* dependen unicamente de un set de features representado por *x*. En NLP muchos de estos problemas no tienen sentido, tal y como hemos visto en Language Modeling, hay casos en los que la palabra actual depende, tanto de lo que hemos dicho(pasado), como de lo que vamos a decir (futuro). De hecho, cuando vamos a hablar o escribir, *normalmente y no todos* pensamos que vamos a decir, es decir, hacemos una predicción de aquello futuro, y luego vamos articulando esa idea (pasado). Para estos casos, hay modelos probabilísticos como los modelos markovianos de variables latentes (Hidden Markov Models [HMM](https://en.wikipedia.org/wiki/Hidden_Markov_model)), que quedan fuera del temario del curso, o Conditional Random Fields ([CRF](https://en.wikipedia.org/wiki/Conditional_random_field)) como su alternativa más usada para problemas de este estilo. 

Este último, y solo a modo de que os suene, cae en la categoría de *structured prediction* ([wiki](https://en.wikipedia.org/wiki/Structured_prediction)). Structured prediction no es ni más ni menos que lo que nosotros querríamos para este tipo de productos, es decir, en lugar de que *y* este condicionada sólo a una serie de *features x*, este condicionado a todo el output. Y que todo este entrelazado, que por ejemplo un tag posterior, pueda afectar a uno anterior y viceversa.

![](https://i.imgur.com/ukAr3Uh.jpg)

Entonces si no vemos esto dentro del curso, hemos terminado la clase porque no podemos resolver el problema. Bueno no del todo.

Como siempre, hay alternativas, que quizás no cumplen con todos los requisitos que querriamos pero dan resultados correctos, sobretodo para ver baselines.

Entonces, si no podemos predecir algo en base al tiempo, haremos que el tiempo sea una feature. Es decir, en una feature meteremos tambien las features de *t-n* hasta *t+n*. Es decir, que la feature *x_n* estará compuesta por los la concatenación de elementos de features pasadas, y features futuras, y haremos que el clasificador tenga suficiente información en cada pareja *x,y*. Veremos en breves como se componen estas features.

![](https://i.imgur.com/Wail7yI.jpg =400x)
![](https://i.imgur.com/irjqooW.jpg =400x)

De esta forma, conseguimos crear features "independientes" del tiempo y podremos entrenar como siempre.

### Named Entity Recognition

El Objectivo es asignar doble-tag uno para cada clase, persona, organizacion, lo que queramos, y el otro es el conocido como BIO.

*   Begin: Inicio de una entidad
*   Inside: Dentro de una entidad
*   Outside: No es una entida

Actualmente y bastantes taggers correctos, pero en general, hay que adaptarlo al dominio. Un caso muy claro, es el de esos taggers, como el que veremos que se basa entre otras features, en si una palabra esta en mayúsculas. En un dominio más casual, como twitter, o cualquier otro ambito del tipo de internet.

![](https://i.imgur.com/a8Zregf.jpg)

Cual es la idea para entrenar? Para todo un texto usamos n-grams fomo features normalmente, entre otras. Aqui la idea es palabra por palabra poder decidir que tag tiene. Se definen una seria de features, que no son n-grams. Ahoar veremos un subset de estas features. Para lo demas, sera como cualquier otro modelo. Representar como input las features, tener preparado para cada input, un output, escoger clasificador y función de objetivo y a entrenar!

In [0]:
import csv
from tqdm import tqdm
import spacy
nlp = spacy.load('en_core_web_sm', disable=['parser', 'ner', 'textcat'])

In [1]:
#collab opening

dataset = []

with open('ner_dataset.csv', encoding='mac_roman',) as f:
    dataset_ner = csv.reader(f)
    next(dataset_ner, None)  
    sentence = []    
    for row in dataset_ner: # .encode('utf-8').strip()
        if row[0] != '':
            dataset.append(sentence)
            sentence = [(row[1],row[2], row[3])]
        else:
            sentence.append((row[1],row[2], row[3]))
dataset.remove([])
len(dataset)

FileNotFoundError: ignored

In [0]:
dataset[0]

[('Thousands', 'NNS', 'O'),
 ('of', 'IN', 'O'),
 ('demonstrators', 'NNS', 'O'),
 ('have', 'VBP', 'O'),
 ('marched', 'VBN', 'O'),
 ('through', 'IN', 'O'),
 ('London', 'NNP', 'B-geo'),
 ('to', 'TO', 'O'),
 ('protest', 'VB', 'O'),
 ('the', 'DT', 'O'),
 ('war', 'NN', 'O'),
 ('in', 'IN', 'O'),
 ('Iraq', 'NNP', 'B-geo'),
 ('and', 'CC', 'O'),
 ('demand', 'VB', 'O'),
 ('the', 'DT', 'O'),
 ('withdrawal', 'NN', 'O'),
 ('of', 'IN', 'O'),
 ('British', 'JJ', 'B-gpe'),
 ('troops', 'NNS', 'O'),
 ('from', 'IN', 'O'),
 ('that', 'DT', 'O'),
 ('country', 'NN', 'O'),
 ('.', '.', 'O')]

In [0]:
classes = list(set([c for sentence in dataset for (_, _, c) in sentence]))
classes

['B-org',
 'I-per',
 'B-geo',
 'I-org',
 'I-nat',
 'B-eve',
 'B-gpe',
 'B-nat',
 'I-tim',
 'I-eve',
 'I-art',
 'O',
 'I-geo',
 'B-per',
 'B-art',
 'I-gpe',
 'B-tim']

# Building sets of features

Hasta ahora, hemos usado como features solo una minúscula parte del potencial linguístico que tienen los textos. Para este ejercicio entrenaremos con nuevas features, algunas muy obvias, otras quizas no tanto. En la imagen se pueden ver features para crear un Part of Speech tagger, nosotros lo adaptaremos a un Named Entity Recognizer.

![alt text](https://i.imgur.com/1SAC0TU.jpg =500x)

From Neural Network Methods for NLP by Yoav Goldberg


Acordemonos que las features hay que adaptarlas no solo al problema, sino al dataset. El problema nos puede dar un estimado de que features usar, pero como siempre, nosotros podemos generar, quitar o combinar features como queramos. Por ejemplo, en el case de Entity Recognizers, que una palabra este en mayúsculas quizás es muy relevante, pero si estamos en un contexto de internet, donde la gente no sigue una convención fija de escriptura, es menos relevante.

Para que veais un ejemplo de la complejidad de la tarea.




> Mal
![](https://i.imgur.com/wCOsdsk.png)
![](https://i.imgur.com/EO89nLZ.png)

> Bien
![](https://i.imgur.com/amsQr9T.png)

In [0]:
import re
 
def shape(word):
    word_shape = 'other'
    if re.match('[0-9]+(\.[0-9]*)?|[0-9]*\.[0-9]+$', word):
        word_shape = 'number'
    elif re.match('\W+$', word):
        word_shape = 'punct'
    elif re.match('[A-Z][a-z]+$', word):
        word_shape = 'capitalized'
    elif re.match('[A-Z]+$', word):
        word_shape = 'uppercase'
    elif re.match('[a-z]+$', word):
        word_shape = 'lowercase'
    elif re.match('[A-Z][a-z]+[A-Z][a-z]+[A-Za-z]*$', word):
        word_shape = 'camelcase'
    elif re.match('[A-Za-z]+$', word):
        word_shape = 'mixedcase'
    elif re.match('__.+__$', word):
        word_shape = 'wildcard'
    elif re.match('[A-Za-z0-9]+\.$', word):
        word_shape = 'ending-dot'
    elif re.match('[A-Za-z0-9]+\.[A-Za-z0-9\.]+\.$', word):
        word_shape = 'abbreviation'
    elif re.match('[A-Za-z0-9]+\-[A-Za-z0-9\-]+.*$', word):
        word_shape = 'contains-hyphen'
 
    return word_shape

In [0]:
def ner_features(tokens, index, history):
    """
    `tokens`  = a POS-tagged sentence [(w1, t1), ...]
    `index`   = the index of the token we want to extract features for
    `history` = the previous predicted IOB tags
    """
    # Pad the sequence with placeholders
    tokens = [('__START2__', '__START2__','__START2__' ), ('__START1__', '__START1__', '__START1__')] + list(tokens) + [('__END1__', '__END1__', '__END1__'), ('__END2__', '__END2__', '__END2__')]
    history = ['__START2__', '__START1__'] + list(history)
    # shift the index with 2, to accommodate the padding
    index += 2
    word, pos, lemma = tokens[index]
    prevword, prevpos, prevlemma = tokens[index - 1]
    prevprevword, prevprevpos, prevprevlemma = tokens[index - 2]
    nextword, nextpos, nextlemma = tokens[index + 1]
    nextnextword, nextnextpos, nextnextlemma = tokens[index + 2]
    previob = history[-1]
    prevpreviob = history[-2]
 
    feat_dict = {
        'word': word,
        'lemma': lemma,
        'pos': pos,
        'shape': shape(word),
 
        'next-word': nextword,
        'next-pos': nextpos,
        'next-lemma': nextlemma,
        'next-shape': shape(nextword),
 
        'next-next-word': nextnextword,
        'next-next-pos': nextnextpos,
        'next-next-lemma': nextnextlemma,
        'next-next-shape': shape(nextnextword),
 
        'prev-word': prevword,
        'prev-pos': prevpos,
        'prev-lemma': prevlemma,
        'prev-iob': previob,
        'prev-shape': shape(prevword),
 
        'prev-prev-word': prevprevword,
        'prev-prev-pos': prevprevpos,
        'prev-prev-lemma': prevprevlemma,
        'prev-prev-iob': prevpreviob,
        'prev-prev-shape': shape(prevprevword),
    }
 
    return feat_dict
    

In [0]:
def to_dataset(parsed_sentences, feature_detector):
        """
        Transform a list of tagged sentences into a scikit-learn compatible POS dataset
        :param parsed_sentences:
        :param feature_detector:
        :return:
        """
        X, y = [], []
        for parsed in tqdm(parsed_sentences):
            words, tags, iob_tags = zip(*parsed)
            lemmas = [t.lemma_ for t in nlp(" ".join(words))]
            for index in range(len(parsed)):
                tagged = zip(words, tags, lemmas)
                X.append(feature_detector(tagged, index, history=iob_tags[:index]))
                y.append(iob_tags[index])
        return X, y

In [0]:
X, y = to_dataset(dataset[0:5000], ner_features)

100%|██████████| 5000/5000 [01:06<00:00, 75.08it/s]


In [0]:
import itertools
def get_minibatch(parsed_sentences, feature_detector, batch_size=500):
    batch = list(itertools.islice(parsed_sentences, batch_size))
    X, y = to_dataset(batch, feature_detector)
    return X, y

In [0]:
X[0]

{'lemma': 'thousand',
 'next-lemma': 'of',
 'next-next-lemma': 'demonstrator',
 'next-next-pos': 'NNS',
 'next-next-shape': 'lowercase',
 'next-next-word': 'demonstrators',
 'next-pos': 'IN',
 'next-shape': 'lowercase',
 'next-word': 'of',
 'pos': 'NNS',
 'prev-iob': '__START1__',
 'prev-lemma': '__START1__',
 'prev-pos': '__START1__',
 'prev-prev-iob': '__START2__',
 'prev-prev-lemma': '__START2__',
 'prev-prev-pos': '__START2__',
 'prev-prev-shape': 'wildcard',
 'prev-prev-word': '__START2__',
 'prev-shape': 'wildcard',
 'prev-word': '__START1__',
 'shape': 'capitalized',
 'word': 'Thousands'}

In [0]:
from sklearn.linear_model import Perceptron
from sklearn.feature_extraction import DictVectorizer
from sklearn.pipeline import Pipeline

In [0]:
vectorizer = DictVectorizer(sparse=False)


In [0]:
ner_clf = Pipeline([
    ('vectorizer', vectorizer),
    ('classifier', Perceptron(verbose=10, n_jobs=-1, n_iter=5))
])


In [0]:
X, y = get_minibatch(dataset, ner_features, 500)
vectorizer.fit(X)
while len(X):
    X = vectorizer.transform(X)
    clf.partial_fit(X, y, classes)
    X, y = get_minibatch(dataset, ner_features, 500)

100%|██████████| 500/500 [00:06<00:00, 75.05it/s]


-- Epoch 1-- Epoch 1

-- Epoch 1-- Epoch 1

Norm: 16.91, NNZs: 231, Bias: -4.000000, T: 10976, Avg. loss: 0.009293
Total training time: 0.53 seconds.
-- Epoch 1
Norm: 54.09, NNZs: 1445, Bias: -9.000000, T: 10976, Avg. loss: 0.134384
Total training time: 0.59 seconds.
-- Epoch 1
Norm: 62.13, NNZs: 1837, Bias: -9.000000, T: 10976, Avg. loss: 0.165179
Total training time: 0.61 seconds.
-- Epoch 1
Norm: 25.38, NNZs: 480, Bias: -3.000000, T: 10976, Avg. loss: 0.025784
Total training time: 0.62 seconds.
-- Epoch 1
Norm: 14.70, NNZs: 150, Bias: -3.000000, T: 10976, Avg. loss: 0.003371
Total training time: 0.56 seconds.
-- Epoch 1
Norm: 52.57, NNZs: 1439, Bias: -10.000000, T: 10976, Avg. loss: 0.114705
Total training time: 0.57 seconds.
-- Epoch 1
Norm: 45.89, NNZs: 1130, Bias: -6.000000, T: 10976, Avg. loss: 0.070062
Total training time: 0.55 seconds.
-- Epoch 1
Norm: 48.79, NNZs: 1070, Bias: -6.000000, T: 10976, Avg. loss: 0.065962
Total training time: 0.59 seconds.
-- Epoch 1


[Parallel(n_jobs=-1)]: Done   5 tasks      | elapsed:    1.1s


Norm: 25.14, NNZs: 387, Bias: -4.000000, T: 10976, Avg. loss: 0.010295
Total training time: 0.56 seconds.
-- Epoch 1
Norm: 18.44, NNZs: 228, Bias: -4.000000, T: 10976, Avg. loss: 0.004738
Total training time: 0.54 seconds.
-- Epoch 1
Norm: 25.57, NNZs: 375, Bias: -5.000000, T: 10976, Avg. loss: 0.014122
Total training time: 0.55 seconds.
-- Epoch 1
Norm: 24.25, NNZs: 376, Bias: -5.000000, T: 10976, Avg. loss: 0.010842
Total training time: 0.59 seconds.
-- Epoch 1


[Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed:    1.7s
[Parallel(n_jobs=-1)]: Done  12 out of  17 | elapsed:    1.8s remaining:    0.8s


Norm: 10.86, NNZs: 97, Bias: -3.000000, T: 10976, Avg. loss: 0.001549
Total training time: 0.57 seconds.
-- Epoch 1
Norm: 39.34, NNZs: 796, Bias: -8.000000, T: 10976, Avg. loss: 0.031341
Total training time: 0.56 seconds.
Norm: 31.18, NNZs: 497, Bias: -6.000000, T: 10976, Avg. loss: 0.008564
Total training time: 0.64 seconds.
Norm: 19.03, NNZs: 243, Bias: -4.000000, T: 10976, Avg. loss: 0.005831
Total training time: 0.56 seconds.


[Parallel(n_jobs=-1)]: Done  14 out of  17 | elapsed:    2.3s remaining:    0.5s
[Parallel(n_jobs=-1)]: Done  17 out of  17 | elapsed:    2.7s finished
  2%|▏         | 10/500 [00:00<00:05, 93.87it/s]

Norm: 57.95, NNZs: 1466, Bias: 1.000000, T: 10976, Avg. loss: 0.091108
Total training time: 0.41 seconds.


100%|██████████| 500/500 [00:06<00:00, 82.13it/s]


-- Epoch 1-- Epoch 1-- Epoch 1-- Epoch 1



Norm: 23.92, NNZs: 302, Bias: -5.000000, T: 10976, Avg. loss: 0.002278
Total training time: 0.54 seconds.
-- Epoch 1
Norm: 76.85, NNZs: 2195, Bias: -11.000000, T: 10976, Avg. loss: 0.070517
Total training time: 0.55 seconds.
-- Epoch 1
Norm: 66.54, NNZs: 1776, Bias: -11.000000, T: 10976, Avg. loss: 0.053389
Total training time: 0.57 seconds.Norm: 33.76, NNZs: 612, Bias: -5.000000, T: 10976, Avg. loss: 0.008109
Total training time: 0.58 seconds.
-- Epoch 1

-- Epoch 1
Norm: 54.95, NNZs: 1374, Bias: -8.000000, T: 10976, Avg. loss: 0.022048
Total training time: 0.56 seconds.
-- Epoch 1
Norm: 66.86, NNZs: 1743, Bias: -10.000000, T: 10976, Avg. loss: 0.043823
Total training time: 0.61 seconds.
-- Epoch 1
Norm: 58.67, NNZs: 1259, Bias: -6.000000, T: 10976, Avg. loss: 0.020773
Total training time: 0.60 seconds.
-- Epoch 1


[Parallel(n_jobs=-1)]: Done   5 tasks      | elapsed:    1.1s


Norm: 16.25, NNZs: 164, Bias: -4.000000, T: 10976, Avg. loss: 0.000638
Total training time: 0.81 seconds.
-- Epoch 1
Norm: 31.02, NNZs: 477, Bias: -7.000000, T: 10976, Avg. loss: 0.003735
Total training time: 0.54 seconds.
-- Epoch 1
Norm: 20.45, NNZs: 256, Bias: -4.000000, T: 10976, Avg. loss: 0.000638
Total training time: 0.57 seconds.
-- Epoch 1
Norm: 27.71, NNZs: 456, Bias: -5.000000, T: 10976, Avg. loss: 0.000729
Total training time: 0.63 seconds.
-- Epoch 1


[Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed:    1.7s


Norm: 27.09, NNZs: 424, Bias: -5.000000, T: 10976, Avg. loss: 0.003827
Total training time: 0.67 seconds.
-- Epoch 1


[Parallel(n_jobs=-1)]: Done  12 out of  17 | elapsed:    2.0s remaining:    0.8s


Norm: 10.86, NNZs: 97, Bias: -3.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.58 seconds.
-- Epoch 1
Norm: 45.01, NNZs: 953, Bias: -8.000000, T: 10976, Avg. loss: 0.007562
Total training time: 0.63 seconds.
Norm: 34.67, NNZs: 576, Bias: -7.000000, T: 10976, Avg. loss: 0.002551
Total training time: 0.64 seconds.


[Parallel(n_jobs=-1)]: Done  14 out of  17 | elapsed:    2.4s remaining:    0.5s


Norm: 19.60, NNZs: 253, Bias: -4.000000, T: 10976, Avg. loss: 0.001093
Total training time: 0.54 seconds.
Norm: 69.94, NNZs: 1813, Bias: 1.000000, T: 10976, Avg. loss: 0.045372
Total training time: 0.46 seconds.


[Parallel(n_jobs=-1)]: Done  17 out of  17 | elapsed:    2.8s finished
100%|██████████| 500/500 [00:06<00:00, 78.26it/s]


-- Epoch 1-- Epoch 1
-- Epoch 1
-- Epoch 1

Norm: 37.71, NNZs: 677, Bias: -5.000000, T: 10976, Avg. loss: 0.006013
Total training time: 0.55 seconds.
-- Epoch 1
Norm: 25.38, NNZs: 302, Bias: -5.000000, T: 10976, Avg. loss: 0.000911
Total training time: 0.57 seconds.
-- Epoch 1
Norm: 84.38, NNZs: 2349, Bias: -11.000000, T: 10976, Avg. loss: 0.034803
Total training time: 0.60 seconds.
-- Epoch 1
Norm: 75.49, NNZs: 1926, Bias: -10.000000, T: 10976, Avg. loss: 0.036352
Total training time: 0.61 seconds.
-- Epoch 1
Norm: 16.25, NNZs: 164, Bias: -4.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.56 seconds.
-- Epoch 1
Norm: 75.01, NNZs: 1879, Bias: -10.000000, T: 10976, Avg. loss: 0.022777
Total training time: 0.55 seconds.
-- Epoch 1
Norm: 62.27, NNZs: 1351, Bias: -6.000000, T: 10976, Avg. loss: 0.009475
Total training time: 0.62 seconds.
-- Epoch 1
Norm: 61.25, NNZs: 1514, Bias: -9.000000, T: 10976, Avg. loss: 0.008837
Total training time: 0.64 seconds.
-- Epoch 1


[Parallel(n_jobs=-1)]: Done   5 tasks      | elapsed:    1.1s


Norm: 21.31, NNZs: 269, Bias: -4.000000, T: 10976, Avg. loss: 0.000456
Total training time: 0.54 seconds.
-- Epoch 1
Norm: 27.71, NNZs: 456, Bias: -5.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.57 seconds.
-- Epoch 1
Norm: 33.88, NNZs: 527, Bias: -7.000000, T: 10976, Avg. loss: 0.002004
Total training time: 0.60 seconds.
-- Epoch 1


[Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed:    1.7s
[Parallel(n_jobs=-1)]: Done  12 out of  17 | elapsed:    1.9s remaining:    0.8s


Norm: 29.05, NNZs: 457, Bias: -5.000000, T: 10976, Avg. loss: 0.001276
Total training time: 0.62 seconds.
-- Epoch 1
Norm: 10.86, NNZs: 97, Bias: -3.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.59 seconds.
-- Epoch 1
Norm: 47.87, NNZs: 1020, Bias: -8.000000, T: 10976, Avg. loss: 0.002551
Total training time: 0.60 seconds.
Norm: 35.72, NNZs: 609, Bias: -7.000000, T: 10976, Avg. loss: 0.000820
Total training time: 0.51 seconds.
Norm: 19.60, NNZs: 253, Bias: -4.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.58 seconds.


[Parallel(n_jobs=-1)]: Done  14 out of  17 | elapsed:    2.3s remaining:    0.5s
[Parallel(n_jobs=-1)]: Done  17 out of  17 | elapsed:    2.7s finished
  2%|▏         | 9/500 [00:00<00:05, 87.86it/s]

Norm: 77.81, NNZs: 2011, Bias: 2.000000, T: 10976, Avg. loss: 0.019224
Total training time: 0.43 seconds.


100%|██████████| 500/500 [00:06<00:00, 74.51it/s]


-- Epoch 1-- Epoch 1-- Epoch 1-- Epoch 1



Norm: 25.38, NNZs: 302, Bias: -5.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.53 seconds.
-- Epoch 1
Norm: 81.41, NNZs: 2023, Bias: -12.000000, T: 10976, Avg. loss: 0.020590
Total training time: 0.54 seconds.
-- Epoch 1
Norm: 40.84, NNZs: 707, Bias: -5.000000, T: 10976, Avg. loss: 0.001367
Total training time: 0.59 seconds.
-- Epoch 1
Norm: 90.72, NNZs: 2467, Bias: -10.000000, T: 10976, Avg. loss: 0.022686
Total training time: 0.65 seconds.
-- Epoch 1
Norm: 16.25, NNZs: 164, Bias: -4.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.54 seconds.
-- Epoch 1
Norm: 81.10, NNZs: 1981, Bias: -11.000000, T: 10976, Avg. loss: 0.017493
Total training time: 0.54 seconds.
-- Epoch 1
Norm: 63.89, NNZs: 1552, Bias: -9.000000, T: 10976, Avg. loss: 0.005922
Total training time: 0.57 seconds.
-- Epoch 1
Norm: 65.92, NNZs: 1417, Bias: -6.000000, T: 10976, Avg. loss: 0.003827
Total training time: 0.57 seconds.
-- Epoch 1


[Parallel(n_jobs=-1)]: Done   5 tasks      | elapsed:    1.1s


Norm: 27.71, NNZs: 456, Bias: -5.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.57 seconds.
-- Epoch 1
Norm: 21.31, NNZs: 269, Bias: -4.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.60 seconds.
-- Epoch 1
Norm: 29.90, NNZs: 467, Bias: -6.000000, T: 10976, Avg. loss: 0.000364
Total training time: 0.57 seconds.
-- Epoch 1
Norm: 34.47, NNZs: 527, Bias: -7.000000, T: 10976, Avg. loss: 0.000273
Total training time: 0.63 seconds.
-- Epoch 1


[Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed:    1.7s
[Parallel(n_jobs=-1)]: Done  12 out of  17 | elapsed:    1.8s remaining:    0.7s


Norm: 36.36, NNZs: 624, Bias: -7.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.54 seconds.
-- Epoch 1
Norm: 48.74, NNZs: 1034, Bias: -9.000000, T: 10976, Avg. loss: 0.000638
Total training time: 0.67 seconds.
Norm: 10.86, NNZs: 97, Bias: -3.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.72 seconds.
Norm: 19.60, NNZs: 253, Bias: -4.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.58 seconds.


[Parallel(n_jobs=-1)]: Done  14 out of  17 | elapsed:    2.4s remaining:    0.5s
[Parallel(n_jobs=-1)]: Done  17 out of  17 | elapsed:    2.7s finished
  2%|▏         | 9/500 [00:00<00:05, 86.33it/s]

Norm: 82.79, NNZs: 2109, Bias: 3.000000, T: 10976, Avg. loss: 0.012573
Total training time: 0.38 seconds.


100%|██████████| 500/500 [00:06<00:00, 76.72it/s]


-- Epoch 1-- Epoch 1

-- Epoch 1-- Epoch 1

Norm: 42.17, NNZs: 720, Bias: -5.000000, T: 10976, Avg. loss: 0.001276
Total training time: 0.54 seconds.
-- Epoch 1
Norm: 86.08, NNZs: 2065, Bias: -11.000000, T: 10976, Avg. loss: 0.010477
Total training time: 0.57 seconds.
-- Epoch 1
Norm: 96.50, NNZs: 2582, Bias: -11.000000, T: 10976, Avg. loss: 0.017857
Total training time: 0.58 seconds.
-- Epoch 1
Norm: 25.38, NNZs: 302, Bias: -5.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.61 seconds.
-- Epoch 1
Norm: 16.25, NNZs: 164, Bias: -4.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.54 seconds.
-- Epoch 1
Norm: 84.82, NNZs: 2039, Bias: -11.000000, T: 10976, Avg. loss: 0.009657
Total training time: 0.53 seconds.
-- Epoch 1
Norm: 65.21, NNZs: 1575, Bias: -9.000000, T: 10976, Avg. loss: 0.002733
Total training time: 0.58 seconds.
-- Epoch 1
Norm: 68.09, NNZs: 1445, Bias: -6.000000, T: 10976, Avg. loss: 0.003553
Total training time: 0.60 seconds.
-- Epoch 1


[Parallel(n_jobs=-1)]: Done   5 tasks      | elapsed:    1.1s


Norm: 27.71, NNZs: 456, Bias: -5.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.54 seconds.
-- Epoch 1
Norm: 21.31, NNZs: 269, Bias: -4.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.59 seconds.
-- Epoch 1
Norm: 34.47, NNZs: 527, Bias: -7.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.58 seconds.
-- Epoch 1
Norm: 29.90, NNZs: 467, Bias: -6.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.58 seconds.
-- Epoch 1


[Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed:    1.7s
[Parallel(n_jobs=-1)]: Done  12 out of  17 | elapsed:    1.8s remaining:    0.8s


Norm: 10.86, NNZs: 97, Bias: -3.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.60 seconds.
-- Epoch 1
Norm: 50.06, NNZs: 1049, Bias: -9.000000, T: 10976, Avg. loss: 0.000364
Total training time: 0.59 seconds.
Norm: 36.36, NNZs: 624, Bias: -7.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.54 seconds.
Norm: 19.60, NNZs: 253, Bias: -4.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.59 seconds.


[Parallel(n_jobs=-1)]: Done  14 out of  17 | elapsed:    2.3s remaining:    0.5s
[Parallel(n_jobs=-1)]: Done  17 out of  17 | elapsed:    2.7s finished
  2%|▏         | 11/500 [00:00<00:04, 106.75it/s]

Norm: 86.83, NNZs: 2164, Bias: 2.000000, T: 10976, Avg. loss: 0.007744
Total training time: 0.42 seconds.


100%|██████████| 500/500 [00:05<00:00, 87.01it/s]


-- Epoch 1
-- Epoch 1-- Epoch 1

-- Epoch 1
Norm: 100.57, NNZs: 2653, Bias: -12.000000, T: 10976, Avg. loss: 0.013757
Total training time: 0.62 seconds.
-- Epoch 1
Norm: 44.02, NNZs: 741, Bias: -4.000000, T: 10976, Avg. loss: 0.000456
Total training time: 0.72 seconds.
-- Epoch 1
Norm: 25.38, NNZs: 302, Bias: -5.000000, T: 10976, Avg. loss: 0.000000
Norm: 89.16, NNZs: 2130, Bias: -12.000000, T: 10976, Avg. loss: 0.006833Total training time: 0.75 seconds.

Total training time: 0.75 seconds.
-- Epoch 1
-- Epoch 1
Norm: 16.25, NNZs: 164, Bias: -4.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.53 seconds.
-- Epoch 1
Norm: 88.36, NNZs: 2077, Bias: -11.000000, T: 10976, Avg. loss: 0.003462
Total training time: 0.61 seconds.
-- Epoch 1
Norm: 70.09, NNZs: 1468, Bias: -6.000000, T: 10976, Avg. loss: 0.002095
Total training time: 0.60 seconds.
-- Epoch 1


[Parallel(n_jobs=-1)]: Done   5 tasks      | elapsed:    1.2s


Norm: 67.35, NNZs: 1615, Bias: -9.000000, T: 10976, Avg. loss: 0.001731
Total training time: 0.73 seconds.
-- Epoch 1
Norm: 27.71, NNZs: 456, Bias: -5.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.54 seconds.
-- Epoch 1
Norm: 21.31, NNZs: 269, Bias: -4.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.55 seconds.
-- Epoch 1


[Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed:    1.9s


Norm: 34.47, NNZs: 527, Bias: -7.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.63 seconds.
-- Epoch 1
Norm: 29.90, NNZs: 467, Bias: -6.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.68 seconds.
-- Epoch 1


[Parallel(n_jobs=-1)]: Done  12 out of  17 | elapsed:    2.2s remaining:    0.9s


Norm: 10.86, NNZs: 97, Bias: -3.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.54 seconds.
-- Epoch 1
Norm: 50.06, NNZs: 1049, Bias: -9.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.59 seconds.
Norm: 36.36, NNZs: 624, Bias: -7.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.61 seconds.


[Parallel(n_jobs=-1)]: Done  14 out of  17 | elapsed:    2.5s remaining:    0.5s
[Parallel(n_jobs=-1)]: Done  17 out of  17 | elapsed:    2.7s finished
  0%|          | 0/500 [00:00<?, ?it/s]

Norm: 19.60, NNZs: 253, Bias: -4.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.53 seconds.
Norm: 88.52, NNZs: 2185, Bias: 3.000000, T: 10976, Avg. loss: 0.004100
Total training time: 0.49 seconds.


100%|██████████| 500/500 [00:05<00:00, 87.83it/s]


-- Epoch 1-- Epoch 1
-- Epoch 1-- Epoch 1


Norm: 103.63, NNZs: 2702, Bias: -11.000000, T: 10976, Avg. loss: 0.009202
Total training time: 0.56 seconds.
-- Epoch 1
Norm: 25.38, NNZs: 302, Bias: -5.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.62 seconds.
-- Epoch 1
Norm: 92.39, NNZs: 2163, Bias: -12.000000, T: 10976, Avg. loss: 0.006833
Total training time: 0.63 seconds.
-- Epoch 1
Norm: 44.29, NNZs: 761, Bias: -6.000000, T: 10976, Avg. loss: 0.000091
Total training time: 0.69 seconds.
-- Epoch 1
Norm: 16.25, NNZs: 164, Bias: -4.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.54 seconds.
-- Epoch 1
Norm: 90.56, NNZs: 2108, Bias: -11.000000, T: 10976, Avg. loss: 0.005102
Total training time: 0.57 seconds.
-- Epoch 1
Norm: 68.95, NNZs: 1641, Bias: -10.000000, T: 10976, Avg. loss: 0.002824
Total training time: 0.60 seconds.
-- Epoch 1


[Parallel(n_jobs=-1)]: Done   5 tasks      | elapsed:    1.1s


Norm: 70.41, NNZs: 1484, Bias: -7.000000, T: 10976, Avg. loss: 0.000456
Total training time: 0.64 seconds.
-- Epoch 1
Norm: 27.71, NNZs: 456, Bias: -5.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.67 seconds.
-- Epoch 1
Norm: 21.31, NNZs: 269, Bias: -4.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.59 seconds.
-- Epoch 1
Norm: 34.47, NNZs: 527, Bias: -7.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.63 seconds.
-- Epoch 1
Norm: 29.90, NNZs: 467, Bias: -6.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.62 seconds.
-- Epoch 1


[Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed:    1.8s
[Parallel(n_jobs=-1)]: Done  12 out of  17 | elapsed:    2.0s remaining:    0.8s


Norm: 50.06, NNZs: 1049, Bias: -9.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.56 seconds.
-- Epoch 1
Norm: 10.86, NNZs: 97, Bias: -3.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.62 seconds.
Norm: 36.36, NNZs: 624, Bias: -7.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.59 seconds.
Norm: 19.60, NNZs: 253, Bias: -4.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.57 seconds.


[Parallel(n_jobs=-1)]: Done  14 out of  17 | elapsed:    2.4s remaining:    0.5s
[Parallel(n_jobs=-1)]: Done  17 out of  17 | elapsed:    2.8s finished
  2%|▏         | 10/500 [00:00<00:05, 96.60it/s]

Norm: 89.58, NNZs: 2214, Bias: 2.000000, T: 10976, Avg. loss: 0.003189
Total training time: 0.45 seconds.


100%|██████████| 500/500 [00:05<00:00, 96.66it/s]


-- Epoch 1
-- Epoch 1
-- Epoch 1-- Epoch 1

Norm: 106.16, NNZs: 2740, Bias: -12.000000, T: 10976, Avg. loss: 0.005193
Total training time: 0.56 seconds.
-- Epoch 1
Norm: 94.38, NNZs: 2169, Bias: -11.000000, T: 10976, Avg. loss: 0.004009
Total training time: 0.58 seconds.
-- Epoch 1
Norm: 25.38, NNZs: 302, Bias: -5.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.58 seconds.Norm: 44.74, NNZs: 773, Bias: -6.000000, T: 10976, Avg. loss: 0.000273

Total training time: 0.58 seconds.-- Epoch 1

-- Epoch 1
Norm: 16.25, NNZs: 164, Bias: -4.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.56 seconds.
-- Epoch 1Norm: 92.04, NNZs: 2139, Bias: -12.000000, T: 10976, Avg. loss: 0.002369
Total training time: 0.54 seconds.

-- Epoch 1
Norm: 70.20, NNZs: 1637, Bias: -9.000000, T: 10976, Avg. loss: 0.000273
Total training time: 0.55 seconds.
-- Epoch 1
Norm: 70.41, NNZs: 1484, Bias: -7.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.57 seconds.
-- Epoch 1


[Parallel(n_jobs=-1)]: Done   5 tasks      | elapsed:    1.1s


Norm: 34.47, NNZs: 527, Bias: -7.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.58 seconds.
-- Epoch 1
Norm: 21.31, NNZs: 269, Bias: -4.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.61 seconds.
-- Epoch 1
Norm: 29.90, NNZs: 467, Bias: -6.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.58 seconds.
-- Epoch 1
Norm: 27.71, NNZs: 456, Bias: -5.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.64 seconds.
-- Epoch 1


[Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed:    1.7s
[Parallel(n_jobs=-1)]: Done  12 out of  17 | elapsed:    1.8s remaining:    0.7s


Norm: 10.86, NNZs: 97, Bias: -3.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.59 seconds.
-- Epoch 1
Norm: 36.36, NNZs: 624, Bias: -7.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.58 seconds.
Norm: 50.06, NNZs: 1049, Bias: -9.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.62 seconds.
Norm: 19.60, NNZs: 253, Bias: -4.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.61 seconds.


[Parallel(n_jobs=-1)]: Done  14 out of  17 | elapsed:    2.3s remaining:    0.5s
[Parallel(n_jobs=-1)]: Done  17 out of  17 | elapsed:    2.7s finished
  2%|▏         | 8/500 [00:00<00:06, 75.17it/s]

Norm: 90.52, NNZs: 2247, Bias: 2.000000, T: 10976, Avg. loss: 0.000638
Total training time: 0.40 seconds.


100%|██████████| 500/500 [00:05<00:00, 85.70it/s]


-- Epoch 1-- Epoch 1

-- Epoch 1
-- Epoch 1
Norm: 45.17, NNZs: 778, Bias: -6.000000, T: 10976, Avg. loss: 0.000364
Total training time: 0.56 seconds.
-- Epoch 1Norm: 25.38, NNZs: 302, Bias: -5.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.56 seconds.

-- Epoch 1
Norm: 108.89, NNZs: 2803, Bias: -13.000000, T: 10976, Avg. loss: 0.006651
Total training time: 0.57 seconds.
-- Epoch 1
Norm: 96.11, NNZs: 2197, Bias: -12.000000, T: 10976, Avg. loss: 0.001731
Total training time: 0.59 seconds.
-- Epoch 1
Norm: 70.81, NNZs: 1658, Bias: -10.000000, T: 10976, Avg. loss: 0.000456
Total training time: 0.52 seconds.
-- Epoch 1
Norm: 16.25, NNZs: 164, Bias: -4.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.55 seconds.
-- Epoch 1
Norm: 93.47, NNZs: 2158, Bias: -11.000000, T: 10976, Avg. loss: 0.002642
Total training time: 0.58 seconds.
-- Epoch 1
Norm: 70.41, NNZs: 1484, Bias: -7.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.58 seconds.
-- Epoch 1


[Parallel(n_jobs=-1)]: Done   5 tasks      | elapsed:    1.1s


Norm: 21.31, NNZs: 269, Bias: -4.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.55 seconds.
-- Epoch 1
Norm: 27.71, NNZs: 456, Bias: -5.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.58 seconds.
-- Epoch 1
Norm: 34.47, NNZs: 527, Bias: -7.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.61 seconds.Norm: 29.90, NNZs: 467, Bias: -6.000000, T: 10976, Avg. loss: 0.000000

Total training time: 0.58 seconds.
-- Epoch 1
-- Epoch 1


[Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed:    1.7s
[Parallel(n_jobs=-1)]: Done  12 out of  17 | elapsed:    1.8s remaining:    0.7s


Norm: 50.06, NNZs: 1049, Bias: -9.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.54 seconds.
-- Epoch 1
Norm: 10.86, NNZs: 97, Bias: -3.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.79 seconds.
Norm: 36.36, NNZs: 624, Bias: -7.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.73 seconds.
Norm: 19.60, NNZs: 253, Bias: -4.000000, T: 10976, Avg. loss: 0.000000
Total training time: 0.78 seconds.


[Parallel(n_jobs=-1)]: Done  14 out of  17 | elapsed:    2.5s remaining:    0.5s
[Parallel(n_jobs=-1)]: Done  17 out of  17 | elapsed:    2.7s finished
  1%|          | 5/500 [00:00<00:10, 48.58it/s]

Norm: 91.52, NNZs: 2278, Bias: 3.000000, T: 10976, Avg. loss: 0.000911
Total training time: 0.50 seconds.


100%|██████████| 500/500 [00:06<00:00, 72.00it/s]


KeyboardInterrupt: 

add results visualization