# State-of-the-art w NLP
### ImageNet for language
#### Rafal Pronko

### Rafal Pronko
rafalpronko @ LinkedIn, Telegram, Twitter, Tungsten


| <img src="czmis.png" width="100">  | <img src="webinterpret.png" width="100">  |  <img src="pm.jpg" width="100"> |
|---|---|---|
| <img src="ynd.png" width="100">  | <img src="cleanride.svg" width="100">  | <img src="cvtimeline.jpg" width="100">  |






### Klasyfikacja tekstu

![](text_classification.png)

1. Kategoryzacja ogłoszeń na portalach (Ebay / Amazon / Allegro ...)
2. Wykrywanie niechcianych tekstów: SPAM / mowa nienawiści ...
3. Klasyfikacja artykułów: przypisywanie kategorii / wykrywanie nieprawdziwych informacji
4. Klasyfikacja zawodów ...

### Google wybor modelu
<img src="google_text_classification.png" width="500">

https://developers.google.com/machine-learning/guides/text-classification/step-2-5

In [43]:
import pandas as pd
from imblearn.under_sampling import RandomUnderSampler

In [40]:
df = pd.read_csv("train-2.csv")

In [49]:
df.shape

(1306122, 3)

In [50]:
rus = RandomUnderSampler()

In [51]:
X_sample, y_sample = rus.fit_resample(df, df["target"])

In [52]:
df = pd.DataFrame(X_sample)

In [55]:
df.columns = ["id", "question_text", "target"]

In [56]:
df.shape

(161620, 3)

In [57]:
df.head()

Unnamed: 0,id,question_text,target
0,f61330815ecb292abbac,Why is there a quota in education when educati...,0
1,dd2f5b1bf1a9c7a04764,What is the reason that media is so under thre...,0
2,edfbf901bd9cd28aedb0,Why did iron smelting in ancient times require...,0
3,fca7888ce077fb047f87,Why are there no holidays in June?,0
4,c5ff781cb1a9441049c9,My dad is visiting and I am jobless now. I fee...,0


In [58]:
df[df["target"] == 0].shape[0] / df.shape[0]

0.5

In [59]:
df["question_text"] = df.question_text.str.lower()

In [60]:
df.head()

Unnamed: 0,id,question_text,target
0,f61330815ecb292abbac,why is there a quota in education when educati...,0
1,dd2f5b1bf1a9c7a04764,what is the reason that media is so under thre...,0
2,edfbf901bd9cd28aedb0,why did iron smelting in ancient times require...,0
3,fca7888ce077fb047f87,why are there no holidays in june?,0
4,c5ff781cb1a9441049c9,my dad is visiting and i am jobless now. i fee...,0


In [61]:
from tensorflow.python.keras.layers import Dense
from tensorflow.python.keras.layers import Dropout
from tensorflow.python.keras.layers import Embedding
from tensorflow.python.keras.layers import SeparableConv1D, Convolution1D, Bidirectional, GRU
from tensorflow.python.keras.layers import MaxPooling1D
from tensorflow.python.keras.layers import GlobalAveragePooling1D, GlobalMaxPooling1D
from sklearn.model_selection import train_test_split
from tensorflow.python.keras.preprocessing import sequence
from tensorflow.python.keras.preprocessing import text
from tensorflow.python.keras import models
from tensorflow.python.keras import initializers
from tensorflow.python.keras import regularizers
from tensorflow import keras

In [62]:
X_train, X_test, y_train, y_test = train_test_split(df["question_text"], df["target"])

In [63]:
TOP_K = 20001

In [64]:
MAX_SEQUENCE_LENGTH = 60
MAX_VECTOR_EMD = 200

In [65]:
def sequence_vectorize(train_texts, val_texts):
    """Vectorizes texts as sequence vectors.

    1 text = 1 sequence vector with fixed length.

    # Arguments
        train_texts: list, training text strings.
        val_texts: list, validation text strings.

    # Returns
        x_train, x_val, word_index: vectorized training and validation
            texts and word index dictionary.
    """
    # Create vocabulary with training texts.
    tokenizer = text.Tokenizer(num_words=TOP_K)
    tokenizer.fit_on_texts(train_texts)

    # Vectorize training and validation texts.
    x_train = tokenizer.texts_to_sequences(train_texts)
    x_val = tokenizer.texts_to_sequences(val_texts)

    # Get max sequence length.
    max_length = len(max(x_train, key=len))
    if max_length > MAX_SEQUENCE_LENGTH:
        max_length = MAX_SEQUENCE_LENGTH

    # Fix sequence length to max value. Sequences shorter than the length are
    # padded in the beginning and sequences longer are truncated
    # at the beginning.
    x_train = sequence.pad_sequences(x_train, maxlen=max_length)
    x_val = sequence.pad_sequences(x_val, maxlen=max_length)
    return x_train, x_val, tokenizer.word_index, tokenizer

In [66]:
X_train_seq, X_test_seq, idxs, tok = sequence_vectorize(X_train, X_test)

In [67]:
def _get_last_layer_units_and_activation(num_classes):
    """Gets the # units and activation function for the last network layer.

    # Arguments
        num_classes: int, number of classes.

    # Returns
        units, activation values.
    """
    if num_classes == 2:
        activation = 'sigmoid'
        units = 1
    else:
        activation = 'softmax'
        units = num_classes
    return units, activation

In [68]:
def sepcnn_model(blocks,
                 filters,
                 kernel_size,
                 embedding_dim,
                 dropout_rate,
                 pool_size,
                 input_shape,
                 num_classes,
                 num_features,
                 use_pretrained_embedding=False,
                 is_embedding_trainable=False,
                 embedding_matrix=None):
    """Creates an instance of a separable CNN model.

    # Arguments
        blocks: int, number of pairs of sepCNN and pooling blocks in the model.
        filters: int, output dimension of the layers.
        kernel_size: int, length of the convolution window.
        embedding_dim: int, dimension of the embedding vectors.
        dropout_rate: float, percentage of input to drop at Dropout layers.
        pool_size: int, factor by which to downscale input at MaxPooling layer.
        input_shape: tuple, shape of input to the model.
        num_classes: int, number of output classes.
        num_features: int, number of words (embedding input dimension).
        use_pretrained_embedding: bool, true if pre-trained embedding is on.
        is_embedding_trainable: bool, true if embedding layer is trainable.
        embedding_matrix: dict, dictionary with embedding coefficients.

    # Returns
        A sepCNN model instance.
    """
    op_units, op_activation = _get_last_layer_units_and_activation(num_classes)
    model = models.Sequential()

    # Add embedding layer. If pre-trained embedding is used add weights to the
    # embeddings layer and set trainable to input is_embedding_trainable flag.
    if use_pretrained_embedding:
        model.add(Embedding(input_dim=num_features,
                            output_dim=embedding_dim,
                            input_length=input_shape[0],
                            weights=[embedding_matrix],
                            trainable=is_embedding_trainable))
    else:
        model.add(Embedding(input_dim=num_features,
                            output_dim=embedding_dim,
                            input_length=input_shape[0]))
    
    for _ in range(blocks-1):
        model.add(Dropout(dropout_rate))
        model.add(SeparableConv1D(filters=filters,
                                  kernel_size=kernel_size,
                                  activation='relu',
                                  bias_initializer='random_uniform',
                                  depthwise_initializer='random_uniform',
                                  padding='same'))
        model.add(SeparableConv1D(filters=filters,
                                  kernel_size=kernel_size,
                                  activation='relu',
                                  bias_initializer='random_uniform',
                                  depthwise_initializer='random_uniform',
                                  padding='same'))
        model.add(MaxPooling1D(pool_size=pool_size))

    model.add(SeparableConv1D(filters=filters * 2,
                              kernel_size=kernel_size,
                              activation='relu',
                              bias_initializer='random_uniform',
                              depthwise_initializer='random_uniform',
                              padding='same'))
    model.add(SeparableConv1D(filters=filters * 2,
                              kernel_size=kernel_size,
                              activation='relu',
                              bias_initializer='random_uniform',
                              depthwise_initializer='random_uniform',
                              padding='same'))
    model.add(GlobalMaxPooling1D())
    model.add(Dense(16, activation='relu'))
    model.add(Dense(op_units, activation=op_activation))
    return model

In [76]:
model = sepcnn_model(blocks=4,
                     filters=32,
                     kernel_size=5,
                     embedding_dim=MAX_VECTOR_EMD,
                     dropout_rate=0.2,
                     pool_size=3,
                     input_shape=(MAX_SEQUENCE_LENGTH,),
                     num_classes=2,
                     num_features=TOP_K,
                     use_pretrained_embedding=False,
                     is_embedding_trainable=True,
                     )

In [77]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_5 (Embedding)      (None, 60, 200)           4000200   
_________________________________________________________________
dropout_13 (Dropout)         (None, 60, 200)           0         
_________________________________________________________________
separable_conv1d_33 (Separab (None, 60, 32)            7432      
_________________________________________________________________
separable_conv1d_34 (Separab (None, 60, 32)            1216      
_________________________________________________________________
max_pooling1d_13 (MaxPooling (None, 20, 32)            0         
_________________________________________________________________
dropout_14 (Dropout)         (None, 20, 32)            0         
_________________________________________________________________
separable_conv1d_35 (Separab (None, 20, 32)            1216      
__________

In [78]:
model.compile(optimizer=keras.optimizers.Adam(lr=0.001), loss="binary_crossentropy",
              metrics=["accuracy"])

In [79]:
model.fit(X_train_seq, y_train, validation_data=(X_test_seq, y_test), epochs=3, batch_size=512,)

Train on 121215 samples, validate on 40405 samples
Epoch 1/3

Epoch 2/3

Epoch 3/3



<tensorflow.python.keras._impl.keras.callbacks.History at 0x1577a6080>

### Bidirectional LSTM
<img src="bidirectional.png" width="800">


https://hackernoon.com/what-kagglers-are-using-for-text-classification-c695b58b5709

In [80]:
def gru_model(num_features, embedding_dim, input_shape, num_classes):
    op_units, op_activation = _get_last_layer_units_and_activation(num_classes)
    model = models.Sequential()
    model.add(Embedding(input_dim=num_features,
                            output_dim=embedding_dim,
                            input_length=input_shape[0]))
    model.add(Bidirectional(GRU(128, return_sequences=True)))
    model.add(Bidirectional(GRU(64, return_sequences=False)))
    model.add(Dense(op_units, activation=op_activation))
    return model

In [81]:
model_gru = gru_model(embedding_dim=MAX_VECTOR_EMD,
                        input_shape=(MAX_SEQUENCE_LENGTH,),
                        num_features=TOP_K,
                       num_classes=2)

In [82]:
model_gru.compile(optimizer=keras.optimizers.Adam(lr=0.001), loss="binary_crossentropy",
              metrics=["accuracy"])

In [83]:
model_gru.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_6 (Embedding)      (None, 60, 200)           4000200   
_________________________________________________________________
bidirectional_1 (Bidirection (None, None, 256)         252672    
_________________________________________________________________
bidirectional_2 (Bidirection (None, 128)               123264    
_________________________________________________________________
dense_11 (Dense)             (None, 1)                 129       
Total params: 4,376,265
Trainable params: 4,376,265
Non-trainable params: 0
_________________________________________________________________


In [85]:
model_gru.fit(X_train_seq, y_train, validation_data=(X_test_seq, y_test), epochs=3, batch_size=512)

Train on 121215 samples, validate on 40405 samples
Epoch 1/3

Epoch 2/3

Epoch 3/3



<tensorflow.python.keras._impl.keras.callbacks.History at 0x167905f28>

### Attention
<img src="attention.png" width="800">


https://hackernoon.com/what-kagglers-are-using-for-text-classification-c695b58b5709

In [86]:
from tensorflow.python.keras import backend as K
from keras.engine.topology import Layer
#from keras import initializations
from tensorflow.python.keras import initializers, regularizers, constraints


class Attention(Layer):
    def __init__(self, step_dim,
                 W_regularizer=None, b_regularizer=None,
                 W_constraint=None, b_constraint=None,
                 bias=True, **kwargs):
        """
        Keras Layer that implements an Attention mechanism for temporal data.
        Supports Masking.
        Follows the work of Raffel et al. [https://arxiv.org/abs/1512.08756]
        # Input shape
            3D tensor with shape: `(samples, steps, features)`.
        # Output shape
            2D tensor with shape: `(samples, features)`.
        :param kwargs:
        Just put it on top of an RNN Layer (GRU/LSTM/SimpleRNN) with return_sequences=True.
        The dimensions are inferred based on the output shape of the RNN.
        Example:
            model.add(LSTM(64, return_sequences=True))
            model.add(Attention())
        """
        self.supports_masking = True
        #self.init = initializations.get('glorot_uniform')
        self.init = initializers.get('glorot_uniform')

        self.W_regularizer = regularizers.get(W_regularizer)
        self.b_regularizer = regularizers.get(b_regularizer)

        self.W_constraint = constraints.get(W_constraint)
        self.b_constraint = constraints.get(b_constraint)

        self.bias = bias
        self.step_dim = step_dim
        self.features_dim = 0
        super(Attention, self).__init__(**kwargs)

    def build(self, input_shape):
        assert len(input_shape) == 3

        self.W = self.add_weight((input_shape[-1],),
                                 initializer=self.init,
                                 name='{}_W'.format(self.name),
                                 regularizer=self.W_regularizer,
                                 constraint=self.W_constraint)
        self.features_dim = input_shape[-1]

        if self.bias:
            self.b = self.add_weight((input_shape[1],),
                                     initializer='zero',
                                     name='{}_b'.format(self.name),
                                     regularizer=self.b_regularizer,
                                     constraint=self.b_constraint)
        else:
            self.b = None

        self.built = True

    def compute_mask(self, input, input_mask=None):
        # do not pass the mask to the next layers
        return None

    def call(self, x, mask=None):
        # eij = K.dot(x, self.W) TF backend doesn't support it

        # features_dim = self.W.shape[0]
        # step_dim = x._keras_shape[1]

        features_dim = self.features_dim
        step_dim = self.step_dim

        eij = K.reshape(K.dot(K.reshape(x, (-1, features_dim)), K.reshape(self.W, (features_dim, 1))), (-1, step_dim))

        if self.bias:
            eij += self.b

        eij = K.tanh(eij)

        a = K.exp(eij)

        # apply mask after the exp. will be re-normalized next
        if mask is not None:
            # Cast the mask to floatX to avoid float64 upcasting in theano
            a *= K.cast(mask, K.floatx())

        # in some cases especially in the early stages of training the sum may be almost zero
        a /= K.cast(K.sum(a, axis=1, keepdims=True) + K.epsilon(), K.floatx())

        a = K.expand_dims(a)
        weighted_input = x * a
    #print weigthted_input.shape
        return K.sum(weighted_input, axis=1)

    def compute_output_shape(self, input_shape):
        #return input_shape[0], input_shape[-1]
        return input_shape[0],  self.features_dim

Using TensorFlow backend.


In [87]:
from keras.layers import Dense, Input, LSTM, Embedding, Dropout, Activation
from keras.models import Model

In [88]:
embedding_layer = Embedding(TOP_K,
        MAX_VECTOR_EMD,
        input_length=60,
        trainable=True)

In [89]:
lstm_layer = LSTM(128, dropout=0.2, recurrent_dropout=0.2,return_sequences=True)

In [90]:
comment_input = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='int32')
embedded_sequences= embedding_layer(comment_input)
x = lstm_layer(embedded_sequences)
x = Dropout(0.2)(x)
merged = Attention(MAX_SEQUENCE_LENGTH)(x)
merged = Dense(128, activation="relu")(merged)
merged = Dropout(0.2)(merged)
preds = Dense(1, activation='sigmoid')(merged)

In [91]:
model_att = Model(inputs=[comment_input], outputs=preds)

In [92]:
model_att.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 60)                0         
_________________________________________________________________
embedding_1 (Embedding)      (None, 60, 200)           4000200   
_________________________________________________________________
lstm_1 (LSTM)                (None, 60, 128)           168448    
_________________________________________________________________
dropout_1 (Dropout)          (None, 60, 128)           0         
_________________________________________________________________
attention_1 (Attention)      (None, 128)               188       
_________________________________________________________________
dense_1 (Dense)              (None, 128)               16512     
_________________________________________________________________
dropout_2 (Dropout)          (None, 128)               0         
__________

In [93]:
model_att.compile(optimizer='adam', loss="binary_crossentropy",
              metrics=["accuracy"])

In [95]:
model_att.fit(X_train_seq, y_train, validation_data=(X_test_seq, y_test), epochs=3, batch_size=512)

Train on 121215 samples, validate on 40405 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x173bbb748>

### Podsumowanie wynikow

CNN - 0.8846

GRU - 0.8932

ATT - 0.8920


### Word Embedding

1. BoW
2. Word2Vec (CBOW i skip-gram model) - lokalny kontekst 
3. GloVe - stara sie wziac pod uwage kontekst globalny
4. FastText - zbudowany na Word2Vec jednak stara sie znalezc konteksty sub-slow (uczony jak Word2Vec + ngramy w slowie - na koncu jest usredniany)
5. ELMo (Embedding from Language Model) - pozwala na rozpoznanie sensu slowa w danym kontekscie
6. BERT - Podobny do ELMo ale... uczony inaczej



In [162]:
from flair.embeddings import WordEmbeddings
from flair.data import Sentence

In [163]:
glove_embedding = WordEmbeddings('glove')

In [164]:
sentence = Sentence('Stick to the marked trails.')
sentence2 = Sentence('I intend to stick to my promise.')

In [165]:
print(sentence)
print(sentence2)

Sentence: "Stick to the marked trails." - 5 Tokens
Sentence: "I intend to stick to my promise." - 7 Tokens


In [166]:
glove_embedding.embed(sentence)
for token in sentence:
    print(token)
    print(token.embedding)

Token: 1 Stick
tensor([-0.6643,  0.1528, -0.2895, -0.4068, -0.4737,  0.2733, -0.2304,  0.0374,
        -0.2698, -0.2610, -0.3525,  0.3428,  0.4750,  0.6637, -0.5283,  0.3865,
        -0.0985,  0.2492, -0.2734, -0.5586,  0.0603,  0.5039, -0.5368,  0.0660,
         0.3239,  0.5892, -0.6452, -0.1834,  0.0057,  0.2798, -0.0065,  1.2641,
         0.0600,  0.1529,  0.1453, -0.1880,  0.2369, -0.2504,  0.6144, -0.9344,
        -0.0688, -0.5705,  0.4045, -0.4111, -1.3672, -0.3098, -0.3376, -0.0909,
        -0.6633, -0.6661, -0.0061,  0.0937, -0.4293,  1.4003,  0.0140, -1.6013,
         0.2068,  0.3280,  1.0206,  0.0566,  0.5846,  0.4398, -0.8114,  0.4312,
        -0.0174, -0.0137,  0.0159,  0.4674, -0.6148, -0.5592, -0.0948, -0.1868,
         0.3186,  0.0247, -0.0717, -0.2578, -0.3763,  0.2470,  0.4026,  0.2124,
         0.0916,  0.0474, -0.6611, -0.3974, -0.6042, -0.0321,  0.1920,  0.6081,
         0.0280,  0.2510, -0.5960,  0.2214, -0.5980, -0.5555, -0.5342, -1.0906,
        -0.4165, -0.2219,

In [167]:
glove_embedding.embed(sentence2)
for token in sentence2:
    print(token)
    print(token.embedding)

Token: 1 I
tensor([-0.0465,  0.6197,  0.5665, -0.4658, -1.1890,  0.4460,  0.0660,  0.3191,
         0.1468, -0.2212,  0.7924,  0.2991,  0.1607,  0.0253,  0.1868, -0.3100,
        -0.2811,  0.6051, -1.0654,  0.5248,  0.0642,  1.0358, -0.4078, -0.3801,
         0.3080,  0.5996, -0.2699, -0.7603,  0.9422, -0.4692, -0.1828,  0.9065,
         0.7967,  0.2482,  0.2571,  0.6232, -0.4477,  0.6536,  0.7690, -0.5123,
        -0.4433, -0.2187,  0.3837, -1.1483, -0.9440, -0.1506,  0.3001, -0.5781,
         0.2017, -1.6591, -0.0792,  0.0264,  0.2205,  0.9971, -0.5754, -2.7266,
         0.3145,  0.7052,  1.4381,  0.9913,  0.1398,  1.3474, -1.1753,  0.0040,
         1.0298,  0.0646,  0.9089,  0.8287, -0.4700, -0.1058,  0.5916, -0.4221,
         0.5733, -0.5411,  0.1077,  0.3978, -0.0487,  0.0646, -0.6144, -0.2860,
         0.5067, -0.4976, -0.8157,  0.1641, -1.9630, -0.2669, -0.3759, -0.9585,
        -0.8584, -0.7158, -0.3234, -0.4312,  0.4139,  0.2837, -0.7093,  0.1500,
        -0.2154, -0.3762, -0.

In [168]:
s1_stick = 0
for token in sentence:
    if (token.text == 'stick') or ((token.text == 'Stick')):
        s1_stick = token.embedding

In [169]:
s2_stick = 0
for token in sentence2:
    if (token.text == 'stick') or ((token.text == 'Stick')):
        s2_stick = token.embedding

In [170]:
s1_stick == s2_stick

tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1], dtype=torch.uint8)

In [171]:
from flair.embeddings import BertEmbeddings

In [172]:
embedding = BertEmbeddings()

In [173]:
sentence = Sentence('Stick to the marked trails.')
sentence2 = Sentence('I intend to stick to my promise.')

In [174]:
embedding.embed(sentence)
for token in sentence:
    if (token.text == 'stick') or ((token.text == 'Stick')):
        print(token)
        print(token.embedding)

Token: 1 Stick
tensor([ 0.3480,  0.2575,  0.0822,  ...,  0.5876, -0.8315, -0.3655])


In [175]:
embedding.embed(sentence2)
for token in sentence2:
    if (token.text == 'stick') or ((token.text == 'Stick')):
        print(token)
        print(token.embedding)

Token: 4 stick
tensor([ 0.5259,  0.4349,  0.5928,  ...,  0.9525, -0.0109, -0.3207])


In [176]:
from flair.models import TextClassifier

```__label__<class_1> <text>
__label__<class_2> <text>```

In [180]:
df["target"] = "__label__"+df["target"].astype(str)

In [211]:
df["drop"] = df.question_text.apply(lambda x: 1 if len(x.split()) > 60 else 0)

In [212]:
df.shape

(161620, 3)

In [213]:
df.head()

Unnamed: 0,target,question_text,drop
0,__label__0,why is there a quota in education when educati...,0
1,__label__0,what is the reason that media is so under thre...,0
2,__label__0,why did iron smelting in ancient times require...,0
3,__label__0,why are there no holidays in june?,0
4,__label__0,my dad is visiting and i am jobless now. i fee...,0


In [215]:
df = df[df["drop"] == 0]

In [216]:
df = df[["target", "question_text"]]

In [217]:
# https://towardsdatascience.com/text-classification-with-state-of-the-art-nlp-library-flair-b541d7add21f
df.iloc[0:int(len(df)*0.8)].to_csv('train.csv', sep='\t', index = False, header = False)
df.iloc[int(len(df)*0.8):int(len(df)*0.9)].to_csv('test.csv', sep='\t', index = False, header = False)
df.iloc[int(len(df)*0.9):].to_csv('dev.csv', sep='\t', index = False, header = False);

In [218]:
from flair.data_fetcher import NLPTaskDataFetcher
from flair.embeddings import WordEmbeddings, FlairEmbeddings, DocumentLSTMEmbeddings, DocumentPoolEmbeddings
from flair.models import TextClassifier
from flair.trainers import ModelTrainer
from pathlib import Path


In [219]:
corpus = NLPTaskDataFetcher.load_classification_corpus(Path('./'), test_file='test.csv',
                                                       dev_file='dev.csv', train_file='train.csv')

word_embeddings = [embedding]

document_embeddings = DocumentPoolEmbeddings(word_embeddings)

2019-03-27 12:58:37,309 Reading data from .
2019-03-27 12:58:37,311 Train: train.csv
2019-03-27 12:58:37,313 Dev: dev.csv
2019-03-27 12:58:37,314 Test: test.csv


In [None]:
classifier = TextClassifier(document_embeddings,
                            label_dictionary=corpus.make_label_dictionary(),
                            multi_label=False)

trainer = ModelTrainer(classifier, corpus)

trainer.train('', max_epochs=3)

2019-03-27 12:59:17,659 ----------------------------------------------------------------------------------------------------
2019-03-27 12:59:17,660 Evaluation method: MICRO_F1_SCORE
2019-03-27 12:59:17,665 ----------------------------------------------------------------------------------------------------
2019-03-27 12:59:21,698 epoch 1 - iter 0/4041 - loss 0.02329264


### Podsumowanie
CNN - 0.8846

GRU - 0.8932

ATT - 0.8920

Flair - 


![](bert_banch.png)

### ULMFiT Universal Language Model Fine-tuning for Text Classification

![](ulm.png)

http://nlp.fast.ai/classification/2018/05/15/introducting-ulmfit.html

![](result.png)

## Q&A


rafalpronko @ LinkedIn, Telegram, Twitter, Tungsten