# PART 3: Advanced word and document embeddings and neural classification.

Here we only use *Flair* package, which is probably the most easy-to-use Python library to use state-of-the art pretrained neural language models. Using pretrained models gives major advantage as it allows leveraging information from huge datasets to smaller NLP problems (your problem). We only need to fine-tune a pretrained model to perform well for a new problem.

We'll take preprocessed data from PART 1, which we load here from pickle file.

For full documents and examples, see  
https://github.com/zalandoresearch/flair/tree/master/resources/docs  
https://www.analyticsvidhya.com/blog/2019/02/flair-nlp-library-python  https://heartbeat.fritz.ai/using-transfer-learning-and-pre-trained-language-models-to-classify-spam-549fc0f56c20  

In [1]:
# root folder of data with pickle file"
DATA_ROOT = r'C:\Users\h01928\Documents\GIT_codes\NLP_kickstart_tutorial' + r'\\'

We start with simple word vectors which are included also in Flair (Fasttext type). These embeddings are not context-dependent but always fixed.

In [2]:
from flair.embeddings import WordEmbeddings
word_embedding = WordEmbeddings('fi') # fasttext word embeddings for Finnish

from flair.data import Sentence
sentence = Sentence('hiilijalanjälki')
word_embedding.embed(sentence)
# now check out the embedded tokens.
vector = sentence[0].embedding.cpu().detach().numpy() # result is tensor, we want convert it to normal vector
print('Simple word embedding for "%s": %s' % (sentence[0],str(vector)))

Simple word embedding for "Token: 1 hiilijalanjälki": [ 0.28942   -0.25729    0.2793    -0.14168    0.21692    0.33281
 -0.076142   0.21425   -0.14987    0.50143    0.38442   -0.030661
 -0.54667   -0.06934    0.1375    -0.40508    0.58006   -0.10255
  0.13013    0.20505    0.0084429  0.18679   -0.17214   -0.015876
  0.16071    0.012619  -0.26008    0.5827     0.13461    0.38794
  0.27849    0.31263   -0.28229   -0.29986   -0.36067    0.57393
  0.45562    0.25721   -0.16588   -0.34081   -0.029271  -0.053188
 -0.28379   -0.31579   -0.16162   -0.044539   0.11141   -0.56292
  0.042089  -0.17313   -0.10631   -0.046749  -0.37972    0.12351
 -0.14223   -0.55344   -0.4255     0.21749    0.56593   -0.30287
 -0.4045     0.28351    0.14293   -0.15708    0.56132    0.8697
 -0.48887    0.1861     0.092133  -0.0092559  0.50473   -0.090265
 -0.60152    0.0038192 -0.12302   -0.11521   -0.25384    0.27161
 -0.18558   -0.12193    0.44237   -0.017731   0.10056   -0.10506
  0.33185    0.029694   0.25036  

Next we get embeddings which depend on context. These embeddings come from models, not just from a big fixed table like simple embeddings. In Flair, we can easily combine multiple models, such as Bert and Flair.

In [3]:
from flair.embeddings import FlairEmbeddings,WordEmbeddings,BertEmbeddings

# https://github.com/stefan-it/flair-lms#multilingual-flair-embeddings

# init Flair embeddings
flair_forward_embedding = FlairEmbeddings('fi-forward')
flair_backward_embedding = FlairEmbeddings('fi-backward')

# init multilingual BERT
bert_embedding = BertEmbeddings('bert-base-multilingual-cased')

from flair.embeddings import StackedEmbeddings

# now create the StackedEmbedding object that combines all embeddings
stacked_embeddings = StackedEmbeddings(
    embeddings=[flair_forward_embedding, flair_backward_embedding, bert_embedding])

# Next we test the embeddings for example sentences
from flair.data import Sentence
import numpy as np

# make two highly similar sentences
sentence=[None]*2
sentence[0] = Sentence('Suomalaisen hiilijalanjälki on vuodessa keskimäärin noin 11 tonnia hiilidioksidiksi muutettuna .')
sentence[1] = Sentence('Japanilaisen hiilijalanjälki on vuodessa enintään noin 11 tonnia hiilidioksidiksi muutettuna .')
# NOTE: All tokens should be separated by space, no "smart" tokenizer is used here!

# get embedding for the second word, which was same in both sentences
vectors=[None]*2
for i,s in enumerate(sentence):
    # just embed a sentence using the StackedEmbedding as you would with any single embedding.
    stacked_embeddings.embed(s)
    # now check out the embedded tokens.
    token=s[1]
    vectors[i] = token.embedding.cpu().detach().numpy()
    print('%i-dimensional contextual word embedding for "%s": %s' % (len(vectors[i]),token,str(vectors[i])))
print("\nCorrelation between token vectors %f" % (np.corrcoef(vectors)[0,1]))

7168-dimensional contextual word embedding for "Token: 2 hiilijalanjälki": [ 0.0010662   0.01355947  0.17441215 ...  0.54320973  0.31850675
 -0.39369926]
7168-dimensional contextual word embedding for "Token: 2 hiilijalanjälki": [ 0.00110434  0.01864425  0.16161357 ...  0.29572856  0.14452124
 -0.46072665]

Correlation between token vectors 0.978887


Finally we take above idea a step further: We fine-tune above models to create document embeddings that are used with a classifier. This requires some ~10 mins using PyTorch and GPU.

In [4]:
from flair.data import Corpus
from flair.embeddings import WordEmbeddings, FlairEmbeddings, DocumentRNNEmbeddings
from flair.models import TextClassifier
from flair.trainers import ModelTrainer
from flair.datasets import CSVClassificationCorpus

# Step 1: Create .CSV files of our data
import pickle
import random
import pandas
import csv

data=pickle.load(open(DATA_ROOT + 'turkuNLP_preprocessed_data.pickle','rb'))
# create shuffled indices
ind = list(range(len(data)))
random.seed(1)
random.shuffle(ind)

# create training, development and testing splits using ratios 8:1:1
# save each split into own file
MAX_TOKENS = 1000 # need to limit document size or else run out of memory :(
cutter = lambda x:x[1:] if len(x)<MAX_TOKENS else x[1:MAX_TOKENS] # first token is just repeat, skip...
low = 0
for frac,file in zip([0.8,0.9,1.0],['train.csv','dev.csv','test.csv']):
    up = round(frac*len(data))
    frame = pandas.DataFrame()
    frame['label'] = [data[ind[i]]['label'] for i in range(low,up)]
    frame['text'] = [" ".join(cutter(data[ind[i]]['tokens_raw'])) for i in range(low,up)]
    # use pandas for easy file saving
    frame.to_csv(DATA_ROOT + file,encoding="utf-8",sep="\t",index=False,quoting=csv.QUOTE_NONE)
    low = up

# this is the folder in which train, test and dev files reside
data_folder = DATA_ROOT

# column format indicating which columns hold the text and label(s)
column_name_map = {0: "label_topic",1: "text"} # note: 0 = first column!

# load corpus containing training, test and dev data and if CSV has a header, you can skip it
print('Loading corpus')
corpus: Corpus = CSVClassificationCorpus(data_folder,
                                         column_name_map,
                                         skip_header=True,
                                         delimiter='\t',    # tab-separated files
                                         in_memory=True
)
# 2. create the label dictionary
print('Creating dictionary')
label_dict = corpus.make_label_dictionary()

# 3. make a list of word embeddings
word_embeddings = [WordEmbeddings('fi'),
                   #FlairEmbeddings('fi-forward'), # can add, if enough memory
                   #FlairEmbeddings('fi-backward'),
                   ]

# 4. initialize document embedding by passing list of word embeddings
# Can choose between many RNN types (GRU by default, to change use rnn_type parameter)
document_embeddings: DocumentRNNEmbeddings = DocumentRNNEmbeddings(word_embeddings,
                                                                     hidden_size=512,
                                                                     reproject_words=True,
                                                                     reproject_words_dimension=150,
                                                                     )

# 5. create the text classifier
classifier = TextClassifier(document_embeddings, label_dictionary=label_dict)

# 6. initialize the text classifier trainer
trainer = ModelTrainer(classifier, corpus)

# 7. start the training
import torch
torch.cuda.empty_cache()

print('Training classifier (fine-tuning)')
trainer.train(data_folder+"flair_temp", # Main path to which all output during training is logged and models are saved
              learning_rate=0.1,
              mini_batch_size=40,
              anneal_factor=0.5,
              patience=4,
              max_epochs=100)

Loading corpus
2019-10-25 13:07:12,423 Reading data from C:\Users\h01928\Documents\GIT_codes\NLP_kickstart_tutorial
2019-10-25 13:07:12,424 Train: C:\Users\h01928\Documents\GIT_codes\NLP_kickstart_tutorial\train.csv
2019-10-25 13:07:12,426 Dev: C:\Users\h01928\Documents\GIT_codes\NLP_kickstart_tutorial\dev.csv
2019-10-25 13:07:12,427 Test: C:\Users\h01928\Documents\GIT_codes\NLP_kickstart_tutorial\test.csv
Creating dictionary
2019-10-25 13:07:13,456 Computing label dictionary. Progress:


100%|██████████| 333/333 [00:00<00:00, 326790.65it/s]


2019-10-25 13:07:13,463 [b'TALOUS', b'TERVEYS']
Training classifier (fine-tuning)
2019-10-25 13:07:15,018 ----------------------------------------------------------------------------------------------------
2019-10-25 13:07:15,019 Model: "TextClassifier(
  (document_embeddings): DocumentRNNEmbeddings(
    (embeddings): StackedEmbeddings(
      (list_embedding_0): WordEmbeddings('fi')
    )
    (word_reprojection_map): Linear(in_features=300, out_features=150, bias=True)
    (rnn): GRU(150, 512)
    (dropout): Dropout(p=0.5)
  )
  (decoder): Linear(in_features=512, out_features=2, bias=True)
  (loss_function): CrossEntropyLoss()
)"
2019-10-25 13:07:15,020 ----------------------------------------------------------------------------------------------------
2019-10-25 13:07:15,021 Corpus: "Corpus: 333 train + 41 dev + 42 test sentences"
2019-10-25 13:07:15,022 ----------------------------------------------------------------------------------------------------
2019-10-25 13:07:15,024 Parame

2019-10-25 13:08:29,841 epoch 6 - iter 4/9 - loss 0.50405911 - samples/sec: 60.03
2019-10-25 13:08:30,531 epoch 6 - iter 5/9 - loss 0.50798772 - samples/sec: 60.24
2019-10-25 13:08:31,209 epoch 6 - iter 6/9 - loss 0.49879275 - samples/sec: 61.50
2019-10-25 13:08:31,895 epoch 6 - iter 7/9 - loss 0.50467001 - samples/sec: 60.54
2019-10-25 13:08:32,174 epoch 6 - iter 8/9 - loss 0.49969876 - samples/sec: 157.48
2019-10-25 13:08:32,204 ----------------------------------------------------------------------------------------------------
2019-10-25 13:08:32,205 EPOCH 6 done: loss 0.4997 - lr 0.1000
2019-10-25 13:08:32,448 DEV : loss 0.4289723038673401 - score 0.7561
2019-10-25 13:08:32,461 BAD EPOCHS (no improvement): 2
2019-10-25 13:08:32,463 ----------------------------------------------------------------------------------------------------
2019-10-25 13:08:33,137 epoch 7 - iter 0/9 - loss 0.49897307 - samples/sec: 59.44
2019-10-25 13:08:33,850 epoch 7 - iter 1/9 - loss 0.50505728 - samples/

2019-10-25 13:09:24,896 epoch 13 - iter 4/9 - loss 0.49311431 - samples/sec: 57.09
2019-10-25 13:09:25,530 epoch 13 - iter 5/9 - loss 0.46066243 - samples/sec: 65.62
2019-10-25 13:09:26,207 epoch 13 - iter 6/9 - loss 0.41822258 - samples/sec: 61.55
2019-10-25 13:09:26,901 epoch 13 - iter 7/9 - loss 0.40503795 - samples/sec: 59.79
2019-10-25 13:09:27,197 epoch 13 - iter 8/9 - loss 0.37544496 - samples/sec: 148.36
2019-10-25 13:09:27,223 ----------------------------------------------------------------------------------------------------
2019-10-25 13:09:27,223 EPOCH 13 done: loss 0.3754 - lr 0.1000
2019-10-25 13:09:27,470 DEV : loss 0.2953166663646698 - score 0.7561
2019-10-25 13:09:27,482 BAD EPOCHS (no improvement): 4
2019-10-25 13:09:27,484 ----------------------------------------------------------------------------------------------------
2019-10-25 13:09:28,143 epoch 14 - iter 0/9 - loss 0.27337730 - samples/sec: 60.78
2019-10-25 13:09:28,869 epoch 14 - iter 1/9 - loss 0.24398260 - 

2019-10-25 13:10:04,613 epoch 20 - iter 2/9 - loss 0.11702063 - samples/sec: 58.92
2019-10-25 13:10:05,233 epoch 20 - iter 3/9 - loss 0.12053785 - samples/sec: 67.23
2019-10-25 13:10:05,900 epoch 20 - iter 4/9 - loss 0.11256878 - samples/sec: 62.39
2019-10-25 13:10:06,598 epoch 20 - iter 5/9 - loss 0.10987902 - samples/sec: 59.50
2019-10-25 13:10:07,288 epoch 20 - iter 6/9 - loss 0.10032185 - samples/sec: 60.24
2019-10-25 13:10:07,849 epoch 20 - iter 7/9 - loss 0.09908028 - samples/sec: 74.77
2019-10-25 13:10:08,112 epoch 20 - iter 8/9 - loss 0.08957504 - samples/sec: 168.75
2019-10-25 13:10:08,139 ----------------------------------------------------------------------------------------------------
2019-10-25 13:10:08,140 EPOCH 20 done: loss 0.0896 - lr 0.0250
2019-10-25 13:10:08,388 DEV : loss 0.18724888563156128 - score 0.9024
2019-10-25 13:10:08,401 BAD EPOCHS (no improvement): 0
2019-10-25 13:10:13,718 ---------------------------------------------------------------------------------

2019-10-25 13:11:05,113 epoch 27 - iter 1/9 - loss 0.09252873 - samples/sec: 77.66
2019-10-25 13:11:05,737 epoch 27 - iter 2/9 - loss 0.06432386 - samples/sec: 66.93
2019-10-25 13:11:06,381 epoch 27 - iter 3/9 - loss 0.04924791 - samples/sec: 64.72
2019-10-25 13:11:07,093 epoch 27 - iter 4/9 - loss 0.05949672 - samples/sec: 58.76
2019-10-25 13:11:07,820 epoch 27 - iter 5/9 - loss 0.06594195 - samples/sec: 57.22
2019-10-25 13:11:08,518 epoch 27 - iter 6/9 - loss 0.06152598 - samples/sec: 59.56
2019-10-25 13:11:09,208 epoch 27 - iter 7/9 - loss 0.06193924 - samples/sec: 60.56
2019-10-25 13:11:09,534 epoch 27 - iter 8/9 - loss 0.06374809 - samples/sec: 133.78
2019-10-25 13:11:09,564 ----------------------------------------------------------------------------------------------------
2019-10-25 13:11:09,566 EPOCH 27 done: loss 0.0637 - lr 0.0250
2019-10-25 13:11:09,818 DEV : loss 0.029960786923766136 - score 0.9756
2019-10-25 13:11:09,833 BAD EPOCHS (no improvement): 3
2019-10-25 13:11:14,5

2019-10-25 13:12:10,405 epoch 34 - iter 0/9 - loss 0.02037274 - samples/sec: 55.56
2019-10-25 13:12:11,086 epoch 34 - iter 1/9 - loss 0.02851568 - samples/sec: 61.22
2019-10-25 13:12:11,801 epoch 34 - iter 2/9 - loss 0.02552303 - samples/sec: 58.02
2019-10-25 13:12:12,473 epoch 34 - iter 3/9 - loss 0.02156254 - samples/sec: 61.83
2019-10-25 13:12:13,135 epoch 34 - iter 4/9 - loss 0.01847777 - samples/sec: 62.89
2019-10-25 13:12:13,786 epoch 34 - iter 5/9 - loss 0.01844611 - samples/sec: 64.19
2019-10-25 13:12:14,463 epoch 34 - iter 6/9 - loss 0.01690558 - samples/sec: 61.53
2019-10-25 13:12:14,948 epoch 34 - iter 7/9 - loss 0.01723024 - samples/sec: 87.52
2019-10-25 13:12:15,245 epoch 34 - iter 8/9 - loss 0.01571802 - samples/sec: 147.62
2019-10-25 13:12:15,270 ----------------------------------------------------------------------------------------------------
2019-10-25 13:12:15,271 EPOCH 34 done: loss 0.0157 - lr 0.0125
2019-10-25 13:12:15,525 DEV : loss 0.04408678412437439 - score 0

2019-10-25 13:13:16,608 BAD EPOCHS (no improvement): 1
2019-10-25 13:13:21,713 ----------------------------------------------------------------------------------------------------
2019-10-25 13:13:22,424 epoch 41 - iter 0/9 - loss 0.02838828 - samples/sec: 56.48
2019-10-25 13:13:23,143 epoch 41 - iter 1/9 - loss 0.02269856 - samples/sec: 58.14
2019-10-25 13:13:23,879 epoch 41 - iter 2/9 - loss 0.01980653 - samples/sec: 56.62
2019-10-25 13:13:24,542 epoch 41 - iter 3/9 - loss 0.01571384 - samples/sec: 63.15
2019-10-25 13:13:25,157 epoch 41 - iter 4/9 - loss 0.01836243 - samples/sec: 67.75
2019-10-25 13:13:25,882 epoch 41 - iter 5/9 - loss 0.01771280 - samples/sec: 57.53
2019-10-25 13:13:26,604 epoch 41 - iter 6/9 - loss 0.02124314 - samples/sec: 57.72
2019-10-25 13:13:27,241 epoch 41 - iter 7/9 - loss 0.02016001 - samples/sec: 65.67
2019-10-25 13:13:27,534 epoch 41 - iter 8/9 - loss 0.01796059 - samples/sec: 150.41
2019-10-25 13:13:27,562 ------------------------------------------------

2019-10-25 13:14:32,226 EPOCH 47 done: loss 0.0185 - lr 0.0016
2019-10-25 13:14:32,480 DEV : loss 0.047021497040987015 - score 0.9756
2019-10-25 13:14:32,493 BAD EPOCHS (no improvement): 3
2019-10-25 13:14:37,420 ----------------------------------------------------------------------------------------------------
2019-10-25 13:14:37,972 epoch 48 - iter 0/9 - loss 0.00509428 - samples/sec: 72.70
2019-10-25 13:14:38,620 epoch 48 - iter 1/9 - loss 0.00721476 - samples/sec: 64.72
2019-10-25 13:14:39,257 epoch 48 - iter 2/9 - loss 0.01003294 - samples/sec: 65.73
2019-10-25 13:14:39,984 epoch 48 - iter 3/9 - loss 0.01292774 - samples/sec: 57.06
2019-10-25 13:14:40,669 epoch 48 - iter 4/9 - loss 0.01175103 - samples/sec: 60.96
2019-10-25 13:14:41,358 epoch 48 - iter 5/9 - loss 0.01141263 - samples/sec: 60.34
2019-10-25 13:14:42,043 epoch 48 - iter 6/9 - loss 0.02173261 - samples/sec: 61.12
2019-10-25 13:14:42,746 epoch 48 - iter 7/9 - loss 0.01949102 - samples/sec: 59.22
2019-10-25 13:14:42,91

2019-10-25 13:15:47,324 ----------------------------------------------------------------------------------------------------
2019-10-25 13:15:47,325 EPOCH 54 done: loss 0.0118 - lr 0.0008
2019-10-25 13:15:47,578 DEV : loss 0.04184605926275253 - score 0.9756
Epoch    53: reducing learning rate of group 0 to 3.9063e-04.
2019-10-25 13:15:47,592 BAD EPOCHS (no improvement): 5
2019-10-25 13:15:52,645 ----------------------------------------------------------------------------------------------------
2019-10-25 13:15:53,337 epoch 55 - iter 0/9 - loss 0.02762778 - samples/sec: 57.96
2019-10-25 13:15:53,805 epoch 55 - iter 1/9 - loss 0.02983178 - samples/sec: 90.70
2019-10-25 13:15:54,472 epoch 55 - iter 2/9 - loss 0.02090170 - samples/sec: 62.50
2019-10-25 13:15:55,183 epoch 55 - iter 3/9 - loss 0.01870543 - samples/sec: 58.73
2019-10-25 13:15:55,950 epoch 55 - iter 4/9 - loss 0.01580970 - samples/sec: 54.49
2019-10-25 13:15:56,496 epoch 55 - iter 5/9 - loss 0.01505946 - samples/sec: 76.77
20

2019-10-25 13:17:02,932 epoch 61 - iter 7/9 - loss 0.01356996 - samples/sec: 65.69
2019-10-25 13:17:03,227 epoch 61 - iter 8/9 - loss 0.01488375 - samples/sec: 150.94
2019-10-25 13:17:03,254 ----------------------------------------------------------------------------------------------------
2019-10-25 13:17:03,255 EPOCH 61 done: loss 0.0149 - lr 0.0002
2019-10-25 13:17:03,521 DEV : loss 0.04100845009088516 - score 0.9756
2019-10-25 13:17:03,534 BAD EPOCHS (no improvement): 2
2019-10-25 13:17:08,661 ----------------------------------------------------------------------------------------------------
2019-10-25 13:17:09,418 epoch 62 - iter 0/9 - loss 0.00514495 - samples/sec: 53.03
2019-10-25 13:17:10,078 epoch 62 - iter 1/9 - loss 0.01043281 - samples/sec: 63.29
2019-10-25 13:17:10,760 epoch 62 - iter 2/9 - loss 0.01103168 - samples/sec: 61.02
2019-10-25 13:17:11,356 epoch 62 - iter 3/9 - loss 0.00909348 - samples/sec: 70.99
2019-10-25 13:17:12,071 epoch 62 - iter 4/9 - loss 0.00983108 -

{'test_score': 0.9762,
 'dev_score_history': [0.439,
  0.5366,
  0.5122,
  0.7805,
  0.6829,
  0.7561,
  0.6829,
  0.7805,
  0.8537,
  0.7317,
  0.8537,
  0.7073,
  0.7561,
  0.7561,
  0.7805,
  0.5854,
  0.7317,
  0.8293,
  0.6829,
  0.9024,
  0.6829,
  0.9024,
  0.9024,
  0.9756,
  0.9268,
  0.6341,
  0.9756,
  0.9756,
  0.9756,
  0.9756,
  0.9756,
  0.8293,
  0.9024,
  0.9756,
  0.9756,
  0.9756,
  0.9756,
  0.878,
  0.9756,
  0.9756,
  0.9756,
  0.9756,
  0.9756,
  0.9756,
  0.9756,
  0.9756,
  0.9756,
  0.9756,
  0.9756,
  0.9756,
  0.9756,
  0.9756,
  0.9756,
  0.9756,
  0.9756,
  0.9756,
  0.9756,
  0.9756,
  0.9756,
  0.9756,
  0.9756,
  0.9756,
  0.9756,
  0.9756],
 'train_loss_history': [0.6774369478225708,
  0.6550139453676012,
  0.6157231264644198,
  0.5907341837882996,
  0.5360442863570319,
  0.4996987647480435,
  0.4695712625980377,
  0.5170105596383413,
  0.4881509128544066,
  0.35877754622035557,
  0.4238038924005296,
  0.31994590825504726,
  0.3754449569516712,
  0.205

Here we got F1 accuracy of 0.976 for our training set, which is better than those of classic methods (PART 1). However, here we only had one fold instead of all 10.

After training, we can load the model and use it to make predictions for new texts.

In [5]:
# Finally we test the model with couple of text segments
classifier = TextClassifier.load(data_folder+r"flair_temp\final-model.pt")

# create and test example sentences
from flair.data import Sentence
sentence = Sentence('Jälleenhankintahinnalla tarkoitetaan omaisuuden uushankintahintaa . Tällöin voidaan viitata siihen , mihin hintaan esimerkiksi tietty kone tai laite voitaisiin korvata hankkimalla se markkinoilta tänä päivänä .')
classifier.predict(sentence)
print('Predicted class for text "%s"\n%s' % (sentence.to_plain_string(),sentence.labels))

sentence = Sentence('Elinluovutuskortti antaa luvan käyttää kortin täyttäneen henkilön elimiä ja kudoksia kuoleman jälkeen toisten henkilöiden hengen pelastamiseksi tai terveyden parantamiseksi . Vuoden 2010 lakimuutoksen jälkeen kortti ei enää ole välttämätön , sillä nykyisin oletetaan vainajan suostuneen elintensä luovutukseen , ellei hänen tiedetä sitä elinaikanaan erityisesti kieltäneen . Elinluovutuskortin mukana kantaminen on edelleen hyvä varmistaa tahtonsa toteutuminen . Suomalaisista noin 18 prosenttia on allekirjoittanut elinluovutuskortin .')
classifier.predict(sentence)
print('Predicted class for text "%s"\n%s' % (sentence.to_plain_string(),sentence.labels))

2019-10-25 13:17:51,078 loading file C:\Users\h01928\Documents\GIT_codes\NLP_kickstart_tutorial\\flair_temp\final-model.pt
Predicted class for text "Jälleenhankintahinnalla tarkoitetaan omaisuuden uushankintahintaa . Tällöin voidaan viitata siihen , mihin hintaan esimerkiksi tietty kone tai laite voitaisiin korvata hankkimalla se markkinoilta tänä päivänä ."
[TALOUS (0.999728262424469)]
Predicted class for text "Elinluovutuskortti antaa luvan käyttää kortin täyttäneen henkilön elimiä ja kudoksia kuoleman jälkeen toisten henkilöiden hengen pelastamiseksi tai terveyden parantamiseksi . Vuoden 2010 lakimuutoksen jälkeen kortti ei enää ole välttämätön , sillä nykyisin oletetaan vainajan suostuneen elintensä luovutukseen , ellei hänen tiedetä sitä elinaikanaan erityisesti kieltäneen . Elinluovutuskortin mukana kantaminen on edelleen hyvä varmistaa tahtonsa toteutuminen . Suomalaisista noin 18 prosenttia on allekirjoittanut elinluovutuskortin ."
[TERVEYS (0.9936586022377014)]
