<a href="https://colab.research.google.com/github/sauravsingla/General/blob/master/Policyholders_Trade_Classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
#Import packages
import csv
import pandas as pd
import numpy as np
from scipy.integrate import simps

%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import matplotlib as mpl

In [None]:
#Assign colmn names
colnames=['id', 'home', 'about', 'what','services','strategy','trade'] 

In [None]:
#Reading CSV file
train_df = pd.read_csv('training_dataset.csv', sep=',',  names=colnames, skiprows=0, nrows=24838)

In [None]:
#Reading CSV file
test_df = pd.read_csv('test_dataset.csv', sep=',',  names=colnames, skiprows=0, nrows=6203)

In [None]:
train_df = train_df.drop(0)

In [None]:
test_df = test_df.drop(0)

In [None]:
#Checking top 5 rows
test_df.head(5)

Unnamed: 0,id,home,about,what,services,strategy,trade
1,51564,It appears that your cart is currently empty!....,,,,,Beauty
2,42840,Helping You Feel Beautiful. We provide you wit...,,,,,Beauty
3,42943,SALE - Up to 35% off Selected Collections.* Le...,SALE - Up to 35% off Selected Collections.* Le...,,,,Beauty
4,42881,Discover L'Occitane's most loved products made...,"At L'OCCITANE, we show off the very best Prove...",,,,Beauty
5,43083,You have no items in your shopping basket.. FR...,You have no items in your shopping basket.. FR...,,,,Beauty


In [None]:
#Checking the quality of the dataset
test_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 6202 entries, 1 to 6202
Data columns (total 7 columns):
id          6202 non-null object
home        6202 non-null object
about       2246 non-null object
what        99 non-null object
services    565 non-null object
strategy    23 non-null object
trade       6202 non-null object
dtypes: object(7)
memory usage: 387.6+ KB


In [None]:
#Checking the dimension of the dataset
train_df.shape

(24837, 7)

In [None]:
#Checking the count of the attributes
display(train_df.describe())

Unnamed: 0,id,home,about,what,services,strategy,trade
count,24837,24837,8775,425,2085,75,24837
unique,24837,23560,8153,399,1860,66,11
top,14678,Never be without your favourite Boots products...,See what the NHS offers. Choose the right serv...,The Trussell Trust partners with local communi...,Never be without your favourite Boots products...,Learn about our specialist programmes for supp...,Tradesmen
freq,1,89,96,5,70,3,5895


*Tradesmen is the majority class and Beauty is the minority class. I am not performing undersampling of the majority class and oversampling of the minority class in here*

In [None]:
#checking the distribution of the Target variable in training dataset
train_df['trade'].value_counts()

Tradesmen                                     5895
Health, Beauty And Complementary Therapies    5059
Investment And Financial Services             4642
Real Estate Agent                             3645
It & Technology                               2262
Hr And Recruitment                             966
Marketing                                      780
Insurance Brokers                              743
Photography                                    556
Beauty                                         164
Film And Video Production                      125
Name: trade, dtype: int64

In [None]:
#checking the distribution of the Target variable in test dataset
test_df['trade'].value_counts()

Tradesmen                                     1472
Health, Beauty And Complementary Therapies    1264
Investment And Financial Services             1160
Real Estate Agent                              911
It & Technology                                565
Hr And Recruitment                             241
Marketing                                      194
Insurance Brokers                              185
Photography                                    139
Beauty                                          40
Film And Video Production                       31
Name: trade, dtype: int64

*Due to missing values in variables such as  about, what, services, strategy, I have dropped those variables*

In [None]:
train_df = train_df.drop(train_df.columns[[0, 2, 3,4,5]], axis=1)  

In [None]:
test_df = test_df.drop(test_df.columns[[0, 2, 3,4,5]], axis=1)  

*Home is the independent variable and trade is the dependent variable*

In [None]:
display(train_df.describe())

Unnamed: 0,home,trade
count,24837,24837
unique,23560,11
top,Never be without your favourite Boots products...,Tradesmen
freq,89,5895


In [None]:
display(test_df.describe())

Unnamed: 0,home,trade
count,6202,6202
unique,6016,11
top,Never be without your favourite Boots products...,Tradesmen
freq,25,1472


In [None]:
train_df.head()

Unnamed: 0,home,trade
1,The home also provides specialist care for old...,"Health, Beauty And Complementary Therapies"
2,We believe in forming a long term partnership ...,Investment And Financial Services
3,Did you know that 4 in 5 people prefer website...,"Health, Beauty And Complementary Therapies"
4,Call Reporting & Sales Management Systems. Off...,It & Technology
5,"For more information on custom carpentry, and ...",Tradesmen


In [None]:
#feature extraction using vector
from sklearn.feature_extraction.text import CountVectorizer
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(train_df.home)
X_train_counts.shape

(24837, 87906)

In [None]:
#converting vector to TFIDF
from sklearn.feature_extraction.text import TfidfTransformer
tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)
X_train_tfidf.shape

(24837, 87906)

*Using Naive Bayes for classification coming with accuracy of 64% on the test dataset. But, there is slight improvement in accuracy to 66% after implementing stopwords and 68% after implementing stemming*

In [None]:
from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB().fit(X_train_tfidf, train_df.trade)

In [None]:
from sklearn.pipeline import Pipeline
text_clf = Pipeline([('vect', CountVectorizer()), ('tfidf', TfidfTransformer()), ('clf', MultinomialNB()),])
text_clf = text_clf.fit(train_df.home, train_df.trade)

In [None]:
import numpy as np
predicted = text_clf.predict(test_df.home)
np.mean(predicted == test_df.trade)

0.6439858110287005

In [None]:
from sklearn.pipeline import Pipeline
text_clf = Pipeline([('vect', CountVectorizer(stop_words='english')), ('tfidf', TfidfTransformer()),('clf', MultinomialNB()), ])

In [None]:
text_clf = text_clf.fit(train_df.home, train_df.trade)

In [None]:
import numpy as np
predicted = text_clf.predict(test_df.home)
np.mean(predicted == test_df.trade)

0.6636568848758465

In [None]:
#Importing nltk for stopwords
import nltk
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


True

In [None]:
from nltk.stem.snowball import SnowballStemmer
stemmer = SnowballStemmer("english", ignore_stopwords=True)
class StemmedCountVectorizer(CountVectorizer):
    def build_analyzer(self):
        analyzer = super(StemmedCountVectorizer, self).build_analyzer()
        return lambda doc: ([stemmer.stem(w) for w in analyzer(doc)])
stemmed_count_vect = StemmedCountVectorizer(stop_words='english')
text_mnb_stemmed = Pipeline([('vect', stemmed_count_vect), ('tfidf', TfidfTransformer()), ('mnb', MultinomialNB(fit_prior=False)), ])
text_mnb_stemmed = text_mnb_stemmed.fit(train_df.home, train_df.trade)
predicted_mnb_stemmed = text_mnb_stemmed.predict(test_df.home)
np.mean(predicted_mnb_stemmed == test_df.trade)

0.6891325378910029

*Built model using Support Vector Machine and coming with accuracy of 76% on the test dataset and improved to 77% when used stopwords but accuracy remain to 77% after implementing stemming*

In [None]:
from sklearn.linear_model import SGDClassifier
import numpy as np1
text_clf_svm = Pipeline([('vect', CountVectorizer()), ('tfidf', TfidfTransformer()), ('clf-svm', SGDClassifier(loss='hinge', penalty='l2', alpha=1e-3, n_iter=5, random_state=42)),])
text_clf_svm= text_clf_svm.fit(train_df.home, train_df.trade)
predicted_svm = text_clf_svm.predict(test_df.home)
np1.mean(predicted_svm == test_df.trade)



0.7679780715898097

In [None]:
from sklearn.linear_model import SGDClassifier
import numpy as np1
text_clf_svm = Pipeline([('vect', CountVectorizer(stop_words='english')), ('tfidf', TfidfTransformer()), ('clf-svm', SGDClassifier(loss='hinge', penalty='l2', alpha=1e-3, n_iter=5, random_state=42)),])
text_clf_svm= text_clf_svm.fit(train_df.home, train_df.trade)
predicted_svm = text_clf_svm.predict(test_df.home)
np1.mean(predicted_svm == test_df.trade)



0.77184779103515

In [None]:
from nltk.stem.snowball import SnowballStemmer
stemmer = SnowballStemmer("english", ignore_stopwords=True)
class StemmedCountVectorizer(CountVectorizer):
    def build_analyzer(self):
        analyzer = super(StemmedCountVectorizer, self).build_analyzer()
        return lambda doc: ([stemmer.stem(w) for w in analyzer(doc)])
stemmed_count_vect = StemmedCountVectorizer(stop_words='english')
text_mnb_stemmed = Pipeline([('vect', stemmed_count_vect), ('tfidf', TfidfTransformer()), ('clf-svm', SGDClassifier(loss='hinge', penalty='l2', alpha=1e-3, n_iter=5, random_state=42)), ])
text_mnb_stemmed = text_mnb_stemmed.fit(train_df.home, train_df.trade)
predicted_mnb_stemmed = text_mnb_stemmed.predict(test_df.home)
np1.mean(predicted_mnb_stemmed == test_df.trade)



0.7712028377942599

*Using Glove embedding with CNN for text classification*

In [None]:
!pip3 install deeppavlov



In [None]:
from deeppavlov.dataset_readers.basic_classification_reader import BasicClassificationDatasetReader

In [None]:
#Reading the test dataset and I will train model on it due to RAM constraint
dr = BasicClassificationDatasetReader().read(
    data_path='/content/',
    train='test_dataset.csv',
    x = 'home',
    y = 'trade'
)



In [None]:
#Checking train/valid/test sizes
[(k, len(dr[k])) for k in dr.keys()]

[('train', 6203), ('valid', 0), ('test', 0)]

In [None]:
from deeppavlov.dataset_iterators.basic_classification_iterator import BasicClassificationDatasetIterator

In [None]:
#Data iterator splitting train  and valid in proportion 0.8/0.2
train_iterator = BasicClassificationDatasetIterator(
    data=dr,
    field_to_split='train',  # field that will be splitted
    split_fields=['train', 'valid'],   # fields to which the fiald above will be splitted
    split_proportions=[0.8, 0.2],  #proportions for splitting
    split_seed=23,  # seed for splitting dataset
    seed=42)  # seed for iteration over dataset

2019-04-15 10:17:27.450 INFO in 'deeppavlov.dataset_iterators.basic_classification_iterator'['basic_classification_iterator'] at line 73: Splitting field <<train>> to new fields <<['train', 'valid']>>


In [None]:
#Checking train instances 
x_train, y_train = train_iterator.get_instances(data_type='train')
for x, y in list(zip(x_train, y_train))[:5]:
    print('x:', x)
    print('y:', y)
    print('=================')

x: Quality Meat Scotland. Anuga 2013 - Cologne. 2 Ballyoran Lane, Belfast, BT16 1XJ. Phone: +44 (0) 028 9048 4999. Fax: +44 (0) 028 9048 0777
y: ['Marketing']
x: With 35 years experience in the Roofing Industry, D J Mackay is one of Cardiff’s oldest Roofing Companies.. Over the years I have learnt a vast range of skills and knowledge which I now pass onto my team.. All members of our team keep their skills up to date by regularly attending courses on the most recent roofing developments. We all take great pride in our work, this can be seen in the continuously high standards we produce with our work.. Safety is of importance to D J Mackay Roofing and the firm is fully insured with public liability insurance.. If you are searching for a company with over three decades of experience and a professional approach to Roofing – then look no further.. Don't panic- pick up the phone and give us a call… 0800 056 0743. © 2017 mackay. All Rights Reserved. Proudly made by Digital NRG Ltd
y: ['Trade

In [None]:
from deeppavlov.models.preprocessors.str_lower import StrLower

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package perluniprops to /root/nltk_data...
[nltk_data]   Unzipping misc/perluniprops.zip.
[nltk_data] Downloading package nonbreaking_prefixes to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping corpora/nonbreaking_prefixes.zip.


In [None]:
#Lowercase the text
str_lower = StrLower()

In [None]:
from deeppavlov.models.tokenizers.nltk_moses_tokenizer import NLTKMosesTokenizer

In [None]:
#Generating token
tokenizer = NLTKMosesTokenizer()

In [None]:
#Generating token on the training dataset-independent variable
train_x_lower_tokenized = str_lower(tokenizer(train_iterator.get_instances(data_type='train')[0]))

In [None]:
from deeppavlov.core.data.simple_vocab import SimpleVocabulary

In [None]:
#Initialize vocabulary to collect all appeared in the dataset classes
classes_vocab = SimpleVocabulary(
    save_path='./snips/classes.dict',
    load_path='./snips/classes.dict')

In [None]:
#Saving instances of the dataset classes
classes_vocab.fit((train_iterator.get_instances(data_type='train')[1]))
classes_vocab.save()

2019-04-15 10:17:44.798 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 89: [saving vocabulary to /content/snips/classes.dict]


In [None]:
#There are 12 classes of the target variable
list(classes_vocab.items())

[('Tradesmen', 0),
 ('Health', 1),
 (' Beauty And Complementary Therapies', 2),
 ('Investment And Financial Services', 3),
 ('Real Estate Agent', 4),
 ('It & Technology', 5),
 ('Hr And Recruitment', 6),
 ('Marketing', 7),
 ('Insurance Brokers', 8),
 ('Photography', 9),
 ('Beauty', 10),
 ('Film And Video Production', 11)]

In [None]:
#Vocabulary of textual tokens appeared 2 and more times in the dataset
token_vocab = SimpleVocabulary(
    save_path='./snips/tokens.dict',
    load_path='./snips/tokens.dict',
    min_freq=2,
    special_tokens=('<PAD>', '<UNK>',),
    unk_token='<UNK>')

In [None]:
token_vocab.fit(train_x_lower_tokenized)
token_vocab.save()

2019-04-15 10:17:47.390 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 89: [saving vocabulary to /content/snips/tokens.dict]


In [None]:
#Number of tokens in dictionary
len(token_vocab)

22205

In [None]:
#10 most common words and number of times their appeared
token_vocab.freqs.most_common()[:10]

[(',', 50996),
 ('.', 49078),
 ('and', 45842),
 ('the', 43201),
 ('to', 36893),
 ('of', 26755),
 ('a', 25600),
 ('..', 23927),
 ('in', 18527),
 ('we', 17874)]

In [None]:
token_ids = token_vocab(str_lower(tokenizer(['how are you'])))
token_ids

[[82, 18, 12]]

In [None]:
tokenizer(token_vocab(token_ids))

['how are you']

In [None]:
import numpy as np
from deeppavlov.models.embedders.bow_embedder import BoWEmbedder

In [None]:
#Initialize bag-of-words embedder giving total number of tokens
bow = BoWEmbedder(depth=token_vocab.len)
#Indexed tokenized samples
bow(token_vocab(str_lower(tokenizer(['how are you']))))

[array([0, 0, 0, ..., 0, 0, 0], dtype=int32)]

In [None]:
#All 3 tokens are in the vocabulary
sum(bow(token_vocab(str_lower(tokenizer(['how are you']))))[0])

3

In [None]:
from deeppavlov.models.sklearn import SklearnComponent

In [None]:
#Initialize TF-IDF vectorizer sklearn component with `transform` as infer method
tfidf = SklearnComponent(
    model_class="sklearn.feature_extraction.text:TfidfVectorizer",
    infer_method="transform",
    save_path='./tfidf_v0.pkl',
    load_path='./tfidf_v0.pkl',
    mode='train')

2019-04-15 10:17:47.590 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 165: Initializing model sklearn.feature_extraction.text:TfidfVectorizer from scratch


In [None]:
#Fit on textual train instances and save it
tfidf.fit(str_lower(train_iterator.get_instances(data_type='train')[0]))
tfidf.save()

2019-04-15 10:17:54.748 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 108: Fitting model sklearn.feature_extraction.text:TfidfVectorizer
2019-04-15 10:17:58.986 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 240: Saving model to /content/tfidf_v0.pkl


In [None]:
#Number of tokens in the TF-IDF vocabulary
len(tfidf.model.vocabulary_)

35208

In [None]:
from deeppavlov.models.embedders.glove_embedder import GloVeEmbedder

paramiko missing, opening SSH/SCP/SFTP paths will be disabled.  `pip install paramiko` to suppress


In [None]:
from deeppavlov.core.data.utils import simple_download

In [None]:
#Downloading pretrained embedding Glove
simple_download(url="http://files.deeppavlov.ai/embeddings/glove.6B.100d.txt",destination="./glove.6B.100d.txt")

2019-04-15 10:18:00.384 INFO in 'deeppavlov.core.data.utils'['utils'] at line 63: Downloading from http://files.deeppavlov.ai/embeddings/glove.6B.100d.txt to glove.6B.100d.txt
347MB [00:28, 12.1MB/s]


In [None]:
embedder = GloVeEmbedder(load_path='./glove.6B.100d.txt', dim=100, pad_zero=True)

2019-04-15 10:18:29.161 INFO in 'deeppavlov.models.embedders.glove_embedder'['glove_embedder'] at line 52: [loading GloVe embeddings from `/content/glove.6B.100d.txt`]


In [None]:
# output shape is (batch_size x max_num_tokens_in_the_batch x embedding_dim)
embedded_batch = embedder(str_lower(tokenizer(['how are you']))) 
len(embedded_batch), len(embedded_batch[0]), embedded_batch[0][0].shape

(1, 3, (100,))

In [None]:
# output shape is (batch_size x embedding_dim)
embedded_batch = embedder(str_lower(tokenizer(['how are you'])), mean=True) 
len(embedded_batch), embedded_batch[0].shape

(1, (100,))

In [None]:
from deeppavlov.models.embedders.tfidf_weighted_embedder import TfidfWeightedEmbedder

In [None]:
weighted_embedder = TfidfWeightedEmbedder(
    embedder=embedder,  # our GloVe embedder instance
    tokenizer=tokenizer,  # our tokenizer instance
    mean=True,  # to return one vector per sample
    vectorizer=tfidf  # our TF-IDF vectorizer
)

In [None]:
# output shape is (batch_size x  embedding_dim)
embedded_batch = weighted_embedder(str_lower(tokenizer(['how are you']))) 
len(embedded_batch), embedded_batch[0].shape

(1, (100,))

*KerasClassificationModel on GloVe embeddings*

In [None]:
from deeppavlov.models.classifiers.keras_classification_model import KerasClassificationModel
from deeppavlov.models.preprocessors.one_hotter import OneHotter
from deeppavlov.models.classifiers.proba2labels import Proba2Labels

Using TensorFlow backend.


In [None]:
#Intialize `KerasClassificationModel` that composes CNN shallow-and-wide network (name here as`cnn_model`)
cls = KerasClassificationModel(save_path="./cnn_model_v0", 
                               load_path="./cnn_model_v0", 
                               embedding_size=embedder.dim,
                               n_classes=classes_vocab.len,
                               model_name="cnn_model",
                               text_size=100, # number of tokens
                               kernel_sizes_cnn=[3, 5, 7],
                               filters_cnn=128,
                               dense_size=100,
                               optimizer="Adam",
                               learning_rate=0.1,
                               learning_rate_decay=0.01,
                               loss="categorical_crossentropy")

2019-04-15 10:19:16.89 INFO in 'deeppavlov.models.classifiers.keras_classification_model'['keras_classification_model'] at line 244: [initializing `KerasClassificationModel` from scratch as cnn_model]


Instructions for updating:
Colocations handled automatically by placer.


2019-04-15 10:19:16.868 INFO in 'deeppavlov.models.classifiers.keras_classification_model'['keras_classification_model'] at line 134: Model was successfully initialized!
Model summary:
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            (None, 100, 100)     0                                            
__________________________________________________________________________________________________
conv1d_1 (Conv1D)               (None, 100, 128)     38528       input_1[0][0]                    
__________________________________________________________________________________________________
conv1d_2 (Conv1D)               (None, 100, 128)     64128       input_1[0][0]                    
__________________________________________________________________________________________________
conv1d_3 (Conv1D)      

In [None]:
#KerasClassificationModel assumes one-hotted distribution of classes per sample.
#OneHotter converts indices to one-hot vectors representation.
#To obtain indices we can use our `classes_vocab` intialized and fitted above
#Feature Engineering
onehotter = OneHotter(depth=classes_vocab.len, single_vector=True)

In [None]:
#Train for 10 epochs
for ep in range(10):
    for x, y in train_iterator.gen_batches(batch_size=64, 
                                           data_type="train"):
        x_embed = embedder(tokenizer(str_lower(x)))
        y_onehot = onehotter(classes_vocab(y))
        cls.train_on_batch(x_embed, y_onehot)

Instructions for updating:
Use tf.cast instead.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.


In [None]:
#Save model weights and parameters
cls.save()

2019-04-15 10:23:34.399 INFO in 'deeppavlov.models.classifiers.keras_classification_model'['keras_classification_model'] at line 373: [saving model to /content/cnn_model_v0_opt.json]


In [None]:
from deeppavlov.metrics.accuracy import sets_accuracy

In [None]:
#Get all train and valid data from iterator
x_train, y_train = train_iterator.get_instances(data_type="train")
x_valid, y_valid = train_iterator.get_instances(data_type="valid")

In [None]:
#Infering on validation data we get probability distribution on given data.
y_valid_pred = cls(embedder(tokenizer(str_lower(x_valid))))

In [None]:
#To convert probability distribution to labels, 
#we first need to convert probabilities to indices,
#and then using vocabulary `classes_vocab` convert indices to labels.
#Proba2Labels` converts probabilities to indices and supports three different modes:
#if `max_proba` is true, returns indices of the highest probabilities
#if `confident_threshold` is given, returns indices with probabiltiies higher than threshold
#if `top_n` is given, returns `top_n` indices with highest probabilities
prob2labels = Proba2Labels(max_proba=True)

In [None]:
#Looking into obtained result
print("Text sample: {}".format(x_valid[0]))
print("True label: {}".format(y_valid[0]))
print("Predicted probability distribution: {}".format(dict(zip(classes_vocab.keys(), 
                                                               y_valid_pred[0]))))
print("Predicted label: {}".format(classes_vocab(prob2labels(y_valid_pred))[0]))

Text sample: Just another WordPress site. Welcome to WordPress. This is your first post. Edit or delete it, then start writing!
True label: ['Tradesmen']
Predicted probability distribution: {'Tradesmen': 0.03927648067474365, 'Health': 0.0009163916110992432, ' Beauty And Complementary Therapies': 0.00106048583984375, 'Investment And Financial Services': 0.024354279041290283, 'Real Estate Agent': 0.08466535806655884, 'It & Technology': 0.9965453147888184, 'Hr And Recruitment': 0.0005005896091461182, 'Marketing': 0.004743725061416626, 'Insurance Brokers': 0.0004550516605377197, 'Photography': 0.01750969886779785, 'Beauty': 0.021246731281280518, 'Film And Video Production': 0.0035310983657836914}
Predicted label: ['It & Technology']


*With RAM constraint, tried running CNN model on Test Dataset. Due to less observations in the test dataset. The accuracy of the model is 44%. I have also feed 100 token in the CNN. It is also the reason for the less accuracy of the model*

In [None]:
#Calculate sets accuracy
sets_accuracy(y_valid, classes_vocab(prob2labels(y_valid_pred)))

0.44883158742949236