# DeepLearning using Tensorflow

## Data Preparation and Environment Setup

Second round of data preparation was essential and number of classes were reduced. 

Details in this [notebook](https://github.com/niteeshhegde/classified-ad-demand/blob/master/data-preprocessing/dataprep-pandas.ipynb) 

Google's Colab has free GPUs and users can run 2 sessions at a time.

This notebook was run on colab with Python 3 and GPU with 36Gi RAM.

Also here text data from Title and Description are taken intoo consideration.

In this tensorflow model, word2vec russian is used for text embeddings

Authenticate into gcp

In [None]:
from google.colab import auth

In [None]:
auth.authenticate_user()

In [None]:
!gcloud config set project skilful-orb-255314

Updated property [core/project].


Copy the dataset 

In [None]:
!gsutil cp gs://dataproc-e3bd1f7b-2e29-4da6-a5c4-077c164fd32a-us-central1/avito/test/three_class_model_train_param_title_desc_params.csv /train.csv

Copying gs://dataproc-e3bd1f7b-2e29-4da6-a5c4-077c164fd32a-us-central1/avito/test/three_class_model_train_param_title_desc_params.csv...
/ [1 files][712.8 MiB/712.8 MiB]                                                
Operation completed over 1 objects/712.8 MiB.                                    


Copy Russian Word2Vec Vectors

In [None]:
!gsutil cp gs://dataproc-e3bd1f7b-2e29-4da6-a5c4-077c164fd32a-us-central1/avito/wiki.ru.vec /wiki.ru.vec

Copying gs://dataproc-e3bd1f7b-2e29-4da6-a5c4-077c164fd32a-us-central1/avito/wiki.ru.vec...
/ [1 files][  4.6 GiB/  4.6 GiB]   50.2 MiB/s                                   
Operation completed over 1 objects/4.6 GiB.                                      


In [None]:
!pip install gcsfs

Collecting gcsfs
  Downloading https://files.pythonhosted.org/packages/ce/5c/bc61dbd2e5b61d84486a96a64ca43512c9ac085487464562182f58406290/gcsfs-0.6.2-py2.py3-none-any.whl
Installing collected packages: gcsfs
Successfully installed gcsfs-0.6.2


In [None]:
import pandas as pd
import numpy as np
import gcsfs

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import preprocessing
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report
from sklearn.metrics import f1_score
from sklearn.utils import resample
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import confusion_matrix 
import time

  import pandas.util.testing as tm


Get Embeddings frm word vectors

In [None]:
def get_coefs(word, *arr): 
    return word, np.asarray(arr, dtype='float32')

embeddings_index = dict(get_coefs(*o.rstrip().rsplit(' ')) for o in open('/wiki.ru.vec'))

In [None]:
df_train = pd.read_csv('/train.csv')
df_train.head()

Unnamed: 0,region_en,category_name_en,parent_category_name_en,user_type,weekend,price,description,title,param_1,param_2,param_3,description_len,title_len,param_1_len,param_2_len,param_3_len,item_seq_number,image_present,image_top_1,deal_class_5
0,Sverdlovsk oblast,Children's products and toys,Personal belongings,Private,0,400.0,"Кокон для сна малыша,пользовались меньше месяц...",Кокоби(кокон для сна),Постельные принадлежности,,,7,3,2,0,0,2,False,1008.0,Poor
1,Samara oblast,Furniture and interior,For the home and garden,Private,1,3000.0,"Стойка для одежды, под вешалки. С бутика.",Стойка для Одежды,Другое,,,7,3,1,0,0,19,False,692.0,Poor
2,Rostov oblast,Audio and video,Consumer electronics,Private,1,4000.0,"В хорошем состоянии, домашний кинотеатр с blu ...",Philips bluray,"Видео, DVD и Blu-ray плееры",,,17,2,5,0,0,9,False,3032.0,Okay
3,Tatarstan,Children's products and toys,Personal belongings,Company,0,2200.0,Продам кресло от0-25кг,Автокресло,Автомобильные кресла,,,3,1,2,0,0,286,False,796.0,Good
4,Volgograd oblast,Cars,Transport,Private,0,40000.0,Все вопросы по телефону.,"ВАЗ 2110, 2003",С пробегом,ВАЗ (LADA),2110.0,4,3,2,2,1,3,False,2264.0,Poor


In [None]:
X = df_train[['region_en','category_name_en','user_type','weekend','price','description','description_len','title','title_len','param_1_len','param_2_len','param_3_len','param_1','param_2','param_3']]
y = df_train[['deal_class_5']]
X_enc = pd.get_dummies(X, columns=['region_en','user_type','category_name_en'], drop_first = True)
X_enc.head()

Unnamed: 0,weekend,price,description,description_len,title,title_len,param_1_len,param_2_len,param_3_len,param_1,param_2,param_3,region_en_Bashkortostan,region_en_Belgorod oblast,region_en_Chelyabinsk oblast,region_en_Irkutsk oblast,region_en_Kaliningrad oblast,region_en_Kemerovo oblast,region_en_Khanty-Mansi Autonomous Okrug,region_en_Krasnodar Krai,region_en_Krasnoyarsk Krai,region_en_Nizhny Novgorod oblast,region_en_Novosibirsk oblast,region_en_Omsk oblast,region_en_Orenburg oblast,region_en_Perm Krai,region_en_Rostov oblast,region_en_Samara oblast,region_en_Saratov oblast,region_en_Stavropol Krai,region_en_Sverdlovsk oblast,region_en_Tatarstan,region_en_Tula oblast,region_en_Tyumen oblast,region_en_Udmurtia,region_en_Vladimir oblast,region_en_Volgograd oblast,region_en_Voronezh oblast,region_en_Yaroslavl oblast,user_type_Private,...,category_name_en_Cars,category_name_en_Cats,category_name_en_Children's clothing and shoes,category_name_en_Children's products and toys,"category_name_en_Clothing, shoes, accessories",category_name_en_Collecting,category_name_en_Commercial property,category_name_en_Desktop computers,category_name_en_Dogs,category_name_en_Equipment for business,category_name_en_Food,category_name_en_Furniture and interior,"category_name_en_Games, consoles and software",category_name_en_Garages and Parking spaces,category_name_en_Health and beauty,"category_name_en_Houses, villas, cottages",category_name_en_Hunting and fishing,category_name_en_Land,category_name_en_Laptops,category_name_en_Motorcycles and bikes,category_name_en_Musical instruments,category_name_en_Offer services,category_name_en_Office equipment and consumables,category_name_en_Other animals,category_name_en_Pet products,category_name_en_Phones,category_name_en_Photo,category_name_en_Plants,category_name_en_Products for computer,category_name_en_Property abroad,category_name_en_Ready business,category_name_en_Repair and construction,category_name_en_Room,category_name_en_Sports and recreation,category_name_en_Tablets and e-books,category_name_en_Tableware and goods for kitchen,category_name_en_Tickets and travel,category_name_en_Trucks and buses,category_name_en_Watches and jewelry,category_name_en_Water transport
0,0,400.0,"Кокон для сна малыша,пользовались меньше месяц...",7,Кокоби(кокон для сна),3,2,0,0,Постельные принадлежности,,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,...,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,1,3000.0,"Стойка для одежды, под вешалки. С бутика.",7,Стойка для Одежды,3,1,0,0,Другое,,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,1,4000.0,"В хорошем состоянии, домашний кинотеатр с blu ...",17,Philips bluray,2,5,0,0,"Видео, DVD и Blu-ray плееры",,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,2200.0,Продам кресло от0-25кг,3,Автокресло,1,2,0,0,Автомобильные кресла,,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,40000.0,Все вопросы по телефону.,4,"ВАЗ 2110, 2003",3,2,2,1,С пробегом,ВАЗ (LADA),2110.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,...,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


## Stopwords Removal

In [None]:
import nltk
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
import re, string, timeit

In [None]:
nltk.download("stopwords")
stopWords = stopwords.words('russian')

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


Remove special Characters, numbers and words with len < 2 for title and description

In [None]:
X_enc['description_non_stop'] = X_enc['description'].str.replace(r'\d+','')
X_enc['description_non_stop'] = X_enc['description_non_stop'].apply(lambda x: ' '.join([re.sub(r'[^\w\s]',' ',word) for word in x.split()]))
X_enc['description_non_stop'] = X_enc['description_non_stop'].apply(lambda x: ' '.join([word.lower().strip() for word in x.split() if word.lower().strip() not in (stopWords) and len(word)>=3 ]))

In [None]:
X_enc['title_non_stop'] = X_enc['title'].str.replace(r'\d+','')
X_enc['title_non_stop'] = X_enc['title_non_stop'].apply(lambda x: ' '.join([re.sub(r'[^\w\s]',' ',word) for word in x.split()]))
X_enc['title_non_stop'] = X_enc['title_non_stop'].apply(lambda x: ' '.join([word.lower().strip() for word in x.split() if word.lower().strip() not in (stopWords) and len(word)>=3 ]))

## Resampling the data

In [None]:
from sklearn.utils import resample
from keras.preprocessing import text, sequence

Using TensorFlow backend.


In [None]:
X_train, X_test, y_train, y_test = train_test_split(X_enc, y, test_size = 0.20, random_state = 42, stratify=y)

In [None]:
LE = LabelEncoder()
y_train['deal_class_5'] = LE.fit_transform(y_train.deal_class_5)
y_test['deal_class_5'] = LE.fit_transform(y_test.deal_class_5)
X_train['deal_class_5'] = y_train['deal_class_5']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after removing the cwd from sys.path.


In [None]:
df_2 = X_train[X_train['deal_class_5']==2]
df_1= X_train[X_train['deal_class_5']==1]
df_0= X_train[X_train['deal_class_5']==0]
# Downsample majority class
df_2 = resample(df_2, 
                                 replace=False,    # sample without replacement
                                 n_samples=700000,     # to match minority class
                                 random_state=123) # reproducible results
df_1 = resample(df_1, 
                                 replace=True,    # sample without replacement
                                 n_samples=700000,     # to match minority class
                                 random_state=123) 
df_0 = resample(df_0, 
                                 replace=True,    # sample without replacement
                                 n_samples=700000,     # to match minority class
                                 random_state=123) 
# Combine minority class with downsampled majority class
df_downsampled = pd.concat([df_2, df_1, df_0])
 
# Display new class counts
df_downsampled.deal_class_5.value_counts()

2    700000
1    700000
0    700000
Name: deal_class_5, dtype: int64

In [None]:
y_train = df_downsampled['deal_class_5']
X_train = df_downsampled.drop(columns=['deal_class_5'])

## Tokenizing and word embeddings

In [None]:
max_features = 100000
maxlen_title = 20
maxlen_desc = 60
embed_size = 300

In [None]:
X_train_title = X_train['title_non_stop'].values
X_test_title = X_test['title_non_stop'].values
tokenizer_title = text.Tokenizer(num_words=max_features)
tokenizer_title.fit_on_texts(list(X_test_title)+list(X_train_title))

X_train_title = tokenizer_title.texts_to_sequences(X_train_title)
X_train_title = sequence.pad_sequences(X_train_title, maxlen=maxlen_title)

In [None]:
X_test_title = tokenizer_title.texts_to_sequences(X_test_title)
X_test_title = sequence.pad_sequences(X_test_title, maxlen=maxlen_title)

In [None]:
def get_coefs(word, *arr): return word, np.asarray(arr, dtype='float32')
embeddings_index = dict(get_coefs(*o.rstrip().rsplit(' ')) for o in open('/wiki.ru.vec'))

word_index_title = tokenizer_title.word_index
nb_words = min(max_features, len(word_index_title))
embedding_matrix_title = np.zeros((nb_words, embed_size))
for word, i in word_index_title.items():
    if i >= max_features: continue
    embedding_vector_title = embeddings_index.get(word)
    if embedding_vector_title is not None: embedding_matrix_title[i] = embedding_vector_title

In [None]:
X_train_desc = X_train['description_non_stop'].values
X_test_desc = X_test['description_non_stop'].values
tokenizer_desc = text.Tokenizer(num_words=max_features)
tokenizer_desc.fit_on_texts(list(X_test_desc)+list(X_train_desc))

X_train_desc = tokenizer_desc.texts_to_sequences(X_train_desc)
X_train_desc = sequence.pad_sequences(X_train_desc, maxlen=maxlen_desc)

In [None]:
X_test_desc = tokenizer_desc.texts_to_sequences(X_test_desc)
X_test_desc = sequence.pad_sequences(X_test_desc, maxlen=maxlen_desc)

In [None]:
def get_coefs(word, *arr): return word, np.asarray(arr, dtype='float32')
embeddings_index_2 = dict(get_coefs(*o.rstrip().rsplit(' ')) for o in open('/wiki.ru.vec'))

word_index_desc = tokenizer_desc.word_index
nb_words = min(max_features, len(word_index_desc))
embedding_matrix_desc = np.zeros((nb_words, embed_size))
for word, i in word_index_desc.items():
    if i >= max_features: continue
    embedding_vector_desc = embeddings_index_2.get(word)
    if embedding_vector_desc is not None: embedding_matrix_desc[i] = embedding_vector_desc

In [None]:
X_enc = X_enc.drop(columns=['description', 'title','description_non_stop', 'title_non_stop','param_1', 'param_2','param_3'])
X_train = X_train.drop(columns=['description', 'title','description_non_stop', 'title_non_stop','param_1', 'param_2','param_3'])
X_test = X_test.drop(columns=['description', 'title','description_non_stop', 'title_non_stop','param_1', 'param_2','param_3'])

In [None]:
sc = StandardScaler()
X_train.loc[:,["price","description_len","title_len","param_1_len","param_2_len","param_3_len"]] = sc.fit_transform(X_train[["price","description_len","title_len","param_1_len","param_2_len","param_3_len"]])
X_test.loc[:,["price","description_len","title_len","param_1_len","param_2_len","param_3_len"]] = sc.transform(X_test[["price","description_len","title_len","param_1_len","param_2_len","param_3_len"]])


In [None]:
X_train = X_train.to_numpy()
y_test = pd.get_dummies(y_test, columns=['deal_class_5']).to_numpy()
y_train = pd.get_dummies(y_train, columns=['0']).to_numpy()
y_train

array([[0, 0, 1],
       [0, 0, 1],
       [0, 0, 1],
       ...,
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0]], dtype=uint8)

## Tensorflow Model Environment Setup

In [None]:
pip install tensorflow-addons



In [None]:
pip install git+https://github.com/tensorflow/docs

Collecting git+https://github.com/tensorflow/docs
  Cloning https://github.com/tensorflow/docs to /tmp/pip-req-build-5uy3q04m
  Running command git clone -q https://github.com/tensorflow/docs /tmp/pip-req-build-5uy3q04m
Building wheels for collected packages: tensorflow-docs
  Building wheel for tensorflow-docs (setup.py) ... [?25l[?25hdone
  Created wheel for tensorflow-docs: filename=tensorflow_docs-0.0.0d41eeb858e80108db8d13ba25867757d7de0fcf9_-cp36-none-any.whl size=124709 sha256=c901297211b4b003f735707231e39fe1e30659c44ecb1f7a894e9f35b119a266
  Stored in directory: /tmp/pip-ephem-wheel-cache-fc_dydyn/wheels/eb/1b/35/fce87697be00d2fc63e0b4b395b0d9c7e391a10e98d9a0d97f
Successfully built tensorflow-docs
Installing collected packages: tensorflow-docs
Successfully installed tensorflow-docs-0.0.0d41eeb858e80108db8d13ba25867757d7de0fcf9-


In [None]:
from sklearn import metrics
import tensorflow as tf

from tensorflow import keras as k1
from tensorflow.keras import layers
from tensorflow.keras import Model
print(tf.__version__)

2.2.0


In [None]:
import tensorflow_docs as tfdocs
import tensorflow_docs.plots
import tensorflow_docs.modeling
import keras.backend as K
import tensorflow_addons as tfa


In [None]:
from keras import backend as K
from keras.layers import Input
from keras.layers import Embedding, concatenate,GlobalAveragePooling1D,Dense,Dropout,SpatialDropout1D,Reshape,Flatten
from keras import Model
from keras.callbacks import ModelCheckpoint

## Tensorflow Model

In [None]:
embedding_dim=300
seq_length_title = 20
seq_length_desc = 60

Description and title tokens are passed through embedding layers separately. 

Both of them along with other data is fed to feed forward Neural Network.

In [None]:
nlp_input_desc = Input(shape=(seq_length_title,), name='nlp_input_desc')
nlp_input_title = Input(shape=(seq_length_desc,), name='nlp_input_title')
emb1 = Embedding(input_dim=100000,output_dim=300,weights=[embedding_matrix_title])(nlp_input_title)
emb1 = SpatialDropout1D(0.3)(emb1)
emb1 = Flatten()(emb1)
emb2 = Embedding(input_dim=100000,output_dim=300,weights=[embedding_matrix_desc])(nlp_input_desc)
emb2 = SpatialDropout1D(0.3)(emb2)
emb2 = Flatten()(emb2)
meta_input = Input(shape=(82,), name='meta_input')
x = concatenate([emb1,emb2, meta_input])
x = Dense(512, activation='relu')(x)
x = Dropout(0.05)(x)
x = Dense(256, activation='relu')(x)
x = Dense(64, activation='relu')(x)
x = Dense(32, activation='relu')(x)
x = Dense(3, activation='softmax')(x)
model =  Model(inputs=[nlp_input_desc,nlp_input_title, meta_input], outputs=[x])




In [None]:
early_stopping = k1.callbacks.EarlyStopping(
    monitor='accuracy', 
    verbose=1,
    patience=30,
    mode='max',
    restore_best_weights=True)

In [None]:
model.compile(optimizer=k1.optimizers.Adam(lr=2e-4),
              loss="categorical_crossentropy",
              metrics=[tfa.metrics.F1Score(num_classes=3,average="macro",threshold=None),"accuracy" ])

In [None]:
checkpoint_path = "gs://dataproc-e3bd1f7b-2e29-4da6-a5c4-077c164fd32a-us-central1/avito/w2v/11/cp-{epoch:04d}.ckpt"

In [None]:
cp_callback = ModelCheckpoint(
    filepath=checkpoint_path, 
    verbose=1, 
    save_weights_only=True,
    period=3)

In [None]:
model.load_weights("gs://dataproc-e3bd1f7b-2e29-4da6-a5c4-077c164fd32a-us-central1/avito/w2v/3/cp-0039.ckpt")

In [None]:
history = model.fit([X_train_title,X_train_desc,X_train],y_train, epochs=100, callbacks=[cp_callback,early_stopping], validation_split=0.2, shuffle= True,batch_size=2048)

Train on 1680000 samples, validate on 420000 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30

Epoch 00003: saving model to gs://dataproc-e3bd1f7b-2e29-4da6-a5c4-077c164fd32a-us-central1/avito/w2v/11/cp-0003.ckpt
Epoch 4/30
Epoch 5/30
Epoch 6/30

Epoch 00006: saving model to gs://dataproc-e3bd1f7b-2e29-4da6-a5c4-077c164fd32a-us-central1/avito/w2v/11/cp-0006.ckpt
Epoch 7/30
Epoch 8/30
Epoch 9/30

Epoch 00009: saving model to gs://dataproc-e3bd1f7b-2e29-4da6-a5c4-077c164fd32a-us-central1/avito/w2v/11/cp-0009.ckpt
Epoch 10/30
Epoch 11/30
Epoch 12/30

Epoch 00012: saving model to gs://dataproc-e3bd1f7b-2e29-4da6-a5c4-077c164fd32a-us-central1/avito/w2v/11/cp-0012.ckpt
Epoch 13/30
Epoch 14/30
Epoch 15/30

Epoch 00015: saving model to gs://dataproc-e3bd1f7b-2e29-4da6-a5c4-077c164fd32a-us-central1/avito/w2v/11/cp-0015.ckpt
Epoch 16/30
Epoch 17/30
Epoch 18/30

Epoch 00018: saving model to gs://dataproc-e3bd1f7b-2e29-4da6-a5c4-077c164fd32a-us-central1/avito/w2v/11/cp-0018.ckpt
Epoch 19/30
Epoch 20/30
Ep

In [None]:
model.summary()

Model: "model_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
nlp_input_title (InputLayer)    (None, 60)           0                                            
__________________________________________________________________________________________________
nlp_input_desc (InputLayer)     (None, 20)           0                                            
__________________________________________________________________________________________________
embedding_1 (Embedding)         (None, 60, 300)      30000000    nlp_input_title[0][0]            
__________________________________________________________________________________________________
embedding_2 (Embedding)         (None, 20, 300)      30000000    nlp_input_desc[0][0]             
____________________________________________________________________________________________

## Results

In [None]:
print('\n# Evaluate on test data')
results = model.evaluate([X_test_title,X_test_desc,X_test], y_test, batch_size=1024)
print('test loss, test acc:', results)


# Evaluate on test data
test loss, test acc: [1.5194234940135756, 0.0, 0.8016495704650879]


In [None]:
actual = np.argmax(y_test, axis=1) 
print('\n# Generate predictions for 3 samples')
predictions = model.predict([X_test_title,X_test_desc,X_test])
result = np.argmax(predictions, axis=1) 


# Generate predictions for 3 samples


In [None]:
metrics.confusion_matrix(actual, result)

array([[  9882,   3032,  19695],
       [  3453,   3092,   9988],
       [ 21894,  10102, 219547]])

In [None]:
print(classification_report(actual,result))

              precision    recall  f1-score   support

           0       0.28      0.30      0.29     32609
           1       0.19      0.19      0.19     16533
           2       0.88      0.87      0.88    251543

    accuracy                           0.77    300685
   macro avg       0.45      0.45      0.45    300685
weighted avg       0.78      0.77      0.78    300685

