# Homework: Word Embedding

In this exercise, you will work on the skip-gram neural network architecture for Word2Vec. You will be using Keras to train your model. 

You must complete the following tasks:
1. Read/clean text files
2. Indexing (Assign a number to each word)
3. Create skip-grams (inputs for your model)
4. Create the skip-gram neural network model
5. Visualization
6. Evaluation (Using pre-trained, not using pre-trained)
    (classify topic from 4 categories) 
    
This notebook assumes you have already installed Tensorflow and Keras with python3 and had GPU enabled. If you run this exercise on GCloud using the provided disk image you are all set.



In [None]:
%tensorflow_version 2.x
%matplotlib inline
import numpy as np
import pandas as pd
import math
import glob
import re
import random
import collections
import os
import sys
import tensorflow as tf
from tensorflow.keras.preprocessing import sequence
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import GRU, Dropout
from tensorflow.keras.models import load_model
from tensorflow.keras.layers import Embedding, Reshape, Activation, Input, Dense, Masking, Conv1D, Bidirectional
from tensorflow.python.keras.layers.merge import Dot
from tensorflow.python.keras.utils import np_utils
from tensorflow.python.keras.utils.data_utils import get_file
from tensorflow.python.keras.utils.np_utils import to_categorical
from tensorflow.keras.preprocessing.sequence import skipgrams
from tensorflow.keras.preprocessing import sequence
from tensorflow.keras import backend as K
from tensorflow.keras.optimizers import Adam

random.seed(42)

# Step 1: Read/clean text files

The given code can be used to processed the pre-tokenzied text file from the wikipedia corpus. In your homework, you must replace those text files with raw text files.  You must use your own tokenizer to process your text files

In [None]:
!wget https://www.dropbox.com/s/eexden7246sgfzf/BEST-TrainingSet.zip
!wget https://www.dropbox.com/s/n87fiy25f2yc3gt/wiki.zip
!unzip wiki.zip
!unzip BEST-TrainingSet.zip

--2021-02-21 13:34:43--  https://www.dropbox.com/s/eexden7246sgfzf/BEST-TrainingSet.zip
Resolving www.dropbox.com (www.dropbox.com)... 162.125.1.18, 2620:100:601a:18::a27d:712
Connecting to www.dropbox.com (www.dropbox.com)|162.125.1.18|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /s/raw/eexden7246sgfzf/BEST-TrainingSet.zip [following]
--2021-02-21 13:34:43--  https://www.dropbox.com/s/raw/eexden7246sgfzf/BEST-TrainingSet.zip
Reusing existing connection to www.dropbox.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://uc93e5e1d4b76996852455b50004.dl.dropboxusercontent.com/cd/0/inline/BJVq30-9EQ_s0jdJlKOPXpJbwqAoBz_I2wbPiZTNAk7sH9AI90oG-9yejR3J-PgSoWG50a4nsECxW3htz8xnI21fa_nhXqGOx_gpeS0FZjdeYTnHCj-uFCQzuCL38y5vkdA/file# [following]
--2021-02-21 13:34:43--  https://uc93e5e1d4b76996852455b50004.dl.dropboxusercontent.com/cd/0/inline/BJVq30-9EQ_s0jdJlKOPXpJbwqAoBz_I2wbPiZTNAk7sH9AI90oG-9yejR3J-PgSoWG50a4nsECxW3htz8xnI21

In [None]:
#Step 1: read the wikipedia text file
with open("wiki/thwiki_chk.txt") as f:
    #the delimiter is one or more whitespace characters
    input_text = re.compile(r"\s+").split(f.read()) 
    #exclude an empty string from our input
    input_text = [word for word in input_text if word != ''] 

In [None]:
tokens = input_text
print(tokens[:10])
print("total word count:", len(tokens))

['หน้า', 'หลัก', 'วิกิพีเดีย', 'ดำเนินการ', 'โดย', 'มูลนิธิ', 'วิกิ', 'มีเดีย', 'องค์กร', 'ไม่']
total word count: 36349066


# Step 2: Indexing (Assign a number to each word)

The code below generates an indexed dataset(each word is represented by a number), a dictionary, a reversed dictionary

## <font color='blue'>Homework Question 1:</font>
<font color='blue'>“UNK” is often used to represent an unknown word (a word which does not exist in your dictionary/training set). You can also represent a rare word with this token as well.  How do you define a rare word in your program? Explain in your own words and capture the screenshot of your code segment that is a part of this process</font>

 + <font color='blue'>edit or replace create_index with your own code to set a threshold for rare words and replace them with "UNK"</font>
<div>From sorting word_count which sort from high to low count,
for min_thres_unk,find the index of the last word that count more than the min_thres_unk
for max_word_count,if it equal to None, set the value to len of word_count,
use the less one to be the last index of newword count and remove all other word that after that index from word_count.</div>

In [None]:
#step 2:Build dictionary and build a dataset(replace each word with its index)
def create_index(input_text, min_thres_unk = 0, max_word_count = None):
    # TODO#1 : edit or replace this function
    words = [word for word in input_text ]
    word_count = list()

    #use set and len to get the number of unique words
    word_count.extend(collections.Counter(words).most_common(len(set(words))))
    print(len(word_count))
    mtu=len(word_count)-1
    while(word_count[mtu][1] < min_thres_unk):
      mtu-=1
    
    if  max_word_count == None:
       max_word_count=len(word_count)
    final = min(max_word_count,mtu+1)
    print(final)
    word_count=word_count[:final]

    #include a token for unknown word
    word_count.append(("UNK",0))
  
    #print out 10 most frequent words
    print(word_count[:10])

    dictionary = dict()
    dictionary["for_keras_zero_padding"] = 0

    

    for word in word_count:
        if word == "UNK":
          print(len(dictionary))
        dictionary[word[0]] = len(dictionary)
    reverse_dictionary = dict(zip(dictionary.values(), dictionary.keys()))

    data = list()
    for word in input_text:
      if word in dictionary:
        data.append(dictionary[word])
      else :
        data.append(dictionary["UNK"])

    return data,dictionary, reverse_dictionary

# call method with min_thres_unk=1ß
dataset, dictionary, reverse_dictionary = create_index(tokens, 5)
print(len(dataset))
print(len(dictionary))


701355
127222
[('ที่', 950006), ('ใน', 897329), ('เป็น', 726847), ('และ', 668116), ('การ', 619128), ('มี', 536738), ('ของ', 532237), ('ได้', 508117), (')', 359576), ('"', 357830)]
36349066
127224


In [None]:
print("output sample (dataset):",dataset[:10])
for i in dataset[:10]:
  print(reverse_dictionary[i])
print("output sample (dictionary):",{k: dictionary[k] for k in list(dictionary)[:10]})
print("output sample (dictionary):",{k: dictionary[k] for k in list(dictionary)[-5:]})
print("output sample (reverse dictionary):",{k: reverse_dictionary[k] for k in list(reverse_dictionary)[:10]})
print(dictionary["UNK"])
print(len(dictionary))


output sample (dataset): [229, 208, 2453, 573, 15, 1829, 7149, 3124, 681, 24]
หน้า
หลัก
วิกิพีเดีย
ดำเนินการ
โดย
มูลนิธิ
วิกิ
มีเดีย
องค์กร
ไม่
output sample (dictionary): {'for_keras_zero_padding': 0, 'ที่': 1, 'ใน': 2, 'เป็น': 3, 'และ': 4, 'การ': 5, 'มี': 6, 'ของ': 7, 'ได้': 8, ')': 9}
output sample (dictionary): {'ค่ายมูล': 127219, 'HFE': 127220, 'ปก์': 127221, 'คาร์ลอฟ': 127222, 'UNK': 127223}
output sample (reverse dictionary): {0: 'for_keras_zero_padding', 1: 'ที่', 2: 'ใน', 3: 'เป็น', 4: 'และ', 5: 'การ', 6: 'มี', 7: 'ของ', 8: 'ได้', 9: ')'}
127223
127224


# Step3: Create skip-grams (inputs for your model)
Keras has a skipgrams-generator, the cell below shows us how it generates skipgrams 

## <font color='blue'>Homework Question 2:</font>
<font color='blue'>The negative samples are sampled from sampling_table.  Look through Keras source code to find out how they sample negative samples. Discuss the sampling technique taught in class and compare it to the Keras source code.</font>



<font color='red'>Q2: PUT YOUR ANSER HERE!!!</font>
<div>ans:</div>
<div>
It use the different technique to sampling the probability.In class we use the probality that come from softmax and power 3/4 on them which will make probability of the high one increse less than the small one,which will make rare words appear increase.   
</div>
<div>

</div>
<div>
But in keras ,it use (min(1, sqrt(word_frequency / sampling_factor) /(word_frequency / sampling_factor))) to sampling the probabilty, by using the rank instead of probabilty it will become "frequency(rank) ~ 1/(rank * (log(rank) + gamma) + 1/2 - 1/(12*rank))" where gamma is the Euler-Mascheroni constant,by using those function ,it will make the similar outcome that probability of the high one increse less than the small one.
</div>

In [None]:
# Step 3: Create data samples
vocab_size = len(dictionary)
skip_window = 1       # How many words to consider left and right.

# TODO#2 check out keras source code and find out how their sampling technique works. Describe it in your own words.
sample_set= dataset[:10]
sampling_table = sequence.make_sampling_table(vocab_size)
couples, labels = skipgrams(sample_set, vocab_size, window_size=skip_window, sampling_table=sampling_table)
word_target, word_context = zip(*couples)
word_target = np.array(word_target, dtype="int32")
word_context = np.array(word_context, dtype="int32")

print(couples, labels)

for i in range(8):
    print(reverse_dictionary[couples[i][0]],reverse_dictionary[couples[i][1]])



[[3124, 7149], [3124, 93851], [3124, 54988], [2453, 85182], [208, 229], [3124, 681], [2453, 573], [24, 28894], [24, 681], [208, 2453], [2453, 91925], [208, 58879], [208, 71427], [2453, 208]] [1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1]
มีเดีย วิกิ
มีเดีย พระสาโรชรั
มีเดีย ไชลด์
วิกิพีเดีย ควิเบกแอร์เพล
หลัก หน้า
มีเดีย องค์กร
วิกิพีเดีย ดำเนินการ
ไม่ เกิดพันธะ


# Step 4: create the skip-gram model
## <font color='blue'>Homework Question 3:</font>
 <font color='blue'>Q3:  In your own words, discuss why Sigmoid is chosen as the activation function in the  skip-gram model.</font>

<font color='red'>Q3: PUT YOUR ANSER HERE!!!</font>
<div>Ans : The sigmoid function is use to normalize the scalar output to be a probability (sum of each vector out put = 1) which is easy to use to predict the similar word.
</div>

In [None]:
#reference: https://github.com/nzw0301/keras-examples/blob/master/Skip-gram-with-NS.ipynb
dim_embedddings = 32
V= len(dictionary)

#step1: select the embedding of the target word from W
w_inputs = Input(shape=(1, ), dtype='int32')
w = Embedding(V+1, dim_embedddings)(w_inputs)

#step2: select the embedding of the context word from C
c_inputs = Input(shape=(1, ), dtype='int32')
c  = Embedding(V+1, dim_embedddings)(c_inputs)

#step3: compute the dot product:c_k*v_j
o = Dot(axes=2)([w, c])
o = Reshape((1,), input_shape=(1, 1))(o)

#step4: normailize dot products into probability
o = Activation('sigmoid')(o)
#TO DO#4 Question: Why sigmoid?

SkipGram = Model(inputs=[w_inputs, c_inputs], outputs=o)
SkipGram.summary()
opt=Adam(lr=0.01)
SkipGram.compile(loss='binary_crossentropy', optimizer=opt)

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(None, 1)]          0                                            
__________________________________________________________________________________________________
input_2 (InputLayer)            [(None, 1)]          0                                            
__________________________________________________________________________________________________
embedding (Embedding)           (None, 1, 32)        4071200     input_1[0][0]                    
__________________________________________________________________________________________________
embedding_1 (Embedding)         (None, 1, 32)        4071200     input_2[0][0]                    
______________________________________________________________________________________________

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')


Mounted at /content/gdrive


In [None]:
# you don't have to spend too much time training for your homework, you are allowed to do it on a smaller corpus
# currently the dataset is 1/20 of the full text file.
SkipGram.load_weights('/content/gdrive/MyDrive/my_skipgram32_weights-hw.h5')
for _ in range(5):
  prev_i=0
    #it is likely that your GPU won't be able to handle large input
    #just do it 100000 words at a time
  for i in range(len(dataset)//100000):
      #generate skipgrams
    data, labels = skipgrams(sequence=dataset[prev_i*100000:(i*100000)+100000], vocabulary_size=V, window_size=2, negative_samples=4.)
    x = [np.array(x) for x in zip(*data)]
    y = np.array(labels, dtype=np.int32)
    if x:
      loss = SkipGram.train_on_batch(x, y)
    prev_i = i 
    print(loss,i*100000)
         

  SkipGram.save_weights('my_skipgram32_weights-hw.h5')
  SkipGram.save_weights('/content/gdrive/MyDrive/my_skipgram32_weights-hw.h5')


0.0988653302192688 0
0.10142631083726883 100000
0.10059817880392075 200000
0.09497367590665817 300000
0.09488150477409363 400000
0.09738943725824356 500000
0.09548914432525635 600000
0.09712303429841995 700000
0.10001739859580994 800000
0.10326787084341049 900000
0.10384557396173477 1000000
0.09873879700899124 1100000
0.1023455560207367 1200000
0.09986156970262527 1300000
0.10066931694746017 1400000
0.10755951702594757 1500000
0.10618967562913895 1600000
0.10470178723335266 1700000
0.1046474277973175 1800000
0.10527408123016357 1900000
0.10705284029245377 2000000
0.103509321808815 2100000
0.10080412030220032 2200000
0.1024264395236969 2300000
0.10145581513643265 2400000
0.10260861366987228 2500000
0.10193868726491928 2600000
0.10963566601276398 2700000
0.11665118485689163 2800000
0.11252246797084808 2900000
0.10630818456411362 3000000
0.10755328088998795 3100000
0.11252421885728836 3200000
0.11415430903434753 3300000
0.11291901022195816 3400000
0.11073683202266693 3500000
0.10936640202

In [None]:
SkipGram.load_weights('/content/gdrive/MyDrive/my_skipgram32_weights-hw.h5')

In [None]:
#Get weight of the embedding layer
final_embeddings=SkipGram.get_weights()[0]
print(final_embeddings)
print(final_embeddings.shape)

[[ 4.9895022e-02  3.6874186e-02 -1.5272655e-02 ... -2.0491375e-02
   4.8222687e-02  1.3819542e-02]
 [ 7.6262289e-01  2.5289753e-01  1.5278080e+00 ... -9.0818113e-01
  -2.7133682e-01 -6.1320853e-01]
 [-3.3188692e-01 -8.4968430e-01  2.1781044e-01 ... -3.4764460e-01
  -2.1680659e-01 -1.8166181e-01]
 ...
 [-4.2230379e-02  9.6028559e-03 -1.3267864e-02 ... -4.6134412e-02
  -2.4837995e-02 -4.4472303e-02]
 [ 8.9145416e-01 -5.9044063e-02  7.6835647e-02 ... -5.7215178e-01
  -7.5376970e-01 -1.9977589e-01]
 [ 1.9089136e-02  4.3877218e-02 -3.0381596e-02 ... -1.9369472e-02
   1.5525464e-02  1.0363236e-03]]
(127225, 32)


# Step 5: Intrinsic Evaluation: Word Vector Analogies
## <font color='blue'>Homework Question 4: </font>
<font color='blue'> Read section 2.1 and 2.3 in this [lecture note](http://web.stanford.edu/class/cs224n/readings/cs224n-2019-notes02-wordvecs2.pdf). Come up with 10 semantic analogy examples and report results produced by your word embeddings </font>


In [None]:
# TODO#4:Come up with 10 semantic analogy examples and report results produced by your word embeddings 
#and tell us what you observe 
def W2V(wordInput):
   if wordInput not in dictionary:
        print(wordInput)
   return final_embeddings[word_to_index(wordInput)]
def analogy(wordArray):
  temp = (W2V(wordArray[0])-W2V(wordArray[1]))+W2V(wordArray[2])
  ans=np.dot(final_embeddings,temp)
  ans_index = np.argmax(ans)
  print(wordArray[1],":",wordArray[0]," :: ",wordArray[2],":",reverse_dictionary[ans_index])
  print("")
  return 

wt0=['พระราชา',"ชาย","หญิง"]
analogy(wt0)
wt1=["แล้ง","ร้อน","ชื้น"]
analogy(wt1)
wt2=["ไก่","ย่าง","ทอด"]
analogy(wt2)
wt3=["นาง","นาย","เรา"]
analogy(wt3)
wt4=["พ่อ","บิดา","มารดา"]
analogy(wt4)
wt5=["หนาว","หิมะ","ลาวา"]
analogy(wt5)
wt6=["บาท","ไทย","อเมริกา"]
analogy(wt6)
wt7=["บ่าว","สาว","นาง"]
analogy(wt7)
wt8=["เทวดา","สวรรค์","นรก"]
analogy(wt8)
wt9=["หิน","ภูเขา","ทะเล"]
analogy(wt9)

ชาย : พระราชา  ::  หญิง : หม่อม

ร้อน : แล้ง  ::  ชื้น : ป่า

ย่าง : ไก่  ::  ทอด : ปลา

นาย : นาง  ::  เรา : You

บิดา : พ่อ  ::  มารดา : สาว

หิมะ : หนาว  ::  ลาวา : ร้อน

ไทย : บาท  ::  อเมริกา : ดอลลา

สาว : บ่าว  ::  นาง : พระยา

สวรรค์ : เทวดา  ::  นรก : ร้าย

ภูเขา : หิน  ::  ทะเล : น้ำ



# Step 6: Extrinsic Evaluation

## <font color='blue'>Homework Question5:</font>
<font color='blue'>
Use the word embeddings from the skip-gram model as pre-trained weights in a classification model. Compare the result the with the same classification model that does not use the pre-trained weights. 
</font>


In [None]:
all_news_filepath = glob.glob('BEST-TrainingSet/news/*.txt')
all_novel_filepath = glob.glob('BEST-TrainingSet/novel/*.txt')
all_article_filepath = glob.glob('BEST-TrainingSet/article/*.txt')
all_encyclopedia_filepath = glob.glob('BEST-TrainingSet/encyclopedia/*.txt')

In [None]:
#preparing data for the classificaiton model
#In your homework, we will only use the first 2000 words in each text file
#any text file that has less than 2000 words will be padded
#reason:just to make this homework feasible under limited time and resource
max_length = 2000
def word_to_index(word):
    if word in dictionary:
        return dictionary[word]
    else:#if unknown
        return dictionary["UNK"]


def prep_data():
    input_text = list()
    for textfile_path in [all_news_filepath, all_novel_filepath, all_article_filepath, all_encyclopedia_filepath]:
        for input_file in textfile_path:
            f = open(input_file,"r") #open file with name of "*.txt"
            text = re.sub(r'\|', ' ', f.read()) # replace separation symbol with white space           
            text = re.sub(r'<\W?\w+>', '', text)# remove <NE> </NE> <AB> </AB> tags
            text = text.split() #split() method without an argument splits on whitespace 
            indexed_text = list(map(lambda x:word_to_index(x), text[:max_length])) #map raw word string to its index   
            if 'news' in input_file:
                input_text.append([indexed_text,0]) 
            elif 'novel' in input_file:
                input_text.append([indexed_text,1]) 
            elif 'article' in input_file:
                input_text.append([indexed_text,2]) 
            elif 'encyclopedia' in input_file:
                input_text.append([indexed_text,3]) 
            
            f.close()
    random.shuffle(input_text)
    return input_text

input_data = prep_data()
train_data = input_data[:int(len(input_data)*0.6)]
val_data = input_data[int(len(input_data)*0.6):int(len(input_data)*0.8)]
test_data = input_data[int(len(input_data)*0.8):]

train_input = [data[0] for data in train_data]
train_input = sequence.pad_sequences(train_input, maxlen=max_length) #padding
train_target = [data[1] for data in train_data]
train_target=to_categorical(train_target, num_classes=4)

val_input = [data[0] for data in val_data]
val_input = sequence.pad_sequences(val_input, maxlen=max_length) #padding
val_target = [data[1] for data in val_data]
val_target=to_categorical(val_target, num_classes=4)

test_input = [data[0] for data in test_data]
test_input = sequence.pad_sequences(test_input, maxlen=max_length) #padding
test_target = [data[1] for data in test_data]
test_target=to_categorical(test_target, num_classes=4)

del input_data, val_data,train_data, test_data

In [None]:
#the classification model
#TODO#5 find out how to initialize your embedding layer with pre-trained weights, evaluate and observe
#don't forget to compare it with the same model that does not use pre-trained weights
#you can use your own model too! and feel free to customize this model as you wish
cls_model = Sequential()
cls_model.add(Embedding(len(dictionary)+1, 32, input_length=max_length,mask_zero=True))
cls_model.add(GRU(32))
cls_model.add(Dropout(0.5))
cls_model.add(Dense(4, activation='softmax'))
opt=Adam(lr=0.01)
cls_model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
cls_model.summary()
print('Train...')
cls_model.fit(train_input, train_target,
          epochs=10,
          validation_data=[val_input, val_target])

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_2 (Embedding)      (None, 2000, 32)          4071200   
_________________________________________________________________
gru (GRU)                    (None, 32)                6336      
_________________________________________________________________
dropout (Dropout)            (None, 32)                0         
_________________________________________________________________
dense (Dense)                (None, 4)                 132       
Total params: 4,077,668
Trainable params: 4,077,668
Non-trainable params: 0
_________________________________________________________________
Train...
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7f65d1c5e7f0>

In [None]:
results = cls_model.evaluate(test_input, test_target)
print("test loss, test acc:", results)

test loss, test acc: [3.2379207611083984, 0.4313725531101227]


In [None]:
#the classification model
#TODO#5 find out how to initialize your embedding layer with pre-trained weights, evaluate and observe
#don't forget to compare it with the same model that does not use pre-trained weights
#you can use your own model too! and feel free to customize this model as you wish
cls_weight_model = Sequential()
#cls_weight_model.add(Embedding(len(dictionary)+1, embeddings_initializer=Constant(SkipGram.get_weights()), 32, input_length=max_length,trainable=True,mask_zero=True))
cls_weight_model.add(Embedding(len(dictionary)+1, 32, weights=[final_embeddings], input_length=max_length,mask_zero=True))
cls_weight_model.add(GRU(32))
cls_weight_model.add(Dropout(0.5))
cls_weight_model.add(Dense(4, activation='softmax'))
opt=Adam(lr=0.01)
cls_weight_model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
cls_weight_model.summary()
print('Train...')
cls_weight_model.fit(train_input, train_target,
          epochs=10,
          validation_data=[val_input, val_target])

Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_4 (Embedding)      (None, 2000, 32)          4071200   
_________________________________________________________________
gru_1 (GRU)                  (None, 32)                6336      
_________________________________________________________________
dropout_1 (Dropout)          (None, 32)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 4)                 132       
Total params: 4,077,668
Trainable params: 4,077,668
Non-trainable params: 0
_________________________________________________________________
Train...
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7f653dc6b6a0>

In [None]:
results = cls_weight_model.evaluate(test_input, test_target)
print("test loss, test acc:", results)

test loss, test acc: [1.5284396409988403, 0.656862735748291]


In [None]:
print("From the evaluation, we can see that the model using pre-trained weights has an accurancy more than the model which not use the pre-trained weights. ")

From the evaluation, we can see that the model using pre-trained weights has an accurancy more than the model which not use the pre-trained weights. 
