# Prediction

With this notebook I try to load the model and tokenizer, pre-process the input for prediction, and finally make the prediction.

### Load model and tokenizer

In [5]:
import os
import tensorflow as tf
from tensorflow import keras

In [6]:
# Recreate the exact same model, including its weights and the optimizer
new_model = tf.keras.models.load_model('model/model1.h5')

# Show the model architecture
new_model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, None, 64)          512000    
_________________________________________________________________
bidirectional (Bidirectional (None, 128)               66048     
_________________________________________________________________
dense (Dense)                (None, 64)                8256      
_________________________________________________________________
dense_1 (Dense)              (None, 4)                 260       
Total params: 586,564
Trainable params: 586,564
Non-trainable params: 0
_________________________________________________________________


In [68]:
import pickle

# loading tokenizer
with open('model/tokenizer.pickle', 'rb') as handle:
    tokenizer = pickle.load(handle)
    
    
word_index = tokenizer.word_index
dict(list(word_index.items())[0:10])

{'<OOV>': 1,
 'experience': 2,
 'data': 3,
 'work': 4,
 'sales': 5,
 'team': 6,
 'skills': 7,
 'business': 8,
 'ability': 9,
 'the': 10}

### Pre-process input (cv)

In [98]:
import docx

doc =  docx.Document('data/monster-cv-template-sales-manager.docx')

In [99]:
def getText(filename):
    doc = docx.Document(filename)
    fullText = []
    for para in doc.paragraphs:
        fullText.append(para.text.replace('\t', ' '))
    return ' '.join(fullText), doc

In [100]:
text, doc = getText('data/monster-cv-template-sales-manager.docx')

In [101]:
doc

<docx.document.Document at 0x7fadef0cd550>

In [94]:
import nltk
from nltk.corpus import stopwords
nltk.download('stopwords')

STOPWORDS = set(stopwords.words('english'))

job_list = []

for word in STOPWORDS:
    token = ' ' + word + ' '
    job = text.replace(token, ' ')
    job = job.replace(' ', ' ')

job_list.append(job)

[nltk_data] Downloading package stopwords to /home/robin/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [95]:
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

import numpy as np

max_length = 1000
trunc_type = 'post'
padding_type = 'post'

seq = tokenizer.texts_to_sequences(job_list)
padded = pad_sequences(seq, maxlen=max_length, padding=padding_type, truncating=trunc_type)
pred = new_model.predict(padded)
labels = ['data scientist', 'sales manager', 'front-office manager', 'front-end developer']

In [96]:
print(pred, labels[np.argmax(pred)])

[[0.05724354 0.35578763 0.28760016 0.2993687 ]] sales manager


In [102]:
labels[np.argmax(pred)]

'sales manager'

In [97]:
reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])

def decode_job(text):
    return ' '.join([reverse_word_index.get(i, '?') for i in text])
print(decode_job(padded[0]))
print('---')
print(job_list[0])

<OOV> wood address flat 0 any road any town <OOV> email telephone <OOV> 000 000 000 personal statement a hard working knowledgeable and target oriented sales manager an extensive successful sales record builds and maintains a loyal client base through strong relationship building skills and excels at devising strategies for increased sales skilled in bringing out the best in staff able to manage effectively and recruit talent strong <OOV> and time management ability skilled in planning scheduling and meeting deadlines driven to succeed a valuable addition to a forward thinking company strong opportunities for progression key achievements company achieved area sales of <OOV> <OOV> in one year company <OOV> item sales up from 400 to 1000 a week company <OOV> the rising star award date company achieved 1 adviser in eight out of 12 months and <OOV> ranked in the in top four every month company achieved record breaking sales of <OOV> consistently brought in half of the overall monthly sales