## Ingredient Phrase Model

This program will create a model that is designed to separately identify food name, quantity and other information as Named Entity Recognition tags from a word ingredient list.

In [1]:
%autosave 0

Autosave disabled


In [2]:
import pandas as pd
import numpy as np
import os
import pickle
import re

# Generate training data using NY Times ingredient phrase tagger
from ingredient_phrase_tagger.training.cli import Cli
from ingredient_phrase_tagger.training import utils, reshape

from sklearn.model_selection import train_test_split

# Model libraries
from tagger_model import *

from IPython.core.debugger import set_trace

  (fname, cnt))
  (fname, cnt))
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


In [3]:
# Some default parameters
n_word_embedding_nodes=300
n_tag_embedding_nodes=150
n_RNN_nodes=400
n_dense_nodes=200

dataPath = '../data'

In [4]:
filename = 'cleaned_nyt_ingred_data.pkl'

# reshape.read_and_save_raw_data(dataPath, filename)
cleaned_dat = pd.read_pickle(os.path.join(dataPath, filename))

# clean up tags to remove prefixes so that there are less tags to predict
cleaned_dat['tags'] = [[re.sub(r'[B|I]-', '', tag) for tag in tags] for tags in cleaned_dat.tags]

In [5]:
train, test = train_test_split(cleaned_dat, test_size = .2, random_state=10)

In [10]:
# Create lexicon
lexicon = lexiconTransformer(words_min_freq=2, unknown_tag_token='OTHER', saveNamePrefix='Ingred_mod')

lexicon.fit(train.sents, train.tags)

train['sent_indx'], train['tag_indx'] = lexicon.transform(train.sents, train.tags)

# Get length of longest sequence
max_seq_len = get_max_seq_len(train['sent_indx'])

#Add one to max length for offsetting sequence by 1
train_padded_words = pad_idx_seqs(train['sent_indx'], 
                                  max_seq_len + 1) 

train_padded_tags = pad_idx_seqs(train['tag_indx'],
                                 max_seq_len + 1, value=0)

# Shift tags by 1 for training since we should use tag of previous 
# iteration in next iteration
shifted_train_padded_tags = np.insert(train_padded_tags, 0, 1, axis=1)[:, :-1]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


In [7]:
# Convert to one-hot vector encoding for y
# train_y = [to_categorical(i, num_classes=len(lexicon.tags_lexicon) + 1) for i in train_padded_tags]

In [11]:
mod_save_name = 'ingredient_model_clean_tags_crf_wordOnly'
crf_mod = True

In [12]:
ingredient_model = run_training_model(train_padded_words, train_padded_tags, 
                                      train_padded_tags, mod_save_name, lexicon, crf=crf_mod,
                                      print_summary=True, batch_size=256, epochs=200,
                                      n_word_embedding_nodes=n_word_embedding_nodes,
                                      n_tag_embedding_nodes=n_tag_embedding_nodes,
                                      n_RNN_nodes=n_RNN_nodes, 
                                      n_dense_nodes=n_dense_nodes)

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
word_input_layer (InputLayer)   (None, 56)           0                                            
__________________________________________________________________________________________________
tag_input_layer (InputLayer)    (None, 56)           0                                            
__________________________________________________________________________________________________
word_embedding_layer (Embedding (None, 56, 300)      1690800     word_input_layer[0][0]           
__________________________________________________________________________________________________
tag_embedding_layer (Embedding) (None, 56, 150)      1500        tag_input_layer[0][0]            
__________________________________________________________________________________________________
concat_emb

Epoch 20/200
Epoch 00020: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 21/200
Epoch 00021: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 22/200
Epoch 00022: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 23/200
Epoch 00023: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 24/200
Epoch 00024: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 25/200
Epoch 00025: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 26/200
Epoch 00026: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 27/200
Epoch 00027: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 28/200
Epoch 00028: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 29/200
Epoch 00029: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 30/200
Epoch 00030: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 31/200
Epoch 00031: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 32/200
Epoch 00032: saving model t

Epoch 48/200
Epoch 00048: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 49/200
Epoch 00049: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 50/200
Epoch 00050: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 51/200
Epoch 00051: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 52/200
Epoch 00052: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 53/200
Epoch 00053: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 54/200
Epoch 00054: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 55/200
Epoch 00055: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 56/200
Epoch 00056: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 57/200
Epoch 00057: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 58/200
Epoch 00058: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 59/200
Epoch 00059: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 60/200
Epoch 00060: saving model t

Epoch 76/200
Epoch 00076: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 77/200
Epoch 00077: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 78/200
Epoch 00078: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 79/200
Epoch 00079: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 80/200
Epoch 00080: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 81/200
Epoch 00081: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 82/200
Epoch 00082: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 83/200
Epoch 00083: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 84/200
Epoch 00084: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 85/200
Epoch 00085: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 86/200
Epoch 00086: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 87/200
Epoch 00087: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 88/200
Epoch 00088: saving model t

Epoch 104/200
Epoch 00104: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 105/200
Epoch 00105: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 106/200
Epoch 00106: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 107/200
Epoch 00107: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 108/200
Epoch 00108: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 109/200
Epoch 00109: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 110/200
Epoch 00110: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 111/200
Epoch 00111: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 112/200
Epoch 00112: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 113/200
Epoch 00113: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 114/200
Epoch 00114: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 115/200
Epoch 00115: saving model to models\ingredient_model_crf_tmp1.hdf5
Epoch 116/200
Epoch 00116: s

KeyboardInterrupt: 

In [None]:
test['sent_indx'], test['tag_indx'] = lexicon.transform(test.sents, test.tags)

In [None]:
test_mod = create_test_model(mod_save_name, lexicon, crf=crf_mod, 
                             n_word_embedding_nodes=n_word_embedding_nodes,
                             n_tag_embedding_nodes=n_tag_embedding_nodes,
                             n_RNN_nodes=n_RNN_nodes, 
                             n_dense_nodes=n_dense_nodes)

In [None]:
preds = predict_new_tag(test_mod, test, lexicon)

In [None]:
evaluate_model(preds, test, print_sample=True)