Is it possible to use AnchorText with Tokenizer instead of CountVectorizer? #74

Enantiodromis · 2021-08-01T17:00:37Z

Good afternoon. Thank you for such a great package!

Is it possible to implement AnchorsText explainer with a model which takes in Tokenizer.texts_to_sequences data?

My current implementation:

Creating a reverse dictionary

reverse_word_map = dict(map(reversed, word_index.items()))

# Function takes a tokenized sentence and returns the words
def sequence_to_text(list_of_indices):
    # Looking up words in dictionary
    words = [reverse_word_map.get(letter) for letter in list_of_indices]
    return words
my_texts = np.array(list(map(sequence_to_text, X_test_encoded)))
test_text = ' '.join(my_texts[4])

def wrapped_predict(strings):
    print(strings)
    cnn_rep = tokenizer.texts_to_sequences(strings)
    text_data = pad_sequences(cnn_rep, maxlen=30)
    print(text_data)
    prediction = model.predict(text_data)
    print(prediction)
    return model.predict(text_data)

nlp = spacy.load('en_core_web_sm')
explainer = AnchorText(nlp, ['negative', 'positive'], use_unk_distribution=True)
exp = explainer.explain_instance(test_text, wrapped_predict, threshold=0.95)

And the current output is:

['war clan versus clan touch zenith final showdown bridge body count countless demons magic swords priests versus buddhist monks beautiful visions provided maestro rest good japanese flick rainy summer night']
[[ 181 6818 3962 6818 1039 19084 332 4277 2956 519 1415 3404
2136 1193 8736 8834 3962 14769 8249 197 5440 1925 15445 245
5 766 356 6073 1320 195]]
[[0.50682825]]
['UNK UNK UNK clan touch UNK final showdown bridge UNK UNK countless UNK UNK UNK priests UNK UNK monks beautiful UNK provided UNK rest UNK japanese UNK rainy UNK UNK']
[[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 6818 1039 332 4277 2956 3404 8834 8249 197 1925 245
766 6073]]
[[0.50716233]]

Error being thrown:

ValueError: all the input arrays must have the same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s)

It appears like it is working...sort of I am not really sure how if possible I can work around this error, any help would be greatly appreciated.

The text was updated successfully, but these errors were encountered:

marcotcr · 2021-08-09T21:21:44Z

Your wrapper is fine. The problem is that we expect wrapped_predict to return an 1d array with integer predictions, and yours is returning a 2d (n, 1) with the probability of class 1 (I'm guessing). Just make sure wrapped_predict(['a', 'b', 'c']) returns something that looks like np.array([1, 0, 1])

Enantiodromis · 2021-08-15T09:58:38Z

Your wrapper is fine. The problem is that we expect wrapped_predict to return an 1d array with integer predictions, and yours is returning a 2d (n, 1) with the probability of class 1 (I'm guessing). Just make sure wrapped_predict(['a', 'b', 'c']) returns something that looks like np.array([1, 0, 1])

Thanks for the reply! Changing the wrapper implementation per your suggestion worked great.

The wrapper I am using now (Posting for anyone that might encounter similar trouble)

    def wrapped_predict(strings):
        cnn_rep = tokenizer.texts_to_sequences(strings)
        text_data = pad_sequences(cnn_rep, maxlen=30)
        prediction = model.predict(text_data)
        predicted_class = np.where(prediction > 0.5, 1,0)[0]
        return predicted_class

marcotcr closed this as completed Aug 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to use AnchorText with Tokenizer instead of CountVectorizer? #74

Is it possible to use AnchorText with Tokenizer instead of CountVectorizer? #74

Enantiodromis commented Aug 1, 2021 •

edited

marcotcr commented Aug 9, 2021

Enantiodromis commented Aug 15, 2021

Is it possible to use AnchorText with Tokenizer instead of CountVectorizer? #74

Is it possible to use AnchorText with Tokenizer instead of CountVectorizer? #74

Comments

Enantiodromis commented Aug 1, 2021 • edited

Creating a reverse dictionary

marcotcr commented Aug 9, 2021

Enantiodromis commented Aug 15, 2021

Enantiodromis commented Aug 1, 2021 •

edited