Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to use AnchorText with Tokenizer instead of CountVectorizer? #74

Closed
Enantiodromis opened this issue Aug 1, 2021 · 2 comments

Comments

@Enantiodromis
Copy link

Enantiodromis commented Aug 1, 2021

Good afternoon. Thank you for such a great package!

Is it possible to implement AnchorsText explainer with a model which takes in Tokenizer.texts_to_sequences data?

My current implementation:

Creating a reverse dictionary

reverse_word_map = dict(map(reversed, word_index.items()))

# Function takes a tokenized sentence and returns the words
def sequence_to_text(list_of_indices):
    # Looking up words in dictionary
    words = [reverse_word_map.get(letter) for letter in list_of_indices]
    return words
my_texts = np.array(list(map(sequence_to_text, X_test_encoded)))
test_text = ' '.join(my_texts[4])

def wrapped_predict(strings):
    print(strings)
    cnn_rep = tokenizer.texts_to_sequences(strings)
    text_data = pad_sequences(cnn_rep, maxlen=30)
    print(text_data)
    prediction = model.predict(text_data)
    print(prediction)
    return model.predict(text_data)

nlp = spacy.load('en_core_web_sm')
explainer = AnchorText(nlp, ['negative', 'positive'], use_unk_distribution=True)
exp = explainer.explain_instance(test_text, wrapped_predict, threshold=0.95)

And the current output is:

['war clan versus clan touch zenith final showdown bridge body count countless demons magic swords priests versus buddhist monks beautiful visions provided maestro rest good japanese flick rainy summer night']
[[ 181 6818 3962 6818 1039 19084 332 4277 2956 519 1415 3404
2136 1193 8736 8834 3962 14769 8249 197 5440 1925 15445 245
5 766 356 6073 1320 195]]
[[0.50682825]]
['UNK UNK UNK clan touch UNK final showdown bridge UNK UNK countless UNK UNK UNK priests UNK UNK monks beautiful UNK provided UNK rest UNK japanese UNK rainy UNK UNK']
[[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 6818 1039 332 4277 2956 3404 8834 8249 197 1925 245
766 6073]]
[[0.50716233]]

Error being thrown:

ValueError: all the input arrays must have the same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s)

It appears like it is working...sort of I am not really sure how if possible I can work around this error, any help would be greatly appreciated.

@marcotcr
Copy link
Owner

marcotcr commented Aug 9, 2021

Your wrapper is fine. The problem is that we expect wrapped_predict to return an 1d array with integer predictions, and yours is returning a 2d (n, 1) with the probability of class 1 (I'm guessing). Just make sure wrapped_predict(['a', 'b', 'c']) returns something that looks like np.array([1, 0, 1])

@marcotcr marcotcr closed this as completed Aug 9, 2021
@Enantiodromis
Copy link
Author

Your wrapper is fine. The problem is that we expect wrapped_predict to return an 1d array with integer predictions, and yours is returning a 2d (n, 1) with the probability of class 1 (I'm guessing). Just make sure wrapped_predict(['a', 'b', 'c']) returns something that looks like np.array([1, 0, 1])

Thanks for the reply! Changing the wrapper implementation per your suggestion worked great.

The wrapper I am using now (Posting for anyone that might encounter similar trouble)

    def wrapped_predict(strings):
        cnn_rep = tokenizer.texts_to_sequences(strings)
        text_data = pad_sequences(cnn_rep, maxlen=30)
        prediction = model.predict(text_data)
        predicted_class = np.where(prediction > 0.5, 1,0)[0]
        return predicted_class

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants