Found input variables with inconsistent numbers of samples: [5000, 1] #35

courageon · 2016-11-02T15:59:08Z

Not sure if a new version of scikit-learn is messing this up or not but I get this error when trying to run an explanation:

Found input variables with inconsistent numbers of samples: [5000, 1]

The outer-error occurs in lime_base.py here:
https://github.com/marcotcr/lime/blob/master/lime/lime_base.py#L75

The inner error is thrown in scikit-learn here:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/validation.py#L180

I have tried to follow the multi-class notebook example as closely as I could but I do not see anything I could change to make my data look any more like the one in the example. That is, all of my classifier outputs look exactly like what's given in the example.

Any suggestions?

Thanks!

marcotcr · 2016-11-03T19:22:36Z

Hello,
You're getting this error running the multi-class notebook? If so, a few questions:

what line in the notebook?
what's your version of python?
what's your version of sklearn?
what's your operating system?
Thanks!

marcotcr · 2016-11-12T22:39:43Z

I just tried all of the notebooks with the newest version of sklearn, and they work, so it's probably not related to sklearn.

nikodrum · 2016-11-16T15:08:57Z

Same problem. @courageon how did you solve it?

courageon · 2016-11-16T15:44:50Z

Sorry for such a long delay in response. I got pulled off the task to look at something else for a while. Thank you for looking into it @marcotcr.

LIME works fine, I made some wrong assumptions in how the predict callback was supposed to work. @nikodrum take a look at your predict callback in your explain_instance call. explain_instance sends a list of items to be predicted by the predict callback. I was only expecting a single item at a time and was therefore only returning a single item at a time. That error appears because I was only returning one prediction, instead of the requested 5000.

@marcotcr Would it be possible to update the comment to lime_text's explain_instance function to point this out? Not being very familiar with scikit's predict_proba function, it wasn't very clear (to me at least) that this was the expected case.

Once that was fixed though, everything else fell into place and I started getting some interesting results from my model. Very cool!

marcotcr · 2016-11-16T22:44:20Z

Yeah, the comments were definitely wrong, thanks for pointing it out.

DianeBouchacourt · 2019-07-26T13:42:31Z

@courageon did you feed multiple samples in the end ?

I am having the same error in my own code, and I don't have it neither when I use num_samples = 1 in the call of explain_instance. But I have the warning from python3.6/site-packages/sklearn/linear_model/ridge.py

Singular matrix in solving dual problem. Using least-squares solution instead.

which comes from the fact that I have 1 sample I guess ? Any idea why in comparison the notebook run fine whereas it uses 1 sample too it seems ?

Thanks !

DianeBouchacourt · 2019-07-26T14:39:21Z

OK I think I understand now, the 5000 samples are the data created in explain_instance as perturbed input.

whyisyoung · 2019-10-10T22:11:35Z

@DianeBouchacourt Hi, how did you solve the following warning?

Singular matrix in solving dual problem. Using least-squares solution instead.

I changed the num_samples in the explain_instance() function to a smaller number (e.g., 32 or 500). But num_samples = 1000 didn't produce such warnings. I'm using ~2,500 samples for training, ~1,300 samples for testing. So I guess the default num_samples = 5000 is too big for my case.

craigmassie · 2019-11-24T15:36:00Z

Similar issue here. If I set num_samples=100 I receive inconsistency with: [100, 10]. num_sample=1000 and inconsistency of [1000,100]. Any ideas? Code is here: https://github.com/craigmassie/MachineLearningParadigm/blob/master/VisAndExplain.ipynb

sheisjw · 2020-01-14T10:08:31Z

@courageon I have exactly the same issue. I defined a prediction function because model.predict only returns one value per data point. I encountered Found input variables with inconsistent numbers of samples: [5000, 1] when using explain_instance. Can you help here?

    def func2(text_sample):
        proba_yes = pipeline.predict(text_sample)[0][0] #probability of yes
        proba_no = 1 - proba_yes
        print(proba_no, proba_yes)
        return np.array([[proba_no, proba_yes]])
    return func2

F = pred_f()
cnn_features = explainer.explain_instance(preprocess_text_minimal(text_sample), F, num_features=2)```

elliottash · 2020-02-03T15:39:38Z

Hi all,
I am also having this issue, using a binary classification. My pipeline is a text encoder followed by sklearn logit. And I get the listed error.
Any help would be appreciated and thanks!

EDIT: I fixed it. the problem was that my text encoder wouldn't work on lists of inputs.

tharix · 2020-09-22T19:44:19Z

Found input variables with inconsistent number of samples: [123, 491]

I need help with this please

13Ashu · 2020-12-03T16:23:27Z

I faced the same issue.
The issue arises because most of the times our custom predict function either takes in input as a str or as a list.
To solve it, you need to create a predict function that handles both the types of inputs and returns the probabilities in the same way as predict_proba does.

My model always needs the input as a list. I have designed it in such a way that passing "predict_proba" hyperparameter gives us the same model.predict_proba output as sklearn models.

def predict_prob(sent):
    if isinstance(sent,list):
        out = model.predict(sent,predict_proba=True)
        return(out)
    elif isinstance(sent,str):
        out = model.predict([sent],predict_proba=True)
        return(out)
    else:
        return("Some ERRORRRR")

GladiatorX · 2020-12-31T04:49:01Z

Hello,
I am too facing similar issue.
So I am working on text classification setup where my function predictSentiment outputs prob. distribution for 3 classes.

Here a sample output value of predictSentiment is [[7.97884561e-01 1.07981933e-01 2.67660452e-04]] & is of array datatype; similar to scikit learns predict_prob() function.

def predictSentiment(sentence):
	# code to predict sentiment score
	return output

explainer = lime.lime_text.LimeTextExplainer(
    split_expression=lambda s: re.split(r'\W+', s),
    class_names=["NEGATIVE", "NEUTRAL","POSITIVE"]
)

exp = explainer.explain_instance(
    sentence,# The review to explain
    classifier_fn= predictSentiment,
    top_labels=1,
    num_features=4, 
    num_samples=1000    
)

My Error: Found input variables with inconsistent numbers of samples: [1000, 1]
kindly help. Thanks !

marcotcr · 2021-01-12T16:07:57Z

predictSentiment should take as input a list of sentences rather than a single sentence.

deweihu96 · 2021-02-26T03:17:06Z

Got a similar issue. I have a binary text classification model with PyTorch. When I set num_samples as 1, everything is ok. Except the model could learn nothing, and the weights of features are all zero. When I set num_samples as default, I got this error: "ValueError: Found input variables with inconsistent numbers of samples", no matter how I tune it (e.g. set to 2, 32 or 500, 1000....).

My code is here, could anyone help me? Thanks in advance: )
https://github.com/DeweiHu66666/Share/blob/main/caml_lime.ipynb

marcotcr closed this as completed Nov 16, 2016

olgakravchenko mentioned this issue Aug 5, 2019

Is there an explanation for XGBoost regression tasks available? #371

Closed

shyamrallapalli mentioned this issue Aug 28, 2019

Your model needs to output single-dimensional numpyarrays, not arrays of (5000, 1) dimensions #377

Closed

SelComputas mentioned this issue Jun 2, 2020

Warning and suboptimal outputs #469

Closed

another1s mentioned this issue May 9, 2022

What is explain_instance() "text" argument supposed to be? #671

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Found input variables with inconsistent numbers of samples: [5000, 1] #35

Found input variables with inconsistent numbers of samples: [5000, 1] #35

courageon commented Nov 2, 2016

marcotcr commented Nov 3, 2016

marcotcr commented Nov 12, 2016

nikodrum commented Nov 16, 2016

courageon commented Nov 16, 2016

marcotcr commented Nov 16, 2016

DianeBouchacourt commented Jul 26, 2019

DianeBouchacourt commented Jul 26, 2019

whyisyoung commented Oct 10, 2019

craigmassie commented Nov 24, 2019

sheisjw commented Jan 14, 2020 •

edited

elliottash commented Feb 3, 2020 •

edited

tharix commented Sep 22, 2020 •

edited

13Ashu commented Dec 3, 2020 •

edited

GladiatorX commented Dec 31, 2020

marcotcr commented Jan 12, 2021

deweihu96 commented Feb 26, 2021

Found input variables with inconsistent numbers of samples: [5000, 1] #35

Found input variables with inconsistent numbers of samples: [5000, 1] #35

Comments

courageon commented Nov 2, 2016

marcotcr commented Nov 3, 2016

marcotcr commented Nov 12, 2016

nikodrum commented Nov 16, 2016

courageon commented Nov 16, 2016

marcotcr commented Nov 16, 2016

DianeBouchacourt commented Jul 26, 2019

DianeBouchacourt commented Jul 26, 2019

whyisyoung commented Oct 10, 2019

craigmassie commented Nov 24, 2019

sheisjw commented Jan 14, 2020 • edited

elliottash commented Feb 3, 2020 • edited

tharix commented Sep 22, 2020 • edited

13Ashu commented Dec 3, 2020 • edited

GladiatorX commented Dec 31, 2020

marcotcr commented Jan 12, 2021

deweihu96 commented Feb 26, 2021

sheisjw commented Jan 14, 2020 •

edited

elliottash commented Feb 3, 2020 •

edited

tharix commented Sep 22, 2020 •

edited

13Ashu commented Dec 3, 2020 •

edited