Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all() #5

Closed
neerajnj10 opened this issue Aug 10, 2020 · 14 comments

Comments

@neerajnj10
Copy link

I keep getting this error for any and all ind of model running. I even tried running the example dataset- Titanic, even that ended up throwing same error.
Is there any update in process because of which this is happening, because it explainer dashboard was working fine until 2 days ago. I would appreciate a quick response. :)
This is really cool implementation and is very useful in my current work environment, thank you very much for working on this.

@oegedijk
Copy link
Owner

Hi Neeraj,

Do you have the full stack trace? Where exactly does it throw the error? From the error itself it seems that somewhere in the code where I expect a type that resolves to a bool, I get a dataframe instead.

I didn't make any recent releases, so I would guess it's probably is an incompatibility with your environment? Did you update any packages in the last two days? What OS are you using? What kind of model are you using (and what version?)? Version of shap?

But happy that overall you find it useful! Let's get this error fixed! :)

@neerajnj10
Copy link
Author

neerajnj10 commented Aug 10, 2020

Hi Oege,
Wow, thank you for replying so fast, I was not expecting it.
image

Attached is the error I get when I run the titanic example.
I have run individual components of the explainer in the jupyter notebook and they work fine, but when I call explainer dashboard is when this is thrown,



Below is the error when I run explainer on my dataset. It is a binary classification model and I am using lightgbm for that purpose.

the explainer object has no decision_trees property. so setting decision_trees=False...:
ValueError Traceback (most recent call last)
in
----> 1 ExplainerDashboard(explainer, mode='inline').run(8052)

~\Anaconda3\lib\site-packages\explainerdashboard\dashboards.py in init(self, explainer, tabs, title, hide_header, header_hide_title, header_hide_selector, block_selector_callbacks, pos_label, fluid, mode, width, height, external_stylesheets, server, url_base_pathname, importances, model_summary, contributions, shap_dependence, shap_interaction, decision_trees, **kwargs)
364 block_selector_callbacks=block_selector_callbacks,
365 pos_label=pos_label,
--> 366 fluid=fluid)
367 else:
368 tabs = self._convert_str_tabs(tabs)

~\Anaconda3\lib\site-packages\explainerdashboard\dashboards.py in init(self, explainer, tabs, title, hide_title, hide_selector, block_selector_callbacks, pos_label, fluid, **kwargs)
104
105 self.selector = PosLabelSelector(explainer, pos_label=pos_label)
--> 106 self.tabs = [instantiate_component(tab, explainer, **kwargs) for tab in tabs]
107 assert len(self.tabs) > 0, 'When passing a list to tabs, need to pass at least one valid tab!'
108

~\Anaconda3\lib\site-packages\explainerdashboard\dashboards.py in (.0)
104
105 self.selector = PosLabelSelector(explainer, pos_label=pos_label)
--> 106 self.tabs = [instantiate_component(tab, explainer, **kwargs) for tab in tabs]
107 assert len(self.tabs) > 0, 'When passing a list to tabs, need to pass at least one valid tab!'
108

~\Anaconda3\lib\site-packages\explainerdashboard\dashboards.py in instantiate_component(component, explainer, **kwargs)
48
49 if inspect.isclass(component) and issubclass(component, ExplainerComponent):
---> 50 return component(explainer, **kwargs)
51 elif isinstance(component, ExplainerComponent):
52 return component

~\Anaconda3\lib\site-packages\explainerdashboard\dashboard_tabs.py in init(self, explainer, title, name, hide_selector, importance_type, depth, cats)
38
39 self.importances = ImportancesComponent(explainer, hide_selector=hide_selector,
---> 40 importance_type=importance_type, depth=depth, cats=cats)
41
42 self.register_components(self.importances)

~\Anaconda3\lib\site-packages\explainerdashboard\dashboard_components\overview_components.py in init(self, explainer, title, name, hide_type, hide_depth, hide_cats, hide_title, hide_selector, pos_label, importance_type, depth, cats)
140 self.hide_title = hide_title
141 self.hide_selector = hide_selector
--> 142 if self.explainer.cats is None or not self.explainer.cats:
143 self.hide_cats = True
144

~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in nonzero(self)
2148 def nonzero(self):
2149 raise ValueError(
-> 2150 f"The truth value of a {type(self).name} is ambiguous. "
2151 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
2152 )

ValueError: The truth value of a Index is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().


I am working in jupyter notebook, using python 3.7, on windows machine, I installed the package last week Monday, ran a sample test and worked fine, but yesterday it stopped working, so I uninstalled and installed it back again, it did not fix it obviously.

@oegedijk
Copy link
Owner

Hi Neeraj,

The first error could be related to running an old version of dash (the line self.server.register_blueprint( is line 402 in the current dash version: https://github.com/plotly/dash/blob/dev/dash/dash.py, but seems to be line 154 in your version)

When the server parameter is kept to the default (server=True) then dash.Dash() should instantiate a flask app in line 279: self.server = flask.Flask(name) if server else None, in which case self.server is no longer bool.

So not sure what's going in your case, but I guess it's an old version. So could you pip install -U dash and see if it helps?

How did you construct the explainer for the second error? explainer.cats should be a list of strings that you passed to the constructor. But from the error it seems that in your example explainer.cats is either a pd.Series or a pd.DataFrame?

@neerajnj10
Copy link
Author

Hi Oege,

Thanks again!
Indeed, installing dash resolved the issue for the first example, and you were right, I actually did the pd.Series instead, and did not check the exact format that was needed, passing it in the form of list of strings, resolved it. Thank you so very much!

I have one more question though, so when we use LIME explainer, then it needs data in certain format, for example, when we before label encoding the categories, it needs dictionary of those key value label encodes to be passed for it determine correctly, if "sex" is category and if yes, 1- means Male.
Do we need something like that in this case as well? does train or test set need to be np.array format, or it does not matter.

In titanic example, the data seem to have one-hot encoding done categories, do we need to do that in all cases.?

PS- also do you know how to share the dashboard quickly, is it supposed to be depoyed on heroku or something?
Thank you for responding so fast!

Best,
Neeraj

@oegedijk
Copy link
Owner

Yeah, the cats parameter assumes that you have already onehot-encoded your variables with underscores(varname_category), e.g. sex_male, sex_female, etc, and then autodetects the categories.

In order to share the dashboard you need to deploy it somewhere. You should talk to IT within your organization to see if they have a server available to host it. The deployment section of the docs give some info on how to do it. Or otherwise the dash deployment documentation.

You'd probably also want to think of adding some authentication of some kind (will probably add this into the package in the near future as well): https://dash.plotly.com/authentication

@hkoppen
Copy link

hkoppen commented Jan 27, 2021

I have a simple dataset & a SVR. However, explainer = RegressionExplainer(model, X_test, pd.Series(y_test)) yields

ValueError                                Traceback (most recent call last)
<ipython-input-14-d5cec15fc8f7> in <module>
----> 1 explainer = RegressionExplainer(model, X_test, pd.Series(y_test))

c:\users\...\pandas\core\series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    237             name = ibase.maybe_extract_name(name, data, type(self))
    238 
--> 239             if is_empty_data(data) and dtype is None:
    240                 # gh-17261
    241                 warnings.warn(

c:\users\...\pandas\core\construction.py in is_empty_data(data)
    626     is_none = data is None
    627     is_list_like_without_dtype = is_list_like(data) and not hasattr(data, "dtype")
--> 628     is_simple_empty = is_list_like_without_dtype and not data
    629     return is_none or is_simple_empty
    630 

c:\users\...\pandas\core\generic.py in __nonzero__(self)
   1438     def __nonzero__(self):
   1439         raise ValueError(
-> 1440             f"The truth value of a {type(self).__name__} is ambiguous. "
   1441             "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
   1442         )

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

What does it mean, what could be the problem?

@oegedijk
Copy link
Owner

oegedijk commented Jan 27, 2021

Do you have a deeper stacktrace? Not clear to me where this error actually originates... does it happen when you just wrap y_test in a pd.Series: pd.Series(y_test)?

@oegedijk oegedijk reopened this Jan 27, 2021
@hkoppen
Copy link

hkoppen commented Jan 27, 2021

That's everything. Without the wrap (X and y are pd.read_csv()) it is

ValueError                                Traceback (most recent call last)
<ipython-input-4-29b3018c6d0d> in <module>
      1 # Generate explainer object
----> 2 explainer = RegressionExplainer(model, X_test, y_test, cats=['Kurs', 'FTF', 'Wochentag'])

c:\users\...\explainerdashboard\explainers.py in __init__(self, model, X, y, permutation_metric, shap, X_background, model_output, cats, idxs, index_name, target, descriptions, n_jobs, permutation_cv, na_fill, precision, units)
   2451                             shap, X_background, model_output,
   2452                             cats, idxs, index_name, target, descriptions,
-> 2453                             n_jobs, permutation_cv, na_fill, precision)
   2454 
   2455         self._params_dict = {**self._params_dict, **dict(units=units)}

c:\users\...\explainerdashboard\explainers.py in __init__(self, model, X, y, permutation_metric, shap, X_background, model_output, cats, idxs, index_name, target, descriptions, n_jobs, permutation_cv, na_fill, precision)
    160 
    161         if y is not None:
--> 162             self.y = pd.Series(y).astype(precision)
    163             self.y_missing = False
    164         else:

c:\users\...\pandas\core\series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    229             name = ibase.maybe_extract_name(name, data, type(self))
    230 
--> 231             if is_empty_data(data) and dtype is None:
    232                 # gh-17261
    233                 warnings.warn(

c:\users\...\pandas\core\construction.py in is_empty_data(data)
    589     is_none = data is None
    590     is_list_like_without_dtype = is_list_like(data) and not hasattr(data, "dtype")
--> 591     is_simple_empty = is_list_like_without_dtype and not data
    592     return is_none or is_simple_empty
    593 

c:\users\...\pandas\core\generic.py in __nonzero__(self)
   1325     def __nonzero__(self):
   1326         raise ValueError(
-> 1327             f"The truth value of a {type(self).__name__} is ambiguous. "
   1328             "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
   1329         )

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

@oegedijk
Copy link
Owner

ah, so the issue is in the line self.y = pd.Series(y).astype(precision). The precision parameter is set to 'float64' by default, but can be set to 'float32' to save on memory use. (The 0.3 release is all about saving on memory usage in production). But clearly there is something about your y_test that does not allow it to be case as a float64 dtype.

Are there any nan's in your y? What is the dtype?

@hkoppen
Copy link

hkoppen commented Jan 27, 2021

No nan's.

y_test.dtypes returns float64 only. pd.Series(y_test).dtypes throws the same ambiguity error...

@hkoppen
Copy link

hkoppen commented Jan 27, 2021

... ah, hence I have to use np.array(y_test)[:,0].

@oegedijk
Copy link
Owner

Ah, so you y_test, was not one dimensional? Usually you would get a dimensionality error though:

pd.Series(np.ones((1, 10))).astype('float32')

---------------------------------------------------------------------------
ValueError: Data must be 1-dimensional

Is there something about your input data that I could autodetect and then correct for?

@hkoppen
Copy link

hkoppen commented Jan 28, 2021

It's a dataframe of shape (1000, 1) i.e. np.array interprets it as 1000x1-matrix. Maybe pandas.DataFrame.squeeze is the way to go here?

@oegedijk
Copy link
Owner

hmm. I could just put in an assertion assert isinstance(y, pd.Series) or isinstance(y, np.ndarray) or isinstance(y, list)

Or just wrap it in a try ... except block and give a more useful error message...

@oegedijk oegedijk closed this as completed Feb 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants