Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index Dropdown has a bug for large no. of values #254

Closed
AkshayRShiraguppi opened this issue Feb 10, 2023 · 10 comments
Closed

Index Dropdown has a bug for large no. of values #254

AkshayRShiraguppi opened this issue Feb 10, 2023 · 10 comments

Comments

@AkshayRShiraguppi
Copy link

I am not able to type and search specific value in search box in 'whatif' and 'Individual prediction' tabs to choose an index.(When typed a specific number, it just shows wrong values in drop down and clears what i typed) (Random button seems to work fine)

Drop down doesn't work if there is a lot of index values(~2k). Selecting one index in it doesn't accurately filter to the right id.
@oegedijk

@AkshayRShiraguppi AkshayRShiraguppi changed the title Index Dropdown has a big for large no. of values Index Dropdown has a bug for large no. of values Feb 10, 2023
@oegedijk
Copy link
Owner

oegedijk commented Feb 12, 2023

Hi @AkshayRShiraguppi, when there are a large number of indexes (by default >1000) the dashboard defaults to server side index filtering, as it would become to heavy to store all the indexes client inside inside the browser for each index dropdown.

But apparently this is is not working well for you? you can configure the behaviour with the parameter max_idxs_in_dropdown:int=1000, which you can e.g. pass to ExplainerDashboard(explainer, max_idx_in_dropdown=10_000).run().

What do your indexes look like though that the server side filtering doesn't seem to work? Do you have some examples?

Does it work with the titanic examples (such as in the README) if you set e.g. max_idx_in_dropdown=10?

@AkshayRShiraguppi
Copy link
Author

AkshayRShiraguppi commented Feb 13, 2023

Hi @oegedijk ,
Thanks for getting back. Here is the gif screenshot. Index is just default index of the dataframe. Yet you can see it doesn't behave as expected. I hosted this dashboard by passing max_idxs_in_dropdown:int=10000 to ExplainerDashboard().
Doesn't work with or without server side filtering.

Titanic example works fine. I presumed it would since it was tested on it.

Code:


explainer = ClassifierExplainer(loaded_model, X_test2, y_test_pred,cats=['catvariables'],labels=['Not Cancelling', 'Cancelling'])

explainer.dump("explainer.joblib")

explainer = ClassifierExplainer.from_file("explainer.joblib")


db =ExplainerDashboard(explainer,
                       max_idx_in_dropdown=10000
                   ,bootstrap=SLATE
                   ,header_hide_selector=True
                   ,hide_poweredby =1
                  #,mode='inline'
                   ,target  =['Cancelled']
                   ,model_summary=False
                  ).run(use_waitress=True)

@oegedijk
Copy link
Owner

Hmm, that is really strange. First I thought it was maybe due to the numeric indexes, but e.g. this seems to work just fine:

from sklearn.ensemble import RandomForestClassifier
from explainerdashboard import ClassifierExplainer, ExplainerDashboard
from explainerdashboard.datasets import titanic_survive

X_train, y_train, X_test, y_test = titanic_survive()
model = RandomForestClassifier().fit(X_train, y_train)
explainer = ClassifierExplainer(
    model, X_test.reset_index(drop=True), y_test, 
    cats=['Sex', 'Deck', 'Embarked'],
    labels=['Not survived', 'Survived']
)

ExplainerDashboard(explainer, max_idx_in_dropdown=10).run()

Can you reproduce it with a dataset that I could use as well?

@AkshayRShiraguppi
Copy link
Author

AkshayRShiraguppi commented Feb 14, 2023

It definitely has to do something with server side filtering I assume. As the sweet spot is 1000 records.
Drop down and selection works fine with 1000 records. Anything more than 1000 is causing the issue.

I think the parameter max_idx_in_dropdown isn't really working to stop server side filtering.
I had 1005 records and used

ExplainerDashboard(explainer, max_idx_in_dropdown=1005).run()

But, i still have the same issue.

@AkshayRShiraguppi
Copy link
Author

AkshayRShiraguppi commented Feb 14, 2023

Here is the code that can replicate the issue. Its the same titanic code, I just made the rows to duplicate and made it to 2000 records. @oegedijk

from sklearn.ensemble import RandomForestClassifier
from explainerdashboard import ClassifierExplainer, ExplainerDashboard
from explainerdashboard.datasets import titanic_survive, titanic_names

feature_descriptions = {
    "Sex": "Gender of passenger",
    "Gender": "Gender of passenger",
    "Deck": "The deck the passenger had their cabin on",
    "PassengerClass": "The class of the ticket: 1st, 2nd or 3rd class",
    "Fare": "The amount of money people paid", 
    "Embarked": "the port where the passenger boarded the Titanic. Either Southampton, Cherbourg or Queenstown",
    "Age": "Age of the passenger",
    "No_of_siblings_plus_spouses_on_board": "The sum of the number of siblings plus the number of spouses on board",
    "No_of_parents_plus_children_on_board" : "The sum of the number of parents plus the number of children on board",
}

X_train, y_train, X_test, y_test = titanic_survive()
train_names, test_names = titanic_names()
model = RandomForestClassifier(n_estimators=50, max_depth=5)
model.fit(X_train, y_train)

import pandas as pd
import numpy as np
X_test1 = pd.DataFrame(np.repeat(X_test.values, 10, axis=0))
X_test1.columns = X_test.columns
y_test1=y_test.repeat(10)


explainer = ClassifierExplainer(model, X_test1, y_test1, 
                                cats=['Deck', 'Embarked',
                                    {'Gender': ['Sex_male', 'Sex_female', 'Sex_nan']}],
                                cats_notencoded={'Embarked': 'Stowaway'}, # defaults to 'NOT_ENCODED'
                                descriptions=feature_descriptions, # adds a table and hover labels to dashboard
                                labels=['Not survived', 'Survived'], # defaults to ['0', '1', etc]
                                #idxs = test_names, # defaults to X.index
                                index_name = "Passenger", # defaults to X.index.name
                                target = "Survival", # defaults to y.name
                                )

db = ExplainerDashboard(explainer, max_idx_in_dropdown=2000,
                        title="Titanic Explainer", # defaults to "Model Explainer"
                        shap_interaction=False # you can switch off tabs with bools
                        #, mode='inline'
                       ).run()


@oegedijk
Copy link
Owner

Okay, I haven't solved it yet, but at least isolated it: for some reason updating the dropdown options based on. the search value triggers the random index button callback for some unknown reason. That's why you get the weird behaviour.

Why it triggers it is still very mysterious to me though. Will investigate more later

@AkshayRShiraguppi
Copy link
Author

Thanks. By the way awesome job on creating this explainer dashboard. It's helping us a lot to study and showcase model explainability. Would be better if there were lightgbm trees and new shapley plots.

For now,
I will have to create two dashboards and have 1k records in one and remaining 1k in other to make it work.

@oegedijk
Copy link
Owner

update: managed to fix the problem with the dropdown search (also opened an issue with dash here: plotly/dash#2428), but still running into problems properly propagating the index to other connected components. This is weirdly stochastic, sometimes it works sometimes it doesn't.

I think it is probably related to the first issue, but haven't been able to fix it yet...

@oegedijk
Copy link
Owner

So this seems to be a dash issue that got introduced in a recent version (see e.g. plotly/dash#2411), but I think installing pip install dash==2.6.2 should fix the issue? Could you try and confirm?

@AkshayRShiraguppi
Copy link
Author

Yes. Thank you. That fixes the ids getting picked at random.

It still has the issue with dropdown button not listing all ids. But, the parameter max_idx_in_dropdown helped to fix that issue.

Thank you for your valuable time in debugging this. Appreciate it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants