Index Dropdown has a bug for large no. of values #254

AkshayRShiraguppi · 2023-02-10T01:21:16Z

I am not able to type and search specific value in search box in 'whatif' and 'Individual prediction' tabs to choose an index.(When typed a specific number, it just shows wrong values in drop down and clears what i typed) (Random button seems to work fine)

Drop down doesn't work if there is a lot of index values(~2k). Selecting one index in it doesn't accurately filter to the right id.
@oegedijk

oegedijk · 2023-02-12T13:25:46Z

Hi @AkshayRShiraguppi, when there are a large number of indexes (by default >1000) the dashboard defaults to server side index filtering, as it would become to heavy to store all the indexes client inside inside the browser for each index dropdown.

But apparently this is is not working well for you? you can configure the behaviour with the parameter max_idxs_in_dropdown:int=1000, which you can e.g. pass to ExplainerDashboard(explainer, max_idx_in_dropdown=10_000).run().

What do your indexes look like though that the server side filtering doesn't seem to work? Do you have some examples?

Does it work with the titanic examples (such as in the README) if you set e.g. max_idx_in_dropdown=10?

AkshayRShiraguppi · 2023-02-13T21:48:10Z

Hi @oegedijk ,
Thanks for getting back. Here is the gif screenshot. Index is just default index of the dataframe. Yet you can see it doesn't behave as expected. I hosted this dashboard by passing max_idxs_in_dropdown:int=10000 to ExplainerDashboard().
Doesn't work with or without server side filtering.

Titanic example works fine. I presumed it would since it was tested on it.

Code:


explainer = ClassifierExplainer(loaded_model, X_test2, y_test_pred,cats=['catvariables'],labels=['Not Cancelling', 'Cancelling'])

explainer.dump("explainer.joblib")

explainer = ClassifierExplainer.from_file("explainer.joblib")


db =ExplainerDashboard(explainer,
                       max_idx_in_dropdown=10000
                   ,bootstrap=SLATE
                   ,header_hide_selector=True
                   ,hide_poweredby =1
                  #,mode='inline'
                   ,target  =['Cancelled']
                   ,model_summary=False
                  ).run(use_waitress=True)

oegedijk · 2023-02-14T19:14:44Z

Hmm, that is really strange. First I thought it was maybe due to the numeric indexes, but e.g. this seems to work just fine:

from sklearn.ensemble import RandomForestClassifier
from explainerdashboard import ClassifierExplainer, ExplainerDashboard
from explainerdashboard.datasets import titanic_survive

X_train, y_train, X_test, y_test = titanic_survive()
model = RandomForestClassifier().fit(X_train, y_train)
explainer = ClassifierExplainer(
    model, X_test.reset_index(drop=True), y_test, 
    cats=['Sex', 'Deck', 'Embarked'],
    labels=['Not survived', 'Survived']
)

ExplainerDashboard(explainer, max_idx_in_dropdown=10).run()

Can you reproduce it with a dataset that I could use as well?

AkshayRShiraguppi · 2023-02-14T19:50:39Z

It definitely has to do something with server side filtering I assume. As the sweet spot is 1000 records.
Drop down and selection works fine with 1000 records. Anything more than 1000 is causing the issue.

I think the parameter max_idx_in_dropdown isn't really working to stop server side filtering.
I had 1005 records and used

ExplainerDashboard(explainer, max_idx_in_dropdown=1005).run()

But, i still have the same issue.

AkshayRShiraguppi · 2023-02-14T20:08:46Z

Here is the code that can replicate the issue. Its the same titanic code, I just made the rows to duplicate and made it to 2000 records. @oegedijk

from sklearn.ensemble import RandomForestClassifier
from explainerdashboard import ClassifierExplainer, ExplainerDashboard
from explainerdashboard.datasets import titanic_survive, titanic_names

feature_descriptions = {
    "Sex": "Gender of passenger",
    "Gender": "Gender of passenger",
    "Deck": "The deck the passenger had their cabin on",
    "PassengerClass": "The class of the ticket: 1st, 2nd or 3rd class",
    "Fare": "The amount of money people paid", 
    "Embarked": "the port where the passenger boarded the Titanic. Either Southampton, Cherbourg or Queenstown",
    "Age": "Age of the passenger",
    "No_of_siblings_plus_spouses_on_board": "The sum of the number of siblings plus the number of spouses on board",
    "No_of_parents_plus_children_on_board" : "The sum of the number of parents plus the number of children on board",
}

X_train, y_train, X_test, y_test = titanic_survive()
train_names, test_names = titanic_names()
model = RandomForestClassifier(n_estimators=50, max_depth=5)
model.fit(X_train, y_train)

import pandas as pd
import numpy as np
X_test1 = pd.DataFrame(np.repeat(X_test.values, 10, axis=0))
X_test1.columns = X_test.columns
y_test1=y_test.repeat(10)


explainer = ClassifierExplainer(model, X_test1, y_test1, 
                                cats=['Deck', 'Embarked',
                                    {'Gender': ['Sex_male', 'Sex_female', 'Sex_nan']}],
                                cats_notencoded={'Embarked': 'Stowaway'}, # defaults to 'NOT_ENCODED'
                                descriptions=feature_descriptions, # adds a table and hover labels to dashboard
                                labels=['Not survived', 'Survived'], # defaults to ['0', '1', etc]
                                #idxs = test_names, # defaults to X.index
                                index_name = "Passenger", # defaults to X.index.name
                                target = "Survival", # defaults to y.name
                                )

db = ExplainerDashboard(explainer, max_idx_in_dropdown=2000,
                        title="Titanic Explainer", # defaults to "Model Explainer"
                        shap_interaction=False # you can switch off tabs with bools
                        #, mode='inline'
                       ).run()

oegedijk · 2023-02-15T22:29:37Z

Okay, I haven't solved it yet, but at least isolated it: for some reason updating the dropdown options based on. the search value triggers the random index button callback for some unknown reason. That's why you get the weird behaviour.

Why it triggers it is still very mysterious to me though. Will investigate more later

AkshayRShiraguppi · 2023-02-15T23:05:10Z

Thanks. By the way awesome job on creating this explainer dashboard. It's helping us a lot to study and showcase model explainability. Would be better if there were lightgbm trees and new shapley plots.

For now,
I will have to create two dashboards and have 1k records in one and remaining 1k in other to make it work.

oegedijk · 2023-02-17T12:22:51Z

update: managed to fix the problem with the dropdown search (also opened an issue with dash here: plotly/dash#2428), but still running into problems properly propagating the index to other connected components. This is weirdly stochastic, sometimes it works sometimes it doesn't.

I think it is probably related to the first issue, but haven't been able to fix it yet...

oegedijk · 2023-02-17T19:54:30Z

So this seems to be a dash issue that got introduced in a recent version (see e.g. plotly/dash#2411), but I think installing pip install dash==2.6.2 should fix the issue? Could you try and confirm?

AkshayRShiraguppi · 2023-02-17T20:29:10Z

Yes. Thank you. That fixes the ids getting picked at random.

It still has the issue with dropdown button not listing all ids. But, the parameter max_idx_in_dropdown helped to fix that issue.

Thank you for your valuable time in debugging this. Appreciate it.

AkshayRShiraguppi changed the title ~~Index Dropdown has a big for large no. of values~~ Index Dropdown has a bug for large no. of values Feb 10, 2023

AkshayRShiraguppi closed this as completed Feb 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Index Dropdown has a bug for large no. of values #254

Index Dropdown has a bug for large no. of values #254

AkshayRShiraguppi commented Feb 10, 2023

oegedijk commented Feb 12, 2023 •

edited

Loading

AkshayRShiraguppi commented Feb 13, 2023 •

edited

Loading

oegedijk commented Feb 14, 2023

AkshayRShiraguppi commented Feb 14, 2023 •

edited

Loading

AkshayRShiraguppi commented Feb 14, 2023 •

edited

Loading

oegedijk commented Feb 15, 2023

AkshayRShiraguppi commented Feb 15, 2023

oegedijk commented Feb 17, 2023

oegedijk commented Feb 17, 2023

AkshayRShiraguppi commented Feb 17, 2023

Index Dropdown has a bug for large no. of values #254

Index Dropdown has a bug for large no. of values #254

Comments

AkshayRShiraguppi commented Feb 10, 2023

oegedijk commented Feb 12, 2023 • edited Loading

AkshayRShiraguppi commented Feb 13, 2023 • edited Loading

oegedijk commented Feb 14, 2023

AkshayRShiraguppi commented Feb 14, 2023 • edited Loading

AkshayRShiraguppi commented Feb 14, 2023 • edited Loading

oegedijk commented Feb 15, 2023

AkshayRShiraguppi commented Feb 15, 2023

oegedijk commented Feb 17, 2023

oegedijk commented Feb 17, 2023

AkshayRShiraguppi commented Feb 17, 2023

oegedijk commented Feb 12, 2023 •

edited

Loading

AkshayRShiraguppi commented Feb 13, 2023 •

edited

Loading

AkshayRShiraguppi commented Feb 14, 2023 •

edited

Loading

AkshayRShiraguppi commented Feb 14, 2023 •

edited

Loading