Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding plot functionality to model_performance #34

Closed
patri01u opened this issue Jul 29, 2018 · 4 comments

Comments

Projects
None yet
3 participants
@patri01u
Copy link

commented Jul 29, 2018

When plotting from model_performance function, would it be possible to add functionality to limit x-axis values, as well as facets by some model factors to try to drill down into specific factors that drive the overall residuals?

Apologies in advance if these functionalities already exists. #Beginnerhere

@pbiecek

This comment has been minimized.

Copy link
Owner

commented Jul 31, 2018

Result from the plot function is an ggplot2 object, thus you can use xlim() ylim() or other function to zoom in some part of the plot.

But I see your point, that for model_performance it would be nice to have annotations for top k residuals. Will add to TODO

@pbiecek pbiecek added the enhancement label Jul 31, 2018

pbiecek added a commit that referenced this issue Aug 5, 2018

@pbiecek

This comment has been minimized.

Copy link
Owner

commented Aug 5, 2018

I've added a show_outliers parameter to plot.model_performance(), now you can plot names of points with largest residuals.
See an example here: https://pbiecek.github.io/DALEX/reference/plot.model_performance_explainer.html

@12tafran

This comment has been minimized.

Copy link
Contributor

commented Oct 1, 2018

Would it be more meaningful to have the names of points with largest residuals to correspond to the observed data row index? This will make it easier for users to identify which observation has the worst prediction.

Using https://pbiecek.github.io/DALEX/reference/plot.model_performance_explainer.html as an example. We gave the largest residual in the glm model a name of 100,110. This is very confusing to the users since the validation dataset only have 14,999 row

Would it be possible to add an index column in the model_performace() output so the boxplot can use that index number when identifying largest residual instead of the number now?

@pbiecek

This comment has been minimized.

Copy link
Owner

commented Oct 1, 2018

Makes sense, will add option to select if rownames of row indexes should be presented

12tafran pushed a commit to 12tafran/DALEX that referenced this issue Oct 1, 2018

12tafran pushed a commit to 12tafran/DALEX that referenced this issue Oct 1, 2018

pbiecek added a commit that referenced this issue Oct 7, 2018

Merge pull request #47 from 12tafran/develop
Adding plot functionality to model_performance #34

@pbiecek pbiecek closed this Dec 18, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.