-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partial dependence plots with tidymodels and DALEX for #TidyTuesday Mario Kart world records | Julia Silge #32
Comments
Thanks for this. Does bootstraps replicate the training data in this case? |
@csetzkorn Yes, that's right. Since we use |
Thanks for the reply. So does this mean that you replicate data to train on? I guess this cannot bias the model? I always thought that may be a great way of tackling the curse of dimensionality ... |
@csetzkorn I'm not totally clear on your question, but you might check this chapter of our book, and especially pay attention to the rsample-to-resample effect; it may be related to your thoughts. |
Hi Julia im learning a lot. Thanks. that should be pretty interesting to see. Thanks! |
@Ji-square That sounds like it might be an interesting Tidy Tuesday dataset; you can suggest it as a possible option here! |
@jcragy You can visit the GitHub issue and unsubscribe; I don't believe it will unsubscribe you if I do anything like delete your comment. Have a good weekend! 🙌 |
Hi Julia, thank you for your tidytuesdays. Have learnt so much already. Could you demonstrate how to create own step functions in one of your next tidytuesday videos? |
@SebastianBehrens That's an interesting idea! You may have already seen this, but I want to make sure you know about this guide we have posted. |
Hi Julia, Can the ggeffects package be used with cases similar to your example here – except I am working on a regression problem with a Random Forest? My dream is to show marginal effect as "predictions generated by a model when one holds the non-focal variables constant and varies the focal variable(s)." |
@kamaulindhardt I would think so since it uses |
Hi Julia, Thanks |
...continuation |
@venkatpgi The |
Thanks Julia for the as usual prompt reply of yours and I could understand I am somewhat stuck again... Step 1: Created model spec rf_tune_spec_full <- rand_forest( Step 2: Created workflow rf_tune_wf_full <- Step 3: Tuned hyperparameters using tune_grid() tune_res <- tune_grid( Step 4: Picked the best auc best_auc_rf <- select_best(tune_res, "roc_auc") Step 5: Finalised the model final_rf_model <- Step 6: Finalised the workflow with the final model final_rf_wf <- workflow() %>% Step 7: Last fit final_rf_res <- final_rf_wf %>% Step 8: Extracted the workflow (as advised by you) rf_fitted <- final_rf_res %>% extract_workflow() Step 9: Created model explainer library(DALEXtra) #nupe_train is my training data frame and "mort_24h" is the binary outcome variable rf_explainer <- explain_tidymodels( rf_breakdown <- predict_parts( GETTING AN ERROR Error: Can't subset columns that don't exist. x Column I did check the df once again. This variable is in the df. Don't know why I get the error. Any thoughts? PS: I don't know how to embed codes as you have done....apologies for that |
@venkatpgi Can you create a reprex (a minimal reproducible example) for this? The goal of a reprex is to make it easier for us to recreate your problem so that we can understand it and/or fix it. Once you have a reprex, it is best to post on a more public forum like RStudio Community so more folks can see and respond to your problem. If you've never heard of a reprex before, you may want to start with the tidyverse.org help page. You may already have reprex installed (it comes with the tidyverse package), but if not you can install it with: install.packages("reprex") Thanks! 🙌 |
Hi Julia! Thanks for a thorough explanation on PDP with tidymodels! If you were to run multiple predictors in the plot, how would you go about it? Can you input more values to the 'variable'-argument in the model_profile function? |
@guarvid Yep, you can pass in more than one variable to the |
Dear Julia. Thank you for your many great resources! I have followed your approach in this post, using a XGboost model. I am able to create PDP´s for my categorical- and some integer predictors in my data. pdp_21a <- model_profile(explainer,variables = "TC3G21A")
It works fine with a similar predictor, see the summary of the one giving the error, and another that works fine below here:
|
@AndersAstrup Can you create a reprex (a minimal reproducible example) for this? The goal of a reprex is to make it easier for people to recreate your problem so that they can understand it and/or fix it. If you've never heard of a reprex before, you may want to start with the tidyverse.org help page. Once you have a reprex, I recommend posting on RStudio Community, which is a great forum for getting help with these kinds of modeling questions. Thanks! 🙌 |
@juliasilge |
@conlelevn You can read about this topic here in Ch 5 of Feature Engineering and Selection. A very short summary is that using dummy factors with a tree-based model usually gets you the same result, but it takes longer to train. |
I'm killing myself trying to get model_profile to work, but I keep getting the following error:
Any thoughts? I've made sure all my packages are updated, etc. Thanks in advance! |
@ashenkin It looks like you are trying to predict on a If that's not enough to help, I recommend that you create a reprex (a minimal reproducible example) for your problem and post on RStudio Community. It's a great forum for getting help with these kinds of modeling questions. |
Hi Julia, could you explain why we have to use the training data (i.e. mario_train) with the explain_tidymodels, and not the test data? On a separate note, could you please recommend one of your posts with an example using random forest for regression, if you have one? Thank you so much! |
|
Partial dependence plots with tidymodels and DALEX for #TidyTuesday Mario Kart world records | Julia Silge
Tune a decision tree model to predict whether a Mario Kart world record used a shortcut, and explore partial dependence profiles for the world record times.
https://juliasilge.com/blog/mario-kart/
The text was updated successfully, but these errors were encountered: