-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tune XGBoost with tidymodels and #TidyTuesday beach volleyball | Julia Silge #9
Comments
Hi, Julia! Thank you so much to your wonderful tidymodelsseries. It is very informative and impressive. Nice job! For this XgBoost tuning blog, I found a wired result for the ROC curve part. Everything except ROC curve works well. I got the same accuracy and AUC like yours. But my ROC curve is flipped along with the diagonal. It is really wired. Since my curve is below the diagonal, the AUC should be less than 1/2 by definition. However, my AUC is the same as yours. Is it possible that something wrong with roc_curve function? The version of yardstick I am using is 0.0.7. Thank you in advance. |
Yes, since I published this blog post, there was a change in yardstick (in version 0.0.7) that changed how to choose which level (win or lose) is the "event". You can change this by using the |
Got it. Thank you. |
Hi Julie, Great tutorial. Thank you for your support. I am facing two problems;
Error: The number of levels in
Appreciate your time. |
@Mr-Hadoop-Hotshot it sounds like something has gone a bit wrong somewhere in predictions, maybe some The output of |
Hi Julia, Thank you for your reply. Other tutorials also excellent as always. But, the 1st one remains the same. a. Actually my original Any suggestions on this? Note, all Appreciate your time. |
@Mr-Hadoop-Hotshot Ah gotcha, I would go back to the very beginning and make sure that your initial data set only has two levels in your outcome; this sounds like however you are trying to filter and remove a level is not working. If you would like somewhere to ask for help, I recommend RStudio Community; be sure to create a reprex showing the problem. |
Hi @juliasilge Yeah sure I tried that,. Just wanted to let you know, your blog is full of quality information. Thank you once again. |
Check out this chapter of my book on text mining for info on sentiment analysis. |
Hey, this book was recommended by UT, Austin when I was doing my PG program in data science and business analytics. |
Hola Julia, muchas gracias por compartir tu trabajo esta muy bueno soy seguidor tuyo me gusta mucho las pausas que tienes al explicar cada detalle de los codigos, excelente eres muy guapa |
Thanks for the tutorial! I wonder why we create |
@graco-roza I think I discuss this in the video, but the main idea there is to demonstrate how to |
Hi @juliasilge Hey I recently started to encounter a problem with executing ERROR MESSAGE : R Session Aborted. R encountered a fatal error. Tried running that code line in console window directly and R throws the same error back. Any suggestions on this issue? Appreciate your time. |
@Mr-Hadoop-Hotshot Hmmm, most things are working well on R 4.1.0 but we have run into a few small issues so far that we've needed to fix. I can't tell from just this what the problem might be. Can you create a reprex and post it with the details of your problem on RStudio Community? I think that will be the best way to find the solution. |
Hey Julia, thank you very much for amazing work! I am a fresh Big data student, I want use these codes in my project , however I already split, and balanced my data for other models I did. For the purpose of the project I want to continue with the same split. Is there any way I can put my prepared data in that split functions? |
@canlikala Yes, you can use existing training/testing splits in tidymodels; you will need to create your own This case study shows how we treat a validation set as a single set of resampling. |
Hi Julia, |
@martinocrippa We don't currently make use of the linear booster in parsnip but we are tracking interest in that feature here. If you would like to either add a 👍 or add any helpful context for your use case there, that would be great. |
ok, thank you very much
have a nice day
Il giorno ven 18 giu 2021 alle ore 20:17 Julia Silge <
***@***.***> ha scritto:
… @martinocrippa <https://github.com/martinocrippa> We don't currently make
use of the linear booster in parsnip but we are tracking interest in that
feature here <tidymodels/parsnip#118>. If you
would like to either add a 👍 or add any helpful context for your use
case there, that would be great.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHK6UAK4Z2KAZMTUDHTMS6DTTOETXANCNFSM4Y4KDWZA>
.
|
Dear Julia, I get this following error "Error: The provided |
Sorry.. I found the reason: I forgot to set my ´trees = 1000´ Nw it works. However I get this error in my XGBoost tuning "Fold01, Repeat1: preprocessor 1/1, model 30/30: Error: The option ! Fold01, Repeat1: internal: A correlation computation is required, but x Fold02, Repeat1: preprocessor 1/1, model 2/30: Error: The option Anyone having experience with this? |
Thanks for this great example. I have a question. In this example you are using XGBoost in a classification model and you naturally evaluate model performance in the end with a ROC curve. My question is: What kind of model performance would you use for the case where XGBoost is used in regression? |
@kamaulindhardt You can check out metrics that are appropriate for regression, and see some example ways to evaluate regression models in this chapter. |
Dear Julia and all, I have one problem which i could not solve, namely i need to get variable importance values. I need them to be in exact numbers and not only in the plot. can you please be so kind and guide me in this issue? Kind regards |
@TotorosForest You can use the |
Dear Julia! mm_final_xgb %>% i hope i have not written "hubble bublle" code :) My goal is to select some variables from 10 variables that are examined (8 variables are ordinal, 2 variables are binary). What would you recommend as a cutoff coefficient in case you would want to select only few of these 10? Moreover, what is this importance value? Is it information gain value, gini idex? regression coefficients? How would i call them in the report? Thank you. |
@TotorosForest You can look here at the |
Dear all, "It’s time to go back to the testing set! Let’s use last_fit() to fit our model one last time on the training data and evaluate our model one last time on the testing set. Notice that this is the first time we have used the testing data during this whole modeling analysis. final_res <- last_fit(final_xgb, vb_split)" My question: as we aim is to test the results in the testing set, should not the data file be "vb_test" instead of "vb_split"? As i understand vb_split is the result of initial partition of the data 75 % / 25 %. and if we want to test on the test set, should not we choose "vb_test" ? Thank you for understanding of my confusion. Kind regards, |
@TotorosForest You can check out the documentation for |
Hi Julia, |
I had many 0s in the data, it's running now but the tune_grid is taking so lung ~12 hours and still running, I am wondering if this is normal? |
@SamiFarashi I would say generally no, but it's hard to say without other information. If you are looking at a very long-running model, I recommend starting out with very few tuning parameters, few resamples or a subset of your data, and then scaling up to achieve the best model in a reasonable timeframe. If you can describe your situation in more detail, I recommend posting on RStudio Community, which is a great forum for getting help with these kinds of modeling questions. |
Great post and package! Thanks so much! |
Hi Julia, |
Several of the models in tidymodels support multiclass classification! You can see some of them here, but also some models support this natively, like ranger. |
Thank you. Does that mean that xgboost as included in tidymodels does not support multi class classification? I have seen examples where num_class is set along with other params, e.g. with objective = "multi:softprob". |
@MonkeyCousin xgboost does support multiclass, yep. You can see an example here. |
Hi Julia, Thanks for this tutorial! When I run this with an XGBoost regression on my own data, everything works! However, the default model (setting Any idea if this is common? I'm wondering because I plan to implement this tuning step in many other areas of my code. If relevant, I did choose the best parameters based on "rsq" rathern than "RMSE" (which seem to be the choices for a regression-based xgb compared to "auc" in the classification version". |
@wcwr Take a look at this chapter to understand what might be happening by optimizing |
Hi Julia, In the Looked for this info in the Thanks for the wonderful tutorial! |
@wcwr The |
Hi Julia, Thanks for the blog post and all your videos! How can you assess accuracy comparisons between train and test sets from the |
@jlecornu3 We don't recommend measuring model performance using the training set as a whole for the reasons outlined in this section and there purposefully isn't fluent tooling in tidymodels to do so using a final tuned model. However, if you look at this blog post, the metrics you see with |
So do you feel this |
@jlecornu3 Ah, maybe I misunderstood what you were asking. In this blog post:
You might want to check out this chapter on "spending your data budget" and how to use the training set vs. test set, as well as how |
Thanks Julia -- super clear! |
hi julia i know this is not the appropriate place to ask this question but i am trying to use mlflow in rstudio and i always faced this error and i did not find any solution : |
@mohamedelhilaltek I recommend that you create a reprex (a minimal reproducible example) showing what you want to do and any problems you run into with it, then posting on Posit Community. I know there aren't a ton of mlflow users but generally it's a great forum for getting help with these kinds of questions. Good luck! 🙌 |
hi julia i have s regression problem where the target variable is influenced by zero more than 50 percent how can i do this with xgboost is there any step |
@Hamza-Gouaref Hmmmm, if you had counts with a lot of zeroes, I would suggest that you use zero-inflated Poisson, like in this post. Can you formulate it as a Poisson problem? That would be my mine suggestion. |
Hi Julia, thanks for the great info! Very useful! One question, I've been trying, unsuccessfully, to create a couple of partial dependence plots for your example (for both numeric and categorical predictors). I think its because I'm very unfamiliar with the tidyverse approach to predictive modeling and how/were objects are located. Could you direct me to a source that might be helpful (or short code example)? I've been trying to use pdp and the DALEXtra packages. Thanks very much, Joe |
@retzerjj Check out this chapter of our book that shows how to make partial dependence plots with DALEXtra. If you are wanting to figure out how to pull out various components of a tidymodels workflow, check out these methods, which can help you extract out the workflow, the parsnip model, the underlying engine model, and so forth. |
Thank you for the great video and help. My question is about the vip package to see the variable importance. When I try to install the package I get the error message, "package 'vip' is not available for this version of R". I'm using 4.2.2. Has vip been placed by another package? Thanks. |
@mjwera Ooooof, looks like it was archived from CRAN. You can read about their plans here and in the meantime you can install from GitHub. |
@mjwera apologies, looks like vip was orphaned for some failed tests from some of the last changes we made, but we never got the warning! Should be back up and running soon! |
Thank you!
…On Fri, Aug 18, 2023 at 2:15 PM Julia Silge ***@***.***> wrote:
@mjwera <https://github.com/mjwera> Ooooof, looks like it was archived
from CRAN <https://cran.r-project.org/package=vip>. You can read about
their plans here <koalaverse/vip#153> and in
the meantime you can install from GitHub
<https://github.com/koalaverse/vip#installation>.
—
Reply to this email directly, view it on GitHub
<#9 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AZQHFBO5YFMKNNEPRVJP7YLXV65NFANCNFSM4Y4KDWZA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hi Julia, Thank you for another great video! Best wishes, |
@HanLum I am not aware of how to do that, but you might want to create a reprex (a minimal reproducible example) showing what you want to do and any problems you run into with it, then posting on Posit Community. It's a great forum for getting help with these kinds of modeling questions. Good luck! 🙌 |
@juliasilge Thank you for getting back to me so fast! I will give Posit a try :) Thank you! |
Tune XGBoost with tidymodels and #TidyTuesday beach volleyball | Julia Silge
Learn how to tune hyperparameters for an XGBoost classification model to predict wins and losses.
https://juliasilge.com/blog/xgboost-tune-volleyball/
The text was updated successfully, but these errors were encountered: