-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feedback from Sebastian on ML notebook #8
Comments
@rasbt, I pinged you on here so you can see how I respond to each point as I work on it. Thank you again for your feedback! |
Regarding the images: I pulled them from another repo that was Public Domain. However, looking at the original sources, it seems that they are not attribution free. I will have to fix that. https://commons.wikimedia.org/wiki/File:Petal-sepal.jpg |
Oh, I see that I was a little bit sloppy yesterday night ... Seems like the sentence "On a side-note, but you probably already now this: Most gradient-based optimization algos" got cut-off. What I wanted to say is even if features are on the same scale (e.g., cm), you still want to standardize the features prior to e.g., gradient descent; makes the learning easier because you have more balanced weight updates. Going into this would be way too much detail for the tutorial, but I would at least mention that people should check their features prior to using ML algos other than tree-based ones. "When you plot the cross-val error, I could also print the standard deviation" I meant "would", not "could" :P But estimating the variance is actually not that trivial, FYI, look at the papers:
|
Isn't it better to plot the distribution? I showed the mean the first couple examples; perhaps I'll just replace those with a distplot. |
I agree that that's a bit more detail than I'd like to go into for this tutorial; I'll leave it to your book to explain that. :-) |
This ties in nicely with #7. I'll add a note to that issue and check this one off. |
Yes, that's probably even better in this context. I suggested the stddev because
followed by the sentence
The info is basically already contained in the plot, but this would maybe be a nice summary statistic. And it is useful in practice too when you are tuning parameters e.g., via k-fold cv or in nested cv using gridsearch, e.g,. as some sort of tie-breaker.
Sure, but I think that it would maybe be more worthwhile for the reader to use a basic decision tree instead of the Random Forest ... the hyper-parameter tuning (tree depth) would be more intuitive I guess. You could print an unpruned tree with good training acc. but bad generalization performance, and then show how you can address this with pruning (max_depth). But this is just a thought :) |
Now we start with a decision tree classifier and build up to a random forest classifier.
Alright, check it out now. It starts with a decision tree classifier then builds up to a random forest. I think this last commit addresses the rest of your points. Please let me know if I missed anything. :-) |
Wow, you seem really determined to turn this IPython notebook into a IPython book :) Haha, if you are not busy enough, I have another batch for you!
And then, you could place a little "arrow" or so under each section header to jump back to the overview
sorry, that's technically not correct, you use shallow trees (aka decision stumps) in boosting, not in bagging & Random Forests. I would maybe introduce it as (of course with nicer wording):
|
Haha... oh dear, what have I gotten myself into? ;-) Good suggestions - I addressed a couple with some quick fixes and will leave the rest for the weekend. |
With great resources (for the next gen data scientists) come great responsibilities! :D |
Alrighty, finally got around to most of these! Thanks again for the feedback. |
Wow looks awesome, and no prob, you are always welcome! Ah, one unfortunate caveat with how the GitHub IPython Nb rendering is implemented is that it doesn't support jumping between section via internal links (yet) -- but the TOC is still useful anyways :). Haha, I may call you Random F. Olson from now on, but there is maybe one little phrase that you can add in to make it technically unambiguous: Instead of "-- each trained on a random subset of the features" -> sth. like "-- each trained on a random subsets of training samples (drawn with replacement) and features (drawn without replacement)" Otherwise people may think that they'd use the "original" training set for each decision tree in the forest. |
Feedback from @rasbt:
The text was updated successfully, but these errors were encountered: