Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

do you have xgboost classifier but not regression? #68

Open
Sandy4321 opened this issue Dec 5, 2019 · 4 comments
Open

do you have xgboost classifier but not regression? #68

Sandy4321 opened this issue Dec 5, 2019 · 4 comments

Comments

@Sandy4321
Copy link

do you have xgboost classifier in
https://github.com/eriklindernoren/ML-From-Scratch/blob/master/mlfromscratch/supervised_learning/xgboost.py

but not regression?

@hcho3
Copy link

hcho3 commented Dec 18, 2019

@Sandy4321 It should suffice to replace LogisticLoss() with a squared error:

class SquaredError():
    def __init__(self):
        pass

    def loss(self, y, y_pred):
        return 0.5 * ((y - y_pred) ** 2)

    # gradient w.r.t y_pred
    def gradient(self, y, y_pred):
        return -(y - y_pred)

    # w.r.t y_pred
    def hess(self, y, y_pred):
        return 1

@Sandy4321
Copy link
Author

Great thanks
Did you coded regularization
As in original xgboost they use derivation for regularization
As they wrote
Model Complexity¶
We have introduced the training step, but wait, there is one important thing, the regularization term! We need to define the complexity of the tree Ω(f). In order to do so, let us first refine the definition of the tree f(x) as

ft(x)=wq(x),w∈RT,q:Rd→{1,2,⋯,T}.
Here w is the vector of scores on leaves, q is a function assigning each data point to the corresponding leaf, and T is the number of leaves. In XGBoost, we define the complexity as

Ω(f)=γT+12λ∑j=1Tw2j
Of course, there is more than one way to define the complexity, but this one works well in practice. The regularization is one part most tree packages treat less carefully, or simply ignore. This was because the traditional treatment of tree learning only emphasized improving impurity, while the complexity control was left to heuristics. By defining it formally, we can get a better idea of what we are learning and obtain models that perform well in the wild.

@hcho3
Copy link

hcho3 commented Dec 19, 2019

@Sandy4321 I don't think this example has all the regularization mechanism as XGBoost does, as the example is quite simplified. There are min_samples_split, min_impurity, and max_depth.

@shorey
Copy link

shorey commented Mar 18, 2020

@hcho3 hi, sorry to interrupt. I am trying to learn xgboost by this project. I come up with some problem with function "def _gain(self, y, y_pred):" in supervised_learning/decision_tree.py.

def _gain(self, y, y_pred):
nominator = np.power((y * self.loss.gradient(y, y_pred)).sum(), 2)
denominator = self.loss.hess(y, y_pred).sum()
return 0.5 * (nominator / denominator)

the variable nominator says ((y*self.loss.gradient(y, y_pred).sum())^2, but according to xgboost doc https://xgboost.readthedocs.io/en/latest/tutorials/model.html, shouldn't it be (self.loss.gradient(y, y_pred).sum())^2? I know I am wrong by changing this line to what I thought, because after changing this line the example just got wrong result. But I still don't know why it's like this. Could you explain it to me? thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants