Skip to content

Commit

Permalink
add gamma and tweedie regression.
Browse files Browse the repository at this point in the history
  • Loading branch information
guolinke committed Jan 21, 2018
1 parent 3dc5716 commit 169a271
Show file tree
Hide file tree
Showing 9 changed files with 195 additions and 8 deletions.
4 changes: 4 additions & 0 deletions docs/Features.rst
Original file line number Diff line number Diff line change
Expand Up @@ -225,6 +225,10 @@ Support following metrics:

- Kullback-Leibler

- Gamma

- Tweedie

For more details, please refer to `Parameters <./Parameters.rst#metric-parameters>`__.

Other Features
Expand Down
27 changes: 24 additions & 3 deletions docs/Parameters.rst
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ Core Parameters

- ``application``, default=\ ``regression``, type=enum,
options=\ ``regression``, ``regression_l1``, ``huber``, ``fair``, ``poisson``, ``quantile``, ``mape``,
``binary``, ``multiclass``, ``multiclassova``, ``xentropy``, ``xentlambda``, ``lambdarank``,
``binary``, ``multiclass``, ``multiclassova``, ``xentropy``, ``xentlambda``, ``lambdarank``, ``gammma``, ``tweedie``,
alias=\ ``objective``, ``app``

- regression application
Expand All @@ -74,6 +74,10 @@ Core Parameters

- ``mape``, `MAPE loss`_, alias=\ ``mean_absolute_percentage_error``

- ``gamma``, gamma regression with log-link. It might be useful, e.g., for modeling insurance claims severity, or for any target that might be `gamma-distributed`_

- ``tweedie``, tweedie regression with log-link. It might be useful, e.g., for modeling total loss in insurance, or for any target that might be `tweedie-distributed`_.

- ``binary``, binary `log loss`_ classification application

- multi-class classification application
Expand Down Expand Up @@ -557,10 +561,17 @@ Objective Parameters

- will fit ``sqrt(label)`` instead and prediction result will be also automatically converted to ``pow2(prediction)``

- ``tweedie_variance_power``, default=\ ``1.5``, type=\ ``double``, range=\ ``[1,2)``
- parameter that controls the variance of the tweedie distribution

- set closer to 2 to shift towards a gamma distribution

- set closer to 1 to shift towards a poisson distribution

Metric Parameters
-----------------

- ``metric``, default={``l2`` for regression, ``binary_logloss`` for binary classification, ``ndcg`` for lambdarank}, type=multi-enum
- ``metric``, default=``None``, type=multi-enum

- ``l1``, absolute loss, alias=\ ``mean_absolute_error``, ``mae``

Expand All @@ -576,7 +587,7 @@ Metric Parameters

- ``fair``, `Fair loss`_

- ``poisson``, `Poisson regression`_
- ``poisson``, negative log-likelihood for Poisson regression

- ``ndcg``, `NDCG`_

Expand All @@ -598,6 +609,12 @@ Metric Parameters

- ``kldiv``, `Kullback-Leibler divergence`_, alias=\ ``kullback_leibler``

- ``gamma``, negative log-likelihood for gamma regression

- ``gamma_deviance``, residual deviance for gamma regression, alias=\ ``gamma-deviance``

- ``tweedie``, negative log-likelihood for tweedie regression

- support multi metrics, separated by ``,``

- ``metric_freq``, default=\ ``1``, type=int, alias=\ ``output_freq``
Expand Down Expand Up @@ -769,3 +786,7 @@ You can specific query/group id in data file now. Please refer to parameter ``gr
.. _One-vs-All: https://en.wikipedia.org/wiki/Multiclass_classification#One-vs.-rest

.. _Kullback-Leibler divergence: https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence

.. _gamma-distributed: https://en.wikipedia.org/wiki/Gamma_distribution#Applications

.. _tweedie-distributed: https://en.wikipedia.org/wiki/Tweedie_distribution#Applications
12 changes: 10 additions & 2 deletions docs/Quick-Start.rst
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ Some important parameters:

- ``application``, default=\ ``regression``, type=enum,
options=\ ``regression``, ``regression_l1``, ``huber``, ``fair``, ``poisson``, ``quantile``, ``mape``,
``binary``, ``multiclass``, ``multiclassova``, ``xentropy``, ``xentlambda``, ``lambdarank``,
``binary``, ``multiclass``, ``multiclassova``, ``xentropy``, ``xentlambda``, ``lambdarank``, ``gammma``, ``tweedie``,
alias=\ ``objective``, ``app``

- regression application
Expand All @@ -86,7 +86,11 @@ Some important parameters:

- ``quantile``, `Quantile regression`_

- ``mape``, `MAPE loss`_
- ``mape``, `MAPE loss`_, alias=\ ``mean_absolute_percentage_error``

- ``gamma``, gamma regression with log-link. It might be useful, e.g., for modeling insurance claims severity, or for any target that might be `gamma-distributed`_

- ``tweedie``, tweedie regression with log-link. It might be useful, e.g., for modeling total loss in insurance, or for any target that might be `tweedie-distributed`_.

- ``binary``, binary `log loss`_ classification application

Expand Down Expand Up @@ -247,3 +251,7 @@ Examples
.. _Dropouts meet Multiple Additive Regression Trees: https://arxiv.org/abs/1505.01866

.. _hyper-threading: https://en.wikipedia.org/wiki/Hyper-threading

.. _gamma-distributed: https://en.wikipedia.org/wiki/Gamma_distribution#Applications

.. _tweedie-distributed: https://en.wikipedia.org/wiki/Tweedie_distribution#Applications
4 changes: 3 additions & 1 deletion include/LightGBM/config.h
Original file line number Diff line number Diff line change
Expand Up @@ -178,6 +178,7 @@ struct ObjectiveConfig: public ConfigBase {
// True will sqrt fit the sqrt(label)
bool reg_sqrt = false;
double alpha = 0.9f;
double tweedie_variance_power = 1.5f;
LIGHTGBM_EXPORT void Set(const std::unordered_map<std::string, std::string>& params) override;
};

Expand All @@ -189,6 +190,7 @@ struct MetricConfig: public ConfigBase {
double sigmoid = 1.0f;
double fair_c = 1.0f;
double alpha = 0.9f;
double tweedie_variance_power = 1.5f;
std::vector<double> label_gain;
std::vector<int> eval_at;
LIGHTGBM_EXPORT void Set(const std::unordered_map<std::string, std::string>& params) override;
Expand Down Expand Up @@ -475,7 +477,7 @@ struct ParameterAlias {
"histogram_pool_size", "is_provide_training_metric", "machine_list_filename", "machines",
"zero_as_missing", "init_score_file", "valid_init_score_file", "is_predict_contrib",
"max_cat_threshold", "cat_smooth", "min_data_per_group", "cat_l2", "max_cat_to_onehot",
"alpha", "reg_sqrt"
"alpha", "reg_sqrt", "tweedie_variance_power"
});
std::unordered_map<std::string, std::string> tmp_map;
for (const auto& pair : *params) {
Expand Down
4 changes: 4 additions & 0 deletions src/io/config.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -321,6 +321,8 @@ void ObjectiveConfig::Set(const std::unordered_map<std::string, std::string>& pa
GetDouble(params, "alpha", &alpha);
CHECK(alpha > 0 && alpha < 1);
GetBool(params, "reg_sqrt", &reg_sqrt);
GetDouble(params, "tweedie_variance_power", &tweedie_variance_power);
CHECK(tweedie_variance_power >= 1 && tweedie_variance_power < 2);
std::string tmp_str = "";
if (GetString(params, "label_gain", &tmp_str)) {
label_gain = Common::StringToArray<double>(tmp_str, ',');
Expand All @@ -345,6 +347,8 @@ void MetricConfig::Set(const std::unordered_map<std::string, std::string>& param
CHECK(num_class > 0);
GetDouble(params, "alpha", &alpha);
CHECK(alpha > 0 && alpha < 1);
GetDouble(params, "tweedie_variance_power", &tweedie_variance_power);
CHECK(tweedie_variance_power >= 1 && tweedie_variance_power < 2);
std::string tmp_str = "";
if (GetString(params, "label_gain", &tmp_str)) {
label_gain = Common::StringToArray<double>(tmp_str, ',');
Expand Down
6 changes: 6 additions & 0 deletions src/metric/metric.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,12 @@ Metric* Metric::CreateMetric(const std::string& type, const MetricConfig& config
return new KullbackLeiblerDivergence(config);
} else if (type == std::string("mean_absolute_percentage_error") || type == std::string("mape")) {
return new MAPEMetric(config);
} else if (type == std::string("gamma")) {
return new GammaMetric(config);
} else if (type == std::string("gamma-deviance") || type == std::string("gamma_deviance")) {
return new GammaDevianceMetric(config);
} else if (type == std::string("tweedie")) {
return new TweedieMetric(config);
}
return nullptr;
}
Expand Down
54 changes: 54 additions & 0 deletions src/metric/regression_metric.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -242,5 +242,59 @@ class MAPEMetric : public RegressionMetric<MAPEMetric> {
}
};

class GammaMetric : public RegressionMetric<GammaMetric> {
public:
explicit GammaMetric(const MetricConfig& config) :RegressionMetric<GammaMetric>(config) {
}

inline static double LossOnPoint(label_t label, double score, const MetricConfig&) {
const double psi = 1.0;
const double theta = -1.0 / score;
const double a = psi;
const double b = -std::log(-theta);
const double c = 1. / psi * std::log(label / psi) - std::log(label) - std::lgamma(1.0 / psi);
return -((label * theta - b) / a + c);
}
inline static const char* Name() {
return "gamma";
}
};


class GammaDevianceMetric : public RegressionMetric<GammaDevianceMetric> {
public:
explicit GammaDevianceMetric(const MetricConfig& config) :RegressionMetric<GammaDevianceMetric>(config) {
}

inline static double LossOnPoint(label_t label, double score, const MetricConfig&) {
const double epsilon = 1.0e-9;
const double tmp = label / (score + epsilon);
return tmp - std::log(tmp) - 1;
}
inline static const char* Name() {
return "gamma-deviance";
}
inline static double AverageLoss(double sum_loss, double sum_weights) {
return sum_loss * 2;
}
};

class TweedieMetric : public RegressionMetric<TweedieMetric> {
public:
explicit TweedieMetric(const MetricConfig& config) :RegressionMetric<TweedieMetric>(config) {
}

inline static double LossOnPoint(label_t label, double score, const MetricConfig& config) {
const double rho = config.tweedie_variance_power;
const double a = label * std::exp((1 - rho) * std::log(score)) / (1 - rho);
const double b = std::exp((2 - rho) * std::log(score)) / (2 - rho);
return -a + b;
}
inline static const char* Name() {
return "tweedie";
}
};


} // namespace LightGBM
#endif // LightGBM_METRIC_REGRESSION_METRIC_HPP_
8 changes: 8 additions & 0 deletions src/objective/objective_function.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,10 @@ ObjectiveFunction* ObjectiveFunction::CreateObjectiveFunction(const std::string&
return new CrossEntropyLambda(config);
} else if (type == std::string("mean_absolute_percentage_error") || type == std::string("mape")) {
return new RegressionMAPELOSS(config);
} else if (type == std::string("gamma")) {
return new RegressionGammaLoss(config);
} else if (type == std::string("tweedie")) {
return new RegressionTweedieLoss(config);
}
return nullptr;
}
Expand Down Expand Up @@ -66,6 +70,10 @@ ObjectiveFunction* ObjectiveFunction::CreateObjectiveFunction(const std::string&
return new CrossEntropy(strs);
} else if (type == std::string("xentlambda") || type == std::string("cross_entropy_lambda")) {
return new CrossEntropyLambda(strs);
} else if (type == std::string("gamma")) {
return new RegressionGammaLoss(strs);
} else if (type == std::string("tweedie")) {
return new RegressionTweedieLoss(strs);
}
return nullptr;
}
Expand Down
84 changes: 82 additions & 2 deletions src/objective/regression_objective.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -366,7 +366,7 @@ class RegressionPoissonLoss: public RegressionL2loss {
explicit RegressionPoissonLoss(const ObjectiveConfig& config): RegressionL2loss(config) {
max_delta_step_ = static_cast<double>(config.poisson_max_delta_step);
if (sqrt_) {
Log::Warning("cannot use sqrt transform in Poisson Regression, will auto disable it.");
Log::Warning("cannot use sqrt transform in %s Regression, will auto disable it.", GetName());
sqrt_ = false;
}
}
Expand All @@ -379,7 +379,7 @@ class RegressionPoissonLoss: public RegressionL2loss {

void Init(const Metadata& metadata, data_size_t num_data) override {
if (sqrt_) {
Log::Warning("cannot use sqrt transform in Poisson Regression, will auto disable it.");
Log::Warning("cannot use sqrt transform in %s Regression, will auto disable it.", GetName());
sqrt_ = false;
}
RegressionL2loss::Init(metadata, num_data);
Expand Down Expand Up @@ -636,6 +636,86 @@ class RegressionMAPELOSS : public RegressionL1loss {

};



/*!
* \brief Objective function for Gamma regression
*/
class RegressionGammaLoss : public RegressionPoissonLoss {
public:
explicit RegressionGammaLoss(const ObjectiveConfig& config) : RegressionPoissonLoss(config) {
}

explicit RegressionGammaLoss(const std::vector<std::string>& strs) : RegressionPoissonLoss(strs) {

}

~RegressionGammaLoss() {}

void GetGradients(const double* score, score_t* gradients,
score_t* hessians) const override {
if (weights_ == nullptr) {
#pragma omp parallel for schedule(static)
for (data_size_t i = 0; i < num_data_; ++i) {
gradients[i] = static_cast<score_t>(1.0 - label_[i] / std::exp(score[i]));
hessians[i] = static_cast<score_t>(label_[i] / std::exp(score[i]));
}
} else {
#pragma omp parallel for schedule(static)
for (data_size_t i = 0; i < num_data_; ++i) {
gradients[i] = static_cast<score_t>(1.0 - label_[i] / std::exp(score[i]) * weights_[i]);
hessians[i] = static_cast<score_t>(label_[i] / std::exp(score[i]) * weights_[i]);
}
}
}

const char* GetName() const override {
return "gamma";
}

};

/*!
* \brief Objective function for Tweedie regression
*/
class RegressionTweedieLoss: public RegressionPoissonLoss {
public:
explicit RegressionTweedieLoss(const ObjectiveConfig& config) : RegressionPoissonLoss(config) {
rho_ = config.tweedie_variance_power;
}

explicit RegressionTweedieLoss(const std::vector<std::string>& strs) : RegressionPoissonLoss(strs) {

}

~RegressionTweedieLoss() {}

void GetGradients(const double* score, score_t* gradients,
score_t* hessians) const override {
if (weights_ == nullptr) {
#pragma omp parallel for schedule(static)
for (data_size_t i = 0; i < num_data_; ++i) {
gradients[i] = static_cast<score_t>(-label_[i] * std::exp((1 - rho_) * score[i]) + std::exp((2 - rho_) * score[i]));
hessians[i] = static_cast<score_t>(-label_[i] * (1 - rho_) * std::exp((1 - rho_) * score[i]) +
(2 - rho_) * std::exp((2 - rho_) * score[i]));
}
} else {
#pragma omp parallel for schedule(static)
for (data_size_t i = 0; i < num_data_; ++i) {
gradients[i] = static_cast<score_t>((-label_[i] * std::exp((1 - rho_) * score[i]) + std::exp((2 - rho_) * score[i])) * weights_[i]);
hessians[i] = static_cast<score_t>((-label_[i] * (1 - rho_) * std::exp((1 - rho_) * score[i]) +
(2 - rho_) * std::exp((2 - rho_) * score[i])) * weights_[i]);
}
}
}

const char* GetName() const override {
return "tweedie";
}
private:
double rho_;
};

#undef PercentileFun
#undef WeightedPercentileFun

Expand Down

0 comments on commit 169a271

Please sign in to comment.