Skip to content

Commit

Permalink
[python] [docs] fixed objective in sklearn wrapper; added missed obje…
Browse files Browse the repository at this point in the history
…ctives & metrics to docs (#1059)

* added missed aliases for task parameter

* fixed indents

* added missed aliases and options for tree_learner parameter

* added missed objectives to docs

* fixed typo in Poisson parameter and its description

* fixed model_format parameter description

* added missed metrics to docs

* fixed sklearn objective

* fixed set_params

* fixed docs

* added missed options to objectives

* added note about ignore_column (#1061)
  • Loading branch information
StrikerRUS authored and guolinke committed Nov 16, 2017
1 parent 3d65d06 commit e5eb856
Show file tree
Hide file tree
Showing 4 changed files with 57 additions and 29 deletions.
65 changes: 46 additions & 19 deletions docs/Parameters.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,22 +39,22 @@ Core Parameters

- path of config file

- ``task``, default=\ ``train``, type=enum, options=\ ``train``, ``prediction``
- ``task``, default=\ ``train``, type=enum, options=\ ``train``, ``predict``, ``convert_model``

- ``train`` for training
- ``train``, alias=\ ``training``, for training

- ``prediction`` for prediction.
- ``predict``, alias=\ ``prediction``, ``test``, for prediction.

- ``convert_model`` for converting model file into if-else format, see more information in `Convert model parameters <#convert-model-parameters>`__
- ``convert_model``, for converting model file into if-else format, see more information in `Convert model parameters <#convert-model-parameters>`__

- ``application``, default=\ ``regression``, type=enum,
options=\ ``regression``, ``regression_l2``, ``regression_l1``, ``huber``, ``fair``, ``poisson``, ``quantile``, ``quantile_l2``,
``binary``, ``lambdarank``, ``multiclass``,
options=\ ``regression``, ``regression_l1``, ``huber``, ``fair``, ``poisson``, ``quantile``, ``quantile_l2``,
``binary``, ``multiclass``, ``multiclassova``, ``xentropy``, ``xentlambda``, ``lambdarank``,
alias=\ ``objective``, ``app``

- ``regression``, regression application
- regression application

- ``regression_l2``, L2 loss, alias=\ ``mean_squared_error``, ``mse``
- ``regression_l2``, L2 loss, alias=\ ``regression``, ``mean_squared_error``, ``mse``

- ``regression_l1``, L1 loss, alias=\ ``mean_absolute_error``, ``mae``

Expand All @@ -68,16 +68,28 @@ Core Parameters

- ``quantile_l2``, like the ``quantile``, but L2 loss is used instead

- ``binary``, binary classification application
- ``binary``, binary `log loss`_ classification application

- multi-class classification application

- ``multiclass``, `softmax`_ objective function, ``num_class`` should be set as well

- ``multiclassova``, `One-vs-All`_ binary objective function, ``num_class`` should be set as well

- cross-entropy application

- ``xentropy``, objective function for cross-entropy (with optional linear weights), alias=\ ``cross_entropy``

- ``xentlambda``, alternative parameterization of cross-entropy, alias=\ ``cross_entropy_lambda``

- the label is anything in interval [0, 1]

- ``lambdarank``, `lambdarank`_ application

- the label should be ``int`` type in lambdarank tasks, and larger number represent the higher relevance (e.g. 0:bad, 1:fair, 2:good, 3:perfect)

- ``label_gain`` can be used to set the gain(weight) of ``int`` label

- ``multiclass``, multi-class classification application, ``num_class`` should be set as well

- ``boosting``, default=\ ``gbdt``, type=enum,
options=\ ``gbdt``, ``rf``, ``dart``, ``goss``,
alias=\ ``boost``, ``boosting_type``
Expand Down Expand Up @@ -120,13 +132,15 @@ Core Parameters

- number of leaves in one tree

- ``tree_learner``, default=\ ``serial``, type=enum, options=\ ``serial``, ``feature``, ``data``, alias=\ ``tree``
- ``tree_learner``, default=\ ``serial``, type=enum, options=\ ``serial``, ``feature``, ``data``, ``voting``, alias=\ ``tree``

- ``serial``, single machine tree learner

- ``feature``, feature parallel tree learner
- ``feature``, alias=\ ``feature_parallel``, feature parallel tree learner

- ``data``, data parallel tree learner
- ``data``, alias=\ ``data_parallel``, data parallel tree learner

- ``voting``, alias=\ ``voting_parallel``, voting parallel tree learner

- refer to `Parallel Learning Guide <./Parallel-Learning-Guide.rst>`__ to get more details

Expand Down Expand Up @@ -321,7 +335,7 @@ IO Parameters

- file name of prediction result in ``prediction`` task

- ``model_format``, default=\ ``text``, type=string
- ``model_format``, default=\ ``text``, type=multi-enum, options=\ ``text``, ``proto``

- format to save and load model

Expand Down Expand Up @@ -406,6 +420,8 @@ IO Parameters

- add a prefix ``name:`` for column name, e.g. ``ignore_column=name:c1,c2,c3`` means c1, c2 and c3 will be ignored

- **Note**: works only in CLI-version

- **Note**: index starts from ``0``. And it doesn't count the label column

- ``categorical_feature``, default=\ ``""``, type=string, alias=\ ``categorical_column``, ``cat_feature``, ``cat_column``
Expand Down Expand Up @@ -507,9 +523,9 @@ Objective Parameters

- parameter to control the width of Gaussian function. Will be used in ``regression_l1`` and ``huber`` losses

- ``poission_max_delta_step``, default=\ ``0.7``, type=double
- ``poisson_max_delta_step``, default=\ ``0.7``, type=double

- parameter used to safeguard optimization
- parameter for `Poisson regression`_ to safeguard optimization

- ``scale_pos_weight``, default=\ ``1.0``, type=double

Expand Down Expand Up @@ -579,13 +595,18 @@ Metric Parameters

- ``binary_logloss``, `log loss`_

- ``binary_error``.
For one sample: ``0`` for correct classification, ``1`` for error classification
- ``binary_error``, for one sample: ``0`` for correct classification, ``1`` for error classification

- ``multi_logloss``, log loss for mulit-class classification

- ``multi_error``, error rate for mulit-class classification

- ``xentropy``, cross-entropy (with optional linear weights), alias=\ ``cross_entropy``

- ``xentlambda``, "intensity-weighted" cross-entropy, alias=\ ``cross_entropy_lambda``

- ``kldiv``, `Kullback-Leibler divergence`_, alias=\ ``kullback_leibler``

- support multi metrics, separated by ``,``

- ``metric_freq``, default=\ ``1``, type=int
Expand Down Expand Up @@ -749,3 +770,9 @@ You can specific query/group id in data file now. Please refer to parameter ``gr
.. _AUC: https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve

.. _log loss: https://www.kaggle.com/wiki/LogLoss

.. _softmax: https://en.wikipedia.org/wiki/Softmax_function

.. _One-vs-All: https://en.wikipedia.org/wiki/Multiclass_classification#One-vs.-rest

.. _Kullback-Leibler divergence: https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence
8 changes: 4 additions & 4 deletions include/LightGBM/config.h
Original file line number Diff line number Diff line change
Expand Up @@ -363,7 +363,7 @@ struct ParameterAlias {
{
{ "config", "config_file" },
{ "nthread", "num_threads" },
{ "num_thread", "num_threads" },
{ "num_thread", "num_threads" },
{ "random_seed", "seed" },
{ "boosting", "boosting_type" },
{ "boost", "boosting_type" },
Expand Down Expand Up @@ -402,7 +402,7 @@ struct ParameterAlias {
{ "num_round", "num_iterations" },
{ "num_trees", "num_iterations" },
{ "num_rounds", "num_iterations" },
{ "num_boost_round", "num_iterations" },
{ "num_boost_round", "num_iterations" },
{ "sub_row", "bagging_fraction" },
{ "subsample", "bagging_fraction" },
{ "subsample_freq", "bagging_freq" },
Expand Down Expand Up @@ -432,7 +432,7 @@ struct ParameterAlias {
{ "predict_raw_score", "is_predict_raw_score" },
{ "raw_score", "is_predict_raw_score" },
{ "leaf_index", "is_predict_leaf_index" },
{ "predict_leaf_index", "is_predict_leaf_index" },
{ "predict_leaf_index", "is_predict_leaf_index" },
{ "contrib", "is_predict_contrib" },
{ "predict_contrib", "is_predict_contrib" },
{ "min_split_gain", "min_gain_to_split" },
Expand All @@ -444,7 +444,7 @@ struct ParameterAlias {
{ "bagging_fraction_seed", "bagging_seed" },
{ "workers", "machines" },
{ "nodes", "machines" },
{ "subsample_for_bin", "bin_construct_sample_cnt" },
{ "subsample_for_bin", "bin_construct_sample_cnt" },
});
const std::unordered_set<std::string> parameter_set({
"config", "config_file", "task", "device",
Expand Down
11 changes: 6 additions & 5 deletions python-package/lightgbm/sklearn.py
Original file line number Diff line number Diff line change
Expand Up @@ -163,7 +163,7 @@ def __init__(self, boosting_type="gbdt", num_leaves=31, max_depth=-1,
objective : string, callable or None, optional (default=None)
Specify the learning task and the corresponding learning objective or
a custom objective function to be used (see note below).
default: 'binary' for LGBMClassifier, 'lambdarank' for LGBMRanker.
default: 'regression' for LGBMRegressor, 'binary' or 'multiclass' for LGBMClassifier, 'lambdarank' for LGBMRanker.
min_split_gain : float, optional (default=0.)
Minimum loss reduction required to make a further partition on a leaf node of the tree.
min_child_weight : float, optional (default=1e-3)
Expand Down Expand Up @@ -264,7 +264,7 @@ def __init__(self, boosting_type="gbdt", num_leaves=31, max_depth=-1,
self._best_score = None
self._best_iteration = None
self._other_params = {}
self._objective = None
self._objective = objective
self._n_features = None
self._classes = None
self._n_classes = None
Expand All @@ -285,6 +285,8 @@ def get_params(self, deep=True):
def set_params(self, **params):
for key, value in params.items():
setattr(self, key, value)
if hasattr(self, '_' + key):
setattr(self, '_' + key, value)
self._other_params[key] = value
return self

Expand Down Expand Up @@ -370,8 +372,6 @@ def fit(self, X, y,
For multi-class task, the y_pred is group by class_id first, then group by row_id.
If you want to get i-th row y_pred in j-th class, the access way is y_pred[j * num_data + i].
"""
if not hasattr(self, '_objective'):
self._objective = self.objective
if self._objective is None:
if isinstance(self, LGBMRegressor):
self._objective = "regression"
Expand Down Expand Up @@ -633,7 +633,8 @@ def fit(self, X, y,
self._n_classes = len(self._classes)
if self._n_classes > 2:
# Switch to using a multiclass objective in the underlying LGBM instance
self._objective = "multiclass"
if self._objective != "multiclassova" and not callable(self._objective):
self._objective = "multiclass"
if eval_metric == 'logloss' or eval_metric == 'binary_logloss':
eval_metric = "multi_logloss"
elif eval_metric == 'error' or eval_metric == 'binary_error':
Expand Down
2 changes: 1 addition & 1 deletion src/metric/metric.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ Metric* Metric::CreateMetric(const std::string& type, const MetricConfig& config
return new MultiErrorMetric(config);
} else if (type == std::string("xentropy") || type == std::string("cross_entropy")) {
return new CrossEntropyMetric(config);
} else if (type == std::string("xentlambda")) {
} else if (type == std::string("xentlambda") || type == std::string("cross_entropy_lambda")) {
return new CrossEntropyLambdaMetric(config);
} else if (type == std::string("kldiv") || type == std::string("kullback_leibler")) {
return new KullbackLeiblerDivergence(config);
Expand Down

0 comments on commit e5eb856

Please sign in to comment.