# LightGBM算法
原理：LightGBM 是一种高效的梯度提升框架，它采用了直方图算法等优化技术，能够在处理大规模数据时具有更快的训练速度和更低的内存消耗。
优点：训练速度快，内存占用少，支持大规模数据集和分布式训练。
适用场景：适用于大规模数据的分类和回归任务，特别是在资源受限的环境中。

## 分类任务

In [2]:
import lightgbm as lgb
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# 生成示例数据
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_classes=2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 创建LightGBM数据集
train_data = lgb.Dataset(X_train, label=y_train)
test_data = lgb.Dataset(X_test, label=y_test)

# 设置参数
params = {
    'objective': 'binary',
    'metric': 'binary_logloss',
    'num_leaves': 31,
    'learning_rate': 0.05,
    'feature_fraction': 0.9
}

# 训练模型
lgb_model = lgb.train(params, train_data, num_boost_round=100, valid_sets=[test_data])

# 预测
y_pred = lgb_model.predict(X_test)
y_pred_binary = [1 if pred > 0.5 else 0 for pred in y_pred]
print(f"LightGBM分类器的准确率: {accuracy_score(y_test, y_pred_binary):.2f}")

[LightGBM] [Info] Number of positive: 415, number of negative: 385
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000379 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 2550
[LightGBM] [Info] Number of data points in the train set: 800, number of used features: 10
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.518750 -> initscore=0.075035
[LightGBM] [Info] Start training from score 0.075035
LightGBM分类器的准确率: 0.95


## 回归任务

In [5]:
import lightgbm as lgb
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error


X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, noise=0.1, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
lgb_model = lgb.LGBMRegressor(n_estimators=100, learning_rate=0.05, num_leaves=31, feature_fraction=0.9)
lgb_model.fit(X_train, y_train, eval_set=[(X_test, y_test)])
y_pred = lgb_model.predict(X_test)
print(f"LightGBM回归器的MSE: {mean_squared_error(y_test, y_pred):.2f}")

[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000108 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 2550
[LightGBM] [Info] Number of data points in the train set: 800, number of used features: 10
[LightGBM] [Info] Start training from score 0.440965
LightGBM回归器的MSE: 128.66


