# 多元线性回归/对数线性回归（二选一）

## 一、多元线性回归
这部分的内容是要求大家完成多元线性回归，我们会先带着大家使用sklearn做一元线性回归的十折交叉验证，多元线性回归大家可以仿照着完成

### 1. 读取数据

In [24]:
import pandas as pd
import itertools

In [25]:
# 读取数据
data = pd.read_csv('C:/Users/苍山沐雪/Desktop/Jupyter/data//advertising/advertising.csv')


In [26]:
data.head()

Unnamed: 0,TV,Radio,Newspaper,Sales
0,230.1,37.8,69.2,22.1
1,44.5,39.3,45.1,10.4
2,17.2,45.9,69.3,12.0
3,151.5,41.3,58.5,16.5
4,180.8,10.8,58.4,17.9


### 2. 引入模型

In [27]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import cross_val_predict
from sklearn.preprocessing import PolynomialFeatures

### 3. 使用sklearn完成一元线性回归的十折交叉验证验证

#### 创建模型

In [28]:
model = LinearRegression()

#### 选取数据

In [29]:
features = ['TV']
x = data[features]
y = data['Sales']

#### 做十折交叉验证的预测

In [30]:
prediction = cross_val_predict(model, x, y, cv = 10)

这十折交叉验证是按顺序做的，会先将前10%的数据作为测试集，然后会往后顺延到10%到20%，最后将这十份的预测值按顺序拼接后返回

In [31]:
prediction.shape

(200,)

### 4. 计算评价指标

#### MAE

In [32]:
mean_absolute_error(prediction, data['Sales'])

1.8484560436845014

#### RMSE

In [33]:
mean_squared_error(prediction, data['Sales']) ** 0.5

2.311891648438941

### 5. 请你选择多种特征进行组合，完成多元线性回归，并对比不同的特征组合，它们训练出的模型在十折交叉验证上MAE与RMSE的差别，至少完成3组

###### 扩展：多项式回归（一元线性回归的扩展），尝试对部分特征进行变换，如将其二次幂，三次幂作为特征输入模型，观察模型在预测能力上的变化
###### 提示：多元线性回归，只要在上方的features这个list中，加入其他特征的名字就可以

In [34]:
# 创建一个函数来进行特征组合的多元线性回归
def perform_multivariate_regression(features, data):
    model = LinearRegression()
    x = data[features]
    y = data['Sales']
    prediction = cross_val_predict(model, x, y, cv=10)
    mae = mean_absolute_error(prediction, data['Sales'])
    rmse = mean_squared_error(prediction, data['Sales']) ** 0.5
    return mae, rmse

# 定义不同特征组合
all_features = ['TV', 'Radio', 'Newspaper']
feature_combinations = []

for r in range(1, len(all_features) + 1):
    feature_combinations.extend(list(itertools.combinations(all_features, r)))

# 多元线性回归并比较不同特征组合的性能
results = []

for combo in feature_combinations:
    mae, rmse = perform_multivariate_regression(list(combo), data)
    results.append((combo, mae, rmse))

# 输出不同特征组合的性能
for combo, mae, rmse in results:
    print(f"Features: {combo}, MAE: {mae}, RMSE: {rmse}")

# 扩展：多项式回归
poly_results = []

for combo in feature_combinations:
    x = data[list(combo)]
    y = data['Sales']

    for degree in range(1, 4):  # 尝试一次幂，二次幂和三次幂
        poly = PolynomialFeatures(degree=degree)
        x_poly = poly.fit_transform(x)

        model = LinearRegression()
        prediction = cross_val_predict(model, x_poly, y, cv=10)
        mae = mean_absolute_error(prediction, data['Sales'])
        rmse = mean_squared_error(prediction, data['Sales']) ** 0.5

        poly_results.append((combo, degree, mae, rmse))

# 输出多项式回归的性能
for combo, degree, mae, rmse in poly_results:
    print(f"Features: {combo}, Degree: {degree}, MAE: {mae}, RMSE: {rmse}")






Features: ('TV',), MAE: 1.8484560436845014, RMSE: 2.311891648438941
Features: ('Radio',), MAE: 4.298916799089579, RMSE: 5.018042892623836
Features: ('Newspaper',), MAE: 4.409197242713417, RMSE: 5.251605412934672
Features: ('TV', 'Radio'), MAE: 1.2621936422519533, RMSE: 1.6818990518213202
Features: ('TV', 'Newspaper'), MAE: 1.776707605359968, RMSE: 2.2492610419402657
Features: ('Radio', 'Newspaper'), MAE: 4.3163332911165995, RMSE: 5.042150499491994
Features: ('TV', 'Radio', 'Newspaper'), MAE: 1.2644541807760776, RMSE: 1.685774006914065
Features: ('TV',), Degree: 1, MAE: 1.848456043684502, RMSE: 2.3118916484389405
Features: ('TV',), Degree: 2, MAE: 1.8002792724446706, RMSE: 2.2571888068471626
Features: ('TV',), Degree: 3, MAE: 1.8125967310178075, RMSE: 2.273762346296598
Features: ('Radio',), Degree: 1, MAE: 4.298916799089579, RMSE: 5.018042892623835
Features: ('Radio',), Degree: 2, MAE: 4.32327527255341, RMSE: 5.041740516911242
Features: ('Radio',), Degree: 3, MAE: 4.32449000213333, RMSE

###### 双击此处填写
1. 模型1使用的特征：'TV'
2. 模型2使用的特征：'Radio'
3. 模型3使用的特征：'Newspaper'

模型|MAE|RMSE
-|-|-
模型1 | 1.848 | 2.311
模型2 | 4.298 | 5.018
模型3 | 4.409 | 5.251