# Linjär regression i scikit-learn – Lathund


**Arbetsflöde**  
1. Ladda och undersök data  
2. Train/Test-split  
3. (Valfritt) Skalning  
4. Träna modell  
5. Utvärdera och tolka


In [1]:
import pandas as pd
import numpy as np

# Ladda data (byt sökväg vid behov)
df = pd.read_csv("../../data/Advertising.csv", index_col= 0)
df.head()

Unnamed: 0,TV,radio,newspaper,sales
1,230.1,37.8,69.2,22.1
2,44.5,39.3,45.1,10.4
3,17.2,45.9,69.3,9.3
4,151.5,41.3,58.5,18.5
5,180.8,10.8,58.4,12.9


In [2]:
df.shape, df.describe()

((200, 4),
                TV       radio   newspaper       sales
 count  200.000000  200.000000  200.000000  200.000000
 mean   147.042500   23.264000   30.554000   14.022500
 std     85.854236   14.846809   21.778621    5.217457
 min      0.700000    0.000000    0.300000    1.600000
 25%     74.375000    9.975000   12.750000   10.375000
 50%    149.750000   22.900000   25.750000   12.900000
 75%    218.825000   36.525000   45.100000   17.400000
 max    296.400000   49.600000  114.000000   27.000000)

In [3]:
# Features/target
X = df.drop("sales", axis=1)
y = df["sales"]
X.head()

Unnamed: 0,TV,radio,newspaper
1,230.1,37.8,69.2
2,44.5,39.3,45.1
3,17.2,45.9,69.3
4,151.5,41.3,58.5
5,180.8,10.8,58.4


In [4]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.33, random_state=42
)

X_train.shape, X_test.shape, y_train.shape, y_test.shape

((134, 3), (66, 3), (134,), (66,))

### (Valfritt) Skalning

In [8]:
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler().fit(X_train)
scaled_X_train = scaler.transform(X_train)
scaled_X_test = scaler.transform(X_test)

scaled_X_train[:3]

array([[0.99053094, 0.55846774, 0.01491054],
       [0.06087251, 0.24395161, 0.22962227],
       [0.45180927, 0.09879032, 0.08946322]])

### Träna linjär regression

In [9]:
from sklearn.linear_model import LinearRegression

model = LinearRegression().fit(scaled_X_train, y_train)

print("Intercept:", model.intercept_)
for feature, coef in zip(X.columns, model.coef_):
    print(f"{feature}: {coef:.4f}")

Intercept: 2.7911595196243653
TV: 13.2075
radio: 9.7529
newspaper: 0.6111


### Prediktion

In [10]:
# På testdatan
y_pred = model.predict(scaled_X_test)
y_pred[:5]

array([16.58673085, 21.18622524, 21.66752973, 10.81086512, 22.25210881])

In [11]:
# Ny data (anpassa värden efter dina features)
new_data = pd.DataFrame([[100, 20, 30]], columns=X.columns)
scaled_new = scaler.transform(new_data)
model.predict(scaled_new)

array([11.33941654])

### Utvärdering

In [13]:
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)

print(f"MAE: {mae:.4f}")
print(f"MSE: {mse:.4f}")
print(f"RMSE: {rmse:.4f}")
print(f"R²: {r2:.4f}")

MAE: 1.4938
MSE: 3.7279
RMSE: 1.9308
R²: 0.8556


### Tips & fallgropar
- Outliers kan påverka modellen kraftigt – undersök residualer.
- Multikollinearitet – överväg regularisering (Ridge/Lasso).
- Kontrollera modellantaganden (linjäritet, homoskedasticitet, normalfördelning av residualer).
