## Linear Regression Example

#### In this example:

- We import `LinearRegression` from `sklearn.linear_model`.
- We load the Boston housing dataset (`load_boston()`), split it into training and test sets, and then create an instance of `LinearRegression`.
- We fit the model to the training data using `model.fit(X_train, y_train)`, which internally optimizes the model parameters (coefficients) using an efficient method (which can include variants of gradient descent).
- We make predictions on the test set with `model.predict(X_test)`.
- We evaluate the model's performance using mean squared error (`mean_squared_error` from `sklearn.metrics`).

In [5]:
import numpy as np
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.impute import SimpleImputer

# Load Boston housing dataset from fetch_openml
boston = fetch_openml(data_id=506, as_frame=True)

# Extract features (X) and target variable (y)
X = boston.data
y = boston.target

# Check for NaN values in X
print(X.isnull().sum())

# Impute NaN values in X with mean (for example)
imputer = SimpleImputer(strategy='mean')
X_imputed = imputer.fit_transform(X)

# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X_imputed, y, test_size=0.2, random_state=42)

# Create a linear regression model
model = LinearRegression()

# Fit the model to the training data
model.fit(X_train, y_train)

# Predict on the test data
y_pred = model.predict(X_test)

# Evaluate model performance
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse}")


Married               0
Age                   0
Years_of_education    0
Male                  0
Religious             0
Sex_partners          0
Income                6
Drug_use              0
Same_sex_relations    0
dtype: int64
Mean Squared Error: 1.25593006990914


* The MSE computed (mean_squared_error(y_test, y_pred)) gives you a numerical measure of how well your linear regression model predicts housing prices on the test set.
* Lower MSE indicates better predictive performance. If MSE is high, you might explore model improvements or consider if more data or better features are needed.