# Regression models with scikit-learn

Scikit-learn offers a wide variety of regression models, each suited to different types of data and problem domains. Whether we are dealing with linear relationships (like linear regression), complex interactions (like decision trees and neural networks), or need robustness to outliers (like Theil-Sen), Scikit-learn provides easy-to-use implementations. With its consistent API, Scikit-learn simplifies the process of building, evaluating, and fine-tuning regression models.

In [1]:
import numpy as np
import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.linear_model import LinearRegression, Ridge, Lasso, ElasticNet, TheilSenRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.svm import SVR
from sklearn.neighbors import KNeighborsRegressor
from sklearn.neural_network import MLPRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error
import pickle

We will use the California housing dataset available in Scikit-Learn. This dataset contains information about house prices in California and includes features such as median income, house age, and the number of rooms.

In [2]:
# Load the California housing dataset
california = fetch_california_housing()
X = pd.DataFrame(california.data, columns=california.feature_names)
y = california.target

# Display the dataset
print("California housing dataset (First 5 rows):")
display(X.head())

# Prepare the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

California housing dataset (First 5 rows):


Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25


## Linear regression models
Linear Regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables. It assumes that there is a linear relationship between the dependent variable and the independent variables.

### Ordinary least squares (OLS)
OLS is the most commonly used method for estimating the parameters in a linear regression model. It works by minimizing the sum of the squared differences between the observed values (actual data points) and the values predicted by the linear model. The OLS method provides the best linear unbiased estimates of the coefficients in the linear regression model under the assumptions of linearity, independence, homoscedasticity (constant variance of the errors), and normality of the errors.

In [3]:
# Build and fit the linear regression model
lr_model = LinearRegression()
lr_model.fit(X_train, y_train)

# Save the model
with open('linear_regression_model.pkl', 'wb') as f:
    pickle.dump(lr_model, f)

# Load the model
with open('linear_regression_model.pkl', 'rb') as f:
    loaded_lr_model = pickle.load(f)

# Make predictions
y_pred_lr = lr_model.predict(X_test)

mse = mean_squared_error(y_test, y_pred_lr)
mae = mean_absolute_error(y_test, y_pred_lr)
r2 = r2_score(y_test, y_pred_lr)
print(f"Linear regression performance metrics:")
print(f"{'-'*40}")
print(f"Mean squared error (MSE): {mse:.4f}")
print(f"Mean absolute error (MAE): {mae:.4f}")
print(f"R-squared score: {r2:.4f}\n")

Linear regression performance metrics:
----------------------------------------
Mean squared error (MSE): 0.5559
Mean absolute error (MAE): 0.5332
R-squared score: 0.5758



***Explanation***

**Syntax**: The `LinearRegression` class in Scikit-learn is used to implement the OLS regression model. 
```python
LinearRegression(*, fit_intercept=True, copy_X=True, n_jobs=None, positive=False)
```
- `*`: Indicates that all parameters following `*` must be passed as keyword arguments, not positional arguments.
- **`fit_intercept`** (`bool`, default=`True`): Determines whether to calculate the intercept (`b0`) for this model. If `True`, the model will include the intercept term in the regression equation. If `False`, the model will not include an intercept term, forcing the line to pass through the origin (0,0). Most regression models require an intercept term to account for the baseline level of the dependent variable when all independent variables are zero. We might set this to `False` where you know for certain that the dependent variable should be zero when all independent variables are zero.
- **`copy_X`** (`bool`, default=`True`): If `True`, the input data `X` will be copied. Otherwise, it may be overwritten. Use `True` to ensure the original dataset remains unchanged, which is important if we are using the same dataset for multiple operations or if data integrity is critical. Use `False` to save memory and computation time if we are working with very large datasets.
- **`n_jobs`** (`int`, default=`None`): The number of CPU cores to use for the computation. `None` means using a single core, `-1` means using all available cores. Use `-1` or a specific integer for large datasets or computationally intensive tasks where parallel processing can significantly speed up the fitting process.
- **`positive`** (`bool`, default=`False`): When set to `True`, forces the coefficients to be positive. This option is useful in some types of problems to avoid negative weights. Use `True` in scenarios where the relationship between the dependent and independent variables is known to be strictly positive.


---

## Regularization
Regularization is a technique used to prevent overfitting by adding a penalty term to the loss function, discouraging the model from fitting too closely to the training data and thereby improving its generalization performance. Scikit-learn provides three popular regularization methods: ridge, lasso, and elastic net.

### Ridge regression
Ridge regression, also known as L2 regularization, adds a penalty term proportional to the square of the magnitude of the coefficients to the loss function. This penalization discourages large coefficients, effectively shrinking them, and helps reduce the model’s complexity, which can prevent overfitting. Ridge regression does not force coefficients to zero, allowing all features to contribute to the model, albeit with smaller weights.

In [4]:
# Build and fit the ridge regression model
ridge_model = Ridge(alpha=1.0)
ridge_model.fit(X_train, y_train)

# Save the model
with open('ridge_regression_model.pkl', 'wb') as f:
    pickle.dump(ridge_model, f)

# Load the model
with open('ridge_regression_model.pkl', 'rb') as f:
    loaded_ridge_model = pickle.load(f)

# Make predictions
y_pred_ridge = loaded_ridge_model.predict(X_test)

# Display the results
mse = mean_squared_error(y_test, y_pred_ridge)
mae = mean_absolute_error(y_test, y_pred_ridge)
r2 = r2_score(y_test, y_pred_ridge)
print(f"Ridge regression performance metrics:")
print(f"{'-'*40}")
print(f"Mean squared error (MSE): {mse:.4f}")
print(f"Mean absolute error (MAE): {mae:.4f}")
print(f"R-squared score: {r2:.4f}\n")

Ridge regression performance metrics:
----------------------------------------
Mean squared error (MSE): 0.5558
Mean absolute error (MAE): 0.5332
R-squared score: 0.5759



***Explanation***

**Syntax**: 
```python
Ridge(*, alpha=1.0, fit_intercept=True, copy_X=True, max_iter=None, tol=0.001, solver='auto', random_state=None)
```
- `*`: Indicates that all parameters following `*` must be passed as keyword arguments, not positional arguments.
- **`alpha`** (`float`, default=`1.0`): Regularization strength; must be a positive float. Larger values specify stronger regularization. Adjust `alpha` to control the trade-off between bias and variance. Use a larger `alpha` if the model is overfitting, and reduce it if the model underfits or is too simplistic.
- **`fit_intercept`** (`bool`, default=`True`): If `True`, the model calculates the intercept for this model. If set to `False`, no intercept will be used in calculations.
- **`copy_X`** (`bool`, default=`True`): If `True`, the input data `X` will be copied. Otherwise, it may be overwritten. Use `True` to avoid altering the original data. Set to `False` for memory efficiency when dealing with large datasets.
- **`max_iter`** (`int`, default=`None`): Maximum number of iterations for the solver. Increase this if the solver doesn’t converge, especially for large or complex datasets.
- **`tol`** (`float`, default=`0.001`): Tolerance for the stopping criterion. Lower `tol` for higher precision in the solution, or increase it for faster convergence.
- **`solver`** (`{'auto', 'svd', 'cholesky', 'lsqr', 'sparse_cg', 'sag', 'saga', 'lbfgs'}`): Solver to use in the computational routines. ‘auto’ chooses the solver based on the type of data.
    - `auto`: Automatically selects the solver based on data.
    - `svd`: Suitable for small problems.
    - `cholesky`: Faster but less stable.
    - `lsqr`, `sparse_cg`: Useful for sparse data.
    - `sag`, `saga`: Good for large datasets or when memory is limited.
    - `lbfgs`: Limited-memory Broyden–Fletcher–Goldfarb–Shanno, useful for optimization.
- **`random_state`** (`int`, `RandomState` instance, default=`None`): Controls the randomness for data shuffling. Used when the `solver` is set to ‘sag’ or ‘saga’ to shuffle the data.


#### Lasso regression
Lasso regression, or L1 regularization, adds a penalty proportional to the absolute value of the coefficients to the loss function. This can lead to sparse solutions where some coefficients are exactly zero, effectively performing feature selection. Lasso is useful when we have a large number of features and expect that only a small subset is actually relevant to predicting the target.

In [5]:
# Build and fit the lasso regression model
lasso_model = Lasso(alpha=0.1)
lasso_model.fit(X_train, y_train)

# Save the model
with open('lasso_regression_model.pkl', 'wb') as f:
    pickle.dump(lasso_model, f)

# Load the model
with open('lasso_regression_model.pkl', 'rb') as f:
    loaded_lasso_model = pickle.load(f)

# Make predictions
y_pred_lasso = loaded_lasso_model.predict(X_test)

# Display the results
mse = mean_squared_error(y_test, y_pred_lasso)
mae = mean_absolute_error(y_test, y_pred_lasso)
r2 = r2_score(y_test, y_pred_lasso)
print(f"Lasso regression performance metrics:")
print(f"{'-'*40}")
print(f"Mean squared error (MSE): {mse:.4f}")
print(f"Mean absolute error (MAE): {mae:.4f}")
print(f"R-squared score: {r2:.4f}\n")

Lasso regression performance metrics:
----------------------------------------
Mean squared error (MSE): 0.6135
Mean absolute error (MAE): 0.5816
R-squared score: 0.5318



***Explanation***

**Syntax**: 
```python
Lasso(*, alpha=1.0, fit_intercept=True, precompute=False, copy_X=True, max_iter=1000, tol=0.0001, warm_start=False, positive=False, random_state=None, selection='cyclic')
```
- `*`: Indicates that all parameters following `*` must be passed as keyword arguments, not positional arguments.
- **`alpha`** (`float`, default=`1.0`): Regularization strength; must be a positive float. Larger values specify stronger regularization and more coefficients being zeroed out.
- **`fit_intercept`** (`bool`, default=`True`): If `True`, the model calculates the intercept for this model. If set to `False`, no intercept will be used in calculations.
- **`precompute`** (`bool` or `array-like`, default=`False`): Determines whether to use a precomputed Gram matrix to speed up calculations. Set to `True` for large datasets to speed up computations.
- **`copy_X`** (`bool`, default=`True`): If `True`, the input data `X` will be copied. Otherwise, it may be overwritten. Typically `True` unless working with very large datasets where memory is a concern.
- **`max_iter`** (`int`, default=`1000`): Maximum number of iterations for the solver. Increase if the solver does not converge, especially with larger datasets or stricter `tol`.
- **`tol`** (`float`, default=`0.0001`): Tolerance for the optimization. Lower it for more precise solutions or increase for faster convergence.
- **`warm_start`** (`bool`, default=`False`): If set to `True`, reuses the solution of the previous call to `fit` as initialization. Useful when running Lasso in a loop where we iteratively adjust `alpha` or other parameters.
- **`positive`** (`bool`, default=`False`): When set to `True`, forces the coefficients to be positive. Use when the application logically requires non-negative coefficients, such as in economic modeling.
- **`random_state`** (`int`, `RandomState` instance, default=`None`): The seed of the pseudo-random number generator to use when shuffling the data. Set this for reproducibility in experiments.
- **`selection`** (`{'cyclic', 'random'}`): Determines the order of coefficient updates.
    - `cyclic`: Standard choice, where coefficients are updated in order.
    - `random`: A random coefficient is updated every iteration rather than looping over features sequentially. For larger problems or if we suspect the order of features matters; helps in faster convergence sometimes.


### Elastic net regression
Elastic net regression is a hybrid method that combines the penalties of ridge (L2) and lasso (L1) regression. It adds both the absolute value and square of the magnitude of coefficients as penalties. This allows elastic net to both perform variable selection (like lasso) and shrinkage (like ridge), making it suitable for datasets with highly correlated features or when feature selection is necessary.

In [6]:
# Build and fit the elastic net regression model
elastic_net_model = ElasticNet(alpha=0.1, l1_ratio=0.5)
elastic_net_model.fit(X_train, y_train)

# Save the model
with open('elastic_net_regression_model.pkl', 'wb') as f:
    pickle.dump(elastic_net_model, f)

# Load the model
with open('elastic_net_regression_model.pkl', 'rb') as f:
    loaded_elastic_net_model = pickle.load(f)

# Make predictions
y_pred_elastic_net = loaded_elastic_net_model.predict(X_test)

# Display the results
mse = mean_squared_error(y_test, y_pred_elastic_net)
mae = mean_absolute_error(y_test, y_pred_elastic_net)
r2 = r2_score(y_test, y_pred_elastic_net)
print(f"Elastic net regression performance metrics:")
print(f"{'-'*40}")
print(f"Mean squared error (MSE): {mse:.4f}")
print(f"Mean absolute error (MAE): {mae:.4f}")
print(f"R-squared score: {r2:.4f}\n")

Elastic net regression performance metrics:
----------------------------------------
Mean squared error (MSE): 0.5731
Mean absolute error (MAE): 0.5565
R-squared score: 0.5627



***Explanation***

**Syntax**: 
```python
ElasticNet(*, alpha=1.0, l1_ratio=0.5, fit_intercept=True, precompute=False, max_iter=1000, copy_X=True, tol=0.0001, warm_start=False, positive=False, random_state=None, selection='cyclic')
```
- `*`: Indicates that all parameters following `*` must be passed as keyword arguments, not positional arguments.
- **`alpha`** (`float`, default=`1.0`): Constant that multiplies the penalty terms. It can be seen as the regularization strength; must be a positive float. Similar to Ridge and Lasso, adjust `alpha` to control regularization strength. Higher values increase regularization.
- **`l1_ratio`** (`float`, default=`0.5`): The ElasticNet mixing parameter, with `0 <= l1_ratio <= 1`. Controls the balance between L1 and L2 regularization. For `l1_ratio = 0`, the penalty is an L2 penalty (pure ridge regression). For `l1_ratio = 1`, it is an L1 penalty (pure lasso regression).
- **`fit_intercept`** (`bool`, default=`True`): If `True`, the model calculates the intercept for this model. If set to `False`, no intercept will be used in calculations.
- **`precompute`** (`bool` or `array-like`, default=`False`): Whether to use a precomputed Gram matrix. Use `True` to speed up calculations, especially in iterative processes or for large datasets.
- **`max_iter`** (`int`, default=`1000`): Maximum number of iterations for the solver. Increase if the algorithm does not converge within the default iterations.
- **`copy_X`** (`bool`, default=`True`): If `True`, the input data `X` will be copied. Otherwise, it may be overwritten.
- **`tol`** (`float`, default=`0.0001`): Precision of the solution. Adjust for more precise solutions or faster convergence depending on the needs of the model.
- **`warm_start`** (`bool`, default=`False`): If set to `True`, reuses the solution of the previous call to `fit` as initialization. Useful in scenarios requiring iterative refinement of the model.
- **`positive`** (`bool`, default=`False`): When set to `True`, forces the coefficients to be positive. Apply when the problem domain requires non-negative predictions or coefficients.
- **`random_state`** (`int`, `RandomState` instance, default=`None`): The seed of the pseudo-random number generator to use when shuffling the data. Set for consistent results during multiple runs.
- **`selection`** (`{'cyclic', 'random'}`): The order in which coefficients are updated. If set to `‘random’`, a random coefficient is updated every iteration rather than looping over features sequentially. Use `cyclic` for standard, deterministic order.

---

## Non-parametric regression:
Unlike parametric models, which assume a fixed form for the relationship between input features and the target variable, non-parametric models do not make strong assumptions about the data’s underlying structure. This flexibility allows them to adapt to more complex and varied patterns in the data.

### Decision tree regressor
The decision tree regressor is a non-parametric model that uses a tree structure to predict continuous outcomes. It works by recursively splitting the data into subsets based on feature values, creating decision nodes and leaf nodes, where leaf nodes represent the predicted value. Decision trees are easy to interpret and can capture non-linear relationships in the data.

In [7]:
# Build and fit the decision tree regressor model
decision_tree_model = DecisionTreeRegressor(random_state=42)
decision_tree_model.fit(X_train, y_train)

# Save the model
with open('decision_tree_regression_model.pkl', 'wb') as f:
    pickle.dump(decision_tree_model, f)

# Load the model
with open('decision_tree_regression_model.pkl', 'rb') as f:
    loaded_decision_tree_model = pickle.load(f)

# Make predictions
y_pred_decision_tree = loaded_decision_tree_model.predict(X_test)

# Display the results
mse = mean_squared_error(y_test, y_pred_decision_tree)
mae = mean_absolute_error(y_test, y_pred_decision_tree)
r2 = r2_score(y_test, y_pred_decision_tree)
print(f"Decision tree regressor performance metrics:")
print(f"{'-'*40}")
print(f"Mean squared error (MSE): {mse:.4f}")
print(f"Mean absolute error (MAE): {mae:.4f}")
print(f"R-squared score: {r2:.4f}\n")

Decision tree regressor performance metrics:
----------------------------------------
Mean squared error (MSE): 0.4952
Mean absolute error (MAE): 0.4547
R-squared score: 0.6221



**Explanation:**

**Syntax:**
```python
DecisionTreeRegressor(*, criterion='squared_error', splitter='best', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=None, random_state=None, max_leaf_nodes=None, min_impurity_decrease=0.0, ccp_alpha=0.0)
```

- **`criterion`** (`{'squared_error', 'friedman_mse', 'absolute_error', 'poisson'}`, default=`'squared_error'`): The function to measure the quality of a split. `'squared_error'` is the default for regression and measures variance.
- **`splitter`** (`{'best', 'random'}`, default=`'best'`): The strategy used to choose the split at each node. `'best'` selects the best split, while `'random'` selects a random split.
- **`max_depth`** (`int`, default=`None`): The maximum depth of the tree. Limiting depth prevents overfitting.
- **`min_samples_split`** (`int` or `float`, default=`2`): The minimum number of samples required to split an internal node. Higher values prevent overfitting.
- **`min_samples_leaf`** (`int` or `float`, default=`1`): The minimum number of samples required to be at a leaf node.
- **`max_features`** (`int`, `float`, `{'auto', 'sqrt', 'log2'}`, default=`None`): The number of features to consider when looking for the best split.
- **`random_state`** (`int`, `RandomState` instance, default=`None`): Controls the randomness of the estimator. Setting this ensures reproducibility.
- **`ccp_alpha`** (`non-negative float`, default=`0.0`): Complexity parameter used for Minimal Cost-Complexity Pruning.


### Support vector regressor (SVR)
SVR is an extension SVM for regression tasks. SVR aims to find a function that deviates from the true output by a value no greater than a specified margin (epsilon) while also being as flat as possible. SVR is effective in high-dimensional spaces and is robust against overfitting.

In [8]:
# Build and fit the support vector regressor model
svr_model = SVR(kernel='rbf', C=1.0, epsilon=0.1)
svr_model.fit(X_train, y_train)

# Save the model
with open('svr_regression_model.pkl', 'wb') as f:
    pickle.dump(svr_model, f)

# Load the model
with open('svr_regression_model.pkl', 'rb') as f:
    loaded_svr_model = pickle.load(f)

# Make predictions
y_pred_svr = loaded_svr_model.predict(X_test)

# Display the results
mse = mean_squared_error(y_test, y_pred_svr)
mae = mean_absolute_error(y_test, y_pred_svr)
r2 = r2_score(y_test, y_pred_svr)
print(f"Support vector regressor performance metrics:")
print(f"{'-'*40}")
print(f"Mean squared error (MSE): {mse:.4f}")
print(f"Mean absolute error (MAE): {mae:.4f}")
print(f"R-squared score: {r2:.4f}\n")

Support vector regressor performance metrics:
----------------------------------------
Mean squared error (MSE): 1.3320
Mean absolute error (MAE): 0.8600
R-squared score: -0.0165



**Explanation:**

**Syntax:**
```python
SVR(*, kernel='rbf', degree=3, gamma='scale', coef0=0.0, tol=0.001, C=1.0, epsilon=0.1, shrinking=True, cache_size=200, verbose=False, max_iter=-1)
```

- **`kernel`** (`{'linear', 'poly', 'rbf', 'sigmoid', 'precomputed'}`, default=`'rbf'`): The kernel function defines the transformation of data. `'rbf'` (Radial Basis Function) is the default and works well in most cases.
- **`degree`** (`int`, default=`3`): Degree of the polynomial kernel function ('poly'). Ignored by all other kernels.
- **`gamma`** (`{'scale', 'auto'}`, default=`'scale'`): Kernel coefficient for `'rbf'`, `'poly'`, and `'sigmoid'`. `'scale'` is based on the number of features.
- **`C`** (`float`, default=`1.0`): Regularization parameter. The strength of the regularization is inversely proportional to `C`. A smaller `C` increases regularization.
- **`epsilon`** (`float`, default=`0.1`): Defines a margin of tolerance where no penalty is given to errors.
- **`shrinking`** (`bool`, default=`True`): Whether to use the shrinking heuristic, which can speed up computation.
- **`tol`** (`float`, default=`0.001`): Tolerance for stopping criterion.
- **`max_iter`** (`int`, default=`-1`): The maximum number of iterations. `-1` means no limit.


### K-nearest neighbors regressor (KNN)
KNN regressor is a simple, non-parametric model that predicts the target value based on the average of the k-nearest neighbors in the feature space. KNN is intuitive and easy to implement, though its performance can be affected by the choice of `k` and the distance metric.

In [9]:
# Build and fit the K-nearestnNeighbors regressor model
knn_model = KNeighborsRegressor(n_neighbors=5)
knn_model.fit(X_train, y_train)

# Save the model
with open('knn_regression_model.pkl', 'wb') as f:
    pickle.dump(knn_model, f)

# Load the model
with open('knn_regression_model.pkl', 'rb') as f:
    loaded_knn_model = pickle.load(f)

# Make predictions
y_pred_knn = loaded_knn_model.predict(X_test)

# Display the results
mse = mean_squared_error(y_test, y_pred_knn)
mae = mean_absolute_error(y_test, y_pred_knn)
r2 = r2_score(y_test, y_pred_knn)
print(f"K-nearest neighbors regressor performance metrics:")
print(f"{'-'*40}")
print(f"Mean squared error (MSE): {mse:.4f}")
print(f"Mean absolute error (MAE): {mae:.4f}")
print(f"R-squared score: {r2:.4f}\n")

K-nearest neighbors regressor performance metrics:
----------------------------------------
Mean squared error (MSE): 1.1187
Mean absolute error (MAE): 0.8128
R-squared score: 0.1463



**Explanation:**

**Syntax:**
```python
KNeighborsRegressor(*, n_neighbors=5, weights='uniform', algorithm='auto', leaf_size=30, p=2, metric='minkowski', metric_params=None, n_jobs=None)
```

- **`n_neighbors`** (`int`, default=`5`): Number of neighbors to use for prediction. Larger values smooth the predictions but may reduce precision.
- **`weights`** (`{'uniform', 'distance'}`, default=`'uniform'`): Weight function used in prediction. `'uniform'` assigns equal weight to all neighbors, while `'distance'` gives more weight to closer neighbors.
- **`algorithm`** (`{'auto', 'ball_tree', 'kd_tree', 'brute'}`, default=`'auto'`): Algorithm used to compute the nearest neighbors. `'auto'` automatically selects the best option based on the input data.
- **`leaf_size`** (`int`, default=`30`): Affects the speed of the tree construction and query. Smaller leaf sizes can lead to faster queries but slower tree construction.
- **`p`** (`int`, default=`2`): Power parameter for the Minkowski metric. `p=2` is equivalent to the Euclidean distance.
- **`metric`** (`string` or callable, default=`'minkowski'`): The distance metric to use. `'minkowski'` is a generalization of both the Euclidean and Manhattan distances.
- **`n_jobs`** (`int`, default=`None`): The number of parallel jobs to run for neighbors search. `None` means 1, and `-1` means using all processors.

---


# Robust regression
In robust regression, the goal is to create a model that can perform well even when the data contains outliers or noise. 

### Theil-Sen regressor
Theil-Sen Regressor is a robust linear model that estimates the slope as the median of slopes among all pairs of points. It is less sensitive to outliers compared to ordinary least squares regression, making it a good choice when data contains significant outliers.

In [10]:
# Build and fit the Theil-Sen regressor model
theil_sen_model = TheilSenRegressor(random_state=42)
theil_sen_model.fit(X_train, y_train)

# Save the model
with open('theil_sen_regression_model.pkl', 'wb') as f:
    pickle.dump(theil_sen_model, f)

# Load the model
with open('theil_sen_regression_model.pkl', 'rb') as f:
    loaded_theil_sen_model = pickle.load(f)

# Make predictions
y_pred_theil_sen = loaded_theil_sen_model.predict(X_test)

# Display the results
mse = mean_squared_error(y_test, y_pred_theil_sen)
mae = mean_absolute_error(y_test, y_pred_theil_sen)
r2 = r2_score(y_test, y_pred_theil_sen)
print(f"Theil-Sen regressor performance metrics:")
print(f"{'-'*40}")
print(f"Mean squared error (MSE): {mse:.4f}")
print(f"Mean absolute error (MAE): {mae:.4f}")
print(f"R-squared score: {r2:.4f}\n")

Theil-Sen regressor performance metrics:
----------------------------------------
Mean squared error (MSE): 1.0773
Mean absolute error (MAE): 0.5292
R-squared score: 0.1779



**Explanation:**

**Syntax:**
```python
TheilSenRegressor(*, fit_intercept=True, copy_X=True, max_subpopulation=10000, n_subsamples=None, max_iter=300, tol=1e-3, random_state=None, n_jobs=None, verbose=False)
```

- **`fit_intercept`** (`bool`, default=`True`): Whether to calculate the intercept for this model. If `False`, no intercept will be used.
- **`copy_X`** (`bool`, default=`True`): If `True`, the data `X` will be copied; otherwise, it may be overwritten.
- **`max_subpopulation`** (`int`, default=`10000`): Maximum subpopulation size for the method to remain efficient.
- **`n_subsamples`** (`int`, default=`None`): The number of subsamples to draw for the computation.
- **`max_iter`** (`int`, default=`300`): The maximum number of iterations for the optimization algorithm.
- **`tol`** (`float`, default=`1e-3`): Tolerance for the optimization.
- **`random_state`** (`int`, `RandomState` instance, default=`None`): Seed for reproducibility.
- **`n_jobs`** (`int`, default=`None`): The number of parallel jobs to run. `None` means 1, and `-1` means using all processors.


---


## Multi-layer perceptron regressor
The MLP regressor is a type of feedforward artificial neural network model designed for regression tasks. It consists of multiple layers of interconnected nodes (neurons), where each node in a layer is connected to every node in the previous and subsequent layers. The MLP regressor can model complex relationships between inputs and outputs by learning non-linear functions from the data.

In [11]:
# Build and fit the MLP regressor model
mlp_model = MLPRegressor(hidden_layer_sizes=(100,), max_iter=1000, random_state=42)
mlp_model.fit(X_train, y_train)

# Save the model
with open('mlp_regressor_model.pkl', 'wb') as f:
    pickle.dump(mlp_model, f)

# Load the model
with open('mlp_regressor_model.pkl', 'rb') as f:
    loaded_mlp_model = pickle.load(f)

# Make predictions
y_pred_mlp = loaded_mlp_model.predict(X_test)

# Display the results
mse = mean_squared_error(y_test, y_pred_mlp)
mae = mean_absolute_error(y_test, y_pred_mlp)
r2 = r2_score(y_test, y_pred_mlp)
print(f"MLP regression performance metrics:")
print(f"{'-'*40}")
print(f"Mean squared error (MSE): {mse:.4f}")
print(f"Mean absolute error (MAE): {mae:.4f}")
print(f"R-squared score: {r2:.4f}\n")

MLP regression performance metrics:
----------------------------------------
Mean squared error (MSE): 0.7129
Mean absolute error (MAE): 0.6506
R-squared score: 0.4560



***Explanation***

**Syntax**: 
```python
MLPRegressor(*, hidden_layer_sizes=(100,), activation='relu', solver='adam', alpha=0.0001, batch_size='auto', learning_rate='constant', learning_rate_init=0.001, power_t=0.5, max_iter=200, shuffle=True, random_state=None, tol=0.0001, verbose=False, warm_start=False, momentum=0.9, nesterovs_momentum=True, early_stopping=False, validation_fraction=0.1, beta_1=0.9, beta_2=0.999, epsilon=1e-08, n_iter_no_change=10, max_fun=15000)
```
- `*`: Indicates that all parameters following `*` must be passed as keyword arguments, not positional arguments.
- **`hidden_layer_sizes`** (`tuple`, default=`(100,)`): Defines the number of neurons in each hidden layer. The default is a single hidden layer with 100 neurons. Adjust based on the complexity of the problem. More layers and neurons allow the model to capture more complex patterns, but may require more data and computational power.
- **`activation`** (`{'identity', 'logistic', 'tanh', 'relu'}`, default=`'relu'`): Activation function for the hidden layer.
    - `identity`: No-op activation, useful in linear regression models.
    - `logistic`: Sigmoid function, maps input to [0, 1], useful for probabilistic interpretation.
    - `tanh`: Hyperbolic tangent function, maps input to [-1, 1], useful for data centered around zero.
    - `relu`: Rectified Linear Unit, outputs zero for negative inputs and linear for positive. Commonly used due to its simplicity. 
- **`solver`** (`{'lbfgs', 'sgd', 'adam'}`, default=`'adam'`): The algorithm used for optimizing the weights.
    - `lbfgs`: An optimizer in the family of quasi-Newton methods, suitable for smaller datasets or when convergence is a priority.
    - `sgd`: Stochastic gradient descent, good for large datasets  where speed is critical.
    - `adam`: An adaptive learning rate optimizer, efficient and widely used.
- **`alpha`** (`float`, default=`0.0001`): L2 penalty (regularization term) parameter to prevent overfitting by discouraging large weights. Increase `alpha` if the model is overfitting. Adjust based on the model's performance on validation data.
- **`batch_size`** (`int`, default=`'auto'`): Size of minibatches for stochastic optimizers (Number of samples per gradient update). `auto` uses `min(200, n_samples)` as the batch size. Specify a value for manual control, especially in large datasets where smaller batches might speed up learning.
- **`learning_rate`** (`{'constant', 'invscaling', 'adaptive'}`, default=`'constant'`): Learning rate schedule for weight updates.
    - `constant`: Learning rate remains fixed.
    - `adaptive`: Learning rate decreases when the improvement rate drops below `tol`.
    - `invscaling`: Learning rate decreases at each step inversely proportional to the number of iterations.
- **`learning_rate_init`** (`float`, default=`0.001`): The initial learning rate used for weight updates. Adjust this parameter if the model is learning too slowly (increase) or too quickly (decrease).
- **`power_t`** (`float`, default=`0.5`): The exponent for inverse scaling learning rate. Fine-tune if using `invscaling` learning rate.
- **`max_iter`** (`int`, default=`200`): Maximum number of iterations (epochs) for training. Increase this if the model hasn’t converged by the default number of iterations.
- **`shuffle`** (`bool`, default=`True`): Whether to shuffle samples in each iteration. If `True`, shuffles the training data before each iteration. Shuffling generally helps prevent cycles during training.
- **`random_state`** (`int`, `RandomState` instance, default=`None`): The seed of the pseudo-random number generator to use when shuffling the data. Set this for reproducible results, especially important when comparing different models or running experiments.
- **`tol`** (`float`, default=`0.0001`): Tolerance for the stopping criterion. Decrease `tol` for higher precision or increase for faster training.
- **`verbose`** (`bool`, default=`False`): If `True`, prints progress messages during training. Set `True` if we want to monitor the training process, particularly useful during debugging or long training sessions.
- **`warm_start`** (`bool`, default=`False`): Reuse the solution of the previous call to `fit` as initialization, otherwise, just erase the previous solution. Set `True` when performing iterative updates to the model without starting from scratch.
- **`momentum`** (`float`, default=`0.9`): Momentum for gradient descent update. Increase for faster convergence, especially with `sgd` solver, but may require tuning to prevent overshooting.
- **`nesterovs_momentum`** (`bool`, default=`True`): Whether to use Nesterov’s momentum. Typically set to `True` for better convergence, especially in non-convex problems.
- **`early_stopping`** (`bool`, default=`False`): Whether to use early stopping to terminate training when validation score is not improving. Useful for avoiding overfitting and reducing training time. Best combined with `validation_fraction`.
- **`validation_fraction`** (`float`, default=`0.1`): The proportion of training data to set aside as validation set for early stopping. Set this when using early stopping to determine the right amount of validation data.
- **`beta_1`** (`float`, default=`0.9`): Exponential decay rate for estimates of first-moment vector in `adam`, should be in [0, 1). Typically left at default, but can be tuned if `adam` convergence is an issue.
- **`beta_2`** (`float`, default=`0.999`): Exponential decay rate for estimates of second-moment vector in `adam`, should be in [0, 1). Like `beta_1`, usually left at default unless specific issues arise with `adam`.
- **`epsilon`** (`float`, default=`1e-08`): Value for numerical stability in `adam`. Adjust if `adam` has convergence issues, but generally leave it at default.
- **`n_iter_no_change`** (`int`, default=`10`): Maximum number of epochs to not meet `tol` improvement.
- **`max_fun`** (`int`, default=`15000`): Only used when `solver='lbfgs'`. Maximum number of function calls. Increase if `lbfgs` does not converge.