When it comes to performing linear regression in Python, there are two popular libraries that come to mind: Scikit-learn and Statsmodels. Both have their own strengths and weaknesses, and in this article, we will compare the two libraries specifically for ordinary least squares (OLS) regression.

**What is OLS Regression?**

Before diving into the comparison, let’s first define what OLS regression is. OLS regression is a method used to estimate the relationship between a dependent variable and one or more independent variables. It assumes that the relationship between the variables is linear and that the errors in the model are normally distributed and have constant variance.

The goal of OLS regression is to find the coefficients that minimize the sum of the squared errors between the predicted values and the actual values. Once these coefficients are found, they can be used to make predictions on new data.



### Scikit-learn

Scikit-learn is a popular machine learning library in Python that provides a wide range of tools for data analysis and modeling. It includes a variety of regression models, including OLS regression.

One of the main benefits of using Scikit-learn for OLS regression is its simplicity. The library provides a simple API that makes it easy to fit a linear regression model and make predictions. Here’s an example of how to perform OLS regression using Scikit-learn:

In [2]:
from sklearn.linear_model import LinearRegression
from sklearn.datasets import fetch_california_housing

california = fetch_california_housing()
X = california.data
y = california.target

model = LinearRegression()
model.fit(X, y)

predictions = model.predict(X)

In this example, we load the california_housing dataset and split it into the features X and the target variable y. We then create an instance of the LinearRegression class, fit the model on the data, and make predictions on the same data.

While Scikit-learn makes it easy to perform OLS regression, it does have some limitations. 

For example, it doesn’t provide as much statistical information about the model as Statsmodels does. 

Additionally, it doesn’t provide as many options for diagnostic tests such as checking for heteroscedasticity or multicollinearity.



### Statsmodels

Statsmodels is a library in Python that provides a wide range of statistical tools for data analysis. It includes a variety of regression models, including OLS regression.

One of the main benefits of using Statsmodels for OLS regression is the amount of statistical information it provides. For example, it provides a summary of the model that includes information such as the R-squared value, standard errors of the coefficients, and p-values. Here’s an example of how to perform OLS regression using Statsmodels:

In [4]:
import statsmodels.api as sm
from sklearn.datasets import fetch_california_housing

california = fetch_california_housing()
X = california.data
y = california.target

X = sm.add_constant(X)

model = sm.OLS(y, X)
results = model.fit()

print(results.summary())

                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.606
Model:                            OLS   Adj. R-squared:                  0.606
Method:                 Least Squares   F-statistic:                     3970.
Date:                Fri, 06 Oct 2023   Prob (F-statistic):               0.00
Time:                        11:35:15   Log-Likelihood:                -22624.
No. Observations:               20640   AIC:                         4.527e+04
Df Residuals:                   20631   BIC:                         4.534e+04
Df Model:                           8                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const        -36.9419      0.659    -56.067      0.0