1. minimal and interview-focused implementation of **Linear Regression** in Python using scikit-learn, following the steps you mentioned. It uses the Boston Housing dataset (or California Housing, since Boston is deprecated) and includes all critical steps—cleaning, standardization, splitting, training, evaluation, and optional cross-validation.

In [1]:
import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import r2_score, mean_squared_error

# 1. load dataset
data = fetch_california_housing()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target)

# 2. Standardize features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# 3. Split the data into training and validation set
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# 4. Train the model
model = LinearRegression()
model.fit(X_train, y_train)

# 5. Evaluate the model
y_pred = model.predict(X_test)
print("MSE: ", mean_squared_error(y_test, y_pred))
print("R2 score: ", r2_score(y_test, y_pred))

# 6. Cross-validation
scores = cross_val_score(model, X_scaled, y, cv=5, scoring='r2')
print("Cross Validation R2:", scores.mean())

MSE:  0.555891598695244
R2 score:  0.5757877060324511
Cross Validation R2: 0.5530311140279559


2. minimal yet complete implementation of **Polynomial Regression** using scikit-learn. This script satisfies all the specified steps, with concise code suited for quick interview practice

In [7]:
import numpy as np
import pandas as pd
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import PolynomialFeatures, StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_squared_error

# 1. Generate synthetic dataset suitable for polynomial regression
X, y = make_regression(n_samples=200, n_features=1, noise=20, random_state=1)
y= y + 0.5 * (X[:, 0] ** 2)

# 2. Standardize Feature
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# 3. Transform to polynomial features
poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X_scaled)

# 4. Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print(y_test)

# 5. Train the model
model = LinearRegression()
model.fit(X_train, y_train)

# 6. Evaluate the model
y_pred = model.predict(X_test)
print("MSE: ", mean_squared_error(y_test, y_pred))
print("R2 score: ", r2_score(y_test, y_pred))

# 7. Cross-validation
cv_score = cross_val_score(model, X_poly, y, cv=5, scoring='r2')
print("Cross-validated R2: ", cv_score.mean())



[ -46.8821475   187.25521723 -117.34997502 -130.6925585   -95.69336681
   45.13835534  -50.59336942 -119.73414324   44.89223635   31.01953078
  -47.27024402   38.99148066   23.77657436  193.02312659  -89.60039863
  -21.14807728   40.00715257 -103.41488773  -11.46905675  175.09628473
  131.57855496   98.9430734    33.51626485   46.9638043   -18.1589644
    6.58672766    6.45533012   29.61894615  167.05396447   21.18815608
   33.55545614   13.8662212    84.78274559 -227.21628727   61.07188727
  -22.73593005  -14.6837161  -122.87894644 -139.91520111  -52.1240007 ]
MSE:  384.7886908140149
R2 score:  0.9565477683395806
Cross-validated R2:  0.9279007481463681


3. Here’s a minimal yet complete implementation of **Ridge Regression**, designed to meet your requirements for quick interview practice. It uses the California Housing dataset and includes model training, evaluation, cross-validation, and optional hyperparameter tuning.

In [11]:
import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error, r2_score

# 1. Load dataset
data = fetch_california_housing()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target)

# 2. Standardize features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# 3. Split into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 4. Train model
model = Ridge(alpha=1.0)
model.fit(X_train, y_train)

# 5. Evaluate the model
y_pred = model.predict(X_test)
print("MSE: ", mean_squared_error(y_test, y_pred))
print("R2 score: ", r2_score(y_test, y_pred))

# 6. Cross Validaition
cv_scores = cross_val_score(model, X_scaled, y, cv=5, scoring='r2')
print('Cross-validated R2: ', cv_scores.mean())


MSE:  0.5558034669932211
R2 score:  0.5758549611440126
Cross-validated R2:  0.5530382161908947


In [12]:
from sklearn.model_selection import GridSearchCV

params = {'alpha': [0.01, 0.1, 1, 10, 100]}
grid = GridSearchCV(Ridge(), params, cv=5, scoring='r2')
grid.fit(X_scaled, y)

print("Best alpha: ", grid.best_params_['alpha'])
print("Best Cross-Validated R2: ", grid.best_score_)

Best alpha:  10
Best Cross-Validated R2:  0.5530925208131402


4. minimal and interview-ready implementation of **Lasso Regression** that follows the same structure:

In [16]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.datasets import fetch_california_housing
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Lasso
from sklearn.metrics import r2_score, mean_squared_error

# 1. Load dataset
data = fetch_california_housing()
X = pd.DataFrame(data.data, columns= data.feature_names)
y = pd.Series(data.target)

# 2. Scale the data 
X_scaled = StandardScaler().fit_transform(X)

# 3. split the data
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# 4. train the model
model = Lasso(alpha=1)
model.fit(X_train, y_train)

# 5. Evaluate model 
y_pred = model.predict(X_test)
print('MSE: ', mean_squared_error(y_test, y_pred))
print('R2 SCORE: ', r2_score(y_test, y_pred))

# 6. Cross Validation
cv_score = cross_val_score(model, X_scaled, y, cv=5, scoring='r2')
print("Cross_validated R2: ", cv_score.mean())

MSE:  1.3106960720039365
R2 SCORE:  -0.00021908714592466794
Cross_validated R2:  -0.0891728361025553


In [17]:
from sklearn.model_selection import GridSearchCV

params = {'alpha': [0.001, 0.01, 0.1, 1, 10]}
grid = GridSearchCV(Lasso(max_iter=1000), params, cv=5, scoring='r2')
grid.fit(X_scaled, y)

print("Best alpha: ", grid.best_params_['alpha'])
print("Best cross-validated r2: ", grid.best_score_)

Best alpha:  0.001
Best cross-validated r2:  0.553257237003969


5. minimal code example that shows feature selection using Lasso Regression, followed by a step-by-step explanation.

In [27]:
import pandas as pd
from sklearn.datasets import fetch_openml
from sklearn.linear_model import Lasso
from sklearn.preprocessing import StandardScaler

bunch = fetch_openml(data_id=506, as_frame=True)
boston = bunch.frame.dropna()
X = boston.drop(['target'])
y = boston['target']

X_scaled = StandardScaler().fit_transform(X)

lasso = Lasso()
lasso.fit(X_scaled, y)

feature_coeffs = pd.Series(lasso.coef_, index=boston.feature_names)
print("Feature Coefficients: \n", feature_coeffs)

selected_features= feature_coeffs[feature_coeffs != 0].index.tolist()
print('Selected Features: ', selected_features)

KeyError: "['target'] not found in axis"