1. What is Simple Linear Regression?
A statistical method that models the linear relationship between one continuous predictor (X) and one continuous outcome/response (Y):

Y=mX+c+ε
where 
m is slope, 
c is intercept, and 
ε is random error.

2. Key assumptions of Simple Linear Regression
Linearity: The expected value of Y is a straight‑line function of X.

Independence of errors: Residuals are independent across observations.

Homoscedasticity: Constant variance of errors across X.

Normality of errors (for inference): Residuals are normally distributed (mainly needed for valid t/F tests & CIs).

No high‑leverage outliers distorting the line.

X measured without (or with negligible) error in classical OLS settings.



3. What does the coefficient 
m represent in 
Y=mX+c?
The expected change in Y for a one‑unit increase in X.



4. What does the intercept 
c represent in 

Y=mX+c?
The expected value of Y when
X=0 (if 0 lies in the data range; otherwise it’s an extrapolated baseline).

5. How do we calculate the slope 
𝑚
m in Simple Linear Regression?
Using least squares estimates:

m= ∑ i=1n
 (x i
 − xˉ
) 2
 ∑ i=1n
 (x i
 − xˉ )(y i
 − y

 )



6. Purpose of the least squares method
Choose 
𝑚
m and 
𝑐
c that minimize the sum of squared residuals:
m,c
min
​
  
i=1
∑n(yi −^  =∑( −(m +c)2
 .
This gives the “best‑fitting” line (in the squared‑error sense) and leads to nice statistical properti

7. How is the coefficient of determination (R²) interpreted?
The proportion of the total variation in Y (relative to its mean) that is explained by the regression model.

𝑅2=1−SSresSStotR 
​
 
Ranges 0–1 (can be negative in some adjusted or forced‑through‑origin contexts); higher means the model reduces more variance compared with using just 
𝑦
ˉ
y
ˉ
​
 .


8. What is Multiple Linear Regression?
Multiple Linear Regression (MLR) is a statistical method used to model the relationship between one dependent variable (Y) and two or more independent variables (X₁, X₂, …, Xₚ). It assumes a linear relationship between the dependent variable and the predictors.

9. Main difference: Simple vs Multiple Linear Regression
Simple: One predictor (X).

Multiple: Two or more predictors; each coefficient reflects the effect of its predictor holding the others constant.

10. Key assumptions of Multiple Linear Regression
Same core assumptions as simple regression, plus:

Linearity & additivity in predictors.

Independence of errors.

Homoscedasticity of errors.

Normality of errors (for inference).

No perfect multicollinearity (predictors not exact linear combos).

Correct model specification (important omitted variables can bias coefficients).

11. What is heteroscedasticity, and why does it matter?
Heteroscedasticity = non‑constant error variance across fitted values or levels of predictors.
Consequences under OLS: coefficient estimates remain unbiased (if other assumptions hold), but standard errors become wrong, leading to unreliable hypothesis tests, confidence intervals, and prediction intervals.

12. How to improve a model with high multicollinearity
Remove or combine highly correlated predictors.

Use domain knowledge to pick the most meaningful variable.

Center or standardize predictors (helps interpretation; doesn’t “fix” collinearity but reduces numerical issues).

Principal Component Regression (PCR) or Partial Least Squares (PLS).

Regularization: Ridge (shrinks correlated coefficients), Lasso (variable selection), Elastic Net (hybrid).

Collect more data with greater variation in predictors.



13. Transforming categorical variables for regression
Common encodings:

Dummy / One‑Hot Encoding: 0/1 indicator variables (k‑1 dummies for k categories to avoid the dummy trap).

Effect / Deviation Coding: Compare each level to the overall mean instead of a baseline level.

Ordinal Coding: Use meaningful numeric scores when category order matters.

Target / Mean Encoding: Replace each category with mean Y (risk of leakage; use regularization / cross‑fold).

Binary encoding / Hashing (high‑cardinality cases).

Embeddings (in ML frameworks with large, complex categorical spaces).





14. Role of interaction terms in Multiple Linear Regression
Interaction terms (e.g., 
𝑋X 1

×X 2
​
 ) let the effect of one predictor depend on the level of another. Without them, the model assumes additive, independent effects.



15. How can the interpretation of the intercept differ: Simple vs Multiple?
Simple: Expected Y when X=0.

Multiple: Expected Y when all predictors are 0 (and at reference levels for categorical variables). This combination may not be realistic; centering predictors (e.g., subtracting their means) can make the intercept represent the expected Y at “average” predictor values.



16. Significance of the slope in regression analysis & its effect on predictions
A slope coefficient tells how much Y is expected to change per unit change in that predictor (holding others constant in multiple regression). A statistically significant slope (p‑value small; CI not containing 0) suggests the predictor contributes meaningfully to explaining Y.



17. How does the intercept provide context?
It anchors the regression surface: the predicted baseline level of Y when predictors are at their reference (or zero/centered) values. It helps interpret whether predictions for realistic ranges are shifted up or down overall, and is essential for making predictions at any combination of predictors.



18. Limitations of R² as a sole performance measure
Always (or almost always) increases when adding predictors—even noisy ones.

Doesn’t indicate whether coefficients are unbiased or meaningful.

Doesn’t assess model correctness, residual structure, or predictive accuracy on new data.

Can be high for non‑causal associations.

Sensitive to outliers and to range of Y.
Use with Adjusted R², RMSE, MAE, cross‑validation error, residual diagnostics, etc.

19. Interpreting a large standard error for a coefficient
The estimate of that coefficient is imprecise: the data do not strongly support a specific value. Confidence intervals will be wide, and hypothesis tests may fail to reject 0 even if the point estimate is large. Often caused by multicollinearity, small sample size, or high noise.



20. Detecting heteroscedasticity in residual plots & why it matters
Visual clues:

Residuals vs fitted values show a funnel (widening) shape, bow‑tie, or pattern in spread.

Residuals vs a predictor show variance changing with predictor level.

Scale‑Location (Spread‑Location) plot: trend in the square root of |standardized residuals|.

Formal tests: Breusch–Pagan, White, Goldfeld–Quandt.

Why address it: Inference (SEs, p‑values, CIs) becomes unreliable; prediction intervals may be too narrow or too wide. Remedies: use heteroscedasticity‑robust (Huber–White) standard errors, transform Y, model variance structure (e.g., weighted least squares), or use generalized least squares.

21. What does it mean if a Multiple Linear Regression model has a high R² but low Adjusted R²?
High R²: The model explains a large proportion of variance in Y.

Low Adjusted R²: Suggests that the increase in R² is mostly due to adding predictors that do not actually improve model fit significantly. Adjusted R² penalizes unnecessary complexity, so if it's low relative to R², the model likely suffers from overfitting or has many irrelevant predictors.



22. Why is it important to scale variables in Multiple Linear Regression?
Reason: Different predictors can have different scales (e.g., age in years vs. income in thousands). Scaling:

Improves numerical stability during optimization.

Makes coefficients comparable in magnitude (for standardized interpretation).

Important when using regularization (Ridge, Lasso), because penalties depend on coefficient size, which is influenced by variable scale.

Common methods: Standardization (z-score) or Min-Max scaling.

23. What is Polynomial Regression?
A regression technique where the relationship between X and Y is modeled as an nth-degree polynomial:


24. How does polynomial regression differ from linear regression?
Linear Regression: Model is linear in predictors (straight line).

Polynomial Regression: Adds higher-order terms (X², X³, …) to capture non-linear relationships between X and Y.

Note: Despite the curve, the model is linear in parameters (βs), so it's still solved by OLS.

25.When is polynomial regression used?
When the relationship between X and Y is non-linear, but can be approximatCan polynomial regression be applied to multiple variables?
Yes:
 +…
This is called Multivariate Polynomial Regression. It grows quickly in complexity because of interaction terms.ed by a polynomial curve.

Common in physics, growth curves, economics, and when residual plots sugges

 26. Limitations of polynomial regression
Overfitting risk: High-degree polynomials fit noise.

Extrapolation issues: Predictions outside observed X range can be wildly inaccurate.

Multicollinearity: Higher-order terms correlate strongly with each other.

Interpretability: Harder as degree increases.

Computationally expensive for high-degree and multiple variables.



27.Methods to select polynomial degree (evaluate model fit)
Cross-validation (CV): k-fold CV to pick the degree with lowest error on validation data.

Information Criteria: AIC, BIC penalize complexity.

Adjusted R²: Prefers simpler models if extra terms add little value.

Validation curves (plot error vs. degree).

Regularization: Ridge/Lasso with polynomial features to prevent overfitting.

28.Why is visualization important in polynomial regression?
Visualization helps detect if:

The curve fits data well (not underfitting/overfitting).

Residuals show no clear pattern (model adequacy).

Scatterplot with fitted curve, residual plots, and learning curves are essential to interpret performance and bias-variance trade-off.



29. How is polynomial regression implemented in Python?
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline

# Example: Polynomial Regression of degree 3
model = Pipeline([
    ('poly', PolynomialFeatures(degree=3)),
    ('linear', LinearRegression())
])

# X must be 2D
model.fit(X.reshape(-1, 1), y)
y_pred = model.predict(X.reshape(-1, 1))
    