#1. What is Simple Linear Regression
Simple Linear Regression (SLR) is a statistical method used to model the relationship between two variables: one independent variable (X) and one dependent variable (Y). The idea is to fit a straight line that best describes how changes in X predict changes in Y. The equation is:
Y = mX + c + ε, where

Y = dependent variable (output)

X = independent variable (input)

m = slope (rate of change of Y per unit change in X)

c = intercept (value of Y when X = 0)

ε = error term (unexplained variation)

#2. What are the key assumptions of Simple Linear Regression

The assumptions are:

Linearity – The relationship between X and Y is linear.

Independence – Observations are independent of each other.

Homoscedasticity – Constant variance of residuals (errors).

Normality of errors – The residuals should follow a normal distribution.

No multicollinearity – Not applicable here since only one X variable.

#3. What does the coefficient m represent in the equation Y=mX+c

The coefficient m (slope) represents how much Y changes for a one-unit increase in X.

If m > 0, Y increases with X (positive relationship).

If m < 0, Y decreases with X (negative relationship).

If m = 0, there is no linear relationship.

#4. What does the intercept c represent in the equation Y=mX+c

The intercept c is the value of Y when X = 0.

It provides the baseline starting point of the regression line.

It may or may not have practical meaning depending on context (e.g., "height when age = 0" might not be meaningful).

It ensures the line fits the observed data properly.

#5. How do we calculate the slope m in Simple Linear Regression

#6. What is the purpose of the least squares method in Simple Linear Regression

#7. How is the coefficient of determination (R²) interpreted in Simple Linear Regression

R² measures the proportion of variance in Y explained by X.

Value ranges between 0 and 1.

R² = 0 → X explains none of the variation.

R² = 1 → X perfectly explains Y.

Higher R² indicates better fit, but it doesn’t imply causation.

#8. What is Multiple Linear Regression
Multiple Linear Regression (MLR) models the relationship between one dependent variable (Y) and two or more independent variables (X₁, X₂, …, Xn).

It helps capture more complex relationships than simple regression.

#9. What is the main difference between Simple and Multiple Linear Regression

Simple Linear Regression → One independent variable (X).

Multiple Linear Regression → Two or more independent variables (X₁, X₂, …).
MLR provides more explanatory power but also increases the risk of multicollinearity and overfitting.
#10. What are the key assumptions of Multiple Linear Regression

Linearity between predictors and response.

Independence of observations.

Homoscedasticity (equal variance of errors).

Normal distribution of residuals.

No multicollinearity among predictors.

No autocorrelation (important in time-series data).
#11. What is heteroscedasticity, and how does it affect results

Heteroscedasticity occurs when residuals have non-constant variance across levels of X.

It violates regression assumptions.

Leads to inefficient estimates and biased standard errors.

This affects hypothesis testing (t-tests, p-values) making them unreliable.
#12. How can you improve a Multiple Linear Regression model with high multicollinearity

Remove highly correlated predictors.

Use Principal Component Analysis (PCA) or dimensionality reduction.

Apply regularization techniques (Ridge, Lasso).

Standardize variables to reduce scaling effects.
#13. What are some common techniques for transforming categorical variables

One-Hot Encoding (Dummy variables) – for nominal categories.

Label Encoding – when categories are ordinal.

Target Encoding – replaces category with mean target value.

Frequency Encoding – replaces category with frequency counts.
#14. What is the role of interaction terms in Multiple Linear Regression

Interaction terms model the combined effect of two predictors on Y.
They capture relationships where the effect of one variable depends on another.
#15. How can the interpretation of intercept differ between Simple and Multiple Regression

In SLR: Intercept = predicted value of Y when X = 0.

In MLR: Intercept = predicted Y when all predictors are 0.
Sometimes this is meaningful (income when years of experience = 0), sometimes not.
#16. What is the significance of the slope in regression analysis

Each slope (coefficient) represents the effect of one predictor on Y, keeping others constant.

It quantifies the marginal impact of an independent variable.

Helps in prediction and understanding variable importance.

#17. What are the limitations of using R² as a sole measure of model performance

R² always increases when adding more variables (overfitting risk).

It doesn’t indicate causality.

A high R² does not mean the model is good (could still have bias).

Better to use Adjusted R², AIC, BIC, or RMSE for evaluation.

#18. How would you interpret a large standard error for a regression coefficient

Indicates uncertainty about the estimated coefficient.

Suggests that the variable may not significantly affect Y.

Could be due to multicollinearity, small sample size, or noisy data.

#19. What is polynomial regression

Polynomial regression is a type of regression where the relationship between X and Y is modeled as an nth-degree polynomial.


It is useful when data shows curvature instead of a straight line.

#20. When is polynomial regression used

When the relationship between X and Y is nonlinear.

In trend analysis (e.g., growth curves).

To model diminishing/increasing returns.
But beware of overfitting with high-degree polynomials.

#21. How does the intercept in regression provide context

It anchors the regression line to the Y-axis.

Provides baseline prediction when all X = 0.

Helps interpret predictions relative to starting conditions.

#22. How can heteroscedasticity be identified in residual plots

Plot residuals vs predicted values.

If variance increases/decreases systematically (fan shape), it indicates heteroscedasticity.

Important to fix because it biases standard errors.

Fix using log transformations or robust standard errors.

#23. What does it mean if a Multiple Linear Regression model has high R² but low Adjusted R²

High R² → Model explains variation well.

Low Adjusted R² → Extra predictors are not improving the model; they may be irrelevant or adding noise.
It signals overfitting.

#24. Why is it important to scale variables in Multiple Linear Regression

Scaling puts variables on the same scale.

Prevents large-value predictors from dominating the model.

Helps improve stability in optimization and regularization (Ridge, Lasso).

#25. How does polynomial regression differ from linear regression

Linear regression → straight line fit.

Polynomial regression → curve fit using powers of X.

Both are linear in coefficients, but polynomial allows for non-linear relationships.

#27. Can polynomial regression be applied to multiple variables

Yes, it becomes Polynomial Multiple Regression
It models curved surfaces instead of just curved lines.

#28. What are the limitations of polynomial regression

Overfitting with high degree.

Sensitive to outliers.

Poor extrapolation beyond data range.

May require scaling to avoid large coefficient values.

#29. What methods can be used to evaluate model fit when selecting degree of polynomial

Cross-validation.

Adjusted R².

AIC/BIC (penalize complexity).

RMSE/MSE on test data.

Residual plots.

#30. Why is visualization important in polynomial regression

Helps check if the curve fits the data well.

Detects underfitting/overfitting visually.

Makes interpretation easier for stakeholders.

Residual plots can confirm assumption validity.

#31. How is polynomial regression implemented in Python

Using scikit-learn:

    from sklearn.linear_model import LinearRegression
    from sklearn.preprocessing import PolynomialFeatures
    
    import numpy as np

    X = np.array([1,2,3,4,5]).reshape(-1,1)
    y = np.array([2,6,14,28,45])

    poly = PolynomialFeatures(degree=2)
    X_poly = poly.fit_transform(X)

    model = LinearRegression()
    model.fit(X_poly, y)

    print("Coefficients:", model.coef_)
    print("Intercept:", model.intercept_)


