<a href="https://colab.research.google.com/github/namankathuria21/REGRESSION/blob/main/REGRESSION.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Regression Assignment

Q1. What is Simple Linear Regression?

Simple Linear Regression is a statistical method used to model the relationship between one independent variable (X) and one dependent variable (Y).

Equation: Y = mX + c

Purpose: To predict Y using X and determine how X influences Y.

Q2. What are the key assumptions of Simple Linear Regression?

Linearity: Relationship between X and Y is linear.

Independence: Observations are independent.

Homoscedasticity: Constant variance of residuals.

Normality: Residuals follow a normal distribution.

No strong outliers.

Q3. What does the coefficient m represent in the equation Y = mX + c?

The slope (m) represents the rate of change in Y for a one-unit change in X.

Example: If m = 2, then each increase of 1 unit in X increases Y by 2.

Q4. What does the intercept c represent in the equation Y = mX + c?

The intercept (c) is the expected value of Y when X = 0. It provides the baseline starting point of the regression line.

Q5. How do we calculate the slope m in Simple Linear Regression?
𝑚
=
∑
(
𝑋
𝑖
−
𝑋
ˉ
)
(
𝑌
𝑖
−
𝑌
ˉ
)
∑
(
𝑋
𝑖
−
𝑋
ˉ
)
2
m=
∑(X
i
	​

−
X
ˉ
)
2
∑(X
i
	​

−
X
ˉ
)(Y
i
	​

−
Y
ˉ
)
	​


This formula minimizes the squared errors between predicted and actual Y values.

Q6. What is the purpose of the least squares method in Simple Linear Regression?

Least Squares minimizes the sum of squared differences between observed and predicted Y values, ensuring the best-fitting regression line.

Q7. How is the coefficient of determination (R²) interpreted in Simple Linear Regression?

R² measures how much variance in Y is explained by X.

Range: 0 to 1.

Example: R² = 0.80 → 80% of variation in Y is explained by X.

Q8. What is Multiple Linear Regression?

Multiple Linear Regression models the relationship between one dependent variable and multiple independent variables.

Equation: Y = b₀ + b₁X₁ + b₂X₂ + ... + bₙXₙ + ε

Q9. What is the main difference between Simple and Multiple Linear Regression?

Simple Linear Regression → One predictor (X).

Multiple Linear Regression → Two or more predictors (X₁, X₂, …).

Q10. What are the key assumptions of Multiple Linear Regression?

Linearity between predictors and outcome.

Independence of errors.

Homoscedasticity.

Normal distribution of residuals.

No multicollinearity (independent variables shouldn’t be highly correlated).

Q11. What is heteroscedasticity, and how does it affect the results of a Multiple Linear Regression model?

Heteroscedasticity: Residuals have unequal variance.

Effect: Leads to inefficient estimates, biased standard errors, and invalid hypothesis tests.

Q12. How can you improve a Multiple Linear Regression model with high multicollinearity?

Remove highly correlated variables.

Use Principal Component Analysis (PCA).

Use Ridge or Lasso Regression (regularization).

Q13. What are some common techniques for transforming categorical variables for use in regression models?

One-Hot Encoding

Label Encoding

Dummy Variables

Target Encoding

Q14. What is the role of interaction terms in Multiple Linear Regression?

Interaction terms capture the combined effect of two variables.
Example: Income × Education may explain salary better than individual effects.

Q15. How can the interpretation of intercept differ between Simple and Multiple Linear Regression?

Simple: Intercept = Expected Y when X = 0.

Multiple: Intercept = Expected Y when all predictors = 0 (may not always be meaningful).

Q16. What is the significance of the slope in regression analysis, and how does it affect predictions?

Slope shows how much Y changes with a one-unit change in X (holding other variables constant in multiple regression).

Q17. How does the intercept in a regression model provide context for the relationship between variables?

It provides a baseline value of Y when predictors are absent (X = 0).

Q18. What are the limitations of using R² as a sole measure of model performance?

High R² doesn’t guarantee causation.

Adding more predictors always increases R² (can be misleading).

Doesn’t measure overfitting.

Q19. How would you interpret a large standard error for a regression coefficient?

Large SE → Coefficient is unstable and unreliable.

Suggests multicollinearity or insufficient data.

Q20. How can heteroscedasticity be identified in residual plots, and why is it important to address it?

Identified when residuals form a funnel shape in scatter plots.

Important to fix → prevents invalid p-values and confidence intervals.

Q21. What does it mean if a Multiple Linear Regression model has a high R² but low adjusted R²?

High R²: Many variables explain variance.

Low Adjusted R²: Extra predictors are not actually useful (overfitting).

Q22. Why is it important to scale variables in Multiple Linear Regression?

Ensures comparability of coefficients.

Improves numerical stability.

Essential for models with regularization (Ridge/Lasso).

Q23. What is polynomial regression?

Polynomial Regression models non-linear relationships by including polynomial terms of predictors.

Q24. How does polynomial regression differ from linear regression?

Linear regression: Straight-line fit.

Polynomial regression: Curved line fit using squared, cubic, etc. terms.

Q25. When is polynomial regression used?

When the relationship between variables is non-linear but can be approximated by a polynomial curve.

Q26. What is the general equation for polynomial regression?
𝑌
=
𝑏
0
+
𝑏
1
𝑋
+
𝑏
2
𝑋
2
+
𝑏
3
𝑋
3
+
.
.
.
+
𝑏
𝑛
𝑋
𝑛
+
𝜀
Y=b
0
	​

+b
1
	​

X+b
2
	​

X
2
+b
3
	​

X
3
+...+b
n
	​

X
n
+ε
Q27. Can polynomial regression be applied to multiple variables?

Yes. Multivariate polynomial regression includes polynomial terms of multiple predictors.

Q28. What are the limitations of polynomial regression?

High degree → overfitting.

Sensitive to outliers.

Hard to interpret.

Q29. What methods can be used to evaluate model fit when selecting the degree of a polynomial?

Adjusted R²

Cross-validation

AIC/BIC (Information Criteria)

Residual analysis

Q30. Why is visualization important in polynomial regression?

Helps confirm non-linearity.

Shows whether polynomial degree is appropriate.



In [None]:
#Q31. How is polynomial regression implemented in Python?
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures

# Sample data
X = np.array([1, 2, 3, 4, 5]).reshape(-1,1)
y = np.array([2, 5, 10, 17, 26])

# Polynomial transformation
poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)

# Fit regression
model = LinearRegression()
model.fit(X_poly, y)

# Predictions
y_pred = model.predict(X_poly)

plt.scatter(X, y, color="blue")
plt.plot(X, y_pred, color="red")
plt.title("Polynomial Regression")
plt.show()