# ASSIGNMENT


1. What is Simple Linear Regression?
  - Simple Linear Regression is a statistical method used to model the relationship between two variables: one independent (predictor) variable and one dependent (response) variable, assuming a linear relationship.

2. What are the key assumptions of Simple Linear Regression?
  - The key assumptions are:

    - Linearity: The relationship between X and Y is linear.
    - Independence: Observations are independent.
    - Homoscedasticity: The variance of residuals is constant across all levels of X.
    - Normality: Residuals are normally distributed.
3. What does the coefficient m represent in the equation Y = mX + c?
  - The coefficient m represents the slope of the regression line, indicating the change in the dependent variable (Y) for a one-unit increase in the independent variable (X).

4. What does the intercept c represent in the equation Y = mX + c?
  - The intercept c represents the value of Y when X is 0. It is the point where the regression line crosses the Y-axis.

5. How do we calculate the slope m in Simple Linear Regression?
  - The slope m is calculated using the formula:
  - m = \frac{\sum{(X_i - \bar{X})(Y_i - \bar{Y})}}{\sum{(X_i - \bar{X})^2}}

6. What is the purpose of the least squares method in Simple Linear Regression?
  - The least squares method minimizes the sum of squared differences between observed and predicted values, providing the best-fit line for the data.

7. How is the coefficient of determination (R²) interpreted in Simple Linear Regression?
  - R² represents the proportion of variance in the dependent variable explained by the independent variable. It ranges from 0 to 1, where higher values indicate better model fit.

8. What is Multiple Linear Regression?
  - Multiple Linear Regression extends simple linear regression by using two or more independent variables to predict the dependent variable.

9. What is the main difference between Simple and Multiple Linear Regression?
  - Simple Linear Regression uses one independent variable, while Multiple Linear Regression uses two or more independent variables.

10. What are the key assumptions of Multiple Linear Regression?
 - Assumptions include:

    - Linearity
    - Independence
    - Homoscedasticity
    - Normality of residuals
    - No multicollinearity
    - No significant outliers or influential points
11. What is heteroscedasticity, and how does it affect the results of a Multiple Linear Regression model?
  - Heteroscedasticity refers to non-constant variance of residuals. It leads to inefficient estimates and biased standard errors, making hypothesis tests unreliable.

12. How can you improve a Multiple Linear Regression model with high multicollinearity?
  - Use techniques like:
    - Removing highly correlated variables
    - Principal Component Analysis (PCA)
    - Ridge or Lasso regression
    - Combining correlated variables
13. What are some common techniques for transforming categorical variables for use in regression models?
  - Common techniques include:
    - One-hot encoding
    - Label encoding
    - Target encoding
14. What is the role of interaction terms in Multiple Linear Regression?
  - Interaction terms capture the effect of one independent variable on the d dependent variable depending on the level of another variable.

15. How can the interpretation of intercept differ between Simple and Multiple Linear Regression?
  - In simple regression, the intercept is the value of Y when X = 0. In multiple regression, the intercept is the expected value of Y when all independent variables are 0 — which may not be meaningful if 0 is outside the data range.

16. What is the significance of the slope in regression analysis, and how does it affect predictions?
  - The slope indicates the rate of change in Y for a one-unit change in X. It directly influences the direction and magnitude of predictions.

17. How does the intercept in a regression model provide context for the relationship between variables?
  - The intercept gives the baseline value of Y when all predictors are zero, helping interpret the model's starting point.

18. What are the limitations of using R² as a sole measure of model performance?
Limitations include:
  - R² always increases with more variables (even irrelevant ones)
  - Doesn't indicate model bias or overfitting
  - Doesn't assess predictive accuracy on new data
19. How would you interpret a large standard error for a regression coefficient?
  - A large standard error suggests high variability in the coefficient estimate, indicating uncertainty about its true value. This may lead to non-significant results.

20. How can heteroscedasticity be identified in residual plots, and why is it important to address it?
  - Heteroscedasticity appears as a funnel shape in residual plots (residuals spread out as X increases). It must be addressed because it violates OLS assumptions and affects inference.

21. What does it mean if a Multiple Linear Regression model has a high R² but low adjusted R²?
  - It indicates that many predictors were added, but only a few are useful. Adjusted R² penalizes unnecessary variables, suggesting overfitting.

22. Why is it important to scale variables in Multiple Linear Regression?
  - Scaling ensures that:

    - Coefficients are comparable
    - Algorithms (like gradient descent) converge faster
    - No variable dominates due to scale differences
23. What is polynomial regression?
  - Polynomial regression models the relationship between X and Y as an nth-degree polynomial, allowing for curved relationships.

24. How does polynomial regression differ from linear regression?
  - Linear regression assumes a straight-line relationship, while polynomial regression allows for curved relationships by including powers of X.

25. When is polynomial regression used?
  - When the relationship between X and Y is non-linear, such as in exponential growth, decay, or U-shaped patterns.

26. What is the general equation for polynomial regression?
  - Y = \beta_0 + \beta_1X + \beta_2X^2 + \dots + \beta_nX^n + \epsilon

27. Can polynomial regression be applied to multiple variables?
  - Yes, multivariate polynomial regression can be used, where interactions and higher-order terms are included across multiple variables.

28. What are the limitations of polynomial regression?
  - Limitations include:

    - Overfitting with high-degree polynomials
    - Poor extrapolation
    - Interpretability issues
    - Sensitive to outliers
29. What methods can be used to evaluate model fit when selecting the degree of a polynomial?
  - Use:

    - Cross-validation
    - AIC/BIC
    - Adjusted R²
    - Residual plots
30. Why is visualization important in polynomial regression?
  - Visualization helps:
    - Assess model fit
    - Detect overfitting
    - Understand curvature
    - Validate assumptions

In [None]:
#31 . How is polynomial regression implemented in Python?
'''Use PolynomialFeatures from sklearn.preprocessing to generate polynomial terms, then fit a linear regression model:'''
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline

# Example
poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)
model = LinearRegression().fit(X_poly, y)