Question 1: What is Simple Linear Regression?

- Answer- Simple Linear Regression is a statistical method used to model the relationship between two variables:
One independent variable (predictor, input, or feature), typically denoted as
𝑋
X

 One dependent variable (response or output), typically denoted as
𝑌
Y

Question 2: What are the key assumptions of Simple Linear Regression?

- Answer- Linearity – Relationship between X and Y is linear.

 Independence – Observations are independent.

 Homoscedasticity – Constant variance of residuals.

 Normality – Residuals are normally distributed.

 No Multicollinearity – Not applicable in simple regression (only one predictor).

Question 3: What is heteroscedasticity, and why is it important to address in regression
models?

- Answer- Heteroscedasticity means that the variance of the residuals (errors) is not constant across all values of the independent variable.

 In simple terms:

 The spread of the errors changes as the value of X changes.

 This violates one of the key assumptions of linear regression (homoscedasticity).

Question 4: What is Multiple Linear Regression?
 - Answer- Multiple Linear Regression (MLR) is a statistical method used to model the relationship between one dependent variable (Y) and two or more independent variables (X₁, X₂, X₃, ...).

Question 5: What is polynomial regression, and how does it differ from linear
regression?
- Answer- Polynomial Regression is a type of regression where the relationship between the independent variable
𝑋
X and the dependent variable
𝑌
Y is modeled as an nth-degree polynomial.

 Equation:
Y=β0​+β1​X+β2​X2+β3​X3+⋯+βn​Xn+ε





In [2]:
#Question 6: Implement a Python program to fit a Simple Linear Regression model to the following sample data:
#● X = [1, 2, 3, 4, 5]
#● Y = [2.1, 4.3, 6.1, 7.9, 10.2]
#Plot the regression line over the data points.
'''
# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

# Sample data
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)  # Reshape for sklearn
Y = np.array([2.1, 4.3, 6.1, 7.9, 10.2])

# Create and fit the model
model = LinearRegression()
model.fit(X, Y)

# Get the regression line values
Y_pred = model.predict(X)

# Print model parameters
print("Intercept (β0):", model.intercept_)
print("Slope (β1):", model.coef_[0])

# Plot the data points and regression line
plt.scatter(X, Y, color='blue', label='Actual Data')
plt.plot(X, Y_pred, color='red', label='Regression Line')
plt.xlabel("X")
plt.ylabel("Y")
plt.title("Simple Linear Regression")
plt.legend()
plt.grid(True)
plt.show()
'''

'\n# Import necessary libraries\nimport numpy as np\nimport matplotlib.pyplot as plt\nfrom sklearn.linear_model import LinearRegression\n\n# Sample data\nX = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)  # Reshape for sklearn\nY = np.array([2.1, 4.3, 6.1, 7.9, 10.2])\n\n# Create and fit the model\nmodel = LinearRegression()\nmodel.fit(X, Y)\n\n# Get the regression line values\nY_pred = model.predict(X)\n\n# Print model parameters\nprint("Intercept (β0):", model.intercept_)\nprint("Slope (β1):", model.coef_[0])\n\n# Plot the data points and regression line\nplt.scatter(X, Y, color=\'blue\', label=\'Actual Data\')\nplt.plot(X, Y_pred, color=\'red\', label=\'Regression Line\')\nplt.xlabel("X")\nplt.ylabel("Y")\nplt.title("Simple Linear Regression")\nplt.legend()\nplt.grid(True)\nplt.show()\n'

In [3]:
#Question 7: Fit a Multiple Linear Regression model on this sample data:
#● Area = [1200, 1500, 1800, 2000]
#● Rooms = [2, 3, 3, 4]
#● Price = [250000, 300000, 320000, 370000]
'''
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from statsmodels.stats.outliers_influence import variance_inflation_factor
import statsmodels.api as sm

# Sample data
data = pd.DataFrame({
    'Area': [1200, 1500, 1800, 2000],
    'Rooms': [2, 3, 3, 4],
    'Price': [250000, 300000, 320000, 370000]
})

# Independent variables
X = data[['Area', 'Rooms']]
y = data['Price']

# Fit the Multiple Linear Regression model
model = LinearRegression()
model.fit(X, y)

# Print model coefficients
print("Intercept (β0):", model.intercept_)
print("Coefficients (β1, β2):", model.coef_)

# Add constant term for VIF calculation
X_vif = sm.add_constant(X)

# Calculate VIF
vif_data = pd.DataFrame()
vif_data['Feature'] = X_vif.columns
vif_data['VIF'] = [variance_inflation_factor(X_vif.values, i) for i in range(X_vif.shape[1])]

print("\nVariance Inflation Factor (VIF):")
print(vif_data)
'''

'\nimport numpy as np\nimport pandas as pd\nfrom sklearn.linear_model import LinearRegression\nfrom statsmodels.stats.outliers_influence import variance_inflation_factor\nimport statsmodels.api as sm\n\n# Sample data\ndata = pd.DataFrame({\n    \'Area\': [1200, 1500, 1800, 2000],\n    \'Rooms\': [2, 3, 3, 4],\n    \'Price\': [250000, 300000, 320000, 370000]\n})\n\n# Independent variables\nX = data[[\'Area\', \'Rooms\']]\ny = data[\'Price\']\n\n# Fit the Multiple Linear Regression model\nmodel = LinearRegression()\nmodel.fit(X, y)\n\n# Print model coefficients\nprint("Intercept (β0):", model.intercept_)\nprint("Coefficients (β1, β2):", model.coef_)\n\n# Add constant term for VIF calculation\nX_vif = sm.add_constant(X)\n\n# Calculate VIF\nvif_data = pd.DataFrame()\nvif_data[\'Feature\'] = X_vif.columns\nvif_data[\'VIF\'] = [variance_inflation_factor(X_vif.values, i) for i in range(X_vif.shape[1])]\n\nprint("\nVariance Inflation Factor (VIF):")\nprint(vif_data)\n'

In [5]:
#Question 8: Implement polynomial regression on the following data:
#● X = [1, 2, 3, 4, 5]
#● Y = [2.2, 4.8, 7.5, 11.2, 14.7]
#Fit a 2nd-degree polynomial and plot the resulting curve.
'''
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

# Sample data
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
Y = np.array([2.2, 4.8, 7.5, 11.2, 14.7])

# Create 2nd-degree polynomial features
poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)

# Fit the Polynomial Regression model
model = LinearRegression()
model.fit(X_poly, Y)

# Predict using the model
Y_pred = model.predict(X_poly)

# Print model coefficients
print("Intercept (β0):", model.intercept_)
print("Coefficients (β1, β2):", model.coef_)

# Plot the original data and the polynomial curve
plt.scatter(X, Y, color='blue', label='Actual Data')
plt.plot(X, Y_pred, color='red', label='2nd Degree Polynomial Fit')
plt.xlabel("X")
plt.ylabel("Y")
plt.title("Polynomial Regression (Degree 2)")
plt.legend()
plt.grid(True)
plt.show()
'''

'\nimport numpy as np\nimport matplotlib.pyplot as plt\nfrom sklearn.preprocessing import PolynomialFeatures\nfrom sklearn.linear_model import LinearRegression\n\n# Sample data\nX = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)\nY = np.array([2.2, 4.8, 7.5, 11.2, 14.7])\n\n# Create 2nd-degree polynomial features\npoly = PolynomialFeatures(degree=2)\nX_poly = poly.fit_transform(X)\n\n# Fit the Polynomial Regression model\nmodel = LinearRegression()\nmodel.fit(X_poly, Y)\n\n# Predict using the model\nY_pred = model.predict(X_poly)\n\n# Print model coefficients\nprint("Intercept (β0):", model.intercept_)\nprint("Coefficients (β1, β2):", model.coef_)\n\n# Plot the original data and the polynomial curve\nplt.scatter(X, Y, color=\'blue\', label=\'Actual Data\')\nplt.plot(X, Y_pred, color=\'red\', label=\'2nd Degree Polynomial Fit\')\nplt.xlabel("X")\nplt.ylabel("Y")\nplt.title("Polynomial Regression (Degree 2)")\nplt.legend()\nplt.grid(True)\nplt.show()\n'

In [6]:
#Question 9: Create a residuals plot for a regression model trained on this data:
#● X = [10, 20, 30, 40, 50]
#● Y = [15, 35, 40, 50, 65]
#Assess heteroscedasticity by examining the spread of residuals.
'''
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

# Sample data
X = np.array([10, 20, 30, 40, 50]).reshape(-1, 1)
Y = np.array([15, 35, 40, 50, 65])

# Train a simple linear regression model
model = LinearRegression()
model.fit(X, Y)

# Predict values
Y_pred = model.predict(X)

# Calculate residuals
residuals = Y - Y_pred

# Print residuals
print("Residuals:", residuals)

# Plot residuals
plt.scatter(X, residuals, color='purple', marker='o')
plt.axhline(y=0, color='black', linestyle='--')
plt.xlabel("X")
plt.ylabel("Residuals")
plt.title("Residuals Plot")
plt.grid(True)
plt.show()
'''

'\nimport numpy as np\nimport matplotlib.pyplot as plt\nfrom sklearn.linear_model import LinearRegression\n\n# Sample data\nX = np.array([10, 20, 30, 40, 50]).reshape(-1, 1)\nY = np.array([15, 35, 40, 50, 65])\n\n# Train a simple linear regression model\nmodel = LinearRegression()\nmodel.fit(X, Y)\n\n# Predict values\nY_pred = model.predict(X)\n\n# Calculate residuals\nresiduals = Y - Y_pred\n\n# Print residuals\nprint("Residuals:", residuals)\n\n# Plot residuals\nplt.scatter(X, residuals, color=\'purple\', marker=\'o\')\nplt.axhline(y=0, color=\'black\', linestyle=\'--\')\nplt.xlabel("X")\nplt.ylabel("Residuals")\nplt.title("Residuals Plot")\nplt.grid(True)\nplt.show()\n'

Question 10: Imagine you are a data scientist working for a real estate company. You
need to predict house prices using features like area, number of rooms, and location.
However, you detect heteroscedasticity and multicollinearity in your regression
model. Explain the steps you would take to address these issues and ensure a robust
model.

- Answer- Handling Heteroscedasticity & Multicollinearity in Regression
Heteroscedasticity:

 Apply log or square root transformation on the target variable (price).

 Use robust regression or Weighted Least Squares.

 Check for and remove outliers.

 Multicollinearity:

 Remove or combine correlated features (e.g., rooms and area).

 Use PCA to reduce correlated variables.

 Apply Ridge or Lasso regression to reduce impact.