# Question 1 : What is Simple Linear Regression (SLR)? Explain its purpose.

 * Simple Linear Regression finds the best-fitting straight line (called the regression line) that predicts the value of Y based on X.
The general equation of the line is:

                          𝑌=𝑏0+𝑏1𝑋+𝜀


 Where:

 Y → Dependent variable (the variable we want to predict)

 X → Independent variable (the predictor)

 b₀ → Intercept (value of Y when X = 0)

 b₁ → Slope (change in Y for a one-unit change in X)

 ε → Error term (difference between actual and predicted values)

 * Purpose of Simple Linear Regression

   *  Prediction:
      To predict the value of one variable (Y) based on another variable (X).
      Example: Predicting a student’s exam score (Y) from the number of study hours (X).

  *   Relationship Analysis:
      To determine whether there is a linear relationship between two variables.
      Example: Studying how temperature (X) affects ice cream sales (Y).

# Question 2: What are the key assumptions of Simple Linear Regression?

  * 1. Linearity :-
    The relationship between the independent variable (X) and the dependent variable (Y) should be linear.
  
  * 2. Independence of Errors :-
    The residuals (errors) should be independent of each other.

  * 3. Homoscedasticity (Constant Variance of Errors) :-
    The variance of residuals should be constant across all levels of X.
  
  * 4. Normality of Errors :-
    The residuals (errors) should be normally distributed.

# Question 3: Write the mathematical equation for a simple linear regression model and explain each term.

  * The general form of a simple linear regression model is:

                              𝑌=𝑏0+𝑏1𝑋+𝜀

  * Y = The variable we are trying to predict or explain. It changes in response to X.                          
  * X = The predictor or input variable used to explain or predict Y.
  * b₀ = The value of Y when X = 0. It represents where the regression line crosses the Y-axis.
  * b₁ = Shows the change in Y for a one-unit change in X. It indicates the strength and direction of the relationship between X and Y.
  * ε (epsilon) = Represents the random deviation of the observed values from the predicted regression line. It captures the influence of factors not included in the model.

# Question 4: Provide a real-world example where simple linear regression can be applied.
  
  * A common real-world example of **Simple Linear Regression (SLR)** is predicting **house prices based on house size**. In this case, the **dependent variable (Y)** is the price of the house, and the **independent variable (X)** is the size of the house in square feet. By collecting data on several houses, such as their sizes and selling prices, we can use SLR to find a straight-line relationship between the two. For instance, the model might be expressed as *Price = 20 + 0.045 × (Size)*, where 20 represents the base price (in lakhs) when the size is zero, and 0.045 indicates that for every additional square foot, the price increases by ₹4,500 on average. This model helps real estate agents and buyers **predict the price of a house** based on its area and understand how **size influences cost**. Simple Linear Regression is widely applied in such cases where one variable depends directly on another, such as predicting student marks from study hours or estimating sales based on advertising spend.

# Question 5: What is the method of least squares in linear regression?

  * The **method of least squares** is a statistical technique used in **linear regression** to find the best-fitting straight line through a set of data points. It works by minimizing the **sum of the squared differences** between the actual observed values and the values predicted by the regression line. These differences, known as **residuals or errors**, represent how far each data point is from the fitted line. By squaring and summing these errors, the method ensures that both positive and negative deviations are treated equally, and that larger errors are given more weight. The goal is to find the line that makes this total squared error as small as possible. Mathematically, the regression line is expressed as ( Y = b_0 + b_1X ), where ( b_0 ) is the intercept and ( b_1 ) is the slope. The least squares method calculates these values so that the line best represents the relationship between the independent variable ( X ) and the dependent variable ( Y ). This approach is widely used because it provides the most accurate and reliable linear fit under normal statistical conditions.

# Question 6: What is Logistic Regression? How does it differ from Linear Regression?

 * Logistic Regression is a statistical method used to predict the probability of a categorical (usually binary) outcome based on one or more independent variables. Unlike Linear Regression, which predicts continuous values, Logistic Regression predicts discrete outcomes, such as Yes/No, Pass/Fail, or Spam/Not Spam.

In Logistic Regression, the relationship between the dependent variable and the independent variables is modeled using a sigmoid (S-shaped) curve, which converts the linear output into a probability value between 0 and 1.

# Question 7: Name and briefly describe three common evaluation metrics for regression models.

  * 1. Mean Absolute Error (MAE)
    
    MAE measures the average absolute difference between the actual and predicted values. It shows how far, on average, the predictions are from the actual outcomes.

  * 2. Mean Squared Error (MSE)

    MSE calculates the average of the squared differences between the actual and predicted values.  

  * 3. R-squared (R²) – Coefficient of Determination

    R² measures how well the regression model explains the variation in the dependent variable.  

# Question 8: What is the purpose of the R-squared metric in regression analysis?

  * The R-squared (R²) metric, also known as the coefficient of determination, measures how well a regression model explains the variability of the dependent variable using the independent variable(s). In other words, it shows the proportion of variance in the dependent variable (Y) that can be explained by the independent variable(s) (X) in the model.

  The value of R² ranges between 0 and 1:

    * An R² of 0 means the model does not explain any of the variation in Y — the predictions are no better than the mean of Y.

    * An R² of 1 means the model perfectly explains all the variation in Y — the predictions match the actual values exactly.

# Question 9: Write Python code to fit a simple linear regression model using scikit-learn and print the slope and intercept.


In [1]:
# Import necessary libraries
import numpy as np
from sklearn.linear_model import LinearRegression

# Example dataset
# X = independent variable (e.g., hours studied)
# Y = dependent variable (e.g., exam scores)
X = np.array([2, 4, 6, 8, 10]).reshape(-1, 1)  # Reshape to 2D array
Y = np.array([50, 60, 70, 80, 90])

# Create and fit the model
model = LinearRegression()
model.fit(X, Y)

# Get slope (coefficient) and intercept
slope = model.coef_[0]
intercept = model.intercept_

# Print the results
print(f"Slope (b1): {slope}")
print(f"Intercept (b0): {intercept}")


Slope (b1): 4.999999999999999
Intercept (b0): 40.00000000000001


# Question 10: How do you interpret the coefficients in a simple linear regression model?

  * Intercept (b₀):

    Represents the value of the dependent variable (Y) when the independent variable (X) is 0.

    Acts as the starting point or baseline of the regression line.

  * Slope (b₁):

    Indicates the change in Y for a one-unit increase in X.

    Shows the strength and direction of the relationship:

      * Positive slope → Y increases as X increases.

      * Negative slope → Y decreases as X increases.