## Implementing Gradient Descent
**DUCAT ML Class 3**

### First Order Linear Equation

y = b0 + b1x1 + e

In [0]:
import numpy as np

In [0]:
n1 = []
n2 = []

for i in range(1,100):
    n1.append(i*np.random.randint(1, 10))
    # n2.append(i+np.random.randint(1, 10)+i**2)

n1 = np.array(n1)
n2 = np.array(n2)
n2 = n1 * 2.3 + 4

In [0]:
n1

In [0]:
n2

In [0]:
w0 = round(np.random.random(),2)
w1 = round(np.random.random(),2)
i = 1
alpha = 0.000001

while i <= 5000:
    yp = w0 + w1*n1
    j = sum((yp-n2)**2)/len(n1)
    if (i % 2 == 0):
        print(j)
    if (j > 0 and j < 1):
        break
    dw0 = (2/len(n1)) * sum(yp - n2)
    dw1 = (2/len(n1)) * sum((yp - n2) * n1)
    w0 = w0 - alpha * dw0
    w1 = w1 - alpha * dw1
    i += 1

In [0]:
print(w0, w1)

# Causal Model - OLS

y = b1x1 + e

In [0]:
import numpy as np
import statsmodels.api as sm

# Generate synthetic data
np.random.seed(42)
n_obs = 1000

# Endogenous variable (affected by the instrument)
endogenous_variable = np.random.normal(size=n_obs)

# Instrumental variable
instrument = np.random.normal(size=n_obs)

# Exogenous variable (independent of the instrument)
exogenous_variable = np.random.normal(size=n_obs)

# Error term
error = np.random.normal(size=n_obs)

# Causal relationship: endogenous_variable = beta * exogenous_variable + error
beta = 0.5
endogenous_variable = beta * exogenous_variable + error

# Instrumental variable regression
# IV model: endogenous_variable = gamma * instrument + delta * exogenous_variable + error
iv_model = sm.OLS(endogenous_variable, sm.add_constant(np.column_stack((instrument, exogenous_variable))))
iv_results = iv_model.fit()

# Display regression results
print(iv_results.summary())


Here‚Äôs a **short, clear explanation of each term** in the OLS regression output, written in **interview- and exam-friendly language**:

---

### **Model Overview**

* **Dep. Variable (y):** The target variable being predicted.
* **Model (OLS):** Ordinary Least Squares, a method that minimizes squared prediction errors.
* **Method (Least Squares):** Technique used to estimate coefficients.

---

### **Model Fit Metrics**

* **R-squared (0.203):** About **20.3%** of the variation in `y` is explained by the model.
* **Adj. R-squared (0.201):** R-squared adjusted for number of predictors; useful for model comparison.
* **F-statistic (126.7):** Tests whether the model as a whole is statistically significant.
* **Prob (F-statistic) (9.35e-50):** Very small value ‚Üí the model is **overall significant**.

---

### **Model Size & Complexity**

* **No. Observations (1000):** Number of data points used.
* **Df Model (2):** Number of predictors (`x1`, `x2`).
* **Df Residuals (997):** Degrees of freedom left after fitting the model.

---

### **Information Criteria**

* **Log-Likelihood (-1443.5):** Measure of model fit; higher (less negative) is better.
* **AIC (2893):** Metric balancing model fit and complexity; lower is better.
* **BIC (2908):** Similar to AIC but penalizes complexity more strongly.

---

### **Coefficients Table**

* **coef:** Estimated impact of each variable on `y`.
* **std err:** Uncertainty in the coefficient estimate.
* **t:** Test statistic for whether the coefficient is different from zero.
* **P>|t|:** p-value for individual variable significance.
* **[0.025, 0.975]:** 95% confidence interval for the coefficient.

---

### **Coefficient Interpretation**

* **const:** Expected value of `y` when all predictors are zero (not significant here).
* **x1:** Negative effect on `y`, but **not statistically significant** at 5% level.
* **x2:** Strong positive effect on `y`, **highly statistically significant**.

---

### **Covariance Type**

* **nonrobust:** Assumes constant variance of errors (homoscedasticity).

---

### **One-Line Summary**

> The model is statistically significant overall, but only `x2` is a strong and reliable predictor of `y`.

If you want, I can also explain:

* how to **tell if assumptions are violated**
* what to **say in an interview in 30 seconds**
* how this output changes with **multicollinearity or heteroscedasticity**


# sklearn
[Link](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html)

y = b0 + b1x1 + b2x2

In [0]:
import numpy as np
from sklearn.linear_model import LinearRegression
X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
# y = 1 * x_0 + 2 * x_1 + 3
y = np.dot(X, np.array([1, 2])) + 3
reg = LinearRegression().fit(X, y)
reg.score(X, y)
# 1.0

In [0]:
reg.coef_, reg.intercept_, reg.predict(np.array([[3, 5]]))
# array([1., 2.]), np.float64(3.0), array([16.])

# Matrix method

y = b0 + b1x1 + b2x2 + e

In [0]:
import numpy as np

# Sample data
X = np.array([
    [1, 2],
    [2, 3],
    [3, 4],
    [4, 5]
])

y = np.array([6, 8, 10, 12])

# Add bias (intercept) term
X_b = np.c_[np.ones((X.shape[0], 1)), X]

# Normal Equation
beta = np.linalg.inv(X_b.T @ X_b) @ X_b.T @ y

print("Coefficients (Matrix Method):")
# [intercept, beta1, beta2]
print(beta)


In [0]:
import numpy as np

# User inputs
x1 = eval(input("Enter x1: "))
x2 = eval(input("Enter x2: "))

# Create feature vector (with bias term)
X_new = np.array([1, x1, x2])   # 1 is for intercept

print("Input Vector:", X_new)

# Prediction using matrix multiplication
y_pred = X_new @ beta

print("Predicted Output:", y_pred)


1Ô∏è‚É£ Multiple Linear Regression ‚Äî Matrix Method (Normal Equation)

üìå Formula

[
\hat{\beta} = (X^T X)^{-1} X^T y
]

```
[intercept, beta1, beta2]
```

---

 2Ô∏è‚É£ Multiple Linear Regression ‚Äî Gradient Descent

üìå Formula

[
\beta := \beta - \alpha \cdot \frac{1}{m} X^T(X\beta - y)
]




In [0]:
import numpy as np

X = np.array([
    [1, 2],
    [2, 3],
    [3, 4],
    [4, 5]
])

y = np.array([6, 8, 10, 12])

i = 1
alpha = 0.0001

w0 = round(np.random.random(), 2)
w1 = round(np.random.random(), 2)
w2 = round(np.random.random(), 2)

print("Initial Weights:")
print("w0 (Intercept):", w0)
print("w1:", w1)
print("w2:", w2)

i = 1
alpha = 0.001

while i <= 5000:
    
    # Predictions
    yp = w0 + w1 * X[:, 0] + w2 * X[:, 1]
    
    # Cost function (Mean Squared Error)
    j = sum((yp - y) ** 2) / len(y)
    
    if i % 500 == 0:
        print(f"Iteration {i}, Cost = {j}")
    
    if j > 0 and j < 0.0001:
        break
    
    # Gradients
    dw0 = (2 / len(y)) * sum(yp - y)
    dw1 = (2 / len(y)) * sum((yp - y) * X[:, 0])
    dw2 = (2 / len(y)) * sum((yp - y) * X[:, 1])
    
    # Update weights
    w0 = w0 - alpha * dw0
    w1 = w1 - alpha * dw1
    w2 = w2 - alpha * dw2
    
    i += 1



In [0]:
print("Final Weights:")
print("w0 (Intercept):", w0)
print("w1:", w1)
print("w2:", w2)


In [0]:
x1 = eval(input())
x2 = eval(input())

# Creating the NumPy array
X_new = np.array([x1, x2])
print(X_new)

y_pred = w0 + w1 * X_new[0] + w2 * X_new[1]
print("Predicted Output:", y_pred)