# Finding $\hat{w}$ and $\hat{b}$

![title](1.jpg)
![title](2.jpg)
![title](3.jpg)

# Finding expectations and variances of estimators

![title](4.jpg)
![title](5.jpg)
![title](6.jpg)
![title](7.jpg)
![title](8.jpg)
![title](9.jpg)

# Argue that in the limit, the error on $\hat{w}$ and $\hat{b}$ are approximately equal to the given variances

![title](10.jpg)
![title](11.jpg)
![title](12.jpg)

# Proving recentering produces same error in $\hat{w}$ but minimizes it in $\hat{b}$

![title](13.jpg)
![title](14.jpg)

# Proving Experimentally the estimators

In [1]:
import numpy as np
import matplotlib.pyplot as plt

## Computing $\hat{w}$, $\hat{w}'$, $\hat{b}$, and $\hat{b}'$

In [140]:
# Final arrays to store data from all iterations.

class Regression:

    def __init__(self):
        
        # Initializing lists for a particular iteration
        self.m, self.w, self.b, self.sigma_square = 200, 1, 5, 0.1
        self.y_val, self.x_dash = [], []
        self.w_hat, self.b_hat = [], []
        self.w_hat_dash, self.b_hat_dash = [], []

    def get_xdash(self, x):
        for i in x:
            self.x_dash.append(i - 101)
        return np.array(self.x_dash)

    def get_y_values(self, x):
        for i in x:
            epsilon = np.random.normal(0, self.sigma_square)
            self.y_val.append((self.w * i) + self.b + epsilon)
        return np.array(self.y_val)

    def get_w_hat(self, x, y):
        sum_x, sum_y = np.mean(x), np.mean(y)
        sum_numerator, sum_denominator = 0, 0
        for i in range(self.m):
            sum_numerator += ((x[i] - sum_x) * (y[i] - sum_y))
            sum_denominator += ((x[i] - sum_x) ** 2)
        self.w_hat.append(sum_numerator / sum_denominator)
        return np.array(self.w_hat)

    def get_w_hat_dash(self, x, y):
        sum_x, sum_y = np.mean(x), np.mean(y)
        for i in range(self.m):
            sum_numerator = (x[i] - sum_x) * (y[i] - sum_y)
            sum_denominator = (x[i] - sum_x) ** 2
        self.w_hat_dash.append(sum_numerator / sum_denominator)
        return np.array(self.w_hat_dash)

    def get_b_hat(self, x, y):
        sum_x, sum_y = np.mean(x), np.mean(y)
        return np.array([sum_y - (self.w_hat[0] * sum_x)])

    def get_b_hat_dash(self, x, y):
        sum_x, sum_y = np.mean(x), np.mean(y)
        return np.array([sum_y - (self.w_hat_dash[0] * sum_x)])

In [195]:
final_x, final_x_dash, final_y = [], [], []
final_w_hat, final_w_hat_dash = [], []
final_b_hat, final_b_hat_dash = [], []

for i in range(1):
    lr = Regression()
    x_val = np.random.uniform(100, 102, 200)
    final_x.append(x_val)

    y = lr.get_y_values(x_val)
    final_y.append(y)

    x_dash = lr.get_xdash(x_val)
    final_x_dash.append(x_dash)

    w_hat = lr.get_w_hat(x_val, y)
    final_w_hat.append(w_hat)

    w_hat_dash = lr.get_w_hat_dash(x_dash, y)
    final_w_hat_dash.append(w_hat_dash)

    b_hat = lr.get_b_hat(x_val, y)
    final_b_hat.append(b_hat)

    b_hat_dash = lr.get_b_hat_dash(x_dash, y)
    final_b_hat_dash.append(b_hat_dash)

print("The final w_hat is : {}\n".format(final_w_hat))
print("The final w_hat_dash is : {}\n".format(final_w_hat_dash))
print("The final b_hat is : {}\n".format(final_b_hat))
print("The final b_hat_dash is : {}\n".format(final_b_hat_dash))

The final w_hat is : [array([0.9934715])]

The final w_hat_dash is : [array([1.01154212])]

The final b_hat is : [array([5.66086116])]

The final b_hat_dash is : [array([106.00044959])]



## Doing it for 1000 times and finding the expected values and variances of weights and biases.

In [178]:
final_x, final_x_dash, final_y = [], [], []
final_w_hat, final_w_hat_dash = [], []
final_b_hat, final_b_hat_dash = [], []

for i in range(1000):
    lr = Regression()
    x_val = np.random.uniform(100, 102, 200)
    final_x.append(x_val)

    y = lr.get_y_values(x_val)
    final_y.append(y)

    x_dash = lr.get_xdash(x_val)
    final_x_dash.append(x_dash)

    w_hat = lr.get_w_hat(x_val, y)
    final_w_hat.append(w_hat)

    w_hat_dash = lr.get_w_hat_dash(x_dash, y)
    final_w_hat_dash.append(w_hat_dash)

    b_hat = lr.get_b_hat(x_val, y)
    final_b_hat.append(b_hat)

    b_hat_dash = lr.get_b_hat_dash(x_dash, y)
    final_b_hat_dash.append(b_hat_dash)

print("The final w_hat is : {}\n".format(final_w_hat[:10]))
print("The final w_hat_dash is : {}\n".format(final_w_hat_dash[:10]))
print("The final b_hat is : {}\n".format(final_b_hat[:10]))
print("The final b_hat_dash is : {}\n".format(final_b_hat_dash[:10]))

The final w_hat is : [array([1.01006036]), array([0.98066514]), array([0.98803768]), array([0.99103945]), array([0.99055359]), array([1.02096254]), array([0.98917179]), array([1.0107103]), array([1.00819679]), array([0.99494314])]

The final w_hat_dash is : [array([0.88611349]), array([1.24011287]), array([0.74500076]), array([1.07682903]), array([1.23886453]), array([1.00589548]), array([1.06452075]), array([0.96786359]), array([0.42280495]), array([1.03358353])]

The final b_hat is : [array([3.98562128]), array([6.95191465]), array([6.19827003]), array([5.91124894]), array([5.94552887]), array([2.8875427]), array([6.08801742]), array([3.91539118]), array([4.16465544]), array([5.511373])]

The final b_hat_dash is : [array([106.00295571]), array([105.9943962]), array([105.98092248]), array([106.00030891]), array([105.98517083]), array([106.00388992]), array([105.99109597]), array([105.99898609]), array([106.02278977]), array([106.0028361])]



## Finding  the expected values of $\hat{w}$, $\hat{b}$, $\hat{w}'$, and  $\hat{b}'$ {-}

$E[\hat{w}] = w$
So, $E[\hat{w}] = 1$ as we are given the value of w as 1.

$E[\hat{b}] = b$
So, $E[\hat{b}] = 5$ (Given)

Now we will see what are the values of the expectations we got.

In [179]:
# Taking the mean of w_hat, w_hat', b_hat', and b_hat for all 1000 iterations to get the estimation of the expected values.

print("Estimation of the expected value of w_hat is {}\n".format(np.mean(final_w_hat)))
print("Estimation of the expected value of w_hat' is {}\n".format(np.mean(final_w_hat_dash)))
print("Estimation of the expected value of b_hat is {}\n".format(np.mean(final_b_hat)))
print("Estimation of the expected value of b_hat' is {}".format(np.mean(final_b_hat_dash)))

Estimation of the expected value of w_hat is 0.9993226130808428

Estimation of the expected value of w_hat' is 1.0859801848187285

Estimation of the expected value of b_hat is 5.068158752294422

Estimation of the expected value of b_hat' is 106.00192979197905


Here, we can see that the values for $\hat{w}$ and $\hat{w}'$ are almost similar and the are comparable to the original w that we were given. This result makes sense.

Now, we can see that $\hat{b}$ and $\hat{b}'$ have a huge difference. This is because as we saw in question 4, recentering the data does not do anything to the weights but changes the value for $\hat{b}$. So, this result is also in concordance with what we found in the previous questions.

## Finding  the variances of $\hat{w}$, $\hat{b}$, $\hat{w}'$, and  $\hat{b}'$ {-}

$var[\hat{w}] = \frac{\sigma^2}{\sum^m_{i=1}(x_i - \bar{x})^2}$

$var[\hat{b}] = \sigma^2(\frac{1}{m} + \frac{\bar{x}^2}{\sum^m_{i=1}(x_i - \bar{x})^2})$

Now let's see what are the values of variances we got.

In [180]:
'''
Taking the variance of w_hat, w_hat', b_hat', and b_hat for all 1000 iterations and taking a mean to see the final variance.
'''
var_w_hat, var_w_hat_dash, var_b_hat, var_b_hat_dash = [], [], [], []


for i in range(1000):
    i_sum, j_sum = 0, 0
    mean_x_val, mean_x_dash = np.mean(final_x[i]), np.mean(final_x_dash[i])
    
    for j in range(200):
        i_sum += ((final_x[i][j] - mean_x_val)**2)
        j_sum += ((final_x_dash[i][j] - mean_x_dash)**2)
        
    current_var_w = lr.sigma_square / i_sum
    var_w_hat.append(current_var_w)
    var_b_hat.append((lr.sigma_square*0.05) + ((mean_x_val**2) * current_var_w))
    
    current_var_w_dash = lr.sigma_square / j_sum
    var_w_hat_dash.append(current_var_w_dash)
    var_b_hat_dash.append((lr.sigma_square*0.05) + ((mean_x_dash**2) * current_var_w_dash))


print("Mean w_hat variance is {} and mean w_hat' variance is {}\n".format(np.mean(var_w_hat), np.mean(var_w_hat_dash)))
print("Mean b_hat variance is {} and mean b_hat' variance is {}\n".format(np.mean(var_b_hat), np.mean(var_b_hat_dash)))
print("\nThe differences in the variances of w_hat and w_hat' is: {}\n".format
      (abs(np.mean(var_w_hat)-np.mean(var_w_hat_dash))))
print("The differences in the variances of b_hat and b_hat' is: {}\n".format
      (abs(np.mean(var_b_hat)-np.mean(var_b_hat_dash))))

Mean w_hat variance is 0.0015199179975486446 and mean w_hat' variance is 0.0015199179975486446

Mean b_hat variance is 15.509969829077079 and mean b_hat' variance is 0.005002522825888445


The differences in the variances of w_hat and w_hat' is: 0.0

The differences in the variances of b_hat and b_hat' is: 15.50496730625119



As we can see, the difference between variances of $\hat{w}$ and $\hat{w}'$ is 0. This is exactly what we saw in the previous questions that recentering the data does not affect the weights.

For the variances of $\hat{b}$ and $\hat{b}'$, we can see that after recentering the data, the variance for $\hat{b}'$ is decreased heavily. We proved this property in question 4 that after recentering the data, the var($\hat{b}$') has a subtraction term $E[x^2] - \mu^2$ which decreases the variance. This decrease is because before recentering, there was a huge difference in the eigenvalues as there was an uneven spread in one direction. But after recentering, that uneven spread was eliminated giving us relatively symmetric data around the origin. Hence, we also proved the recentering property experimentally. 

# Intuitively, why is there no change in the estimate of the slope when the data is shifted?

![title](17a.jpg)

As seen from the figure, whenever we recenter the data, the angle at which the data cloud is situated does not change. This is because we just move the data cloud up and down according to the location of it's mean but we do not change the orientation of the data at all. In the equation of regression, $w$ represents the slope of the line and $b$ represents the y-intercept.

So, whenever we shift the data, we only change the y-intercept values (as we are just moving the data cloud up and down in space) and $w$ values remain unchanged due to no alteration in the slope.

# Proving that the condition number is minimized when we recenter by taking $\mu = E[x]$

![title](15.jpg)
![title](16.jpg)
![title](a.jpeg)
![title](b.jpeg)