<a href="https://colab.research.google.com/github/lisahqwang/ML-DL-CV/blob/main/Lisa_Biased_Estimation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Biased Estimation Optimization

In [3]:
import numpy as np
import matplotlib.pyplot as plt

## **Setting a Reference Point for linear approximation**


A common deterministic (and therefore typically biased) approximation is a firstorder Taylor expansion. Concretely, for:


$$
y=\operatorname{GELU}(W x) \quad \in \mathrm{R}^d
$$
we pick a reference point $\bar{x}$ (or equivalently $\bar{z}=W \bar{x}$ ) and do


$$
\operatorname{GELU}(W x) \approx \operatorname{GELU}(\bar{z})+\operatorname{diag}\left(\operatorname{GELU}^{\prime}(\bar{z})\right)[W(x-\bar{x})]
$$


We can write this approximation in the form:


$$
\Phi(W) \Psi(x)
$$
by letting (for each dimension $d$ ):
- $\Phi(W)=\left[\operatorname{diag}\left(\operatorname{GELU}^{\prime}(\bar{z})\right) W, \operatorname{GELU}(\bar{z})\right]_{\text {, }}$
- $\Psi(x)=\left[\begin{array}{c}x-\bar{x} \\ 1\end{array}\right]$.



In [7]:
#setting d = 10
d = 10

x = np.random.normal(loc=0, scale=1, size=(d, 1))
print("Layer shape of x: ", x.shape)
print(x)

W = np.random.normal(loc=0, scale=1, size=(d, d))
print('\n')
print("Layer shape of W: ", W.shape)
print(W)

Layer shape of x:  (10, 1)
[[-0.55765566]
 [ 1.69378792]
 [ 1.03214268]
 [-0.27551954]
 [-0.27756137]
 [-1.73018594]
 [ 0.63939021]
 [-1.28047162]
 [ 1.83737787]
 [-2.16822452]]


Layer shape of W:  (10, 10)
[[-8.46682110e-01  1.36505708e+00  2.87656891e-01  3.34626730e-02
  -1.58077441e+00  1.89346175e+00 -1.40672095e-02  5.58742600e-01
   6.09605443e-01  3.22708705e-01]
 [ 3.57451349e-01 -7.13440647e-01  7.13866295e-01 -4.58936059e-01
  -2.50448391e-01 -8.10998777e-01 -2.02573613e+00 -1.43472654e+00
  -3.35744874e-01  1.60178692e-01]
 [ 1.31720013e+00  1.47845566e+00 -1.00296605e+00 -1.67803973e+00
   1.32427531e-01 -2.93137181e-01 -1.72885499e+00  2.45018982e-03
   6.90338349e-01 -4.79449786e-01]
 [-8.72493225e-01  4.48415981e-01 -3.03120788e-01 -5.01516041e-01
   5.00113641e-01  3.56554023e-02 -5.64200909e-01 -3.23042210e-01
  -6.70337693e-01  6.80801971e-01]
 [ 1.43276668e+00  5.24343867e-01 -1.49716112e+00 -6.13185707e-01
  -2.16403086e+00  1.10373946e+00  5.71222686e-01  1.75877

In [8]:
reference_point_z_bar = np.matmul(W, x)
print("reference_point_z_bar shape: ", reference_point_z_bar.shape)
print("reference_point_z_bar: ", reference_point_z_bar)

reference_point_z_bar shape:  (10, 1)
reference_point_z_bar:  [[-0.06939925]
 [ 0.50589647]
 [ 2.86661679]
 [-1.7840101 ]
 [-3.70490166]
 [-4.4181141 ]
 [ 1.71332087]
 [-3.9271466 ]
 [-3.26220939]
 [ 4.85742775]]





**The Derivative of GELU (tanh approximation) is:**

$
\frac{\tanh\left(\frac{\sqrt{2} \left(\frac{8943x^{3}}{200000} + x\right)}{\sqrt{\pi}}\right) + 1}{2} + \frac{x \left(\frac{26829x^{2}}{200000} + 1\right) \operatorname{sech}^{2}\left(\frac{\sqrt{2} \left(\frac{8943x^{3}}{200000} + x\right)}{\sqrt{\pi}}\right)}{\sqrt{2} \sqrt{\pi}}
$


- $\Phi(W)=\left[\operatorname{diag}\left(\operatorname{GELU}^{\prime}(\bar{z})\right) W, \operatorname{GELU}(\bar{z})\right]_{\text {, }}$
- $\Psi(x)=\left[\begin{array}{c}x-\bar{x} \\ 1\end{array}\right]$.

In [30]:
#import sympy
# Use sympy functions for symbolic calculations, including sympy.tanh and sympy.sqrt
#f = 0.5 * x * (1 + np.tanh(np.sqrt(2 / np.pi) * (x + 0.044715 * x**3)))
#df_dx = sympy.diff(f, x)

def gelu_tanh(x):
    return 0.5 * x * (1 + np.tanh(np.sqrt(2/np.pi) * (x + 0.044715 * np.power(x, 3))))

def sech(x):
  #just sech(x)
  return 1 / np.cosh(x)

def df_dx(x):
  df_dx = (np.tanh((np.sqrt(2) * ((8943 * x**3) / 200000 + x)) / np.sqrt(np.pi)) + 1) / 2 + (x * ((26829 * x**2) / 200000 + 1) * sech((np.sqrt(2) * ((8943 * x**3) / 200000 + x)) / np.sqrt(np.pi))**2) / (np.sqrt(2) * np.sqrt(np.pi))
  return df_dx

print("original equation: 0.5 * x * (1 + tanh(sqrt(2 /π) * (x + 0.044715 * x**3)))" )
print("")
print("shape of derivative equation output: ", df_dx(reference_point_z_bar).shape)
print("derivative equation output: ", df_dx(reference_point_z_bar))

#
#print("")
#print("derivative at reference point: ", df_dx_at_reference_point_z_bar)

original equation: 0.5 * x * (1 + tanh(sqrt(2 /π) * (x + 0.044715 * x**3)))

shape of derivative equation output:  (10, 1)
derivative equation output:  [[ 4.44716647e-01]
 [ 8.70987700e-01]
 [ 1.01647950e+00]
 [-1.08446861e-01]
 [-1.13499040e-03]
 [-4.51812686e-05]
 [ 1.11480479e+00]
 [-4.59488172e-04]
 [-5.35531522e-03]
 [ 1.00000378e+00]]


In [34]:
phi = [np.matmul(W, df_dx(reference_point_z_bar)), gelu_tanh(reference_point_z_bar[0])]
print("W: ", W.shape)

psi = [x - reference_point_z_bar, 1]
#print("phi shape: ", phi.shape) # phi is a list and does not have shape attribute
print("phi shape: ", [arr.shape for arr in phi]) # Print shape of each array in phi
print("phi: ", phi)
print("psi shape: ", [arr.shape if isinstance(arr, np.ndarray) else arr for arr in psi]) # Print shape of ndarray or value in psi
print("psi: ", psi)

W:  (10, 10)
phi shape:  [(10, 1), (1,)]
phi:  [array([[ 1.40639768],
       [-1.78237573],
       [-1.37464135],
       [-0.19617595],
       [ 0.26559968],
       [ 0.64208913],
       [-1.89935148],
       [ 0.48967009],
       [ 1.6592798 ],
       [-1.50024129]]), array([-0.03277977])]
psi shape:  [(10, 1), 1]
psi:  [array([[-0.48825641],
       [ 1.18789145],
       [-1.83447412],
       [ 1.50849056],
       [ 3.4273403 ],
       [ 2.68792816],
       [-1.07393066],
       [ 2.64667498],
       [ 5.09958726],
       [-7.02565228]]), 1]


# **This concludes the biased section**
We can concluside that this biased prediction of psi and phi works upon establishing a random chosen point z_bar, and is not done over trials and many iterations.