The purpose of this quick entry is to show the difference in computation-time between a for loop and a vectorized dot-product.

In [1]:
import numpy as np
import time
import math

In [2]:
x = np.random.rand(10000000)
w = np.random.rand(10000000)

In [3]:
y=0
tic = time.time()
y = np.dot(w,x)
toc = time.time()

print(toc-tic)

0.07586431503295898


In [47]:
y=0
tic = time.time()
for i in range(len(x)):
    y = y + (x[i] * w[i])
toc = time.time()

print(toc - tic)

11.33963394165039


Wow... the for loop took almost 1000 times the time as the vectorized dot-product!!!

Let's look at another example:

In [54]:
tic = time.time()
y = np.exp(x)
toc = time.time()
print(toc-tic)

0.03924441337585449


In [53]:
y = np.zeros([len(x), 1])
tic = time.time()
for i in range(len(x)):
    y[i] = math.exp(x[i])
toc = time.time()

print(toc - tic)

10.307820320129395


Once again, vectorizing speeds up the process by ALOT!!! 

Now let's take a look at the logistic regression algorithm, and what it looks like vectorized vs unvectorized:

1. $m$ is the number of datapoints in $X$
2. $n$ is the number of features each datapoint in $X$ has
3. $X$ is an $mxn$ matrix. It's got $m$ rows and $n$ columns. Each row corresponds to an individual datapoint $\vec{x}$, holding information about $n$ features each. It's randomly generated. 
4. $Y$ is a vector holding the correct labels for each of our $\vec{x}_s$ - the $i_{th}$ entry in $\vec{Y}$, which we'll call $y_i$, is the correct label for the $i_{th}$ row of $X$, which we'll call $\vec{x}_i$. Each entry (row) in $Y$ is given by the sigmoid of the linear combination of the two features *(columns)* of each row of $X: 0.2C_0 - 0.3C_1$. In other words, $0.2 \hat{i} - 0.3\hat{j}$ is the correct weight vector that we hope our logistic regression algorithm will get to. $Y$ can only hold $1_s$ and $0_s$, *(we're assuming that the correct labels are binary)* which is what the next few lines are doing - turning the labels in $Y$ into $1_s$ and $0_s$

In [81]:
m = 10 
n=2
X = np.random.rand(m, n)
Y = np.dot(X, ([.2, -.3]))

for val in range(len(Y)):
    Y[val] = (1/(1 + math.exp(-Y[val])))
Y = np.round(Y)
print(X)
print(Y)

[[0.56851308 0.18675876]
 [0.03442909 0.25015737]
 [0.29808635 0.48096666]
 [0.25995842 0.83796735]
 [0.30176686 0.62806705]
 [0.39236232 0.24862946]
 [0.56048842 0.59142251]
 [0.02554223 0.3188979 ]
 [0.84551615 0.59478436]
 [0.89047053 0.59187915]]
[1. 0. 0. 0. 0. 1. 0. 0. 0. 1.]


In [82]:
W = [.1, -.1]
b = np.random.randint(-5, 5)
E=0; db=0;
dW = [0,0]

In [83]:
for loopyloop in range(1000):
    for i in range(m):    
        z = 0
        for j in range(n):
            z += W[j]*X[i,j]
            z += b
        a = 1 / (1 + math.exp(-z))
        L = -(Y[i]*math.log(a) + (1-Y[i])*math.log(1-a))
        E += L


        dz = a-Y[i]
        for j in range(n):
            dW[j] += X[i,j]*dz

        db += dz

    db = db/m
    dW = [dw/m for dw in dW]
    E = E/m

    for w in range(len(W)):
        W[w] -= (dW[w])
    b -= db
    
    if loopyloop%20 == 0:
        print(E)

1.7942223761225804
0.5837624408275794
0.5249437099254616
0.4811939372819247
0.4473007752610365
0.4201601285414925
0.39784944448039355
0.37912342843286606
0.3631401817225821
0.3493085714069628
0.3371999346822178
0.3264948904389487
0.31695000877632223
0.30837615027819
0.30062392474136407
0.29357364803783437
0.28712823264466386
0.2812080463554878
0.2757471246991002
0.2706903349542974
0.2659912220061957
0.2616103510540692
0.2575140178164741
0.2536732342085468
0.25006292300928656
0.24666127282366065
0.24344921722628743
0.24041001099979323
0.2375288829396192
0.23479274951834933
0.23218997728470264
0.22971018455995756
0.22734407502955495
0.22508329738158786
0.22292032633999845
0.22084836136788905
0.21886124004077961
0.2169533636591942
0.21511963312057353
0.2133553934291718
0.2116563855097025
0.21001870422156949
0.20843876165746017
0.20691325496206306
0.20543913803082542
0.20401359655053725
0.2026340259274821
0.20129801171836953
0.20000331223698317
0.1987478430576132


In [80]:
print(E)
print(dW)
print(W)
print(b)

4.432921336272889e-05
[1.9962283174205204e-05, 2.477167713284318e-05]
[0.07879016806161626, -0.12630085216799286]
-5.046993099574763
