# Assignment 3: Neural networks in natural language processing

### Due Date: Oct 30 (both sections)

### Grade (100 pts, 10%)

#### Your Name:

#### Your EID:

*Note: This assignment covers material from the recording, notes, demo, and suggested readings from Lecture-08*

---

In [None]:
Name : Haoyu Wang
EID : hw468

## Questions

### 1. Dropout (50 pts)

Dropout is a regularization technique that randomly sets units in each activation layer, $a \in \mathbb{R}^{D}$, to zero and then multiplies the resultant vector elementwise by a constant $\gamma$ according to:

$$a_{dropout} \leftarrow  \gamma H \odot a$$

where $\odot$ represents the element-wise product operator and $H \in \{0, 1\}^D$ is a mask with entries drawn from 

$$\begin{cases} p(0) &= p_{dropout} \\ p(1) &= 1 - p_{dropout} \end{cases}$$

Select a scaling factor ${\gamma}$ that ensures the expected value over the activation layer remains invariant to the above operation, $E\big[ a_{dropout} \big] = E\big[ a \big]$, and provide rationale for your selection.

*Hint: You want to show that*

$$
\sum_{i=1}^D a_i = \gamma \sum_{i=1}^D a_{dropout, i}
$$

In [10]:
import numpy as np

#Create the original 10000 array with all values are 1
A=np.ones((10000))
np.mean(A)

1.0

In [11]:
# Dropout function
def dropout(x, level):
    if level < 0. or level >= 1: 
        raise ValueError('Dropout level must be in interval [0, 1[.')
    prob = 1. - level
    
    #Create a tensor with Bernoulli distribution with p=0.4
    random_tensor = np.random.binomial(n=1, p=prob, size=x.shape) 
    print(random_tensor)
    #Dropout 
    x *= random_tensor
    print(x)
    #Rescale
    x /= prob

    return x

#Dropout A with 0.4 probability 
B=dropout(A,0.4)
print("B=",B)
np.mean(B)
##From the process above, we can easily find that ∑𝑖=1𝐷𝑎𝑖=𝛾∑𝑖=1𝐷𝑎𝑑𝑟𝑜𝑝𝑜𝑢𝑡,𝑖 almostly.

[0 1 1 ... 0 1 0]
[0. 1. 1. ... 0. 1. 0.]
B= [0.         1.66666667 1.66666667 ... 0.         1.66666667 0.        ]


0.9979999999999998

### 2. Convolutions (50 pts)

Consider a sequence of $T$ token embeddings, $Z \in \mathbb{R}^{T \times D}$, for which $D=3$:

In [2]:
import numpy as np

Z = np.array([
    [1.3,   0.4, -0.2],
    [-3.1,  1.1,  2.1],
    [0.9,   2.8, -1.5],
    [1.3,   2.4,  0.1],
    [1.0,   1.0,  0.5],
    [3.0,  -1.4, -0.2],
    [-0.7,  1.8,  1.3]
])

and a set of convolutional filters, $W=\{ w^{(1)}, w^{(2)} \}$, and corresponding filter widths $S=\{ s^{(1)}, s^{(2)}  \}$:

In [17]:
w1 = np.array([
    [1, 1, 1],
    [1, 1, 1]
])

w2 = np.array([
    [2, 2, 2],
    [2, 2, 2],
    [2, 2, 2]
])

W = [w1, w2]
print(W)
S = [2, 3]

[array([[1, 1, 1],
       [1, 1, 1]]), array([[2, 2, 2],
       [2, 2, 2],
       [2, 2, 2]])]


In Lecture 08 we discussed a set of operations that maps $Z \in \mathbb{R}^{T \times D}$ onto $Z' \in \mathbb{R}^{N_F D}$ (in this problem $N_F = 2$). This involved three steps:

1. **Convolution**: The convolutional operation produces $N_F$ feature maps, $B^{(n)} \in \mathbb{R}^{(T - s^{(n)} + 1) \times D}$, where $n=\{1, \dots, N_F\}$, according to:

$$
\forall_{t \in \{ 1, \dots, T - s^{(n)} + 1 \} } \; B^{(n)}_{t,j} = \sum_{t'=1}^{S^{(n)}} w^{(n)}_{t',j} \; Z_{t,j}
$$

2. **Max pooling**: The max pooling operation computes the max over the sequence dimension in each feature map, $ B_{maxpool}^{(n)} \in \mathbb{R}^D$, according to:

$$
B_{maxpool, j}^{(n)} = \underset{1 \leq t' \leq T - s^{(n)} + 1 }{\max} B^{(n)}_{t', j}
$$

3. **Concatenation**: The resultant set of $N_F$ feature vectors are then concatenated into a single vector $Z'$ according to:

$$
Z' = \big[ B_{maxpool}^{(1)}, \dots, B_{maxpool}^{(n)}, \dots,  B_{maxpool}^{(N_F)}  \big] \in \mathbb{R}^{D \cdot N_F}
$$

In the cell below, perform these three operations to produce $Z' \in \mathbb{R}^6$ and print it.

*Hint: The max pooling operation computes the maximum over each column in $B^{(n)}$*

In [19]:
#########Convolution process
import numpy as np
t,d=Z.shape
print(t,d)
S[0]

7 3


2

In [47]:
#1.Convolution
def convolution(w,Z,n):
    t,d=Z.shape
    conv=[]
    for i in range(t-S[n]+1):
        row = []
        for j in range (1):
            a=Z[i:i+S[n],j:j+d]
            row.append(np.sum(np.multiply(w,a)))
        conv.append(row)
    return np.array(conv)

c1=convolution(w1,Z,0)
c2=convolution(w2,Z,1)
C=[c1,c2]
print(C)

[array([[1.6],
       [2.3],
       [6. ],
       [6.3],
       [3.9],
       [3.8]]), array([[ 7.6],
       [12.2],
       [17. ],
       [15.4],
       [12.6]])]


In [58]:
#2.Max pooling
def Max_pooling(C):
    t,d=Z.shape
    pool=[]
    for i in range (2):
        row=[]
        row.append(np.max(C[i]))
        pool.append(row)
    return np.array(pool)

In [62]:
MP=Max_pooling(C)
print(MP)

[[ 6.3]
 [17. ]]


In [64]:
#3.Concatenation
type(MP)
def concatenation (Max_pooling):
    Z_concate=np.reshape(Max_pooling,-1)
    return Z_concate

In [65]:
concatenation(MP)

array([ 6.3, 17. ])