### 1. 

### C1: Lap tung diem trong tap hop thu nhat, voi tinh khoang cach tu diem nay den tat ca cac diem trong tap hop thu hai (su dung ham khoang cach thong thuong)

Sử dụng một vòng
for tính khoảng cách từ từng điểm trong tập thứ nhất đến tất cả các điểm trong
tập thứ hai thông qua hàm **dist_ps_fast(z, X)** ở trên. 

In [4]:
import numpy as np 
from time import time 

In [2]:
# from one point to each point in a set, fast 
def dist_ps_fast(z, X): 
    X2 = np.sum(X*X, 1)  # (vectorization) square of l2 norm of each X[i], can be precomputed 
                         # (lay tong cua tung hang cua MT) 
    z2 = np.sum(z*z)     # square of l2 norm of z 
    return X2 + z2 - 2*X.dot(z)    # z2 can be ignored 

In [6]:
d, N = 1000, 10000            # dimension - feature (1000 - row), number of training points (10000 - column)
X = np.random.rand(N, d)      # N d-dimensional points 
Z = np.random.randn(100, d)   # 100 diem DL d chieu 

In [7]:
# from each point in one set to each point in another set, half fast 
def dist_ss_0(Z, X): 
    M, N = Z.shape[0], X.shape[0]    # get the number of data points in each set (M = 100, N = N)
    res = np.zeros((M, N))           # create a M x N (100 x N) matrix to contain distance 
                                     # ptu (i, j) la kcach tu diem thu i (cua Z) den diem thu j (cua N)
    for i in range(M):               # lap qua cac diem o tap Z
        res[i] = dist_ps_fast(Z[i], X)
    
    return res 

### C2: 

In [8]:
# from each point in one set to each point in another set, fast (vectorization + broadcasting)
def dist_ss_fast(Z, x): 
    X2 = np.sum(X*X, 1)     # square of l2 norm of each ROW of X 
    Z2 = np.sum(Z*Z, 1)     # square of l2 norm of each ROW of Z
    return Z2.reshape(-1, 1) + X2.reshape(1, -1) - 2*Z.dot(X.T)

### 2. So sanh ket qua cua hai ham: 

In [9]:
t1 = time()
D3 = dist_ss_0(Z, X)
print('half fast set2set running time:', time() - t1, 's')

t1 = time()
D4 = dist_ss_fast(Z, X)
print('fast set2set running time', time() - t1, 's')
print('Result difference:', np.linalg.norm(D3 - D4))

half fast set2set running time: 4.805836915969849 s
fast set2set running time 0.26488399505615234 s
Result difference: 6.173352312305638e-11


Nhan xet: Voi luong DL lon thi vectorization + broadcasting se cho ket qua nhanh hon nhieu. 

In [10]:
a = np.array([[1, 2, 3, 4],    
     [5, 6, 7, 8], 
     [2, 3, 4, 5]])

In [11]:
a.reshape(-1, 1)   # N x 1 vector 

array([[1],
       [2],
       [3],
       [4],
       [5],
       [6],
       [7],
       [8],
       [2],
       [3],
       [4],
       [5]])

In [12]:
a.reshape(1, -1)    # 1 x N vector 

array([[1, 2, 3, 4, 5, 6, 7, 8, 2, 3, 4, 5]])

In [13]:
a.reshape(-1, 1) + a.reshape(1, -1)     # tung hang ptu cot cua A cong voi tat ca ptu hang cua B 

array([[ 2,  3,  4,  5,  6,  7,  8,  9,  3,  4,  5,  6],
       [ 3,  4,  5,  6,  7,  8,  9, 10,  4,  5,  6,  7],
       [ 4,  5,  6,  7,  8,  9, 10, 11,  5,  6,  7,  8],
       [ 5,  6,  7,  8,  9, 10, 11, 12,  6,  7,  8,  9],
       [ 6,  7,  8,  9, 10, 11, 12, 13,  7,  8,  9, 10],
       [ 7,  8,  9, 10, 11, 12, 13, 14,  8,  9, 10, 11],
       [ 8,  9, 10, 11, 12, 13, 14, 15,  9, 10, 11, 12],
       [ 9, 10, 11, 12, 13, 14, 15, 16, 10, 11, 12, 13],
       [ 3,  4,  5,  6,  7,  8,  9, 10,  4,  5,  6,  7],
       [ 4,  5,  6,  7,  8,  9, 10, 11,  5,  6,  7,  8],
       [ 5,  6,  7,  8,  9, 10, 11, 12,  6,  7,  8,  9],
       [ 6,  7,  8,  9, 10, 11, 12, 13,  7,  8,  9, 10]])