# Bài tập 2


Hồng Thanh Hoài - 1612855

---

## 1-2: Hoeﬀding Inequality

**Import các thư viện cần thiết**

In [1]:
import numpy as np
from math import exp

**Tạo kết quả cho thí nghiệm**

In [2]:
%%time
n_coins = 1000 # number of coins
N = 10 # number of flipping for each coin
mu = 0.5
n_repeats = 100000 # run the experiment 100,000 times

# Divide by @n_flipped to get nu - the fraction of heads obtained
# at each time for each coin.
exper = np.random.binomial(N, mu, (n_repeats, n_coins)) / N

v1 = exper[:, 0]
v_rand = exper[n_repeats - 1, np.random.choice(n_coins, size=n_repeats)]
v_min = np.min(exper, axis=1)

CPU times: total: 4.03 s
Wall time: 4.04 s


**Tính giá trị trung bình mảng các $\nu$ của coin $c_{min}$**

In [3]:
print('mean_vmin = %f' % (np.mean(v_min)))

mean_vmin = 0.037779


<font color=blue>**Câu 1: Ta thấy kết quả gần nhất với đáp án [b] 0.01.**</font>

**Kiểm tra một coin có tuân theo bất đẳng thức Hoeffding**

In [4]:
def check_hoeffding(v, mu):
    """
    Check if a coin has has a distribution of ν that satisﬁes the (single-bin) Hoeﬀding Inequality.
    
    Parameters
    ----------
    v : numpy array, shape (n_repeats, 1)
        The matrix of v's distribution of a coin at each trial.
    mu : float
        The real distribution of ﬂipping a fair coin.
    
    Returns
    -------
    satisfy : boolean
        The result after checking.
    """
    # Init satisfy
    satisfy = True
    for epsilon in np.arange(0.01, 0.5, 0.01):
        if np.mean(abs(v - mu) > epsilon) > 2*exp(-2*N*epsilon**2):
            satisfy = False
            break
    return satisfy

**Ta kiểm tra $c_{1}$, $c_{rand}$ và $c_{min}$**

In [5]:
print('check satisfying hoeffding inequality')
print('c1: \t%s' % (check_hoeffding(v1, mu)))
print('c_rand: %s' % (check_hoeffding(v_rand, mu)))
print('c_min: \t%s' % (check_hoeffding(v_min, mu)))

check satisfying hoeffding inequality
c1: 	True
c_rand: True
c_min: 	False


<font color=blue>**Câu 2: Ta thấy kết quả là đáp án [d] $c_{1}$ and $c_{rand}$.**</font>

## 3-4: Error and Noise

<font color=blue>**Câu 3:**</font>
Đề bài yêu cầu tìm xác suất lỗi khi $h$ xấp xỉ $y$ (có nhiễu).
Ta có $\mu$ là xác suất lỗi khi $h$ xấp xỉ một hàm tất định.
Có hai trường hợp xảy ra khi có nhiễu:
+ $h=f(x)$ (_hypothesis_ h được chấp nhận) nhưng $y\neq f(x)$, xác suất là: $(1-\mu)*(1-\lambda)$
+ $h\neq f(x)$ nhưng $y=f(x)$, xác suất là: $\mu*\lambda$  

Vậy, xác suất lỗi khi $h$ xấp xỉ $y$ là: $P=(1-\mu)*(1-\lambda)+\mu*\lambda$  
<font color=blue>**Do đó, đán án đúng là [e] $(1-\lambda)*(1-\mu)+\mu*\lambda$.**</font>

<font color=blue>**Câu 4:**</font>
Tại $\lambda=0.5$, giá trị $\mu$ trong $P$ sẽ bị triệt tiêu nên $h$ sẽ độc lập với $\mu$. Khi đó, nếu $h$ dự đoán đúng $100\%$ thì vẫn sẽ có $50\%$ sai (do nhiễu). Ngược lại, nếu $h$ dự đoán sai $100\%$ thì vẫn sẽ $50\%$ đúng.  
<font color=blue>**Do đó, đáp án đúng là [b] 0.5.**</font>

## 5-7: Linear Regression

**Hàm phát sinh ra `target_w`, vector tham số của $f$**

In [6]:
def generate_target_w():
    """
    Generates target_w from two random, uniformly distributed points in [-1, 1] x [-1, 1].
    
    Returns
    -------
    target_w : numpy array, shape (3, 1) 
        The vector of parameters of f.
    """
    # Generate two points from a uniform distribution over [-1, 1]x[-1, 1]
    p1 = np.random.uniform(-1, 1, 2)
    p2 = np.random.uniform(-1, 1, 2)
    # Compute the target W from these two points
    target_w = np.array([p1[1]*p2[0] - p1[0]*p2[1], p2[1] - p1[1], p1[0] - p2[0]]).reshape((-1, 1))
    
    return target_w

**Hàm phát sinh ra tập dữ liệu**

In [7]:
def generate_data(N, target_w):
    """
    Generates a data set by generating random inputs and then using target_w to generate the 
    corresponding outputs.
    
    Parameters
    ----------
    N : int
        The number of examples.
    target_w : numpy array, shape (3, 1) 
        The vector of parameters of f.
    noise : float
        The percentage of noise in data set.
    
    Returns
    -------
    X : numpy array, shape (N, 3)
        The matrix of input vectors (each row corresponds to an input vector); the first column of 
        this matrix is all ones.
    ys : numpy array, shape (N, 1)
        The vector of outputs.        
    """
    bad_data = True # `bad_data = True` means: data contain points on the target line 
                    # (this rarely happens, but just to be careful)
                    # -> y's of these points = 0 (with np.sign); 
                    #    we don't want this (y's of data must be -1 or 1)
                    # -> re-generate data until `bad_data = False`
    
    while bad_data == True:
        X = np.random.uniform(-1, 1, (N, 2))
        X = np.hstack((np.ones((N, 1)), X)) # Add 'ones' column
        ys = np.sign(np.dot(X, target_w))
        if (0 not in ys): # Good data
            bad_data = False
    
    return X, ys

**Hàm chạy Linear Regression**

## 1-3: Generalization Error

In [8]:
def run_LR(X, ys):
    """
    Runs PLA.
    
    Parameters
    ----------
    X : numpy array, shape (N, 3)
        The matrix of input vectors (each row corresponds to an input vector); the first column of 
        this matrix is all ones.
    ys : numpy array, shape (N, 1)
        The vector of outputs.
    
    Returns
    -------
    w : numpy array, shape (3, 1) 
        The vector of parameters of g.
    """
    X_dagger = np.dot(np.linalg.pinv(np.dot(X.T, X)), X.T)
    w = np.dot(X_dagger, ys)
    return w

In [9]:
target_w = generate_target_w()
X, ys = generate_data(2, target_w)
w = run_LR(X, ys)
X

array([[ 1.        ,  0.83941761, -0.24939166],
       [ 1.        ,  0.85613256,  0.61430973]])

**Hàm chạy cho câu 5 và câu 6**

In [10]:
def main_LR_5_6():
    """
    Parameters
    ----------
    N : int
        The number of training examples.
    noise : float
        The percentage of noise in data set.
    """
    num_runs = 1000
    # The average in-sample error of g - the final hypothesis picked by Linear Regression
    avg_ein = 0.0
    # The average out-of-sample error of g
    avg_eout = 0.0
    # Number of in-sample points
    n_in = 100
    # Number of out-sample points (estimation purpose)
    n_out = 1000
    
    for r in range(num_runs):
        # Generate target_w
        target_w = generate_target_w()
        # Generate training set
        X_in, y_in = generate_data(n_in, target_w)
        # Generate out-of-sample dataset
        X_out, y_out = generate_data(n_out, target_w)
        
        # Run Linear Regression to pick g
        w = run_LR(X_in, y_in)
        
        # Predict on training set with found w
        predictions_in = np.dot(X_in, w)
        # Predict on out-of-sample set with found w
        predictions_out = np.dot(X_out, w)
        
        # Compute binary error between y_in/ y_out - correct output & predictions
        ein = np.mean(y_in != np.sign(predictions_in))
        eout = np.mean(y_out != np.sign(predictions_out))
        
        # Update average error
        avg_ein += (ein * 1.0 / num_runs)
        avg_eout += (eout * 1.0 / num_runs)
    
    # Print results
    print('avg_ein = %f' % (avg_ein))
    print('avg_eout = %f' % (avg_eout))

In [11]:
main_LR_5_6()

avg_ein = 0.038040
avg_eout = 0.047869


<font color=blue>**Câu 5: Ta thấy kết quả gần nhất với đáp án [c] 0.01.**</font>

<font color=blue>**Câu 6: Ta thấy kết quả gần nhất với đáp án [c] 0.01.**</font>

**Dùng Linear Regression để tìm `w` khởi tạo cho PLA**

In [12]:
def run_PLA(X, ys):
    """
    Runs PLA.
    
    Parameters
    ----------
    X : numpy array, shape (N, 3)
        The matrix of input vectors (each row corresponds to an input vector); the first column of 
        this matrix is all ones.
    ys : numpy array, shape (N, 1)
        The vector of outputs.
    
    Returns
    -------
    num_iterations : int
        The number of iterations PLA takes to converge.
    """
    # Init w using Linear Regression
    w = run_LR(X, ys)
    iteration = 0
    
    while True:
        iteration += 1
        # Compute sign for each input vector with current w
        h = np.sign(np.dot(X, w))
        
        # Find misclassified if existed
        for i in range(0, X.shape[0]):
            if h[i] != ys[i]:
                change = X[i]*ys[i]
                # Change current w to make X[i] classified
                w = w + change.reshape((-1, 1))
                break
        
        # Check if converged
        if np.array_equal(h, ys) == True:
            break
    
    return iteration

**Hàm main (PLA)**

In [13]:
def main_PLA(N):
    """
    Parameters
    ----------
    N : int
        The number of training examples.
    """
    num_runs = 1000
    # The average number of iterations PLA takes to converge
    avg_num_iterations = 0.0
    
    for r in range(num_runs):
        # Generate target_w
        target_w = generate_target_w()
        
        # Generate training set
        X, ys = generate_data(N, target_w)
        
        # Run PLA to completely separates all the in-sample points
        num_iterations = run_PLA(X, ys)
        
        # Update average num_iterations
        avg_num_iterations += (num_iterations * 1.0 / num_runs)
    
    # Print results
    print('avg_num_iterations = %f' % (avg_num_iterations))

**Chạy với $N=10$**

In [14]:
main_PLA(N=10)

avg_num_iterations = 7.175000


<font color=blue>**Câu 7: Ta thấy kết quả gần nhất với đáp án [a] 1.**</font>

## 8-10: Nonlinear Transformation

**Hàm phát sinh tập huấn luyện với độ lỗi `noise`, `target function` là $f(x_{1}, x_{2})=sign(x_{1}^2+x_{1}^2-0.6)$**

In [15]:
def generate_data_with_noise(N, noise):
    """
    Generates a data set by generating random inputs and then using target_w to generate the 
    corresponding outputs.
    
    Parameters
    ----------
    N : int
        The number of examples.
    noise : float
        The percentage of noise in data set.
    
    Returns
    -------
    X : numpy array, shape (N, 3)
        The matrix of input vectors (each row corresponds to an input vector); the first column of 
        this matrix is all ones.
    ys : numpy array, shape (N, 1)
        The vector of outputs.        
    """
    bad_data = True # `bad_data = True` means: data contain points on the target line 
                    # (this rarely happens, but just to be careful)
                    # -> y's of these points = 0 (with np.sign); 
                    #    we don't want this (y's of data must be -1 or 1)
                    # -> re-generate data until `bad_data = False`
    
    while bad_data == True:
        X = np.random.uniform(-1, 1, (N, 2))
        X = np.hstack((np.ones((N, 1)), X)) # Add 'ones' column
        ys = np.sign(X[:, 1]**2 + X[:, 2]**2 - 0.6)
        if (0 not in ys): # Good data
            bad_data = False
            
    # Generate simulated noise by ﬂipping the sign of the output randomly
    indices = np.random.choice(np.arange(N), size=int(noise*N), replace=False)
    ys[indices] = -ys[indices]
    
    return X, ys

**Hàm chuyển từ mảng `X` gồm các vector input $x$ ban đầu sang mảng `Z` gồm các vector đặc trưng $z$. Theo đề, $z = [1, x_1, x_2, x_1x_2, x_1^2, x_2^2]^T$**

In [16]:
def transform_to_nonlinear(X, ys):
    """
    Transform the N = 1000 training data into the nonlinear feature vector.
    
    Parameters
    ----------
    N : int
        The number of examples.
    noise : float
        The percentage of noise in data set.
        
    Returns
    -------
    Z : numpy array, shape (N, 6)
        The matrix of data after transforming (each row corresponds to an input vector);
        the first column of this matrix is all ones.
    ys : numpy array, shape (N, 1)
        The vector of correct outputs.     
    w : numpy array, shape (6, 1)
        The average w that found
    """
    x1x2 = np.array([X[:, 1]*X[:, 2]])
    # x1 square
    x1sqr = np.array([X[:, 1]**2])
    # x2 square
    x2sqr = np.array([X[:, 2]**2])
    Z = np.concatenate((X, x1x2.T, x1sqr.T, x2sqr.T), axis=1)
    print(Z.shape)
    return Z, ys

**Hàm dùng cho câu 8**

In [17]:
def main_LR_8(N, noise):
    """
    Parameters
    ----------
    X   : numpy array, shape (N, 3)
        The matrix of input vectors (each row corresponds to an input vector); the first column of 
        this matrix is all ones.
    ys  : numpy array, shape (N, 1)
        The vector of outputs.
    w   : numpy array
        The vector of parameters of g.
    """
    num_runs = 1000
    # The average test error of g - the final hypothesis picked by Linear Regression
    avg_err = 0.0
    
    for r in range(num_runs):
        # Generate training set with noise & specific target function
        X, ys = generate_data_with_noise(N, noise)
        
        # Run Linear Regression to pick g if not in the input
        w = run_LR(X, ys)
        
        # Prediction on X with w found
        predictions = np.dot(X, w)
        
        # Compute binary error between ys - correct output & predictions
        err = np.mean(ys != np.sign(predictions))
        
        # Update average error
        avg_err += (err * 1.0 / num_runs)
    
    # Print results
    print('avg_err = %f' % (avg_err))

**Chạy với $N=1000$, $noise=0.1$**

In [18]:
main_LR_8(N=1000, noise=0.1)

avg_err = 0.505361


<font color=blue>**Câu 8: Ta thấy kết quả gần nhất với đáp án [d] 0.5.**</font>

**Ta tìm `w` khi chuyển sang _nonlinear feature vector_**

In [19]:
num_runs = 100
avg_w = 0.0
N = 1000
noise = 0.1

for r in range(num_runs):
    X, ys = generate_data_with_noise(N, noise)
    Z, ys = transform_to_nonlinear(X, ys)
    avg_w += (run_LR(Z, ys) * 1.0 / num_runs)

print(avg_w)

(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)


**Ta so sánh độ lỗi của `w` tìm được với `w` ở các đáp án**

In [20]:
num_runs = 1000
N = 1000
noise = 0.1

avg_err_w_found = 0.0
avg_err_w_a = 0.0
avg_err_w_b = 0.0
avg_err_w_c = 0.0
avg_err_w_d = 0.0
avg_err_w_e = 0.0

w_found = avg_w
w_a = np.array([-1, -0.05, 0.08, 0.13, 1.5, 1.5])
w_b = np.array([-1, -0.05, 0.08, 0.13, 1.5, 15])
w_c = np.array([-1, -0.05, 0.08, 0.13, 15, 1.5])
w_d = np.array([-1, -1.5, 0.08, 0.13, 0.05, 0.05])
w_e = np.array([-1, -0.05, 0.08, 1.5, 0.15, 0.15])
    
for r in range(num_runs): 
    X, ys = generate_data_with_noise(N, noise)
    Z, ys = transform_to_nonlinear(X, ys)
    
    # Predict on Z with w found
    predictions_w_found = np.dot(Z, w_found)
    # Predict on Z with w_a
    predictions_w_a = np.dot(Z, w_a)
    # Predict on Z with w_b
    predictions_w_b = np.dot(Z, w_b)
    # Predict on Z with w_c
    predictions_w_c = np.dot(Z, w_c)
    # Predict on Z with w_d
    predictions_w_d = np.dot(Z, w_d)
    # Predict on Z with w_e
    predictions_w_e = np.dot(Z, w_e)
        
    # Compute binary error between ys - correct output & predictions
    err_w_found = np.mean(ys != np.sign(predictions_w_found))
    err_w_a = np.mean(ys != np.sign(predictions_w_a))
    err_w_b = np.mean(ys != np.sign(predictions_w_b))
    err_w_c = np.mean(ys != np.sign(predictions_w_c))
    err_w_d = np.mean(ys != np.sign(predictions_w_d))
    err_w_e = np.mean(ys != np.sign(predictions_w_e))
        
    # Update average error
    avg_err_w_found += (err_w_found * 1.0 / num_runs)
    avg_err_w_a += (err_w_a * 1.0 / num_runs)
    avg_err_w_b += (err_w_b * 1.0 / num_runs)
    avg_err_w_c += (err_w_c * 1.0 / num_runs)
    avg_err_w_d += (err_w_d * 1.0 / num_runs)
    avg_err_w_e += (err_w_e * 1.0 / num_runs)
    
# Print results
print('avg_err_w_found = %f' % (avg_err_w_found))
print('a: %f\t b: %f\t c: %f\t d: %f\t e: %f\t' % (avg_err_w_a, avg_err_w_b, avg_err_w_c, avg_err_w_d, avg_err_w_e))

(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)


(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)
(1000, 6)


<font color=blue>**Câu 9: Ta thấy kết quả gần nhất với đáp án [a] $g(x_{1},x_{2}) = sign(−1−0.05x_{1} + 0.08x_{2} + 0.13x_{1}x_{2} + 1.5x_{1}^2 + 1.5x_{2}^2)$.**</font>

**$E_{out}$ với `w` tìm được ở câu 9 (yêu cầu của câu 10) chính là giá trị của `avg_err_w_found` ở block code phía trên:**
**$E_{out}\approx 0.1$**

<font color=blue>**Câu 10: Ta thấy kết quả gần nhất với đáp án [b] 0.1.**</font>

## Tham khảo: 
1. [Linear Regression](https://machinelearningcoban.com/2016/12/28/linearregression/)
2. [NumPy v1.16 Manual](https://docs.scipy.org/doc/numpy/)
3. [Markdown Cheatsheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet)