# Machine Learning: Programming Exercise 2

## 1 Write a function to generate an m+1 dimensional data set, of size n, consisting of m continuous independent variables (X) and one dependent binary variable (Y) defined as

$Y = \begin{cases} 
      1 & \text{if } p(y = 1|\mathbf{x}) = \frac{1}{1 + \exp(-\mathbf{x}^\top\mathbf{\beta})} > 0.5\\
      0 & \text{otherwise}
   \end{cases}$

where,  

- β is a random vector of dimensionality m + 1, representing the coefficients of the linear relationship between X and Y, and
- $\forall i \in [1, n], x_{i0} = 1$

To add noise to the labels (Y) generated, we assume a Bernoulli distribution with probability of success, θ, that determines whether or not the label generated, as above, is to be flipped. The larger the value of θ, the greater is the noise.

The function should take the following parameters:  

- θ: The probability of flipping the label, Y  
- n: The size of the data set
- m: The number of indepedent variables  

Output from the function should be:  

- X: An n × m numpy array of independent variable values (with a 1 in the first column)
- Y: The n × 1 binary numpy array of output values  
- β: The random coefficients used to generate Y from X




In [None]:
import numpy as np

def generate_data(theta, datasetSize, m):                 # Here theta is the probability of flipping the label Y
    beta = np.random.rand(m+1)
    X = np.random.rand(datasetSize, m) 
    X = np.hstack((np.ones((datasetSize, 1)), X)) 
    h = 1 / (1 + np.exp(-X @ beta)) 
    Y=np.zeros(datasetSize) 
    for i in range (0,datasetSize,1) : 
      if h[i]>0.5 :
        Y[i]=Y[i]+1
    noise = np.random.binomial(1, theta, datasetSize)
    #This will switch values of Y if noise is 1
    for i in range(0,datasetSize,1) :
      if noise[i]==1 and Y[i]==1 :
        Y[i] -= 1
      elif noise[i]==1 and Y[i]==0 :
        Y[i] += 1
      else :
        Y[i]=Y[i]+0
    Y = Y.reshape(datasetSize,1)
    return X, Y, beta

generate_data(0.5, 5, 3)

## 2 Write a function that learns the parameters of a logistic regression function given inputs  

- X: An n × m numpy array of independent variable values  
- Y: The n × 1 binary numpy array of output values
- k: the number of iteractions (epochs)  
- τ: the threshold on change in Cost function value from the previous to current iteration 
- λ: the learning rate for Gradient Descent  

The function should implement the Gradient Descent algorithm as discussed in class that initialises β with random values and then updates these values in each iteraction by moving in the the direction defined by the partial derivative of the cost function with respect to each of the coefficients. The function should use only one loop that ends after a number of iterations (k) or a threshold on the change in cost function value (τ).

The output should be a m + 1 dimensional vector of coefficients and the final cost function value.  


In [126]:
import numpy as np

def logistic_regression(X, Y, k, tau, lamda):    # Here tau is the threshold on change in Cost function value from the previous to current iteration. And, lamda is the learning rate for gradient descent.
 
    n, m = X.shape
    X = np.insert(X, 0, 1, axis=1)
    beta = np.random.rand(m + 1)
    cost = float("inf")
    
    for i in range(k):
        Z = np.dot(X, beta)
        H = 1/(1 + np.exp(-Z))
        error = H - Y
        gradient = np.dot(X.T, error) / n

        cost_new = -np.mean(Y * np.log(H) + (1 - Y) * np.log(1 - H))
        if abs(cost - cost_new) < tau:
            break
        cost = cost_new
        beta -= lamda * gradient
        
    return beta, cost

y = np.arange(3)
x = np.arange(15)
x = x.reshape(3,5)
k=2000
t =0.3
l=0.01
logistic_regression(x, y, k, t, l)

3
