# Regularized Logistic Regression
In this part of the exercise, you will implement regularized logistic regression to predict whether microchips from a fabrication plant passes quality assurance (QA). During QA, each microchip goes through various tests to ensure it is functioning correctly.
Suppose you are the product manager of the factory and you have the test results for some microchips on two different tests. From these two tests, you would like to determine whether the microchips should be accepted or rejected. To help you make the decision, you have a dataset of test results on past microchips, from which you can build a logistic regression model.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
!wget https://raw.githubusercontent.com/jortegon/UQROO-Inteligencia-Artificial/master/Regresion/ex2data2.txt
data = np.genfromtxt('ex2data2.txt', delimiter=',') #load the data
X = data[:, :2] # two scores
y = data[:, 2] # admited 
m = len(y)     # number of samples 

#Just for indexing purposes
X = X.reshape(m,2)
y = y.reshape(m,1)

#see the values for y
print(y[:5])

In [None]:
def plotData(X, y):
    '''PLOTDATA Plots the data points X and y into a new figure 
       PLOTDATA(x,y) plots the data points with + for the positive examples
       and o for the negative examples. X is assumed to be a Mx2 matrix.
    '''

    # Find Indices of Positive and Negative Examples 
    pos = np.where(y==1)
    neg = np.where(y == 0)
    # Plot Examples 
    posplot = plt.scatter(X[pos, 0], X[pos, 1], marker='+', label='y=1') 
    negplot = plt.scatter(X[neg, 0], X[neg, 1], marker='o', label='y=0') 
    # Labels and Legend
    plt.xlabel('Microchip Test 1')
    plt.ylabel('Microchip Test 2')

    # Specified in plot order
    plt.legend()

plotData is used to generate a figure where the axes are the two test scores, and the positive (y = 1, accepted) and negative (y = 0, rejected) examples are shown with different markers. This figure shows that our dataset cannot be separated into positive and negative examples by a straight-line through the plot. Therefore, a straight-forward application of logistic regression will not perform well on this dataset since logistic regression will only be able to find a linear decision boundary.


In [None]:
plotData(X,y)

## Feature mapping
One way to fit the data better is to create more features from each data point. In the provided function mapFeature.m, we will map the features into all polynomial terms of x1 and x2 up to the sixth power.

$$
mapFeature(x) = \left[ \begin{array} 1 \\ x_1 \\ x_2 \\ x_1^2 \\ x_1 x_2 \\ x_2^2 \\ x_1^3 \\ \vdots \\ x_1 x_2^5 \\ x^6_2 \end{array} \right]
$$
As a result of this mapping, our vector of two features (the scores on two QA tests) has been transformed into a 28-dimensional vector. A logistic regression classifier trained on this higher-dimension feature vector will have a more complex decision boundary and will appear nonlinear when drawn in our 2-dimensional plot.

While the feature mapping allows us to build a more expressive classifier, it also more susceptible to overfitting. In the next parts of the exercise, you will implement regularized logistic regression to fit the data and also see for yourself how regularization can help combat the overfitting problem.


In [None]:
def mapFeature(X1, X2):
    ''' MAPFEATURE Feature mapping function to polynomial features

        MAPFEATURE(X1, X2) maps the two input features
        to quadratic features used in the regularization exercise.

        Returns a new feature array with more features, comprising of 
        X1, X2, X1.^2, X2.^2, X1*X2, X1*X2.^2, etc..

        Inputs X1, X2 must be the same size
    '''


    degree = 6
    m = 1 if isinstance(X1,float) else len(X1)
    out = [np.ones(m)]
    for i in range(degree+1):
        for j in range(i+1):
            out.append( (X1**(i-j)) * (X2**j) )
    return np.array(out).T

In [None]:
# Note that mapFeature also adds a column of ones for us, so the intercept
# term is handled
X_ext = mapFeature(X[:,0], X[:,1])
print ( X_ext[1:3,:] )

# Initialize fitting parameters
initial_theta = np.zeros(X_ext.shape[1])

# Set regularization parameter lambda to 1
lambda_t = 1


## Cost function and gradient

Now you will implement code to compute the cost function and gradient for regularized logistic regression. Recall that the regularized cost function in logistic regression is
$$ J(\theta) = \frac{1}{m} \sum^m_{i=1} \left[ −y^{(i)}\log(h_\theta(x^{(i)})) − (1−y^{(i)})\log(1−h_\theta(x^{(i)})) \right] + \frac{\lambda}{2m}\sum_{j=1}^n \theta_j^2 $$

Note that you should not regularize the parameter $\theta_0$. The gradient of the cost function is a vector where the $j^{th}$ element is defined as follows:
$$ \frac{\partial J(\theta)}{\partial\theta_j} = \frac{1}{m}\sum^m_{i=1}(h_\theta(x^{(i)}) − y^{(i)})x^{(i)}_j ~~~~~ \mathrm{for} ~ j = 0 $$

$$ \frac{\partial J(\theta)}{\partial\theta_j} = \left( \frac{1}{m}\sum^m_{i=1}(h_\theta(x^{(i)}) − y^{(i)})x^{(i)}_j \right) + \frac{\lambda}{m}\theta_j  ~~~~~ \mathrm{for} ~ j \geq 1 $$

In [None]:
def sigmoid(z):
    '''
        SIGMOID Compute sigmoid functoon
        J = SIGMOID(z) computes the sigmoid of z.
    '''

    # You need to return the following variables correctly 
    if not isinstance(z, np.ndarray):
        z = np.array(z)
    one = np.ones(z.shape)

    # ====================== YOUR CODE HERE ======================
    # Instructions: Compute the sigmoid of each value of z (z can be a matrix,
    #               vector or scalar).

    return g

In [None]:
def costFunctionReg(theta, X, y, lambda_t):
    '''
        COSTFUNCTIONREG Compute cost and gradient for logistic regression
        J = COSTFUNCTIONREG(theta, X, y, lambda) computes the cost of using theta         as the parameter for logistic regression and the gradient of the cost
        w.r.t. to the parameters.
    '''

    # Initialize some useful values
    m = len(y) # number of training examples

    # You need to return the following variables correctly 
    J = 0
    
    # ====================== YOUR CODE HERE ======================
    # Instructions: Compute the cost of a particular choice of theta
    #               You should set J to the cost.
    #               Compute the partial derivatives and set grad to the partial
    #               derivatives of the cost w.r.t. each parameter in theta
    #
    # Note: grad should have the same dimensions as theta

    return J.item(), grad
 

In [None]:
# Compute and display initial cost and gradient for regularized logistic
# regression
cost , grad = costFunctionReg(initial_theta, X_ext, y, lambda_t)

# You should see that the cost is about 0.693.
print('Cost at initial theta (zeros): ', cost)
print(grad[:5])


In [None]:
# Using scipy optimize the cost function
from scipy.optimize import minimize

#  Run fmin_bfgs to obtain the optimal theta
#  This function will return theta and the cost 
options={'maxiter':400,'gtol': 1e-8, 'disp': True}
solution = minimize(costFunctionReg, initial_theta, args=(X_ext, y,lambda_t),jac=True, options=options)
cost = solution['fun']
theta = solution['x']
# Print theta to screen
print('Cost at theta found by minimize function: ', cost)


In [None]:
def plotDecisionBoundary(theta, X, y):
    '''
    PLOTDECISIONBOUNDARY Plots the data points X and y into a new figure with
    the decision boundary defined by theta
       PLOTDECISIONBOUNDARY(theta, X,y) plots the data points with + for the 
       positive examples and o for the negative examples. X is assumed to be 
       a Mx3 matrix, where the first column is an all-ones column for the 
       intercept.
    '''
    plotData(X, y)

    # Here is the grid range
    u = np.linspace(-1, 1.5, 50)
    v = np.linspace(-1, 1.5, 50)
    
    uv = [ [u_i, v_j] for u_i in u for v_j in v ]
    uv = np.array(uv)
    mapuv = mapFeature(uv[:,0],uv[:,1])
    z = np.dot(mapuv,theta)
    z = z.reshape(50,50)
    #print(z)

    # Plot z = 0
    # Notice you need to specify the range 0
    plt.contour(u, v, z.T, 0, linewidth=2)

    
    # Legend, specific for the exercise
    plt.legend()

def predict(theta, X):
    '''Predict whether the label
    is 0 or 1 using learned logistic
    regression parameters '''

    h = np.dot(mapFeature(X[:,0],X[:,1]),theta)

    p = np.where(h > 0.5,1,0)

    return p


In [None]:
# Compute accuracy on our training set
p = predict(theta, X);
print('Train Accuracy: ', np.count_nonzero(p == y.reshape(m))/m)

#Plot Boundary
plotDecisionBoundary(theta, X, y)