[![Binder](https://mybinder.org/badge_logo.svg)](https://lab.mlpack.org/v2/gh/mlpack/examples/master?urlpath=lab%2Ftree%2Fstudent_admission_regression_with_logistic_regression%2Fstudent-admission-logistic-regression-cpp.ipynb)

In [1]:
/**
 * @file student-admission-logistic-regression-cpp.ipynb
 *
 * A simple example usage of Logistic Regression (LR)
 * applied to the Student Admission dataset.
 *
 * We will use a Logistic-Regression model to predict whether a student
 * gets admitted into a university (i.e, the output classes are Yes or No),
 * based on their results on past exams.
 *
 * Data from Andrew Ng's Stanford University Machine Learning Course (Coursera).
 */

In [2]:
!wget -q https://lab.mlpack.org/data/student-admission.txt

In [3]:
#include <mlpack/xeus-cling.hpp>

#include <mlpack/core.hpp>
#include <mlpack/methods/logistic_regression/logistic_regression.hpp>

In [4]:
// Header files to create and show the plot.
#define WITHOUT_NUMPY 1
#include "matplotlibcpp.h"
#include "xwidgets/ximage.hpp"

namespace plt = matplotlibcpp;

In [5]:
using namespace mlpack;

In [6]:
using namespace mlpack::regression;

In [7]:
// Read the input data.
arma::mat input;
data::Load("student-admission.txt", input);

In [8]:
// Print the first 10 rows of the input data.
std::cout << input.submat(0, 0, input.n_rows - 1 , 10).t() << std::endl;

   34.6237   78.0247         0
   30.2867   43.8950         0
   35.8474   72.9022         0
   60.1826   86.3086    1.0000
   79.0327   75.3444    1.0000
   45.0833   56.3164         0
   61.1067   96.5114    1.0000
   75.0247   46.5540    1.0000
   76.0988   87.4206    1.0000
   84.4328   43.5334    1.0000
   95.8616   38.2253         0



Historical data from previous students: each student has two exams scores associated and the final admission result (1.0=yes, 0.0=no).

In [9]:
// Plot the input data.

// Get the indices for the labels  0.0 (not admitted).
arma::mat dataset0 = input.cols(arma::find(input.row(2) == 0));

// Get the data to for the indices.
std::vector<double> x0 = arma::conv_to<std::vector<double>>::from(dataset0.row(0));
std::vector<double> y0 = arma::conv_to<std::vector<double>>::from(dataset0.row(1));

// Get the indices for the label 1.0 (admitted).
arma::mat dataset1 = input.cols(arma::find(input.row(2) == 1.0));

// Get the data to for the indices.
std::vector<double> x1 = arma::conv_to<std::vector<double>>::from(dataset1.row(0));
std::vector<double> y1 = arma::conv_to<std::vector<double>>::from(dataset1.row(1));

plt::figure_size(800, 800);

// Set the label for the legend.
std::map<std::string, std::string> m0;
m0.insert(std::pair<std::string, std::string>("label", "not admitted"));
plt::scatter(x0, y0, 4, m0);

// Set the label for the legend.
std::map<std::string, std::string> m1;
m1.insert(std::pair<std::string, std::string>("label", "admitted"));
plt::scatter(x1, y1, 4, m1);

plt::xlabel("Exam 1 Score");
plt::ylabel("Exam 2 Score");
plt::title("Student admission vs. past two exams");
plt::legend();

plt::save("./plot.png");
auto im = xw::image_from_file("plot.png").finalize();
im

A Jupyter widget with unique id: 54466081d1d04875824c960df742de13

If the score of the first or the second exam was too low, it might be not enough to be admitted. You need a good balance.

This is the logistic function to model our admission:
$P(y=1) = \frac{1}{1 + e^{-(\beta_{0} + \beta_{1} \cdot x_{1} + ... + \beta_{n} \cdot x_{n}) }}$

where y is the admission result (0 or 1) and x are the exams scores.
Since in our example the admission decision is based on two exams (x1 and x2)
(two exams) we can set n = 2. The next step is to find the correct beta
parameters for the model by using our historical data as a training set.

In [10]:
// Split data into training data X (input) and y (labels) target variable.

// Labels are the last row.
arma::Row<size_t> labels =
    arma::conv_to<arma::Row<size_t>>::from(input.row(input.n_rows - 1));
input.shed_row(input.n_rows - 1);

In [11]:
// Create and train Logistic Regression model.
//
// For more information checkout https://mlpack.org/doc/mlpack-git/doxygen/classmlpack_1_1regression_1_1LogisticRegression.html
// or uncomment the line below.
// ?LogisticRegression<>
LogisticRegression<> lr(input, labels, 0.0 /* no regularization */);

In [12]:
// Final beta parameters.
lr.Parameters().print()

  -25.1613    0.2062    0.2015


In [13]:
// We can use these beta parameters to plot the decision boundary on the training data.
// We only need two points to plot a line, so we choose two endpoints:
// the min and the max among the X training data.
std::vector<double> xPlot;
xPlot.push_back(arma::min(input.row(0)) - 2);
xPlot.push_back(arma::max(input.row(0)) + 2);

std::vector<double> yPlot;
yPlot.push_back((-1.0 / lr.Parameters()(2)) * (lr.Parameters()(1) * xPlot[0] + lr.Parameters()(0)));
yPlot.push_back((-1.0 / lr.Parameters()(2)) * (lr.Parameters()(1) * xPlot[1] + lr.Parameters()(0)));

In [14]:
// Plot the decision boundary.

// Get the indices for the labels  0.0 (not admitted).
arma::mat dataset0 = input.cols(arma::find(labels == 0));

// Get the data to for the indices.
std::vector<double> x0 = arma::conv_to<std::vector<double>>::from(dataset0.row(0));
std::vector<double> y0 = arma::conv_to<std::vector<double>>::from(dataset0.row(1));

// Get the indices for the label 1.0 (admitted).
arma::mat dataset1 = input.cols(arma::find(labels == 1.0));

// Get the data to for the indices.
std::vector<double> x1 = arma::conv_to<std::vector<double>>::from(dataset1.row(0));
std::vector<double> y1 = arma::conv_to<std::vector<double>>::from(dataset1.row(1));

plt::figure_size(800, 800);
plt::scatter(x0, y0, 4);
plt::scatter(x1, y1, 4);

plt::plot(xPlot, yPlot);

plt::xlabel("Exam 1 Score");
plt::ylabel("Exam 2 Score");
plt::title("Student admission vs. past two exams");

plt::save("./decision boundary-plot.png");
auto im = xw::image_from_file("decision boundary-plot.png").finalize();
im

A Jupyter widget with unique id: e023c9b879234163b1f5498cc8154920

The blue line is our decision boundary. When your exams score lie below the line then
probably (that is the prediction) you will not be admitted to University.
If they lie above, probably you will. As you can see, the boundary is not predicting
perfectly on the training historical data.

In [15]:
// Let's say that my scores are 40 in the first exam and 78 in the second one.
arma::mat scores("40.0; 78.0");

arma::mat probabilities;
lr.Classify(scores, probabilities);

In [16]:
probabilities.print()

   0.7680
   0.2320


Looks like my probability to be admitted at University is only 23%.