In [1]:
#pragma cling load("/home/patrick/SimLib/simulator.so")
#pragma cling add_include_path("/home/patrick/SimLib/")
#include <Simulator.h>
#include <chrono>
#include <iostream>
using namespace std::chrono;

# Parameters
## Home Field Advantage
Home field advandtage is used to account for the fact that the home team is generally at an advantage over the away team. For example, if Duke beat Gonzaga by 2 points and Gonzaga was the home team, a home field advantage of 4 would cause Gonzaga to take the win by 2 points.
## Scale Large Margins
Scale large margins is used to account for large score margins negatively affecting rankings. When a highly ranked team defeats a low ranked team, the higher ranked team will usually put in back up players and play conservatively, so even if they win, their rating will almost always go down, and the opposite is true for the lower ranked team. To compensate for this, games with win margins >= 21 will be scaled down to have less effect on the rankings.


In [2]:
#include "xwidgets/xslider.hpp"
xw::slider<int> slider;
slider.min = 0;
slider.max = 15;
slider.value = 0;
slider.description = "Home Field Advantage";
slider.display();

#include "xwidgets/xcheckbox.hpp"
xw::checkbox checkbox;
checkbox.value = false;
checkbox.description = "Scale Large Margins?";
checkbox.indent = false;
checkbox.display();

A Jupyter widget

A Jupyter widget

In [3]:
int home_field_advantage = slider.value;
bool apply_scaling = checkbox.value;

## Results

In [4]:
auto start = high_resolution_clock::now();
run(home_field_advantage, apply_scaling);
auto stop = high_resolution_clock::now();
auto duration = duration_cast<microseconds>(stop - start);
std::cout << duration.count() << std::endl;

104 Gonzaga
75 Duke
329 Virginia
205 North Carolina
295 Texas Tech
169 Michigan St
114 Houston
137 Kentucky
289 Tennessee
330 Virginia Tech
17 Auburn
236 Purdue
168 Michigan
353 Youngstown St
195 Nevada
32 Buffalo
348 Wofford
92 Florida St
153 LSU
198 New Mexico St
149 Louisville
320 Utah St
326 VCU
184 Murray St
192 NC State
172 Mississippi St
127 Iowa St
133 Kansas
143 Lipscomb
346 Wisconsin
21 Belmont
47 Cincinnati
222 Oregon
158 Marquette
308 UCF
49 Clemson
279 St Mary's CA
160 Maryland
327 Vermont
328 Villanova
246 S Dakota St
292 Texas
285 Syracuse
89 Florida
337 Washington
67 Dayton
134 Kansas St
95 Furman
112 Hofstra
164 Memphis
218 Oklahoma
94 Fresno St
126 Iowa
305 UC Irvine
297 Toledo
142 Liberty
217 Ohio St
253 San Francisco
171 Mississippi
20 Baylor
194 Nebraska
188 N Kentucky
60 Creighton
287 TCU
123 Indiana
14 Arkansas
54 Colorado
352 Yale
11 Arizona St
177 Montana
66 Davidson
226 Penn St
83 ETSU
209 Northeastern
349 Wright St
288 Temple
166 Miami FL
321 Utah Valley
343 

In [5]:
constructCorrectVector("/home/patrick/march_madness_jupyter/Data/Correct.txt");
createLabels();
constructAxes();
double SSE = sse();
double MSE = mse();
//for some reason in Simulator.cpp SE always rounds to one but works in kernel
double SE = sqrt(MSE);

In [6]:
std::cout << 6 << std::endl;

6


In [7]:
std::cout << MSE << std::endl;

275.906


In [8]:
#include "xplot/xfigure.hpp"
#include "xplot/xmarks.hpp"
#include "xplot/xaxes.hpp"
#include "xplot/xtooltip.hpp"

In [9]:
xpl::linear_scale sx, sy;
xpl::lines line(sx, sy);
line.x = get_y_axis();
line.y = get_y_axis();
auto ax_x = xpl::axis::initialize(sx)
    .label("predicted")
    .finalize();
auto ax_y = xpl::axis::initialize(sy)
    .label("actual")
    .orientation("vertical")
    .side("left")
    .finalize();

In [10]:
xpl::tooltip def_tt,test;
def_tt.fields = std::vector<xtl::xoptional<std::string>>{"x","y"};
def_tt.labels = std::vector<xtl::xoptional<std::string>>{"Predicted Ranking","Actual Ranking"};

In [11]:
auto scatter1 = xpl::scatter::initialize(sx, sy)
   .x(get_x_axis())
   .y(get_y_axis())
   .unhovered_style(::xeus::xjson::parse(R"({"opacity": "0.5"})"))
   .tooltip(def_tt)
   .finalize();
scatter1.names = getLabels();

In [12]:
auto fig1 = xpl::figure::initialize()
    .padding_x(0.1)
    .padding_y(0.025)
    .finalize();
fig1.add_mark(scatter1);
fig1.add_axis(ax_x);
fig1.add_axis(ax_y);
fig1.add_mark(line);

# Actual vs. Predicted
This plot displays the top 64 teams with any particular teams x value representing their predicted ranking and y value representing their actual rankikng. A team(point) close to the line represents a good/accurate prediction while teams far away from the line represent inaccurate predictions. You can hover over a point on the plot and it will tell you the actual ranking and the ranking the model predicted.

In [13]:
fig1

A Jupyter widget

# Metrics

## Sum of Squared Errors
The method used to predict the ranking of the teams uses a least squares linear regression. For any team, we want to express the margin of victory as a linear function of the teams who played that game (Massey, 1997). Each equation will have an error term, which is the actual value - the predicted value. The best model will minimize the sum of squared error terms. This is a metric we can use to determine how well our predicted points fit the curve of the actual points. The lower the SSE, the better our model does. This is the equation:

$$SSE = \sum \limits _{i=1} ^n(y_i-\hat{y_i})^2$$

Where y is the actual ranking and $\hat{y}$ is the predicted ranking.

In [14]:
#include <iostream>
std::cout << "SSE = " << SSE << "\n" << "With Home Field Advantage: " << home_field_advantage
<< "\n" << "And Margin of Victory: " << std::boolalpha << apply_scaling << std::endl;

SSE = 17658
With Home Field Advantage: 0
And Margin of Victory: false


## Mean Squared Error
A more interpretable metric for a regression is the mean squared error. It is essentially the same as SSE, but gives us an average squared error for any point. The benefit to this is that the metric can go *down* as the number of points goes up. This is better because with SSE as you add points, the number will always go up, even if you add points that fit the model well. This is the equation for Mean Squared Errors:

$$MSE = \frac{1}{n} \sum \limits _{i=1} ^n(y_i-\hat{y_i})^2$$

In [15]:
std::cout << "MSE = " << MSE << "\n" << "With Home Field Advantage: " << home_field_advantage
<< "\n" << "And Margin of Victory: " << std::boolalpha << apply_scaling << std::endl;

MSE = 275.906
With Home Field Advantage: 0
And Margin of Victory: false


## Standard Error
Perhaps the most readable metric for a regression is the standard error. There are many different derivations of this equations, but the most simple one is: 

$$\sigma = \sqrt{\frac{\sum \limits _{i=1} ^n(y_i-\hat{y_i})^2}{n}}$$

This gives us the *standard deviation of the errors of predections*

In [16]:
std::cout << "SE = " << SE << "\n" << "With Home Field Advantage: " << home_field_advantage
<< "\n" << "And Margin of Victory: " << std::boolalpha << apply_scaling << std::endl;

SE = 16.6104
With Home Field Advantage: 0
And Margin of Victory: false
