In [None]:
#pragma cling add_include_path("/home/jovyan/include_files/")
#include "Game.cpp"
#include "Simulator.cpp"
#include "Team.cpp"

# Parameters
## Home Field Advantage
Home field advandtage is used to account for the fact that the home team is generally at an advantage over the away team. For example, if Duke beat Gonzaga by 2 points and Gonzaga was the home team, a home field advantage of 4 would cause Gonzaga to take the win by 2 points.
## Scale Large Margins
Scale large margins is used to account for large score margins negatively affecting rankings. When a highly ranked team defeats a low ranked team, the higher ranked team will usually put in back up players and play conservatively, so even if they win, their rating will almost always go down, and the opposite is true for the lower ranked team. To compensate for this, games with win margins >= 21 will be scaled down to have less effect on the rankings.


In [None]:
#include "xwidgets/xslider.hpp"
xw::slider<int> slider;
slider.min = 0;
slider.max = 15;
slider.value = 0;
slider.description = "Home Field Advantage";
slider.display();

#include "xwidgets/xcheckbox.hpp"
xw::checkbox checkbox;
checkbox.value = false;
checkbox.description = "Scale Large Margins?";
checkbox.indent = false;
checkbox.display();

In [None]:
int home_field_advantage = slider.value;
bool apply_scaling = checkbox.value;

## Results

In [None]:
run(home_field_advantage, apply_scaling);

In [None]:
constructCorrectVector("/home/jovyan/Data/Correct.txt");
createLabels();
double sse = constructAxes();
double mse = sse/64;
double se = pow(mse, .5);

In [None]:
#include "xplot/xfigure.hpp"
#include "xplot/xmarks.hpp"
#include "xplot/xaxes.hpp"
#include "xplot/xtooltip.hpp"

In [None]:
xpl::linear_scale sx, sy;
xpl::lines line(sx, sy);
line.x = y_axis;
line.y = y_axis;
auto ax_x = xpl::axis::initialize(sx)
    .label("predicted")
    .finalize();
auto ax_y = xpl::axis::initialize(sy)
    .label("actual")
    .orientation("vertical")
    .side("left")
    .finalize();

In [None]:
xpl::tooltip def_tt,test;
def_tt.fields = std::vector<xtl::xoptional<std::string>>{"x","y"};
def_tt.labels = std::vector<xtl::xoptional<std::string>>{"Predicted Ranking","Actual Ranking"};

In [None]:
auto scatter1 = xpl::scatter::initialize(sx, sy)
   .x(x_axis)
   .y(y_axis)
   .unhovered_style(::xeus::xjson::parse(R"({"opacity": "0.5"})"))
   .tooltip(def_tt)
   .finalize();
scatter1.names = labels;

In [None]:
auto fig1 = xpl::figure::initialize()
    .padding_x(0.1)
    .padding_y(0.025)
    .finalize();
fig1.add_mark(scatter1);
fig1.add_axis(ax_x);
fig1.add_axis(ax_y);
fig1.add_mark(line);

# Actual vs. Predicted
This plot displays the top 64 teams with any particular teams x value representing their predicted ranking and y value representing their actual rankikng. A team(point) close to the line represents a good/accurate prediction while teams far away from the line represent inaccurate predictions. You can hover over a point on the plot and it will tell you the actual ranking and the ranking the model predicted.

In [None]:
fig1

# Metrics

## Sum of Squared Errors
The method used to predict the ranking of the teams uses a least squares linear regression. For any team, we want to express the margin of victory as a linear function of the teams who played that game (Massey, 1997). Each equation will have an error term, which is the actual value - the predicted value. The best model will minimize the sum of squared error terms. This is a metric we can use to determine how well our predicted points fit the curve of the actual points. The lower the SSE, the better our model does. This is the equation:

$$SSE = \sum \limits _{i=1} ^n(y_i-\hat{y_i})^2$$

Where y is the actual ranking and $\hat{y}$ is the predicted ranking.

In [None]:
cout << "SSE = " << sse << "\n" << "With Home Field Advantage: " << home_field_advantage
<< "\n" << "And Margin of Victory: " << std::boolalpha << apply_scaling << endl;

## Mean Squared Error
A more interpretable metric for a regression is the mean squared error. It is essentially the same as SSE, but gives us an average squared error for any point. The benefit to this is that the metric can go *down* as the number of points goes up. This is better because with SSE as you add points, the number will always go up, even if you add points that fit the model well. This is the equation for Mean Squared Errors:

$$MSE = \frac{1}{n} \sum \limits _{i=1} ^n(y_i-\hat{y_i})^2$$

In [None]:
cout << "MSE = " << mse << "\n" << "With Home Field Advantage: " << home_field_advantage
<< "\n" << "And Margin of Victory: " << std::boolalpha << apply_scaling << endl;

## Standard Error
Perhaps the most readable metric for a regression is the standard error. There are many different derivations of this equations, but the most simple one is: 

$$\sigma = \sqrt{\frac{\sum \limits _{i=1} ^n(y_i-\hat{y_i})^2}{n}}$$

This gives us the *standard deviation of the errors of predections*

In [None]:
cout << "SE = " << se << "\n" << "With Home Field Advantage: " << home_field_advantage
<< "\n" << "And Margin of Victory: " << std::boolalpha << apply_scaling << endl;