# Rundown and Sweep Analysis

This notebook runs through a single data set's algorithms and reports on the results.

Data set in operation:

In [None]:
dataset = Sys.getenv("LK_DATASET")
dataset

## Setup

First, we need to load useful packages.

In [None]:
library(dplyr)
library(ggplot2)

In [None]:
options(repr.plot.width=7, repr.plot.height=5)

Then we can set up the file names for the input data itself.

In [None]:
common.fn = sprintf('build/common-%s.csv', dataset)
iicf.fn = sprintf('build/sweep-item-item-%s.csv', dataset)
svd.fn = sprintf('build/sweep-funksvd-%s.csv', dataset)

## Common Algorithms

For every data set, we run some common algorithms. Load the data!

In [None]:
common.results = read.csv(common.fn)
head(common.results)

Because we cross-validated the data, we have multiple partitions per algorithm.  Let's box-plot some metrics!

In [None]:
ggplot(common.results) +
    aes(x=Algorithm, y=RMSE.ByUser) +
    geom_boxplot() +
    ggtitle("Per-user RMSE")

In [None]:
ggplot(common.results) +
    aes(x=Algorithm, y=Predict.nDCG) +
    geom_boxplot() +
    ggtitle("Predict nDCG (rank effectiveness)")

In [None]:
ggplot(common.results) +
    aes(x=Algorithm, y=MRR) +
    geom_boxplot() +
    ggtitle("Mean Reciprocal Rank")

In [None]:
ggplot(common.results) +
    aes(x=Algorithm, y=MAP) +
    geom_boxplot() +
    ggtitle("Mean Average Precision")

## Sweeping Item-Item Parameters

This next experiment runs a grid search of a couple of item-item parameters.

In [None]:
itemitem.data = read.csv(iicf.fn)
head(itemitem.data)

In [None]:
itemitem.agg = itemitem.data %>%
    group_by(Algorithm, NNbrs, Normalization) %>%
    summarize(Count=n(),
              RMSE=mean(RMSE.ByUser),
              Predict.nDCG=mean(Predict.nDCG),
              TopN.nDCG=mean(TopN.nDCG),
              MAP=mean(MAP),
              MRR=mean(MRR))
head(itemitem.agg)

In [None]:
ggplot(itemitem.agg) +
    aes(x=NNbrs, y=RMSE, color=Normalization) +
    geom_line() + geom_point() +
    ggtitle("RMSE by neighborhood size")

In [None]:
ggplot(itemitem.agg) +
    aes(x=NNbrs, y=Predict.nDCG, color=Normalization) +
    geom_line() + geom_point() +
    ggtitle("Predict NDCG by neighborhood size")

In [None]:
ggplot(itemitem.agg) +
    aes(x=NNbrs, y=MRR, color=Normalization) +
    geom_line() + geom_point() +
    ggtitle("MRR by neighborhood size")

In [None]:
ggplot(itemitem.agg) +
    aes(x=NNbrs, y=MAP, color=Normalization) +
    geom_line() + geom_point() +
    ggtitle("MAP by neighborhood size")

## Sweeping FunkSVD Parameters

This next experiment runs a grid search of a couple of FunkSVD parameters.

In [None]:
svd.data = read.csv(svd.fn)
head(svd.data)

In [None]:
svd.agg = svd.data %>%
    group_by(Algorithm, NFeatures, Regularization=as.factor(Regularization)) %>%
    summarize(Count=n(),
              RMSE=mean(RMSE.ByUser),
              Predict.nDCG=mean(Predict.nDCG),
              TopN.nDCG=mean(TopN.nDCG),
              MAP=mean(MAP),
              MRR=mean(MRR))
head(svd.agg)

In [None]:
ggplot(svd.agg) +
    aes(x=NFeatures, y=RMSE, color=Regularization) +
    geom_line() + geom_point() +
    ggtitle("RMSE by feature count")

In [None]:
ggplot(svd.agg) +
    aes(x=NFeatures, y=Predict.nDCG, color=Regularization) +
    geom_line() + geom_point() +
    ggtitle("Predict nDCG by feature count")

In [None]:
ggplot(svd.agg) +
    aes(x=NFeatures, y=MRR, color=Regularization) +
    geom_line() + geom_point() +
    ggtitle("MRR by feature count")

In [None]:
ggplot(svd.agg) +
    aes(x=NFeatures, y=MAP, color=Regularization) +
    geom_line() + geom_point() +
    ggtitle("MAP by feature count")