# Bulding a recommender system


* The first step is to generate recommendations candidates: items we think might be interesting
to the user based on their past behavior.
* Candidate ranking: many candidates will appear more than once and need to be combined
in some way, maybe boosting their score in the process. The main goal is to find a optimal
ranking of candidate at this stage. This ranking stage might also have access to more information
about the recommendation candidates that it can use such as average review scores for popular items.
* Some filtering will be required before presenting the final sorted list. This stage is
where we might eliminate recommendations for items the user has already rated. We also apply a
stop list here to remove items thar are potentially offensive to the user.

The output of the filtering stage is then handed off to you display layer where a pretty
widget of product recommendations is presented to the user. Generally speaking, the candidate
generation, ranking and filtering will live inside some distributed recommendation web service
that your web front-end talks to in the process of rendering a page for a specific user. This is
what we call item-based collaborative filtering.


### Accuracy metrics

Let's say we have $n$ ratings in our test set. For each rating we can call the rating out system
predicts, $y$, and the rating the user actually gave, $x$.

Mean Square Error = $\frac{ \sum_{i=1}^{n} | y_i - x_i |}{n} $

Another one very popular is RMSE. It penalizes you more when your rating prediction is
way off, and penalizes less when you were reasonably close.


Root mean square error = $\sqrt( \frac{ \sum_{i=1}^{n} ( y_i - x_i )^2}{n})$

Typically in recommender systems literature, the RMSE metric is broadly use.

### Evaluating top-n recommenders
* hit rate: Ypu generate top end recommendations for all of the users in your test set.
If one of the recommendations in a user is top end recommendations is something they actually rated,
you consider that a hit.
* $$ \frac{hits}{users} $$
* leave-one-out cross validation: Compute top end recommendations for each user in our training data,
and intentionally remove one of those items from that users in training data. We then test our
recommender system's ability to recommend that item that was left out in the top end results
it created for that user in the testing phase.
* Average reciprocal hit rate (ARHR): this metrics is just like hit rate, but it accounts for
where in the top end list your hits appear. So you end up getting more credit successfully recommending
an item in the top slot, that in the bottom slot.
* $\frac{\sum_{i=1}^{n} \frac{1}{rank_i}}{users}$

### Coverage, diversity, and novelty

* coverage: the percentage of possible recommendations that your system is able to provide. It can also
be important to watch because it gives a sense og how quickly new items in your catalog will start
to appear in recommendations
* diversity: It is a measure of how broad a variety of items in your recommender system is putting
in front of people.
* novelty: It is a measure of how popular them items are that you are recommending.

### Churn and responsiveness
* Churn: It can measure how sensitive your recommender system is to new user behavior. If a user rates
a new movie, does that substantially change their recommendations? If so, then you churn score will be high.
* responsiveness: how quickly does new user behavior influence your recommendations?

### A/B Tests
* You can put recommendations from different algorithms in front of different sets of users
and measure if they actually buy, watch, or otherwise indicate interest in the recommendations you have presented.
* By always testing changes to your recommender system using controlled online experiments, you
can see if they actually cause people to discover and purchase more new things than they would
have otherwise. At the end of the day, the results of online AB test are the only evaluation that matters.

