Jumper use both frecency (frequency+recency at which items have been visited) and match accuracy (how well the query matches the path stored in the database).
The frecency of a match measures the frequency and recency of the visits of the match. Assume that a match has been visited at times
Here
Let us now motivate a bit the definition of frecency above.
Let us first consider an item that has not been visited within the last 10 hours, so that we can neglect the term
We plot this function below:
In the case where the item has just been visited, the frecency above gets an increase of
As we can see from the plot above, the frecency will typically be a number in the range
- It does not diverge at time goes. z uses something like
number-of-visits / time-since-last-visit
, which may explode over time (and therefore require some "aging" mechanism). - It only requires to keep track of the "adjusted" number of visits
$\sum_i e^{-\alpha_2 (t-T_i)}$ and the time of last visit to be computed.
The match accuracy evaluate how well the query entered by the user matches the path stored in the database. Similarly to the fuzzy-finders fzf or fzy, this is done using the variant of the Needleman-Wunsch algorith.
This finds the match that maximizes
U(match) = 10 * len(query) - 9 * (number-of-splits - 1) - total-length-of-gaps + bonuses(match)
The bonuses
above give additional points if matches happen at special places, such at the end of the path, or beginning of words. Then the accuracy is
Based on these two values, the final score of the match is
where -b <value>
.
These additive definition is motivated by the following.
Suppose that one is fuzzy-finding a path, adding one character to the query
at a time.
At first, when query
has very few character (typically <=2), all the paths containing these two characters consecutive will have maximum accuracy
.
Hence the ranking will be mostly decided by the frecency.
However, as more characters are added, the ranking will favors matches that are more accurate. The ranking will then be dominated by the accuracy of the matches.
The definitions of scores above can be motivated by the following statistical model.
Assume that the visits of a given path is a self-exciting point process, with conditional intensity
independently from the visits to the other folders.
When the user queries the database at a time
Let us model
(
The posterior probability is therefore proportional to
The ranking algorithm simply ranks the paths according to their