Skip to content

Commit

Permalink
Merge pull request #853 from metarank/master
Browse files Browse the repository at this point in the history
Docs update
  • Loading branch information
vgoloviznin committed Jan 31, 2023
2 parents aeacec8 + b8172cc commit 9148705
Show file tree
Hide file tree
Showing 7 changed files with 60 additions and 53 deletions.
4 changes: 2 additions & 2 deletions doc/_toc.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,9 @@
* [Scalars](configuration/features/scalar.md)
* [Text](configuration/features/text.md)
* [User Profile](configuration/features/user-session.md)
* [Recommendations](configuration/recommendations/overview.md)
* [Recommendations](configuration/recommendations.md)
* [Trending items](configuration/recommendations/trending.md)
* [Similar items](configuration/recommendations/similar.md)
* [Popular items](configuration/recommendations/trending.md)
* [Models](configuration/supported-ranking-models.md)
* [Data Sources](configuration/data-sources.md)
* [Persistence](configuration/persistence.md)
Expand Down
2 changes: 1 addition & 1 deletion doc/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Metarank's API provides an easy way to integrate Metarank with your applications
- [Feedback API](#feedback) receives the stream of events
- [Train API](#train) trains the model on your data
- [Ranking API](#ranking) provides personalized results generated by the trained model
- [Recommend API]() - retrieval of recommendations.
- [Recommend API](#recommendations) - retrieval of recommendations.
- [Prometheus endpoint](#prometheus-metrics) - to have a nice metrics dashboard about Metarank internals.

## Feedback
Expand Down
5 changes: 5 additions & 0 deletions doc/configuration/recommendations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Recommendations in Metarank

Starting from version `0.6.x`, Metarank supports two types of recommendations:
* [Trending](recommendations/trending.md): popularity-sorted list of items with customized ordering.
* [Similar items](recommendations/similar.md): matrix-factorization collaborative filtering recommender of items you may also like.
5 changes: 0 additions & 5 deletions doc/configuration/recommendations/overview.md

This file was deleted.

50 changes: 25 additions & 25 deletions doc/configuration/recommendations/similar.md
Original file line number Diff line number Diff line change
@@ -1,44 +1,44 @@
# Similar items

A `similar` recommender model can give you items other visitors also liked, while viewing the item you're currently observing.
`similar` recommendation model can give you items other visitors also liked, while viewing the item you're currently observing.

Common use-cases for such model are:
Common use-cases for this model are:
* you-may-also-like recommendations on item page: the context of the recommendation is a single item you're viewing now.
* also-purchased widget on the cart page: the context of the recommendation is the contents of your card.

## Underlying model

Metarank uses a variation of [Matrix Factorization](https://developers.google.com/machine-learning/recommendation/collaborative/matrix) collaborative filtering algorithm for recommendations based on a paper [Fast Matrix Factorization for Online Recommendation with Implicit Feedback](https://arxiv.org/abs/1708.05024) by X.He, H.Zhang, MY.Kan and TS.Chua.

![matrix factorization](../../img/mf.svg)

The ALS family of algorithms for recommendations decompose a sparse matrix of user-item interactions into a set of smaller dense vectors of implicit user and item features (or user and item embeddings). The cool things about these embeddings is that similar items will have similar embeddings!

So Metarank does the following:
* computes item embeddings
* pre-builds a [HNSW](https://www.pinecone.io/learn/hnsw/) index for fast lookups for similar embeddings
* on inference time (when you call the [/recommend/modelname](../../api.md) endpoint), it makes a k-NN index lookup of similar items.

Main pros and cons of such apporach:
* pros: fast even for giant inventories, simple to implement
* cons: lower precision as neural networks based methods like [BERT4rec](https://arxiv.org/abs/1904.06690), recommendations are not personalized.

There is an ongoing work in Metarank project to implement NN-based methods and make current ALS implementation personalized.

## Configuration

```yaml
similar:
type: als
interactions: [click] # which interactions to use
interactions: [click, like, purchase] # which interactions to use
factors: 100 # optional, number of implicit factors in the model, default 100
iterations: 100 # optional, number of training iterations, default 100
```

There are two important parameters in the configuration:
* `factors`: how many hidden parameters the model tries to compute. The more - the better, but slower. Usually defined within the rage of 50-500.
* `iterations`: how many factor refinements attempts are made. The more - the better, but slower. Normal range - 50-300.
* `iterations`: how many factor refinement attempts are made. The more - the better, but slower. Normal range - 50-300.

Rule of thump - set these parameters low, and then increase slightly until training time becomes completely unreasonable.
Rule of thumb - set these parameters low, and then increase slightly until training time becomes completely unreasonable.

See request & response formats in the [API section](../../api.md#recommendations).

## Underlying model

Metarank uses a variation of [Matrix Factorization](https://developers.google.com/machine-learning/recommendation/collaborative/matrix) collaborative filtering algorithm for recommendations based on the [Fast Matrix Factorization for Online Recommendation with Implicit Feedback](https://arxiv.org/abs/1708.05024) by X.He, H.Zhang, MY.Kan and TS.Chua.

![matrix factorization](../../img/mf.svg)

The ALS family of algorithms for recommendations decomposes a sparse matrix of user-item interactions into a set of smaller dense vectors of implicit user and item features (or user and item embeddings). The cool thing about these embeddings is that similar items will have similar embeddings!

So Metarank does the following:
* computes item embeddings.
* pre-builds a [HNSW](https://www.pinecone.io/learn/hnsw/) index for fast lookups for similar embeddings.
* during inference (when you call the [/recommend/modelname](../../api.md#recommendations) endpoint), it makes a k-NN index lookup of similar items.

Main pros and cons of such apporach:
* *pros*: fast even for giant inventories, simple to implement
* *cons*: lower precision compared to neural networks based methods like [BERT4rec](https://arxiv.org/abs/1904.06690), recommendations are not personalized.

See request & response formats in the [API section](../../api.md).
*There is an ongoing work in Metarank project to implement NN-based methods and make current ALS implementation personalized.*
27 changes: 17 additions & 10 deletions doc/configuration/recommendations/trending.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
# Trending items

`trending` recommendation model is used to highlight the most popular items on your site. But it's not about sorting items by popularity! Metarank can:
* combine multiple types of interactions: you can mix clicks and purchases with multiple weights.
* time decay: clicks made yesterday are much more important than clicks from the last months.
`trending` recommendation model is used to highlight the trending (or in other workds, most popular) items in your application. But it's not just about sorting items by popularity!

Metarank can:
* combine multiple types of interactions: you can mix clicks, likes and purchases with different weights.
* time decay: clicks made yesterday are much more important than the clicks from the last months.
* multiple configurations: trending over the last week, and bestsellers over the last year.

## Configuration
Expand All @@ -17,31 +19,36 @@ models:
decay: 0.8 # optional, default 1.0 - no decay
weight: 1.0 # optional, default 1.0
window: 30d # optional, default 30 days
- interaction: like
decay: 0.9
weight: 1.5
window: 60d
- interaction: purchase
decay: 0.95
weight: 3.0

```

The config above defines a trending model, accessible over the `/recommend/yolo-trending` [API endpoint](../../api.md):
* the final item score combines click and purchase events
* purchase has 3x more weight than click
* the final item score combines click, like and purchase events
* purchase has 3x more weight than click, like has 1.5x more weight than click
* purchase has less agressive time decay
* only the last 30 days of data are used
* only the last 30 days of data are used for clicks and purchases, but 60 days are used for likes

## Time decay and weight

The final score used to sort the items is defined by the following formula:
```
score = count * weight * decay ^ days_diff(now, timestamp)
```
If there's multiple interaction types defined, each per-type score is added together for the final score.
When multiple interaction types are defined, per-type scores are added together to get the final score.

Such an unusual way of defining decay can allow a more granular control over the decaying. For example, that's how click importance is weighted for different `decay` values:
Time decay configuration allows a granular control over the decaying. Here's a click importance is weighted for different `decay` values:

![decay with different options](../../img/decay.png)

We recommend setting a decay:
We recommend setting decay:
* within a range of 0.8-0.95 for 1-month periods.
* within a range of 0.95-0.99 for larger periods.

See request & response formats in the [API section](../../api.md).
See request & response formats in the [API section](../../api.md#recommendations).
20 changes: 10 additions & 10 deletions doc/intro.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,22 @@
# What is Metarank?

Metarank is a recommendation and personalization service - a self-hosted reranking API to improve CTR and conversion.
[Metarank](https://metarank.ai) is a recommendation and personalization service - a self-hosted reranking API to improve CTR and conversion.

Main features:
* Recommendations: [trending](configuration/recommendations/trending.md) and [similar-items](configuration/recommendations/trending.md) (MF ALS).
* Recommendations: [trending](configuration/recommendations/trending.md) and [similar-items](configuration/recommendations/similar.md) (MF ALS).
* Personalization: [secondary reranking](quickstart/quickstart.md) (LambdaMART)
* A/B testing, [multiple model serving](configuration/overview.md#models)
* [Bootstrapping](quickstart/quickstart.md#quickstart) on historical traffic data
* AutoML: [automatic feature generation](howto/autofeature.md) and [model re-training](howto/model-retraining.md)
* A/B testing: [multiple model serving](configuration/overview.md#models)

## Common use-cases

Metarank is an open-source service for:
* Algorithmic feed like on FB/Twitter.
* CTR-optimized category/search page ordering on Airbnb.
* Items similar to the one you're viewing on Amazon.
* Popular items on any ecommerce store.
* Algorithmic feed like on Faceook or Twitter.
* CTR-optimized category/search page ordering like on Airbnb.
* Items similar to the one you're viewing like on Amazon.
* Popular items like on any ecommerce store.

Metarank's recommendations are based on interaction history (like clicks and purchases), and secondary reranking - on user & item metadata and a rich set of typical ranking feature generators:
Metarank can generate recommendations based on the interaction history: clicks, likes or purchases. Personalized secondary reranking can use user and item metadata and a rich set of typical ranking feature generators to provide personalized results:
* [User-Agent](configuration/features/user-session.md#user-agent-field-extractor), [Referer](configuration/features/user-session.md#referer) field parsers
* [Counters](configuration/features/counters.md#counters), [rolling window counters](configuration/features/counters.md#windowed-counter), [rates](configuration/features/counters.md#rate) (CTR & conversion)
* [categorical](configuration/features/scalar.md#index-vs-one-hot-what-to-choose) (with one-hot, label and XGBoost/LightGBM native encodings)
Expand Down Expand Up @@ -155,4 +155,4 @@ curl http://localhost:8080/rank/xgboost \

Check out a more in-depth [Quickstart](quickstart/quickstart.md) and full [Reference](installation.md).

If you have any questions, don't hesitate to join our [Slack](https://communityinviter.com/apps/metarank/metarank)!
If you have any questions, don't hesitate to join our [Slack](https://metarank.ai/slack)!

0 comments on commit 9148705

Please sign in to comment.