From 5d46dd9652b36783ff8c29a3e1e36eee5104887f Mon Sep 17 00:00:00 2001 From: Vsevolod Goloviznin Date: Tue, 31 Jan 2023 20:01:24 +0300 Subject: [PATCH] - updated recommendation docs --- doc/_toc.md | 4 +- doc/api.md | 2 +- doc/configuration/recommendations.md | 5 ++ doc/configuration/recommendations/overview.md | 5 -- doc/configuration/recommendations/similar.md | 50 +++++++++---------- doc/configuration/recommendations/trending.md | 27 ++++++---- doc/intro.md | 20 ++++---- 7 files changed, 60 insertions(+), 53 deletions(-) create mode 100644 doc/configuration/recommendations.md delete mode 100644 doc/configuration/recommendations/overview.md diff --git a/doc/_toc.md b/doc/_toc.md index 3421cc033..c6e78886e 100644 --- a/doc/_toc.md +++ b/doc/_toc.md @@ -21,9 +21,9 @@ * [Scalars](configuration/features/scalar.md) * [Text](configuration/features/text.md) * [User Profile](configuration/features/user-session.md) - * [Recommendations](configuration/recommendations/overview.md) + * [Recommendations](configuration/recommendations.md) + * [Trending items](configuration/recommendations/trending.md) * [Similar items](configuration/recommendations/similar.md) - * [Popular items](configuration/recommendations/trending.md) * [Models](configuration/supported-ranking-models.md) * [Data Sources](configuration/data-sources.md) * [Persistence](configuration/persistence.md) diff --git a/doc/api.md b/doc/api.md index e97a3fdcf..29b4b564c 100644 --- a/doc/api.md +++ b/doc/api.md @@ -5,7 +5,7 @@ Metarank's API provides an easy way to integrate Metarank with your applications - [Feedback API](#feedback) receives the stream of events - [Train API](#train) trains the model on your data - [Ranking API](#ranking) provides personalized results generated by the trained model -- [Recommend API]() - retrieval of recommendations. +- [Recommend API](#recommendations) - retrieval of recommendations. - [Prometheus endpoint](#prometheus-metrics) - to have a nice metrics dashboard about Metarank internals. ## Feedback diff --git a/doc/configuration/recommendations.md b/doc/configuration/recommendations.md new file mode 100644 index 000000000..7c6d76621 --- /dev/null +++ b/doc/configuration/recommendations.md @@ -0,0 +1,5 @@ +# Recommendations in Metarank + +Starting from version `0.6.x`, Metarank supports two types of recommendations: +* [Trending](recommendations/trending.md): popularity-sorted list of items with customized ordering. +* [Similar items](recommendations/similar.md): matrix-factorization collaborative filtering recommender of items you may also like. \ No newline at end of file diff --git a/doc/configuration/recommendations/overview.md b/doc/configuration/recommendations/overview.md deleted file mode 100644 index 6d66df46d..000000000 --- a/doc/configuration/recommendations/overview.md +++ /dev/null @@ -1,5 +0,0 @@ -# Recommendations in Metarank - -Starting from version `0.6.x`, Metarank supports two types of recommendations: -* [Trending](trending.md): popularity-sorted list of items with customized ordering. -* [Similar items](similar.md): matrix-factorization collaborative filtering recommender of items you may also like. \ No newline at end of file diff --git a/doc/configuration/recommendations/similar.md b/doc/configuration/recommendations/similar.md index 9c61ca325..2e2186c39 100644 --- a/doc/configuration/recommendations/similar.md +++ b/doc/configuration/recommendations/similar.md @@ -1,44 +1,44 @@ # Similar items -A `similar` recommender model can give you items other visitors also liked, while viewing the item you're currently observing. +`similar` recommendation model can give you items other visitors also liked, while viewing the item you're currently observing. -Common use-cases for such model are: +Common use-cases for this model are: * you-may-also-like recommendations on item page: the context of the recommendation is a single item you're viewing now. * also-purchased widget on the cart page: the context of the recommendation is the contents of your card. -## Underlying model - -Metarank uses a variation of [Matrix Factorization](https://developers.google.com/machine-learning/recommendation/collaborative/matrix) collaborative filtering algorithm for recommendations based on a paper [Fast Matrix Factorization for Online Recommendation with Implicit Feedback](https://arxiv.org/abs/1708.05024) by X.He, H.Zhang, MY.Kan and TS.Chua. - -![matrix factorization](../../img/mf.svg) - -The ALS family of algorithms for recommendations decompose a sparse matrix of user-item interactions into a set of smaller dense vectors of implicit user and item features (or user and item embeddings). The cool things about these embeddings is that similar items will have similar embeddings! - -So Metarank does the following: -* computes item embeddings -* pre-builds a [HNSW](https://www.pinecone.io/learn/hnsw/) index for fast lookups for similar embeddings -* on inference time (when you call the [/recommend/modelname](../../api.md) endpoint), it makes a k-NN index lookup of similar items. - -Main pros and cons of such apporach: -* pros: fast even for giant inventories, simple to implement -* cons: lower precision as neural networks based methods like [BERT4rec](https://arxiv.org/abs/1904.06690), recommendations are not personalized. - -There is an ongoing work in Metarank project to implement NN-based methods and make current ALS implementation personalized. - ## Configuration ```yaml similar: type: als - interactions: [click] # which interactions to use + interactions: [click, like, purchase] # which interactions to use factors: 100 # optional, number of implicit factors in the model, default 100 iterations: 100 # optional, number of training iterations, default 100 ``` There are two important parameters in the configuration: * `factors`: how many hidden parameters the model tries to compute. The more - the better, but slower. Usually defined within the rage of 50-500. -* `iterations`: how many factor refinements attempts are made. The more - the better, but slower. Normal range - 50-300. +* `iterations`: how many factor refinement attempts are made. The more - the better, but slower. Normal range - 50-300. -Rule of thump - set these parameters low, and then increase slightly until training time becomes completely unreasonable. +Rule of thumb - set these parameters low, and then increase slightly until training time becomes completely unreasonable. + +See request & response formats in the [API section](../../api.md#recommendations). + +## Underlying model + +Metarank uses a variation of [Matrix Factorization](https://developers.google.com/machine-learning/recommendation/collaborative/matrix) collaborative filtering algorithm for recommendations based on the [Fast Matrix Factorization for Online Recommendation with Implicit Feedback](https://arxiv.org/abs/1708.05024) by X.He, H.Zhang, MY.Kan and TS.Chua. + +![matrix factorization](../../img/mf.svg) + +The ALS family of algorithms for recommendations decomposes a sparse matrix of user-item interactions into a set of smaller dense vectors of implicit user and item features (or user and item embeddings). The cool thing about these embeddings is that similar items will have similar embeddings! + +So Metarank does the following: +* computes item embeddings. +* pre-builds a [HNSW](https://www.pinecone.io/learn/hnsw/) index for fast lookups for similar embeddings. +* during inference (when you call the [/recommend/modelname](../../api.md#recommendations) endpoint), it makes a k-NN index lookup of similar items. + +Main pros and cons of such apporach: +* *pros*: fast even for giant inventories, simple to implement +* *cons*: lower precision compared to neural networks based methods like [BERT4rec](https://arxiv.org/abs/1904.06690), recommendations are not personalized. -See request & response formats in the [API section](../../api.md). \ No newline at end of file +*There is an ongoing work in Metarank project to implement NN-based methods and make current ALS implementation personalized.* diff --git a/doc/configuration/recommendations/trending.md b/doc/configuration/recommendations/trending.md index a24a01d0f..66261e466 100644 --- a/doc/configuration/recommendations/trending.md +++ b/doc/configuration/recommendations/trending.md @@ -1,8 +1,10 @@ # Trending items -`trending` recommendation model is used to highlight the most popular items on your site. But it's not about sorting items by popularity! Metarank can: -* combine multiple types of interactions: you can mix clicks and purchases with multiple weights. -* time decay: clicks made yesterday are much more important than clicks from the last months. +`trending` recommendation model is used to highlight the trending (or in other workds, most popular) items in your application. But it's not just about sorting items by popularity! + +Metarank can: +* combine multiple types of interactions: you can mix clicks, likes and purchases with different weights. +* time decay: clicks made yesterday are much more important than the clicks from the last months. * multiple configurations: trending over the last week, and bestsellers over the last year. ## Configuration @@ -17,16 +19,21 @@ models: decay: 0.8 # optional, default 1.0 - no decay weight: 1.0 # optional, default 1.0 window: 30d # optional, default 30 days + - interaction: like + decay: 0.9 + weight: 1.5 + window: 60d - interaction: purchase decay: 0.95 weight: 3.0 + ``` The config above defines a trending model, accessible over the `/recommend/yolo-trending` [API endpoint](../../api.md): -* the final item score combines click and purchase events -* purchase has 3x more weight than click +* the final item score combines click, like and purchase events +* purchase has 3x more weight than click, like has 1.5x more weight than click * purchase has less agressive time decay -* only the last 30 days of data are used +* only the last 30 days of data are used for clicks and purchases, but 60 days are used for likes ## Time decay and weight @@ -34,14 +41,14 @@ The final score used to sort the items is defined by the following formula: ``` score = count * weight * decay ^ days_diff(now, timestamp) ``` -If there's multiple interaction types defined, each per-type score is added together for the final score. +When multiple interaction types are defined, per-type scores are added together to get the final score. -Such an unusual way of defining decay can allow a more granular control over the decaying. For example, that's how click importance is weighted for different `decay` values: +Time decay configuration allows a granular control over the decaying. Here's a click importance is weighted for different `decay` values: ![decay with different options](../../img/decay.png) -We recommend setting a decay: +We recommend setting decay: * within a range of 0.8-0.95 for 1-month periods. * within a range of 0.95-0.99 for larger periods. -See request & response formats in the [API section](../../api.md). \ No newline at end of file +See request & response formats in the [API section](../../api.md#recommendations). \ No newline at end of file diff --git a/doc/intro.md b/doc/intro.md index 7b4d67d1d..7e3944d5d 100644 --- a/doc/intro.md +++ b/doc/intro.md @@ -1,22 +1,22 @@ # What is Metarank? -Metarank is a recommendation and personalization service - a self-hosted reranking API to improve CTR and conversion. +[Metarank](https://metarank.ai) is a recommendation and personalization service - a self-hosted reranking API to improve CTR and conversion. Main features: -* Recommendations: [trending](configuration/recommendations/trending.md) and [similar-items](configuration/recommendations/trending.md) (MF ALS). +* Recommendations: [trending](configuration/recommendations/trending.md) and [similar-items](configuration/recommendations/similar.md) (MF ALS). * Personalization: [secondary reranking](quickstart/quickstart.md) (LambdaMART) -* A/B testing, [multiple model serving](configuration/overview.md#models) -* [Bootstrapping](quickstart/quickstart.md#quickstart) on historical traffic data +* AutoML: [automatic feature generation](howto/autofeature.md) and [model re-training](howto/model-retraining.md) +* A/B testing: [multiple model serving](configuration/overview.md#models) ## Common use-cases Metarank is an open-source service for: -* Algorithmic feed like on FB/Twitter. -* CTR-optimized category/search page ordering on Airbnb. -* Items similar to the one you're viewing on Amazon. -* Popular items on any ecommerce store. +* Algorithmic feed like on Faceook or Twitter. +* CTR-optimized category/search page ordering like on Airbnb. +* Items similar to the one you're viewing like on Amazon. +* Popular items like on any ecommerce store. -Metarank's recommendations are based on interaction history (like clicks and purchases), and secondary reranking - on user & item metadata and a rich set of typical ranking feature generators: +Metarank can generate recommendations based on the interaction history: clicks, likes or purchases. Personalized secondary reranking can use user and item metadata and a rich set of typical ranking feature generators to provide personalized results: * [User-Agent](configuration/features/user-session.md#user-agent-field-extractor), [Referer](configuration/features/user-session.md#referer) field parsers * [Counters](configuration/features/counters.md#counters), [rolling window counters](configuration/features/counters.md#windowed-counter), [rates](configuration/features/counters.md#rate) (CTR & conversion) * [categorical](configuration/features/scalar.md#index-vs-one-hot-what-to-choose) (with one-hot, label and XGBoost/LightGBM native encodings) @@ -155,4 +155,4 @@ curl http://localhost:8080/rank/xgboost \ Check out a more in-depth [Quickstart](quickstart/quickstart.md) and full [Reference](installation.md). -If you have any questions, don't hesitate to join our [Slack](https://communityinviter.com/apps/metarank/metarank)! \ No newline at end of file +If you have any questions, don't hesitate to join our [Slack](https://metarank.ai/slack)! \ No newline at end of file