-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can I use only click data? #288
Comments
The reason there is a requirement to have a ranking/impression data - we use it to teach an underlying ML model on what item is relevant, and what is not. So generally speaking, the Learn-to-Rank approach is about training some sort of a binary classifier, which is then asked a question "given items A and B, which one of them should be ranked higher?" If you have impression data with information that items A-B-C-D were shown, and item C was clicked afterwards, then you can make an assumption that items A+B+C were examined by the visitor, but only C is actually relevant from this group. So you teach model that C should be ranked higher than A and B. The other important reason why ranking data is essential is position information: people tend to click on first items in the list much more frequently. In ecommerce the number is around 50% of clicks going on top-5 results. Then a click on position 1 is not really giving you much information on was this item relevant at all. But if visitor scrolled to the bottom of the list and only there found something relevant enough to make a click - then this click is a pure gold from the value perspective. And the last point: we have quite a lot of quite important feature extractors using the impression information, like rate one to compute per-item CTR/conversion rates. Which item will be clicked, the one with 100 clicks and 100k impressions, or the one with 50 clicks and 100 impressions? If you only count clicks, it's not that clear anymore, as everything is relevant. In recommender systems theory there is an approach to deal with this type of problem when there is no negative feedback: you can just sample some random non-clicked items from the inventory and imagine that they are your negative samples. But usually it gives a sub-par result in quality: it's still a synthetic data and saying that this particular pair of socks is less relevant than a teapot - probably will lead to bad ranking results. I'm wondering why do you have this problem? Is it more about collecting historical data to train the model? In our demo (the one on demo.metarank.ai and in RanklensTest) we use only around 5k user sessions and it's more than enough to get the real impact on the ranking, so I guess you don't need to wait years to collect enough data for initial training. This ranking events from the JSON format perspective are the actual request bodies you send to Metarank API to do the reranking itself, so you only need to log them and wait for some time. |
Thank you so much for such a detailed explanation! How algorithm will know the position of the click? If it is taking index position from |
There is an event schema doc describing the format of the events, and there are a couple of important points from there:
So there is no single constant ranking of items needed in Metarank. Each time you present a listing (for example, search results) to a visitor, you should emit it as an event downstream. And each time visitor clicks on an item in this particular ranking, you also should sent a yet another event with clicked item id AND parent ranking, which resulted in this click. |
Hi,
Thanks for stepping in to solve one of the major issue in personalized content serving.
I have a doubt in ranking events. I believe it is like impression events where all impression is grouped together for an user.
Can I use only click data? i.e. I want to use only metadata and interaction events for personalization. Please suggest changes in this config if how can I skip
ranking
event.The text was updated successfully, but these errors were encountered: