Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Temporal effects cause items released on same day to be similar #475

Closed
EralpB opened this issue Jul 21, 2019 · 4 comments
Closed

Temporal effects cause items released on same day to be similar #475

EralpB opened this issue Jul 21, 2019 · 4 comments

Comments

@EralpB
Copy link

EralpB commented Jul 21, 2019

Hello, I'm trying to create a recommender for producthunt, which basically features N products every day. We can model it as a news website.

After my initial investigation I feel like that the items released on the same day tend to be similar because if user is around that day he very likely will upvote top #1 and #2 products and system thinks they are similar since there are many common upvoters.

Have you observed this phenomenon, what's the term for this? I'm lacking the domain knowledge to search research papers about this. Any guidance?

I'm thinking of incorporating negative feedback but I don't want to make this more complex than necessary. So if user comes on day 5 and upvotes a set of products, this means he has seen the others released on day 5 but disliked them. so I can assume negative implicit feedback for this. Even though there will be common upvoters due to the fact that users see the same featured products, this -1 will give a little bit of insight how dissimilar some products are. What do you think of this idea would it improve?

@EthanRosenthal
Copy link

I've had this happen a number of times. It's a difficult problem to solve. If you have high quality user or item metadata, then this can be used to "connect" old users to new users (or old products to new products). However, it's rarely been the case for me that the metadata is high enough quality, and the unique user_id and item_id features end up dominating and thus temporally clumping users and items.

@EralpB
Copy link
Author

EralpB commented Jul 24, 2019

@EthanRosenthal hey thanks for your comment, the way I solved this issue is I picked only 1 like per user per day randomly, this way there are less connections (actually 0) between the items in the same day.

now when I do similar_items I get great spread out results which is very good, the only thing I still need to check is if they make sense at all haha

my model performs better than random BUT test scores are really easy to "cheat", I presume if I had a static recommender like : "recommend most liked products of each day (better each day between -5 and +5 days of when user was active)" it will score high because most likely people upvoted them, so it would match the 20% of the interactions I deleted for test run. I don't know how to create this "recommend top items" model manually with lightfm so I can't compare scores against it.

what I'm more interested in is getting similar items even if it's an item from a year ago it's still relevant because these are products not "news"

when I stepped into I thought collaborative filtering ignores domain completely but more and more I feel you need to tinker a lot to come up with something that really makes sense, and not just result in high test score.

@maciejkula
Copy link
Collaborator

In a more expressive model you might hope to capture this via contextual components: a click on a featured item would fall within a featured context, and so hopefully the observed likes would be attributed to the context rather than the user's long-term preference.

Absent this, I would suggest inverse propensity weighting. This is a reasonable reference.

The procedure is as follows:

  1. For every day in your data, compute the number of clicks a given item gets. On the basis of that number, compute a quantity that is inversely related to the number of clicks a given item has. For instance, if item A gets 3 times as many clicks as the average number of clicks on that day, its weight could be 1/3; if it gets half as many clicks as the average number, its weight could be 1 / 0.5 = 2.
  2. Use that quantity to weigh individual interactions in your training data. Say, if user X liked item A on a day where that item was not particularly popular, we will give that interaction a higher weight. If, however, user X liked that item on a day when A was featured, we would give that like a very low weight: say, 0.1.

The end effect is that you pay more attention to interactions that cannot be simply explained by popularity on any given day.

One caveat is that you may want to tune the way you compute those weights, make them nonlinear, and cap them: you don't want likes of really unpopular products to gets weights of 1000.

@EralpB
Copy link
Author

EralpB commented Jul 26, 2019

thank you very much for your insight @maciejkula , the paper is definitely a good read!

@EralpB EralpB closed this as completed Jul 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants