Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Counteract Recency Bias on Lemmy Sorting Algorithm #4432

Closed
4 tasks done
8ullyMaguire opened this issue Feb 7, 2024 · 2 comments
Closed
4 tasks done

Counteract Recency Bias on Lemmy Sorting Algorithm #4432

8ullyMaguire opened this issue Feb 7, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@8ullyMaguire
Copy link

8ullyMaguire commented Feb 7, 2024

Requirements

  • Is this a feature request? For questions or discussions use https://lemmy.ml/c/lemmy_support
  • Did you check to see if this issue already exists?
  • Is this only a feature request? Do not put multiple feature requests in one issue.
  • Is this a backend issue? Use the lemmy-ui repo for UI / frontend issues.

Is your proposal related to a problem?

Yes, the current issue is that on Lemmy there is a recency bias in the sorting algorithm. As communities grow over time, newer posts tend to receive more votes simply because there are more active users than in the past. This skews the "top of all time" sorting, making it more likely for recent posts to appear at the top, regardless of their relative merit compared to older posts.

Describe the solution you'd like.

I propose implementing a new sorting algorithm that adjusts for the number of monthly active users at the time each post was made. This would allow for a fairer comparison of posts' scores over time, ensuring that a post's visibility in the "top of all time" sorting is more reflective of its actual popularity and impact within the community at the time it was posted.

Describe alternatives you've considered.

An alternative could be to implement a weighted scoring system that diminishes the value of a vote as the total number of active users increases. However, this might be less effective than adjusting scores based on active users at the time of posting, as it does not account for the historical context of each post.

Additional context

Here is an implementation in Rust of a sorting algorithm for Lemmy that takes into account the number of monthly active users at the time a post was made:

use std::collections::HashMap;

struct Post {
    id: u32,
    score: u32,
    date: chrono::DateTime,
}

struct Lemmy {
    posts: Vec,
    mau_history: HashMap, u32>, 
}

impl Lemmy {
    fn score_post(&self, post: &Post) -> u32 {
        let mau = self.mau_history.get(&post.date).unwrap_or(&0);
        post.score / mau
    }
    
    fn sort_posts(&self) -> Vec {
        let mut posts = self.posts.clone();
        posts.sort_by(|a, b| self.score_post(b).cmp(&self.score_post(a)));
        posts
    }
}

This keeps track of the monthly active users (MAU) at different points in time in a hashmap. When sorting posts, it calculates a normalized score by dividing the post's score by the MAU at the time it was posted. This means that older posts with lower MAU will have their scores boosted compared to newer posts with higher MAU.

@8ullyMaguire 8ullyMaguire added the enhancement New feature or request label Feb 7, 2024
@Nutomic
Copy link
Member

Nutomic commented Feb 7, 2024

Sorting is implemented in SQL not Rust, you can see an example here: #3907

@dessalines
Copy link
Member

Yes, the current issue is that on Lemmy there is a recency bias in the sorting algorithm. As communities grow over time, newer posts tend to receive more votes simply because there are more active users than in the past. This skews the "top of all time" sorting, making it more likely for recent posts to appear at the top, regardless of their relative merit compared to older posts.

Storing historical monthly active users just for this seems way overkill. If you want to work on this I'll leave it open, but otherwise its not something we have time for.

@8ullyMaguire 8ullyMaguire closed this as not planned Won't fix, can't repro, duplicate, stale Feb 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants