Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Background Post Hotness generation #25

Closed
strideynet opened this issue Jul 6, 2023 · 5 comments
Closed

Background Post Hotness generation #25

strideynet opened this issue Jul 6, 2023 · 5 comments

Comments

@strideynet
Copy link
Owner

strideynet commented Jul 6, 2023

Right now, we calculate post hotness on the fly which makes cursoring and performance problematic.

We should calculate these every X minutes and persist them to a table we can join across. We should support multiple "algos" and store the last X post history.

Possibly only generate Post Hotness for posts w/in the last X hours (possibly 48/72) to keep the size of this a bit more bound.

@strideynet strideynet changed the title Improve cursor for Algorithmic Feeds Improve "what's hot" feed performanceand cursor Jul 30, 2023
@strideynet strideynet changed the title Improve "what's hot" feed performanceand cursor Improve "what's hot" feed performance and cursor Jul 30, 2023
@strideynet
Copy link
Owner Author

SELECT
    cp.*
FROM
    candidate_posts cp
        INNER JOIN candidate_actors ca ON cp.actor_did = ca.did
        INNER JOIN post_hotness ph
                   ON ph.post_uri = cp.uri AND ph.algo = @algo AND
                      ph.generated_at = @generated_at
WHERE
      cp.is_hidden = false
  AND cp.deleted_at IS NULL
  AND ca.status = 'approved'
  AND (@require_tags::TEXT[] = '{}' OR @require_tags::TEXT[] <@ cp.tags)
  AND (@exclude_tags::TEXT[] = '{}' OR NOT (@exclude_tags::TEXT[] && cp.tags))
  AND (ph.hotness < @hotness_cursor)
ORDER BY
    ph.hotness DESC
LIMIT @_limit;

Following discussion with Tolf - we like the idea of having a background generation process that spits scores out to a table that can be joined in.

@strideynet strideynet changed the title Improve "what's hot" feed performance and cursor Background Post Hotness generation Aug 13, 2023
@itstolf
Copy link
Collaborator

itstolf commented Aug 13, 2023

still writing up some design notes for this, but what do you think about different tables for different algorithms rather than just putting them all in the same table? i think my feeling is that because each algorithm is semantically distinct, it might not make a lot of sense to put them in the same table and have the "hotness" value have a very different meaning per algorithm, but i don't have a super strong opinion either way!

@strideynet
Copy link
Owner Author

still writing up some design notes for this, but what do you think about different tables for different algorithms rather than just putting them all in the same table? i think my feeling is that because each algorithm is semantically distinct, it might not make a lot of sense to put them in the same table and have the "hotness" value have a very different meaning per algorithm, but i don't have a super strong opinion either way!

I see the argument from a semantic side, as the hotness across different algos won't be comparable, but I do think that splitting the tables for it will be more pain than it's worth. It'll reduce our ability to introduce new algos dynamically in future and general housekeeping tasks will be more complex (e.g the background task that cleans out old post hotness scores).

I'm also unsure how well sqlc and other parts of our toolchain will play with this.

If we wanted to track what went into the hotness score for debugging purposes, we could probably just a JSONB field for this (especially as I doubt we'll ever search by it and it'd mostly be for debugging)

@itstolf
Copy link
Collaborator

itstolf commented Aug 13, 2023

leaving this here for now until it finds a better home:

schema

CREATE TABLE post_hotness (
    uri TEXT PRIMARY KEY,
    alg TEXT NOT NULL,
    score REAL NOT NULL,
    generated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX post_hotness_score_idx ON post_hotness (alg, score);

formula

timebase = 2 # hours
gravity = 1.85
score = likes / (t + timebase) ** gravity

materializing query (every 5 minutes)

BEGIN;

DELETE FROM post_hotness
WHERE generated_at < NOW() - INTERVAL '30 minutes';

INSERT INTO post_hotness (uri, alg, score)
SELECT
    cp.uri,
    'classic',
    (SELECT COUNT(*) FROM candidate_likes cl WHERE cl.subject_uri = cp.uri AND cl.deleted_at IS NULL) /
        (EXTRACT(EPOCH FROM NOW() - cp.created_at) / (60 * 60) + 2) ^
        1.85
FROM candidate_posts cp
WHERE
    cp.deleted_at IS NULL AND
    cp.created_at >= NOW() - INTERVAL '48 hours';  -- only compute score over last 48 hours

COMMIT;

selection query

SELECT
    cp.*
FROM
    candidate_posts cp
INNER JOIN candidate_actors ca ON cp.actor_did = ca.did
INNER JOIN post_hotness ph
            ON ph.post_uri = cp.uri AND ph.alg = @alg AND
                ph.generated_at = @generated_at
WHERE
      cp.is_hidden = false
  AND ca.status = 'approved'
  AND (COALESCE($1::TEXT[], '{}') = '{}' OR $1::TEXT[] && cp.hashtags)
  AND ($2::BOOLEAN IS NULL OR COALESCE(cp.has_media, false) = $2)
  AND ($3::BOOLEAN IS NULL OR (ARRAY['nsfw', 'mursuit', 'murrsuit'] && cp.hashtags) = $3)
  AND (cp.indexed_at < $4)
  AND cp.deleted_at IS NULL
  AND (ph.hotness < @hotness_cursor)
ORDER BY
    ph.hotness DESC
LIMIT @_limit;

itstolf added a commit that referenced this issue Aug 14, 2023
itstolf added a commit that referenced this issue Aug 14, 2023
itstolf added a commit that referenced this issue Aug 14, 2023
itstolf added a commit that referenced this issue Aug 14, 2023
itstolf added a commit that referenced this issue Aug 14, 2023
itstolf added a commit that referenced this issue Aug 14, 2023
itstolf added a commit that referenced this issue Aug 14, 2023
itstolf added a commit that referenced this issue Aug 14, 2023
itstolf added a commit that referenced this issue Aug 14, 2023
itstolf added a commit that referenced this issue Aug 14, 2023
itstolf added a commit that referenced this issue Aug 14, 2023
itstolf added a commit that referenced this issue Aug 14, 2023
itstolf added a commit that referenced this issue Aug 14, 2023
itstolf added a commit that referenced this issue Aug 14, 2023
itstolf added a commit that referenced this issue Aug 14, 2023
itstolf added a commit that referenced this issue Aug 15, 2023
itstolf added a commit that referenced this issue Aug 15, 2023
strideynet added a commit that referenced this issue Aug 20, 2023
add hotness queries and materializer (#25)
@strideynet
Copy link
Owner Author

Completed by #127

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants