Background Post Hotness generation #25

strideynet · 2023-07-06T08:54:34Z

Right now, we calculate post hotness on the fly which makes cursoring and performance problematic.

We should calculate these every X minutes and persist them to a table we can join across. We should support multiple "algos" and store the last X post history.

Possibly only generate Post Hotness for posts w/in the last X hours (possibly 48/72) to keep the size of this a bit more bound.

strideynet · 2023-08-04T11:06:40Z

SELECT
    cp.*
FROM
    candidate_posts cp
        INNER JOIN candidate_actors ca ON cp.actor_did = ca.did
        INNER JOIN post_hotness ph
                   ON ph.post_uri = cp.uri AND ph.algo = @algo AND
                      ph.generated_at = @generated_at
WHERE
      cp.is_hidden = false
  AND cp.deleted_at IS NULL
  AND ca.status = 'approved'
  AND (@require_tags::TEXT[] = '{}' OR @require_tags::TEXT[] <@ cp.tags)
  AND (@exclude_tags::TEXT[] = '{}' OR NOT (@exclude_tags::TEXT[] && cp.tags))
  AND (ph.hotness < @hotness_cursor)
ORDER BY
    ph.hotness DESC
LIMIT @_limit;

Following discussion with Tolf - we like the idea of having a background generation process that spits scores out to a table that can be joined in.

itstolf · 2023-08-13T16:31:18Z

still writing up some design notes for this, but what do you think about different tables for different algorithms rather than just putting them all in the same table? i think my feeling is that because each algorithm is semantically distinct, it might not make a lot of sense to put them in the same table and have the "hotness" value have a very different meaning per algorithm, but i don't have a super strong opinion either way!

strideynet · 2023-08-13T16:59:00Z

still writing up some design notes for this, but what do you think about different tables for different algorithms rather than just putting them all in the same table? i think my feeling is that because each algorithm is semantically distinct, it might not make a lot of sense to put them in the same table and have the "hotness" value have a very different meaning per algorithm, but i don't have a super strong opinion either way!

I see the argument from a semantic side, as the hotness across different algos won't be comparable, but I do think that splitting the tables for it will be more pain than it's worth. It'll reduce our ability to introduce new algos dynamically in future and general housekeeping tasks will be more complex (e.g the background task that cleans out old post hotness scores).

I'm also unsure how well sqlc and other parts of our toolchain will play with this.

If we wanted to track what went into the hotness score for debugging purposes, we could probably just a JSONB field for this (especially as I doubt we'll ever search by it and it'd mostly be for debugging)

itstolf · 2023-08-13T19:33:37Z

leaving this here for now until it finds a better home:

schema

CREATE TABLE post_hotness (
    uri TEXT PRIMARY KEY,
    alg TEXT NOT NULL,
    score REAL NOT NULL,
    generated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX post_hotness_score_idx ON post_hotness (alg, score);

formula

timebase = 2 # hours
gravity = 1.85
score = likes / (t + timebase) ** gravity

materializing query (every 5 minutes)

BEGIN;

DELETE FROM post_hotness
WHERE generated_at < NOW() - INTERVAL '30 minutes';

INSERT INTO post_hotness (uri, alg, score)
SELECT
    cp.uri,
    'classic',
    (SELECT COUNT(*) FROM candidate_likes cl WHERE cl.subject_uri = cp.uri AND cl.deleted_at IS NULL) /
        (EXTRACT(EPOCH FROM NOW() - cp.created_at) / (60 * 60) + 2) ^
        1.85
FROM candidate_posts cp
WHERE
    cp.deleted_at IS NULL AND
    cp.created_at >= NOW() - INTERVAL '48 hours';  -- only compute score over last 48 hours

COMMIT;

selection query

SELECT
    cp.*
FROM
    candidate_posts cp
INNER JOIN candidate_actors ca ON cp.actor_did = ca.did
INNER JOIN post_hotness ph
            ON ph.post_uri = cp.uri AND ph.alg = @alg AND
                ph.generated_at = @generated_at
WHERE
      cp.is_hidden = false
  AND ca.status = 'approved'
  AND (COALESCE($1::TEXT[], '{}') = '{}' OR $1::TEXT[] && cp.hashtags)
  AND ($2::BOOLEAN IS NULL OR COALESCE(cp.has_media, false) = $2)
  AND ($3::BOOLEAN IS NULL OR (ARRAY['nsfw', 'mursuit', 'murrsuit'] && cp.hashtags) = $3)
  AND (cp.indexed_at < $4)
  AND cp.deleted_at IS NULL
  AND (ph.hotness < @hotness_cursor)
ORDER BY
    ph.hotness DESC
LIMIT @_limit;

also reformatted some sql

add hotness queries and materializer (#25)

strideynet · 2023-08-20T18:40:06Z

Completed by #127

strideynet changed the title ~~Improve cursor for Algorithmic Feeds~~ Improve "what's hot" feed performanceand cursor Jul 30, 2023

strideynet changed the title ~~Improve "what's hot" feed performanceand cursor~~ Improve "what's hot" feed performance and cursor Jul 30, 2023

strideynet added the performance label Jul 30, 2023

strideynet added the backend label Aug 10, 2023

strideynet assigned itstolf Aug 13, 2023

strideynet changed the title ~~Improve "what's hot" feed performance and cursor~~ Background Post Hotness generation Aug 13, 2023

itstolf added a commit that referenced this issue Aug 14, 2023

add hotness queries and scaffolding for materializer (#25)

a45f75d

also reformatted some sql

itstolf added a commit that referenced this issue Aug 14, 2023

add hotness queries and scaffolding for materializer (#25)

8e9c764

also reformatted some sql

itstolf added a commit that referenced this issue Aug 14, 2023

add hotness queries and scaffolding for materializer (#25)

3f5694a

also reformatted some sql

itstolf added a commit that referenced this issue Aug 14, 2023

add hotness queries and scaffolding for materializer (#25)

08b1d7c

also reformatted some sql

itstolf added a commit that referenced this issue Aug 14, 2023

add hotness queries and scaffolding for materializer (#25)

62dc1dc

also reformatted some sql

itstolf added a commit that referenced this issue Aug 14, 2023

add hotness queries and scaffolding for materializer (#25)

40a6f88

also reformatted some sql

itstolf added a commit that referenced this issue Aug 14, 2023

add hotness queries and scaffolding for materializer (#25)

2dd6967

also reformatted some sql

itstolf added a commit that referenced this issue Aug 14, 2023

add hotness queries and periodic materializer (#25)

6437ea8

also reformatted some sql

itstolf added a commit that referenced this issue Aug 14, 2023

add hotness queries and periodic materializer (#25)

d33f5b4

also reformatted some sql

itstolf added a commit that referenced this issue Aug 14, 2023

add hotness queries and periodic materializer (#25)

3185591

also reformatted some sql

itstolf added a commit that referenced this issue Aug 14, 2023

add hotness queries and periodic materializer (#25)

d8cca76

also reformatted some sql

itstolf added a commit that referenced this issue Aug 14, 2023

add hotness queries and periodic materializer (#25)

f00b179

also reformatted some sql

itstolf added a commit that referenced this issue Aug 14, 2023

add hotness queries and periodic materializer (#25)

755ccd6

also reformatted some sql

itstolf added a commit that referenced this issue Aug 14, 2023

add hotness queries and periodic materializer (#25)

dd21faa

also reformatted some sql

itstolf added a commit that referenced this issue Aug 14, 2023

add hotness queries and periodic materializer (#25)

c99b044

also reformatted some sql

itstolf added a commit that referenced this issue Aug 15, 2023

add hotness queries and periodic materializer (#25)

44d85f5

also reformatted some sql

itstolf added a commit that referenced this issue Aug 15, 2023

add hotness queries and periodic materializer (#25)

c79e441

also reformatted some sql

strideynet added a commit that referenced this issue Aug 20, 2023

Merge pull request #127 from strideynet/itstolf/hot-mess

b907833

add hotness queries and materializer (#25)

strideynet closed this as completed Aug 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Background Post Hotness generation #25

Background Post Hotness generation #25

strideynet commented Jul 6, 2023 •

edited

Loading

strideynet commented Aug 4, 2023

itstolf commented Aug 13, 2023

strideynet commented Aug 13, 2023

itstolf commented Aug 13, 2023 •

edited

Loading

strideynet commented Aug 20, 2023

Background Post Hotness generation #25

Background Post Hotness generation #25

Comments

strideynet commented Jul 6, 2023 • edited Loading

strideynet commented Aug 4, 2023

itstolf commented Aug 13, 2023

strideynet commented Aug 13, 2023

itstolf commented Aug 13, 2023 • edited Loading

schema

formula

materializing query (every 5 minutes)

selection query

strideynet commented Aug 20, 2023

strideynet commented Jul 6, 2023 •

edited

Loading

itstolf commented Aug 13, 2023 •

edited

Loading