-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Background Post Hotness generation #25
Comments
SELECT
cp.*
FROM
candidate_posts cp
INNER JOIN candidate_actors ca ON cp.actor_did = ca.did
INNER JOIN post_hotness ph
ON ph.post_uri = cp.uri AND ph.algo = @algo AND
ph.generated_at = @generated_at
WHERE
cp.is_hidden = false
AND cp.deleted_at IS NULL
AND ca.status = 'approved'
AND (@require_tags::TEXT[] = '{}' OR @require_tags::TEXT[] <@ cp.tags)
AND (@exclude_tags::TEXT[] = '{}' OR NOT (@exclude_tags::TEXT[] && cp.tags))
AND (ph.hotness < @hotness_cursor)
ORDER BY
ph.hotness DESC
LIMIT @_limit; Following discussion with Tolf - we like the idea of having a background generation process that spits scores out to a table that can be joined in. |
still writing up some design notes for this, but what do you think about different tables for different algorithms rather than just putting them all in the same table? i think my feeling is that because each algorithm is semantically distinct, it might not make a lot of sense to put them in the same table and have the "hotness" value have a very different meaning per algorithm, but i don't have a super strong opinion either way! |
I see the argument from a semantic side, as the hotness across different algos won't be comparable, but I do think that splitting the tables for it will be more pain than it's worth. It'll reduce our ability to introduce new algos dynamically in future and general housekeeping tasks will be more complex (e.g the background task that cleans out old post hotness scores). I'm also unsure how well If we wanted to track what went into the hotness score for debugging purposes, we could probably just a JSONB field for this (especially as I doubt we'll ever search by it and it'd mostly be for debugging) |
leaving this here for now until it finds a better home: schemaCREATE TABLE post_hotness (
uri TEXT PRIMARY KEY,
alg TEXT NOT NULL,
score REAL NOT NULL,
generated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX post_hotness_score_idx ON post_hotness (alg, score); formulatimebase = 2 # hours
gravity = 1.85
score = likes / (t + timebase) ** gravity materializing query (every 5 minutes)BEGIN;
DELETE FROM post_hotness
WHERE generated_at < NOW() - INTERVAL '30 minutes';
INSERT INTO post_hotness (uri, alg, score)
SELECT
cp.uri,
'classic',
(SELECT COUNT(*) FROM candidate_likes cl WHERE cl.subject_uri = cp.uri AND cl.deleted_at IS NULL) /
(EXTRACT(EPOCH FROM NOW() - cp.created_at) / (60 * 60) + 2) ^
1.85
FROM candidate_posts cp
WHERE
cp.deleted_at IS NULL AND
cp.created_at >= NOW() - INTERVAL '48 hours'; -- only compute score over last 48 hours
COMMIT; selection querySELECT
cp.*
FROM
candidate_posts cp
INNER JOIN candidate_actors ca ON cp.actor_did = ca.did
INNER JOIN post_hotness ph
ON ph.post_uri = cp.uri AND ph.alg = @alg AND
ph.generated_at = @generated_at
WHERE
cp.is_hidden = false
AND ca.status = 'approved'
AND (COALESCE($1::TEXT[], '{}') = '{}' OR $1::TEXT[] && cp.hashtags)
AND ($2::BOOLEAN IS NULL OR COALESCE(cp.has_media, false) = $2)
AND ($3::BOOLEAN IS NULL OR (ARRAY['nsfw', 'mursuit', 'murrsuit'] && cp.hashtags) = $3)
AND (cp.indexed_at < $4)
AND cp.deleted_at IS NULL
AND (ph.hotness < @hotness_cursor)
ORDER BY
ph.hotness DESC
LIMIT @_limit; |
add hotness queries and materializer (#25)
Completed by #127 |
Right now, we calculate post hotness on the fly which makes cursoring and performance problematic.
We should calculate these every X minutes and persist them to a table we can join across. We should support multiple "algos" and store the last X post history.
Possibly only generate Post Hotness for posts w/in the last X hours (possibly 48/72) to keep the size of this a bit more bound.
The text was updated successfully, but these errors were encountered: