Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds new cache db and writes a materialized view with score every 10 minutes #1

Merged
merged 11 commits into from
Nov 16, 2024

Conversation

chapel
Copy link
Contributor

@chapel chapel commented Nov 15, 2024

This allows us to have a consistent view every 10 minutes that is very fast to query against even with hundreds of thousands of posts in the table.

Because the cache is written out of band, the slowness of the query to create the view for the cache is largely unnoticed, though you're using the sync libsql API and it might make sense to use the async one to mitigate freezing the node.js process while any long queries happen.

I did my best to port the existing cursor to the new one, I couldn't test this as a feed in the UI so I don't know how it feels but all signs point to it being an overall better experience and easier to maintain.

Long term you could create feeds based just on likes, comments, reposts, or whatever you wanted to track because the queries against the cache db are super cheap.

src/api.ts Outdated Show resolved Hide resolved
src/lib/db.ts Outdated Show resolved Hide resolved
src/lib/feed.ts Outdated Show resolved Hide resolved
Copy link
Owner

@hipstersmoothie hipstersmoothie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks amazing so far. Really improved the sql in a lot of ways.

My one questions is about the timestamp not being in the cache key, they seems important to preserve functionality.

Main requirement: as you scroll you scroll into the results you would see for the feed at the time you pulled it. Only upon refresh does the "new feed" get presented to the user

@chapel
Copy link
Contributor Author

chapel commented Nov 15, 2024

So as I mentioned above about multiple cached views to lock people into them, I can try that if that is something you feel is a must.

Copy link
Owner

@hipstersmoothie hipstersmoothie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you write some comments in the readme that explain the cache structure? I understand now but future me might not

items: Object.fromEntries(items),
cursor: `${startTime}/${(cursor.index || 0) + limit}`,
items: posts,
cursor: `${cursor.time || posts[posts.length - 1]?.dateWritten}/${
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made 2 changes. I noticed that when requesting a feed with a cursor with would return a newer time. Now it should always re-use the cursor and preserve the feed the user is looking at

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the cursor time is not from the stored write times, we give you the closest newest view that is cached. By keeping the time that isn't from the cache we end up doing that lookup each time (this is a minor performance impact) but ultimately they are getting the same view either way.

This is only a case if it is an old cursor before my changes, or if a cached view happened to be removed after a while when viewing (at which point whatever they were looking at before would have changed under their feet anyways if that makes sense).

It is a minor thing overall as either way they should get the same view of the feed so long as the cursor time isn't changing between pulls (other than in the case we set the cursor time to the one tied to the view they are getting).

I had the code this way eventually, but it seemed more correct to give back the actually viewed written date which will absolutely guarantee to view the same view.

trendingLinksHourly({ limit: 30 }),
]);
},
cronTime: "0/10 * * * *",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also upped this to every 10 minutes instead of every hour

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure you have a large enough disk on your server. Just from running it a few days off and on for testing, with less than 24 cache view writes my cache.db is like is 1.6GB. You know if we cut off the low score links, hell maybe only the 0 score links, that would likely cut off a ton of items from the cache. I'll test that out.

@hipstersmoothie hipstersmoothie merged commit 1503e18 into hipstersmoothie:main Nov 16, 2024
@chapel
Copy link
Contributor Author

chapel commented Nov 16, 2024

Sorry I didn't get to writing anything in the readme yet. I've been busy with work and was planning on doing that after. I can open a new PR with that as I think it is worthwhile putting it somewhere so any context about it isn't lost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants