-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds new cache db and writes a materialized view with score every 10 minutes #1
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks amazing so far. Really improved the sql in a lot of ways.
My one questions is about the timestamp not being in the cache key, they seems important to preserve functionality.
Main requirement: as you scroll you scroll into the results you would see for the feed at the time you pulled it. Only upon refresh does the "new feed" get presented to the user
So as I mentioned above about multiple cached views to lock people into them, I can try that if that is something you feel is a must. |
for up to 24 hours consistent feed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you write some comments in the readme that explain the cache structure? I understand now but future me might not
items: Object.fromEntries(items), | ||
cursor: `${startTime}/${(cursor.index || 0) + limit}`, | ||
items: posts, | ||
cursor: `${cursor.time || posts[posts.length - 1]?.dateWritten}/${ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made 2 changes. I noticed that when requesting a feed with a cursor with would return a newer time. Now it should always re-use the cursor and preserve the feed the user is looking at
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the cursor time is not from the stored write times, we give you the closest newest view that is cached. By keeping the time that isn't from the cache we end up doing that lookup each time (this is a minor performance impact) but ultimately they are getting the same view either way.
This is only a case if it is an old cursor before my changes, or if a cached view happened to be removed after a while when viewing (at which point whatever they were looking at before would have changed under their feet anyways if that makes sense).
It is a minor thing overall as either way they should get the same view of the feed so long as the cursor time isn't changing between pulls (other than in the case we set the cursor time to the one tied to the view they are getting).
I had the code this way eventually, but it seemed more correct to give back the actually viewed written date which will absolutely guarantee to view the same view.
trendingLinksHourly({ limit: 30 }), | ||
]); | ||
}, | ||
cronTime: "0/10 * * * *", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also upped this to every 10 minutes instead of every hour
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure you have a large enough disk on your server. Just from running it a few days off and on for testing, with less than 24 cache view writes my cache.db
is like is 1.6GB. You know if we cut off the low score links, hell maybe only the 0 score links, that would likely cut off a ton of items from the cache. I'll test that out.
Sorry I didn't get to writing anything in the readme yet. I've been busy with work and was planning on doing that after. I can open a new PR with that as I think it is worthwhile putting it somewhere so any context about it isn't lost. |
This allows us to have a consistent view every 10 minutes that is very fast to query against even with hundreds of thousands of posts in the table.
Because the cache is written out of band, the slowness of the query to create the view for the cache is largely unnoticed, though you're using the sync libsql API and it might make sense to use the async one to mitigate freezing the node.js process while any long queries happen.
I did my best to port the existing cursor to the new one, I couldn't test this as a feed in the UI so I don't know how it feels but all signs point to it being an overall better experience and easier to maintain.
Long term you could create feeds based just on likes, comments, reposts, or whatever you wanted to track because the queries against the cache db are super cheap.