Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bluesky = fediverse: can't yet handle load from influx of new Brazilian users #1295

Closed
eusousu opened this issue Aug 31, 2024 · 9 comments
Closed

Comments

@eusousu
Copy link

eusousu commented Aug 31, 2024

Instructed my friends migration to bluesky to follow https://bsky.app/profile/ap.brid.gy so I could follow them on mastodon. But they never received the DM informing their mastodon handle and trying to retrive them directly from https://fed.brid.gy/[user] returns 404

Is it overloaded? Is it broken? Are us doing something wrong?

I followed from mastodon and it created my bsky handle in less than a minute and my posts are being bridged perfectly

@Huio-op
Copy link

Huio-op commented Aug 31, 2024

+1
Also having the same issue, my friends and I all followed the @ap.brid.gy account and received no messages.

@snarfed
Copy link
Owner

snarfed commented Aug 31, 2024

Hey all, sorry for the trouble! Bridgy Fed's Bluesky => fediverse direction is indeed overloaded and backed up right now due to the huge wave of new users and usage (10x normal!) from Brazil. It's working through the backlog, but it's definitely behind.

I've been optimizing a few things, and I have at least one or two more things I can do that won't take a bigger redesign, but I won't have much time this weekend to work on it. We'll see. Fingers crossed!

https://bsky.app/profile/pfrazee.com/post/3l2xupwbsfy2f
image
https://bsky.app/profile/bnewbold.net/post/3l2yppyjo6p2k
image

@eusousu
Copy link
Author

eusousu commented Aug 31, 2024

ohhh that's really a lot! Would it be possible to have a mean time for process completion on the main site somewhere?

people on bluesky are usually my least tech savy friends and asking them to follow an account for me to be able to follow them is already a difficulty to explain. When the process breaks the expectation it can lose the opportunity window and they give up the idea altogether as a broken thing (reason I could not convince them to create mastodon accounts again after 2022 exodus fail)

@snarfed
Copy link
Owner

snarfed commented Aug 31, 2024

Totally understood and agreed! And monitoring for our backlog is a good idea, at least internal if not public too. I've filed #1296 for that.

@qazmlp
Copy link

qazmlp commented Aug 31, 2024

people on bluesky are usually my least tech savy friends and asking them to follow an account for me to be able to follow them is already a difficulty to explain. When the process breaks the expectation it can lose the opportunity window and they give up the idea altogether as a broken thing (reason I could not convince them to create mastodon accounts again after 2022 exodus fail)

On that note, it would be great if there was a DM to inform users who get caught in the spam filter of the requirements, too.
Right now it just looks like it fails silently if the account isn't old enough, if I'm not mistaken.

(I'd file this as proper feature request, but I'm a bit 🫠 today. Maybe tomorrow.)

@snarfed snarfed changed the title Is the bridge from bluesky not working? Bluesky = fediverse: can't yet handle load from influx of new Brazilian users Sep 1, 2024
@snarfed
Copy link
Owner

snarfed commented Sep 1, 2024

So, overall firehose event rate is up something like 10x, from 100-200/s to 1-1.5k/s. I've gotten our event handling rate up a bit, but not yet enough to keep up, much less work through the backlog. So far I've tried:

  • moving all of our remaining logic (there wasn't a lot) off the main firehose client thread
  • switching CBOR parsing lib from hashberg dag-cbor to libipld (helps, but has a bug that makes us miss commits for our bridged users, which I haven't figured out yet)
  • even parallelized parsing itself

Still haven't managed to get higher than mid 100s of events/s, nor have I gotten CPU utilization above 1.5-2 cores, which implies that the websocket client itself is the bottleneck.

A bit more background in snarfed/arroba#39, but only somewhat relevant.

https://bskycharts.edavis.dev/static/dynazoom.html?plugin_name=edavis.dev%2Fbskycharts.edavis.dev%2Fbsky&start_iso8601=2024-08-25T12%3A00%3A03-0700&stop_iso8601=2024-09-01T16%3A15%3A39-0700&start_epoch=1724612403&stop_epoch=1725232539&lower_limit=&upper_limit=&size_x=800&size_y=400&cgiurl_graph=%2Fmunin-cgi%2Fmunin-cgi-graph

(doesn't include a number of event types, eg #account, #identity, #tombstone, maybe deletes, etc)

image

@therealNAAN
Copy link

I'm glad someone noticed and made this issue, because I wondered if it was my internet or if something had happened to this service. I don't envy the tough work needed but I am hoping for no working too hard! After all, breaks are important D: Especially with this massive scaling, jeez...Thank you for all your hard work, snarfed!

@snarfed
Copy link
Owner

snarfed commented Sep 7, 2024

Finally managed to optimize our firehose processing from ~200 events/s up to 3-4k/s. Bluesky => fediverse bridging backlog is down from its high of 3d to just 18h now, should be caught up and back to realtime in another 5h or so.

image

snarfed added a commit that referenced this issue Sep 7, 2024
snarfed added a commit that referenced this issue Sep 7, 2024
we're now caught up back to realtime on the ATProto firehose

for #1295
snarfed added a commit that referenced this issue Sep 7, 2024
@snarfed
Copy link
Owner

snarfed commented Sep 8, 2024

Done! Back to realtime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants