Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Bluesky support #1453

Closed
snarfed opened this issue Apr 11, 2023 · 50 comments
Closed

Add Bluesky support #1453

snarfed opened this issue Apr 11, 2023 · 50 comments
Labels

Comments

@snarfed
Copy link
Owner

snarfed commented Apr 11, 2023

Both backfeed and publish. I don't plan to do this myself, I'm working on adding Bluesky to Bridgy Fed instead, but I'd be happy to review and merge PRs here.

Granary already has data conversion support, but not API support. Straightforward to add using lexrpc, but we probably want to wait until they add OAuth bluesky-social/atproto#649, I don't want to ask for passwords.

@snarfed
Copy link
Owner Author

snarfed commented Apr 16, 2023

Interesting, they're adding app passwords as a stopgap until they implement full OAuth. https://staging.bsky.app/profile/pfrazee.com/post/3jtivygsgk72w cc @capjamesg @aaronpk

@snarfed
Copy link
Owner Author

snarfed commented May 18, 2023

granary now has much of the API support it would need, and Bridgy Fed support will still take some work, not to mention Bluesky's federation isn't on yet anyway, so maybe we should do this after all.

@cleverdevil
Copy link

I would be a user of this and would also be happy to test.

@vikanezrimaya
Copy link

Will be glad to test this! I think I would actually prefer to implement something internally for myself, but I've been a bridgy user for a long time for the birdsite integration, and now that birdsite is a hellhole, why not try it for Bluesky, if such an opportunity presents itself?

@snarfed
Copy link
Owner Author

snarfed commented Jun 29, 2023

Generic instructions on adding a new silo: https://bridgy.readthedocs.io/#adding-a-new-silo

@JoelOtter
Copy link
Contributor

Thinking of picking this up - does Bridgy have any existing flow for accepting (app) passwords or would you rather wait until it actually has OAuth?

@snarfed
Copy link
Owner Author

snarfed commented Jul 15, 2023

@JoelOtter hey, awesome, that'd be great! App passwords are fine, OAuth isn't in Bluesky team's near term plans.

Bridgy uses https://github.com/snarfed/oauth-dropins (https://oauth-dropins.appspot.com/ ) for auth, so the first step is probably to add it there. You could start with the code in https://github.com/snarfed/bluesky-atom/blob/main/app.py and HTML form in https://github.com/snarfed/bluesky-atom/blob/main/templates/index.html , extract them out, and use them to add Bluesky to oauth-dropins. It wouldn't need the OAuth handlers, but it would need a BaseAuth subclass, Bridgy needs that.

Alternatively we could just do all of that in Bridgy itself, directly, and ignore oauth-dropins. Right now all silo auth in Bridgy goes through oauth-dropins handlers, but that might not be strictly necessary.

I don't know which path ^ I prefer yet. Feel free to poke at the code and see what you think.

@snarfed snarfed closed this as completed Jul 15, 2023
@snarfed snarfed reopened this Jul 15, 2023
@JoelOtter
Copy link
Contributor

Thanks! I'll have a look at the dropins. From my reading of the code this evening it does seem like unpicking the auth handler stuff would be rather painful so I think adding it to oauth-dropins makes sense, as long as you don't object to that repo having an oauth dropin that isn't, y'know, oauth

@snarfed
Copy link
Owner Author

snarfed commented Jul 15, 2023

Hah, understood, and yeah adding Bluesky to oauth-dropins is totally ok with me. Even ignoring granary and Bridgy itself entirely, that would be a great first step, and largely necessary.

@JoelOtter
Copy link
Contributor

Spent a few hours on this today. I've added a pretty simple auth handler to oauth-dropins, and the associated bits on Bridgy. I think now I need to add some more stuff to Granary - at minimum, it's complaining that user_to_actor isn't implemented for Bluesky, and I suspect there'll be a few other things too.

Couple of notes:

  • I'm getting dependency resolution errors for google-api-core and google-cloud-core on latest main, not sure what's up with that but I had to downgrade one of them to make it work locally.
  • There are some errors happening due to (I think) version mismatches between the Pip-published modules and the ones on Git. I had to clone and link against a local lexrpc to avoid a __init__() got an unexpected keyword argument 'headers' error in Granary when it creates the Bluesky client
  • Is there a lint script or anything? I'm producing some very ugly Python

I'll keep chipping away at this. Having fun! But brain a bit melted now.

@snarfed
Copy link
Owner Author

snarfed commented Jul 16, 2023

Awesome progress! Thanks for the details, and for sticking with it through those stumbling blocks.

In general you'll want to pip install -r requirements.txt and use a separate virtualenv for each project. You're right, head of each project may depend on git head of a few dependencies like lexrpc, I try to resolve those when I release, but otherwise those packages use git specifiers in requirements.txt.

I run flake8 occasionally, but it's not yet integrated into CI or anything. Feel free to use a linter locally for your code as long as it matches the rest of the project's style! And apologies for the two space indents, these projects all started long ago when I was young and foolish. 🤷‍♂️

@snarfed
Copy link
Owner Author

snarfed commented Jul 16, 2023

@JoelOtter also for projects like this where you're making interdependent changes to oauth-dropins, granary, and Bridgy, I recomend installing them with -e in each other, so that your local granary uses your local oauth-dropins, and so on. Search for at the same time here and here.

@JoelOtter
Copy link
Contributor

@JoelOtter also for projects like this where you're making interdependent changes to oauth-dropins, granary, and Bridgy, I recomend installing them with -e in each other, so that your local granary uses your local oauth-dropins, and so on. Search for at the same time here and here.

Yep I've done so, the docs are great :)

@JoelOtter
Copy link
Contributor

Something I've run into, not sure if it's a Bridgy thing or a Granary thing - shares/reposts on Bluesky don't appear to have an ID at all like they do on Twitter/Mastodon, which means the handling in tasks.pyL156 fails as there are as2 activities without IDs coming back from Granary. What should we do with these - simply exclude them? I presume that means we won't be able to backfeed is_repost_of stuff for Bluesky.

@snarfed
Copy link
Owner Author

snarfed commented Jul 17, 2023

Huh! At minimum they'll have a CID at the ATP level. Is that not exposed at the Bluesky lexicon level?

If not, we can always synthesize an id, eg [post-id]#reposted-by-[DID], but we should make sure there really is nothing available first.

@JoelOtter
Copy link
Contributor

Managed to get the repost URL out of the viewer block :)

@JoelOtter
Copy link
Contributor

Had a bit of a breakthrough this morning! Still lots of things not handled but likes and reposts are coming through now.

Screenshot 2023-07-20 at 11 16 40

@snarfed
Copy link
Owner Author

snarfed commented Jul 20, 2023

Whoa, that's awesome, so cool! Congrats!

@JoelOtter
Copy link
Contributor

JoelOtter commented Jul 20, 2023

Here's an interesting note: Bluesky doesn't count quote posts as reposts, and there's no way to see who quote-posted something (as far as I can tell?). A quote post itself is just a regular post with an embed, like a link would be. It's an interesting implementation and I kind of like it.

EDIT: Looks like we might be able to get them from notifications though I don't super love that approach.

@snarfed
Copy link
Owner Author

snarfed commented Jul 20, 2023

Interesting indeed! The UX-level semantics don't matter too much to us here, but the choices of what to backfeed, and how we fetch that, obviously do.

Bridgy/granary do have precedent for getting some backfed data from notifications, I think we currently do it for GitHub, at least, if not others.

Regardless though, feel free to ship the first version of this without quotes if you want!

@JoelOtter
Copy link
Contributor

Yeah I think first version will just be backfeeding of likes, reposts and replies. Publishing after that, then quotes later probably.

@JoelOtter
Copy link
Contributor

Would you prefer individual GH issues be created for each of those or shall we just use this issue to track the whole thing?

@snarfed
Copy link
Owner Author

snarfed commented Jul 20, 2023

Up to you! I'm not too particular.

@JoelOtter
Copy link
Contributor

Replies working now :) I'm struggling to test the actual webmention sending. It seems to discover the syndication links fine but doesn't send anything - is that a separate job on the background worker or should it be sending as part of the regular poll?

@snarfed
Copy link
Owner Author

snarfed commented Jul 21, 2023

Woo, congrats! Yeah those are tasks run by Cloud Tasks. Locally, you can trigger them manually: https://bridgy.readthedocs.io/#development , search for To test a poll or propagate task.

@JoelOtter
Copy link
Contributor

Yeah turned out I needed to adjust the URL canonicaliser for Bluesky - Bluesky doesn't respond to HEAD requests, interestingly, so needed to disable the redirect testing.

Current state of affairs: it sends webmentions! I'm sending them to webmention.io which rejects them because it can't hit localhost, obviously. However I think more work needs done because the source URL looks like this:

http://localhost:8080/comment/bluesky/joelotter.com/at:/did:plc:ioz4ztghfznx4s5s4jxqiqun/app.bsky.feed.post/3k2xwstch532a/at:/did:plc:ioz4ztghfznx4s5s4jxqiqun/app.bsky.feed.post/3k2zcw7pksb2z

I think I need to do something here, either use CIDs instead of ATP URIs or just URL-encode them. Is there additional stuff I need to add to the Bluesky source in order for the /comment/bluesky/* endpoint to work, or should it Just Work™️ if I have the URLs formatted correctly?

@JoelOtter
Copy link
Contributor

Did some digging into this and I think there are a few interlinking issues with my current implementation:

  • Bridgy doesn't send the URL with encoding so the second slash after at:// gets removed
  • I can change the path handling in Flask to use e.g. /<path:post_id>, which makes it handle slashes, whether URL-escaped or not. Unfortunately it doesn't work if there are two in the path, e.g. for comment, regardless of whether they are URL-encoded or not!
  • It seems like the post activities are getting stored as AS objects, not as activities, so it breaks on loading

However, if I hack some stuff in to fix the above, it does actually at least fetch posts properly via the /post endpoint, so that's good! On the other hand I am losing my entire mind at this point and think I need a break. 😅 If it's OK with you I might push what I have up to a draft PR? Then hopefully you could spot any places where I'm doing something obviously stupid.

@snarfed
Copy link
Owner Author

snarfed commented Jul 22, 2023

Huge progress! Congrats!

Yeah "post id" in granary/Bridgy is generally just an id, not a full URI, so I'd recommend switching to just CID, as you mentioned. Hopefully that should fix those URLs. Not sure about activities vs objects getting stored, but I'm happy to look at a draft PR any time.

Congrats again, seems like you're almost there!

@JoelOtter
Copy link
Contributor

Have done a bit of research and I'm now unsure/confused as to whether using CIDs is the right move - CIDs are hashes of the content, so are not fixed for a given post and could feasibly change if, for example, Bluesky adds editing support. There's also no way to query the API by CID; it's essentially a content-integrity thing, I think. URIs are the only real supported thing.

What I could do is split URIs up into DIDs and Record Keys. This could then be reformatted into a URI, as we have enough information with those two bits. I'm unsure how invasive a change into Bridgy/Granary this would be though, might need to add lots of silo-specific logic which might mean extending the Source class.

Are IDs in Bridgy's DB unique per silo, per user, or globally?

@snarfed
Copy link
Owner Author

snarfed commented Jul 22, 2023

Yes! Good point about CIDs. Rkeys are a great idea, those should work fine, they'll have a slash in them that would need to be URL-encoded, but that's fine. Actually even just the TID would be fine, which doesn't have the slash.

Post ids are expected to be unique per silo, not globally.

@snarfed
Copy link
Owner Author

snarfed commented Jul 22, 2023

Federation is another wrinkle here. TIDs have to be unique per PDS, but not across them. I don't think we've thought through how Bridgy would handle federated PDSes, right? Maybe we punt on those for now and say use Bridgy Fed for them instead?

@JoelOtter
Copy link
Contributor

The issue with Rkeys/TIDs is that they're unique per repository, which is to say, per user. Might need some more thought. Federated PDSes I don't know - my impression was that the AT protocol somehow would map a DID back to where that repo actually lives but I can't think how that would actually work. Prob best to park it for now as you say! I have a hunch that the pace of movement towards actual federation in Bluesky will be glacial.

@snarfed
Copy link
Owner Author

snarfed commented Jul 22, 2023

Hmm true. Unique per repo is fine for Bridgy's usage in URLs, since those are all per user, but not as datastore keys. I can look at that a bit closer later today.

Otherwise yeah agreed on Bluesky federation in general. The DID doc itself for a given user points to that users PDS's host. If/when we want Bridgy classic to handle all that, we can, but definitely not necessary now, hard coding bsky.app is totally fine.

@JoelOtter
Copy link
Contributor

JoelOtter commented Jul 22, 2023 via email

@snarfed
Copy link
Owner Author

snarfed commented Jul 22, 2023

Hmm! It's hard to change post ids after launch, since they're stored in Response keys in the datastore. I'm fine with at;// URIs though! Let's maybe see if we can URL-encode then in the existing handler paths first? If that somehow still doesn't work, I'm open to adding query param support.

@snarfed
Copy link
Owner Author

snarfed commented Jul 22, 2023

Oh and again this is awesome progress. Thank you so much! Tons of people threaten to contribute, very few actually do, and even fewer manage to do anything this substantial.

@JoelOtter
Copy link
Contributor

Managed to get them encoded for the regular paths, they need double-encoded because werkzeug (I think?) decodes them once, but the slashes still break Flask's path matching somehow. I've added a little overrideable method to the Bridgy Source class which formats stuff for URLs and is a no-op for everything else.

Here's what I have working, as best I can tell:

  • Logging in, linking to Bluesky account
  • Getting likes, reposts and replies
  • Detecting which of those need backfed to the website
  • Sending the webmentions for them
  • Responding to /comment, /like endpoints correctly with the formatting produced for the webmention.

I can't test 100% if the webmentions actually work, because webmentions.io can't reach localhost, but the URL sent is correct and does resolve locally, e.g. in here: https://webmention.io/www.joelotter.com/webmention/LFiNzptZ9IUaUt6Hw1a-

There are bound to be edge cases that aren't handled properly - for example I really haven't dug into the reply-to handling logic and don't know if I need to do anything specific there - but I think this is now good enough to get out into a PR. Thanks for all your help so far! I've had a lot of fun writing Python for the first time since...2014, wow.

@snarfed
Copy link
Owner Author

snarfed commented Jul 24, 2023

Awesome! Really great to hear. You navigated a lot to get it to this point, I'm psyched to see the code!

Any chance you've looked at writing tests yet? They don't have to be hugely involved, usually there's not a ton of logic in Bridgy source classes, but generally I like at least basic coverage in Bridgy and granary. (Never quite got into them in oauth-dropins, it's more UI and external heavy and less easy to test, but I should eventually bite the bullet and figure them out there too.)

@JoelOtter
Copy link
Contributor

JoelOtter commented Jul 24, 2023 via email

@snarfed
Copy link
Owner Author

snarfed commented Jul 24, 2023

Thanks, and true! Bridgy CI clones granary and oauth-dropins at head, granary does the same with oauth-dropins, so I'm fine with PRs on downstream repos failing CI until we get the upstream PRs in, and then we check that downstream goes green before we merge.

This ^ does mean that iterating tests on CI doesn't work with cross-repo changes, and it's slow anyway, so definitely get the tests running locally if you can!

@JoelOtter
Copy link
Contributor

Fair enough! I'll get some PRs out in the meantime and start getting tests together

@snarfed
Copy link
Owner Author

snarfed commented Oct 25, 2023

@JoelOtter shipped this! Woo, congrats, exciting! Firing up my account, here we go... https://brid.gy/bluesky/did:plc:fdme4gb7mu7zrie7peay7tst

@snarfed
Copy link
Owner Author

snarfed commented Oct 25, 2023

Woo, it's working! Lots of example webmentions backfed from Bluesky on https://snarfed.org/2023-10-06_bridgy-fed-status-update-7#comment-2868110

@aaronpk
Copy link
Contributor

aaronpk commented Oct 25, 2023

Exciting! I just connected my site but it looks like it's not finding my bluesky posts. I marked up my posts with a syndication URL that has the at:// URL for the post, which should be enough for bridgy to find it. Has that not been implemented yet?

@snarfed
Copy link
Owner Author

snarfed commented Oct 25, 2023

@aaronpk oh interesting! Yeah it currently expects https://bsky.app/... synd URLs, not at://. We could definitely look into adding those too though!

@cleverdevil
Copy link

Successfully connected my site, and it is seeing both my posts and interactions, but I am getting "[No webmention targets]". I have my Bluesky profile linked on my site and I have my site linked on my Bluesky profile, so not sure what's up. Bridgy works great for my Mastodon account.

Great work on this feature! Really excited to see the ever-expanding support in Bridgy 💪

@snarfed
Copy link
Owner Author

snarfed commented Oct 25, 2023

Hey @cleverdevil, thanks, good to hear from you! Looks like your posts on https://cleverdevil.io/content/statusupdates/ all have at:// URI synd links, like @aaronpk above. Bridgy currently only supports https://bsky.app/... synd links. Should be straightforward to support those though, I'll file an issue to track.

@cleverdevil
Copy link

Oh, right, I remember that I did use at:// in the Known integration I wrote, and did transformations client side to https://bsky.app links, which is why I didn't notice. I'll happily wait for the change, or I'll make changes on my side.

@cleverdevil
Copy link

Okay, I made a quick change that performs the transformation server-side, and now everything is working great. There are older posts that aren't yet picking up the webmention targets, but I am guessing it will catch up eventually, and posts going forward should backfeed just fine.

Thanks again for the hard work, @JoelOtter!

@snarfed
Copy link
Owner Author

snarfed commented Oct 26, 2023

Lots of follow-up issues filed, but I think we can close this one out. Huge congrats and thanks @JoelOtter!!!

@snarfed snarfed closed this as completed Oct 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants