Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bluesky: ValueError: Expected activity_id to be at:// URI; got 3k4celwmjo22p #1575

Closed
snarfed opened this issue Oct 25, 2023 · 17 comments 路 Fixed by #1582
Closed

Bluesky: ValueError: Expected activity_id to be at:// URI; got 3k4celwmjo22p #1575

snarfed opened this issue Oct 25, 2023 · 17 comments 路 Fixed by #1582

Comments

@snarfed
Copy link
Owner

snarfed commented Oct 25, 2023

First Bluesky bug! Congrats and condolences @JoelOtter 馃槑. Looks like the discover function, ie the "Resend for post" form on the user page, is unhappy. Caught by our crash reporter; I've added you to the Google Cloud project, you should be able to see it at https://console.cloud.google.com/errors/detail/CKDu3sTr_biL4AE;time=P30D?project=brid-gy if you log in with your Google account.

Stack trace:

 ValueError: Expected activity_id to be at:// URI; got 3k4celwmjo22p
at .get_activities_response ( /layers/google.python.pip/pip/lib/python3.11/site-packages/granary/bluesky.py:985 )
at .dispatch_request ( /workspace/tasks.py:491 )
at .background_handle_exception ( /workspace/flask_background.py:43 )

Full logs from one request:

[25/Oct/2023:14:26:28 -0700] POST /_ah/queue/discover HTTP/1.1 500

Params: [('source_key', 'agdicmlkLWd5ci0LEgdCbHVlc2t5IiBkaWQ6cGxjOmZkbWU0Z2I3bXU3enJpZTdwZWF5N3RzdAw'), ('post_id', '3k2b4xcjba22o')]
Got source: Bluesky(key=Key('Bluesky', 'did:plc:fdme4gb7mu7zrie7peay7tst'), auth_entity=Key('BlueskyAuth', 'did:plc:fdme4gb7mu7zrie7peay7tst'), blocked_ids=None, created=DatetimeWithNanoseconds(2023, 10, 25, 20, 44, 17, 875687, tzinfo=datetime.timezone.utc), domain_urls=['https://snarfed.org/'], domains=['snarfed.org'], features=['listen'], last_activities_cache_json=...
Source: snarfed.org (Bluesky) did:plc:fdme4gb7mu7zrie7peay7tst, https://brid.gy/bluesky/did:plc:fdme4gb7mu7zrie7peay7tst
no refresh_token
226 lexicons loaded
Using server at https://bsky.social/
com.atproto.server.createSession: {} {'identifier': 'snarfed.org', 'password': '...'}
Validating com.atproto.server.createSession parameters
Running <function post at 0x3e84c297ff60> https://bsky.social/xrpc/com.atproto.server.createSession {'identifier': 'snarfed.org', 'password': '...'}  {'User-Agent': 'Bridgy (https://brid.gy/about)', 'Content-Type': 'application/json'}
Logged into https://bsky.social/ as did:plc:fdme4gb7mu7zrie7peay7tst, setting access_token
Validating com.atproto.server.createSession output
226 lexicons loaded
Using server at https://bsky.social/
[crash]
@JoelOtter
Copy link
Contributor

Did a little digging into this this morning:

  • I think this should be resolvable by implementing base_id for Bluesky in Granary in order to have it resolve URLs to AT URIs.
  • I've done this via the web_url_to_at_uri function we already have, which for this post returns a post like this:
    at://joelotter.com/app.bsky.feed.post/3kciw4eeuzc2x
  • This is a valid AT URI per the spec! However, it appears the Bluesky API can't actually resolve that handle to a DID. Doing app.bsky.feed.getPostThread on this returns a 404. Replacing joelotter.com with the DID makes it work
  • I'm therefore slightly unsure how to resolve this DID given base_id is a static method and so presumably doesn't have access to our client...thoughts?

@snarfed
Copy link
Owner Author

snarfed commented Oct 26, 2023

  • This is a valid AT URI per the spec! However, it appears the Bluesky API can't actually resolve that handle to a DID. Doing app.bsky.feed.getPostThread on this returns a 404. Replacing joelotter.com with the DID makes it work

Interesting that getPostThread chokes on it! Hmm. This kind of surprises me; I've filed bluesky-social/atproto#1778.

For our purposes, we can resolve handles to DIDs with either com.atproto.identity.resolveHandle, against PDS or BGS or AppView, or by doing handle resolution ourselves with eg arroba.did.resolve_handle. I'm not sure if I have a preference between the two yet.

I'm therefore slightly unsure how to resolve this DID given base_id is a static method and so presumably doesn't have access to our client...thoughts?

Totally ok to change base_id to a class method if necessary! Should be backward compatible.

@JoelOtter
Copy link
Contributor

Interesting, thanks! I asked about it in the Bluesky devs matrix chat and got the following info:

Looks like the current implementation looks at the local PDS database so it would only find the thread if it is synced to the PDS.

https://github.com/bluesky-social/atproto/blob/bb039d8e4ce5b7f70c4f3e86d1327e210ef24dc3/packages/pds/src/api/app/bsky/feed/getPostThread.ts#L125

To decide if a PDS should pull the post from joelotter.com's PDS would be a matter for the local PDSs allow/deny list of PDSs it is willing to resolve content from.

@JoelOtter
Copy link
Contributor

Seems we don't actually use arroba yet in Granary! Am I ok to add it as a dependency - should I have it install from Git or from Pip?

@snarfed
Copy link
Owner Author

snarfed commented Oct 26, 2023

Hah funny, I'm in the Discord but don't really follow the matrix chat, it tended to seem more theoretical and less hands on...which fits the answer here. That answer may technically be true, but doesn't really matter since federation isn't on in prod yet, so there's only one PDS, which has every post and thread. Not to mention the AppView, which is expected to have most/all posts, even after federation, has the same problem. Both are pretty clearly requiring DID-based at:// URIs right now. 馃し

@snarfed
Copy link
Owner Author

snarfed commented Oct 26, 2023

Seems we don't actually use arroba yet in Granary! Am I ok to add it as a dependency - should I have it install from Git or from Pip?

Hmm. On second thought, base_id currently doesn't make network requests, and I'm a bit reluctant to have it start. I wonder if there's a different place to do this that would be less invasive there?

@JoelOtter
Copy link
Contributor

Could it resolve in the original post discovery? I fear we might still end up doing it in web_url_to_at_url though which gets called from all over presumably

@JoelOtter
Copy link
Contributor

Looks like it's not actually used anywhere yet. I'll see if I can make the change :)

@snarfed
Copy link
Owner Author

snarfed commented Oct 26, 2023

Hmm yeah. And web_url_to_at_url probably shouldn't make network requests either.

So this specific issue is unrelated to original post discovery, but in both cases, we primarily care about normalizing URLs for the user's own posts. We already have the user's handle and DID, so how about we start small: in original post discovery and the discover task, we look for the user's handle specifically, and if it's in the URL, convert that to DID. Otherwise, we do nothing.

I'm down to try to do more unified global conversion of bsky.app URL => at:// URIs and handles => DIDs, but I'm not sure we quite have a handle on the best approach yet.

@snarfed
Copy link
Owner Author

snarfed commented Oct 26, 2023

(Also fwiw Bridgy Fed uses web_url_to_at_uri, https://github.com/snarfed/bridgy-fed/blob/ea08bd91538e26da3b295a10999e7191cfd6ca13/atproto.py#L158 , but that hasn't shipped yet so it doesn't matter.)

@JoelOtter
Copy link
Contributor

Sounds good - should I do that in post_id do you think?

@JoelOtter
Copy link
Contributor

Or base_id really I should say

@JoelOtter
Copy link
Contributor

Wait I reread your previous comments, never mind, will need to add a new source method!

@JoelOtter
Copy link
Contributor

Just thinking, is this something where the "right" thing to do is actually to implement UrlCanonicalizer properly for Bluesky? Then this becomes trivial, probably

@snarfed
Copy link
Owner Author

snarfed commented Oct 26, 2023

Yeah I've been thinking about that! The problem is bsky.app vs at://. I expect we don't always want Bluesky's UrlCanonicalizer to convert bsky.app URLs to at:// URIs, but sometimes we do...?

@JoelOtter
Copy link
Contributor

This is one of those things where it's easy to do but hard to do right. I'll put something together now though we may want to revisit this as you say!

@snarfed
Copy link
Owner Author

snarfed commented Oct 26, 2023

Confirmed fixed! Thanks again @JoelOtter!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants