Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User Objects with wrong source_protocol #558

Closed
snarfed opened this issue Jun 21, 2023 · 14 comments
Closed

User Objects with wrong source_protocol #558

snarfed opened this issue Jun 21, 2023 · 14 comments

Comments

@snarfed
Copy link
Owner

snarfed commented Jun 21, 2023

I don't understand this one yet, collecting data points here.

Looks like that's because a number of Web users' obj_key Objects have the wrong source_protocol, eg https://tantek.com/ and https://snarfed.org/ are currently both set to activitypub. They both have updated before this launch last night though. tantek.com's is 2023-06-20 12:52:24 PDT , snarfed.org's is 2023-06-04 8:26:43 PDT. I haven't found the request that modified tantek.com's, even searching for Wrote Object https://tantek.com/, which Object._pre_put_hook should have emitted. Haven't searched for snarfed.org's yet.

Next steps:

  • do more of those searches, eg all requests around those times
  • Set all Web's stored Objects to source_protocol=web, then retry a response and see what happens
  • in receive, consider trying multiple possible protocols for each object all the way through. We'd probably need to add protocol to Object's key id though, which would be a pain.
@snarfed
Copy link
Owner Author

snarfed commented Jun 21, 2023

cc @tantek , this is why you're not currently seeing likes/reposts. Ugh.

@snarfed
Copy link
Owner Author

snarfed commented Jun 22, 2023

Looks like this was from intra-BF follows, eg @snarfed.org@snarfed.org currently follows @tantek.com@tantek.com through BF. Ugh. When either of those accounts publishes a new post, BF delivers an AP Create to itself with object https://fed.brid.gy/r/[URL] (below). We store the Create activity itself under its full fed.brid.gy id, but we redirect-unwrap the inner object's id to and store it in an Object with that unwrapped id, https://tantek.com/2023/171/t1/anniversaries-microformats-posse, clobbering the existing one and overwriting its source_protocol to be activitypub.

{
  "id": "https://fed.brid.gy/r/https://tantek.com/2023/171/t1/anniversaries-microformats-posse#bridgy-fed-create",
  "actor": "https://fed.brid.gy/tantek.com",
  "@context": "https://www.w3.org/ns/activitystreams",
  "type": "Create",
  "object": {
    "type": "Note",
    "id": "https://fed.brid.gy/r/https://tantek.com/2023/171/t1/anniversaries-microformats-posse",
    "url": "https://fed.brid.gy/r/https://tantek.com/2023/171/t1/anniversaries-microformats-posse",
    "attributedTo": {
      "type": "Person",
      "id": "https://fed.brid.gy/tantek.com",
      "url": "https://fed.brid.gy/r/https://tantek.com/",
      "preferredUsername": "tantek.com"
      "..."
    },
    "..."
  },
  "..."
}

OK so, ugh, what do we do. I think I'll just block intra-BF follows and activities entirely for now, try to clean up existing source_protocols, and see if that helps.

Remaining open questions:

  • I suspect existing Web users' objects' source_protocol got set to ActivityPub by something else? This may not fix that.
  • I still suspect I need to make Object per-protocol and add protocol to its key id. Ugh. Ugh ugh ugh.

@snarfed
Copy link
Owner Author

snarfed commented Jun 22, 2023

We currently have 16 of these intra-BF follows (ie with to or from on fed.brid.gy) , 10 active. I'll mark them all inactive.

@snarfed
Copy link
Owner Author

snarfed commented Jun 22, 2023

Discovered one possibly relevant issue: when we loaded an Object and got one from protocol.objects_cache, and then modified that object in memory, but didn't put() it, those modifications stuck around in the cached object across requests. Ugh. Ugh ugh ugh. Fixing that to return a copy instead now.

@snarfed
Copy link
Owner Author

snarfed commented Jun 22, 2023

Wooowwww, also discovered that ndb's in-memory cache has the same bug. And that cache is on by default! I didn't realize that. Oof. Will report that, but not sure what to do. Guess I have to pass use_cache=False to all gets and puts? Since context() doesn't accept it. Grr.

@snarfed
Copy link
Owner Author

snarfed commented Jun 22, 2023

^ filed an ndb bug

snarfed added a commit that referenced this issue Jun 23, 2023
...so that modifications aren't durable in memory until we put() them. for #558
snarfed added a commit that referenced this issue Jun 23, 2023
it has a bug/weird behavior that we want to avoid, at least for now.

googleapis/python-ndb#888
#558
@snarfed
Copy link
Owner Author

snarfed commented Jun 23, 2023

I disabled all of those followers with:

for f in (
  Follower.query(
    Follower.to >= ActivityPub(id='https://fed.brid.gy/').key,
    Follower.to < ActivityPub(id='https://fed.brid.gz/').key).fetch()
  + Follower.query(
     Follower.from_ >= ActivityPub(id='https://fed.brid.gy/').key,
     Follower.from_ < ActivityPub(id='https://fed.brid.gz/').key).fetch()
  + Follower.query(Follower.from_ == Web(id='fed.brid.gy').key).fetch()
  + Follower.query(Follower.to == Web(id='fed.brid.gy').key).fetch()
  ):
  f.status = 'inactive'
  f.put()

snarfed added a commit that referenced this issue Jun 23, 2023
also other ActivityPub id validation. #558
@snarfed
Copy link
Owner Author

snarfed commented Jun 23, 2023

Similar cleanup for users, only had two of them with activitypub Objects, set them both to web.

>>> webs = Web.query().fetch()
>>> len(webs)
>>> 968
>>> User.load_multi(webs)
>>> len([w for w in webs if w.obj and w.obj.mf2])
254
>>> len([w for w in webs if w.obj and w.obj.as1])
254
>>> len([w for w in webs if w.obj and w.obj.as1 and not w.obj.mf2])
0
>>> len([w for w in webs if w.obj and w.obj.source_protocol in ('web', 'webmention')])
948
>>> len([w for w in webs if w.obj and w.obj.source_protocol in ('ap', 'activitypub')])
2
>>> [w.key for w in webs if w.obj and w.obj.source_protocol in ('ap', 'activitypub')]
[Key('MagicKey', 'evanstoner.com'), Key('MagicKey', 'kandr3s.co')]
>>> w = Web.get_by_id('https://kandr3s.co/'); w.obj.source_protocol = 'web'; w.obj.put()
Key('Object', 'kandr3s.co')
>>> w = Web.get_by_id('evanstoner.com'); w.obj.source_protocol = 'web'; w.obj.put()
Key('Object', 'https://evanstoner.com/')

@snarfed
Copy link
Owner Author

snarfed commented Jun 23, 2023

More, for posts:

>>> ts = Object.query(Object.key >= Object(id='https://tantek.com/').key,
   Object.key < Object(id='https://tantek.con/').key).fetch()
...
>>> len(ts)
>>> 152
>>> len([t for t in ts if t.source_protocol in ('ap', 'activitypub')])
17
>>> for t in ts:
  if t.source_protocol in ('ap', 'activitypub'):
    t.source_protocol = 'web'
    t.put()
...

>>> ss = Object.query(Object.key >= Object(id='https://snarfed.org/').key,
   Object.key < Object(id='https://snarfed.orh/').key).fetch()
...
>>> len(ss)
709
>>> len([s.key for s in ss if s.source_protocol in ('ap', 'activitypub')])
35
>>> for s in ss:
  if s.source_protocol in ('ap', 'activitypub'):
    s.source_protocol = 'web'
    s.put()
...

@snarfed
Copy link
Owner Author

snarfed commented Jun 23, 2023

Tested a few AP => wm likes, they all worked. OK, phew, tentatively closing. For now.

@snarfed snarfed closed this as completed Jun 23, 2023
@antont
Copy link

antont commented Nov 5, 2023

^ filed an ndb bug

would you have a link for that?

@snarfed
Copy link
Owner Author

snarfed commented Nov 6, 2023

Yup, googleapis/python-ndb#888, it's visible as a cross-reference link here in the issue just above that comment, hence the ^ arrow.

Also consider googleapis/python-ndb#765 if you're using ndb, sadly 😢.

@antont
Copy link

antont commented Nov 6, 2023

Ah, thanks. Maybe I knew that .get behaviour, I guess is intentional there and won't change.

I'm not using NBD now, but used it back in 2011-2012 for a mobile game high score list, and liked it a lot actually. Recently used postgres on the google cloud, an now we use Firebase, which Google bought many years ago, and which also provides a similar (?) document db. I'm using Pydantic to get like a schema for the otherwise ndb like schemaless firebase use. The Python SDK for Firebase supports async use.

@snarfed
Copy link
Owner Author

snarfed commented Nov 6, 2023

Personally, I doubt that get behavior is intentional, it seems pretty incorrect and dangerous to me. No one has responded yet, so 🤷.

Regardless, yes! Firebase's db Firestore (in "Datastore mode") is the backend for ndb now too, ndb itself is just a thin client library. Firestore is great! Postgres etc too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants