Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ATProto: when post or profile bio text is truncated, store full text in custom field in record #1092

Closed
snarfed opened this issue May 29, 2024 · 15 comments
Labels

Comments

@snarfed
Copy link
Owner

snarfed commented May 29, 2024

...or maybe always do this, regardless of whether it's truncated?

This would help with eg #986 (comment)

@mackuba
Copy link

mackuba commented May 31, 2024

Bryan also suggested that it could make sense to also store the URL of the original post as another field (regardless of the (optional) embed).

@snarfed
Copy link
Owner Author

snarfed commented Jun 6, 2024

From #986:

@snarfed:

Anyone know the etiquette for choosing field names for custom fields in app.bsky.* lexicons that you don't own? I'm not sure what to name these fields. eg fullText, bridgyFedFullText, fediverseFullText, bridgyFedFediverseFullText, ...?

@qazmlp:

Anyone know the etiquette for choosing field names for custom fields in app.bsky.* lexicons that you don't own? I'm not sure what to name these fields. eg fullText, bridgyFedFullText, fediverseFullText, bridgyFedFediverseFullText, ...?

I think the answer to that is basically 'don't', according to https://atproto.com/specs/lexicon#authority-and-control.

Is there a way to specify a secondary NSID similarly to how you can import multiple namespaces in AP?

@snarfed:

Is there a way to specify a secondary NSID similarly to how you can import multiple namespaces in AP?

No, there isn't. Custom fields in existing records is generally what people do right now, there's plenty of precedent for it in the ATProto ecosystem. It's true that the docs you linked say:

This is not the recommended way to extend applications, but it is not specifically disallowed.

...but there's no other realistic extension pattern yet.

I guess one way would be to store our own separate records, and publish logic for finding those records. That's much less discoverable or usable than custom fields in existing records though. I haven't heard of any external developers using "sidecar" records like that yet.

@mackuba:

Hey! I've done some digging for this today. It looks like we're still at the stage where there aren't really any clear conventions for this, or to put it differently, the stage where the conventions are being formed by what we're doing ;)

I've scanned the last few days worth of posts for unexpected record keys, and it looks like the only one of this kind is from the Skeets app by @seboslaw, which I think uses this for the post editing feature. The key there is named skeetsAppHistory.

There are also several non-standard but very generic-sounding keys (and probably in most cases added by mistake), like: via, type, alt, labels etc. This is probably not a good idea.

Some (inconclusive) threads I've found about this:

As I understand it, something like gy.brid.fed.originalPost would probably be technically more "correct". But it looks ugly ;) Something like bridgyOriginalPost, bridgyOriginalPostContent or bridgyFedFullText looks better imho and I would go in this direction (we can bikeshed a bit about the specific name). In any case, apps will probably eventually use both of these conventions and more, and ultimately what matters is only that a key is well defined & documented so that others can make use of it if it makes sense, and to minimize the probability of collisions between 3rd party and Bluesky and between two 3rd parties by doing anything in that direction… I'd think that anything that starts with "bridgy" is extremely unlikely to accidentally create a collision with anything else.

(tagging also @bnewbold)

@snarfed
Copy link
Owner Author

snarfed commented Jun 6, 2024

The other prior art from Bridgy Fed itself is the bridged-from-bridgy-fed-activitypub (and eventually bridged-from-bridgy-fed-web, etc) self labels that it adds to profiles of bridged users. Example:

  "uri": "at://did:plc:bg5udl25mvzg3rt7l5n2hzet/app.bsky.actor.profile/self",
  "cid": "bafyreicyi7lbvpu7zvrif4si63c32c5gc5jmizcz7kwnylsmsdvvwgsr6y",
  "value": {
    "$type": "app.bsky.actor.profile",
    "avatar": {
      "$type": "blob",
      "ref": {
        "$link": "bafkreibs7fzsi2ejadqnptikhkv5klrprdnjpikp5rropgeoj2kgssavea"
      },
      "mimeType": "image/png",
      "size": 170358
    },
    "description": "Follow me on the fediverse at [@snarfed.org](https://fed.brid.gy/r/https://snarfed.org/)! This account is just for DMs, testing, etc. 😎\n\n[bridged from https://indieweb.social/@snarfed on the fediverse by https://fed.brid.gy/ ]",
    "displayName": "Ryan Barrett",
    "labels": {
      "$type": "com.atproto.label.defs#selfLabels",
      "values": [
        {
          "val": "bridged-from-bridgy-fed-activitypub"
        }
      ]
    }
  }
}

@snarfed
Copy link
Owner Author

snarfed commented Jun 11, 2024

Another useful point here: there's a ton of prior art in the standards community on this kind of vendor prefixing. Notably, my understanding is that they generally now discourage it and believe it did more harm than good in ecosystems like CSS, HTTP headers, even C++, etc where it was adopted to some degree.

Haven't found a great description of this arc and history, but https://css-tricks.com/is-vendor-prefixing-dead/ and https://alistapart.com/article/the-vendor-prefix-predicament-alas-eric-meyer-interviews-tantek-celik/ are maybe decent starting points. cc @tantek et al.

TLDR: vendor prefixes last forever, confuse developers, cause bugs, rot over time, split and hurt compatibility/interop, and lead to tools like autoprefixer that exacerbate all of those issues.

Maybe we should follow that advice here, instead of re-learning those lessons hard way? Specifically, maybe we name generic fields like this generically, eg fullText, and let shared usage and interop emerge and maybe eventually even encourage lexicon owners to adopt them officially (aka "pave the cowpaths"), as opposed to making the same vendor prefixing mistakes with eg bridgyFullText.

@snarfed snarfed added the now label Jun 11, 2024
@snarfed
Copy link
Owner Author

snarfed commented Jun 11, 2024

@mackuba interesting note here: the original text from the fediverse for posts and and profile bios is often HTML, not plain text. Does that change anything about how you'd want to consume it in Skythread? I convert to plain text, so I could store that in the custom field instead, but I'm inclined to store the original HTML instead.

@mackuba
Copy link

mackuba commented Jun 11, 2024

Hmm, I think it would make sense to keep the original version, i.e. with HTML. Since it's possible to make plain text from HTML but not always the other way around ;) Right now I'm showing the full text by loading it from the Mastodon API, and I'm also getting HTML there, so I had to add an HTML sanitization library for this.

@mackuba
Copy link

mackuba commented Jun 11, 2024

Re: vendor prefixes - I don't think this is a good comparison… because in browsers they were replaced by unprefixed properties, but crucially, ones that are only accessible if an opt-in flag is enabled in the settings (until the property is standardized). So there's no risk of naming collision between browser API and websites JS, except on the developer machines of people who know what they're doing; and no risk of collisions between experimental properties and standard ones, because it's the same browser vendor that's adding both. Here, the problem we have is potential collisions between records created/read by different 3rd party apps, and between 3rd parties and Bluesky. And fields in a record aren't hidden behind a feature flag, they just either are there or aren't there; so this is a different situation. Also, experimental JS properties almost always are the first step with the goal of eventually becoming a normal standardized property shared by everyone, and here it's not clear at all if such property would make sense to everyone or not.

@mackuba
Copy link

mackuba commented Jun 11, 2024

Just thinking about this a bit more… Unlike properties in JS code, fields in records stay there forever, you can't edit old records. So if we add a field e.g. bridgyFullText now and we later decide that it's actually good to have a standard fullText field shared between apps, then we need to keep support for both properties if we want to support old records. But on the other hand, if we start adding fullText and at some point Bluesky decides to add a different fullText field that has different validation rules, it's theoretically possible that an updated record parser following the updated lexicon would start rejecting Bridgy post records, because they would suddenly no longer pass lexicon validation, since you can have any extra fields you want, but you can't have (now) built-in fields with invalid data (according to the newly updated lexicon). And if some third party apps add such field instead but with a somewhat different meaning, it could mean that Bridgy records could potentially trigger some unexpected behavior in those apps. This sounds like possibly bigger problems than having to support old and new property name - and if such property was added by Bridgy first prefixed and then standardized, it's likely that its definition/validation/behavior would change a least a bit anyway.

@bnewbold
Copy link

I'm a little hesitant to weigh in because this is an area it is kind of nice to have other folks experiment with towards consensus.

But! Here are my current thoughts:

  • the situation is a bit between already-namespaced schemas like atproto Lexicons, versus a "commons" namespace like CSS or HTML. there is a clear authority for every Lexicon schema, based on the NSID, and the potential conflicts are between the "authority" (which could have any governance structure internally) and external devs. A somewhat special case is the com.atproto.* namespace, and protocol schemas like DIDs or $blob: those feel more like the CSS/HTML situation to me.
  • I'm personally fairly against using "obvious" or "idiomatic" field names in other folks' Lexicons. it removes their ability to specify the same field with a different type or different semantics (eg, bool vs integer), which would break existing records and/or software. The general hope with the Lexicon system to make development and iteration possible without sacrificing interop and without breaking the ecosystem; letting folks build on top more safely.
  • I would recommend going with either a clear "brand" word camel-case prefix ("bridgyFullText") or NSID syntax as @mackuba recommended ("gy.brid.fed.originalPost")
  • we have some other brainstorms floating around, like maybe we could use a special reserved $-prefixed field in any object to store extensions with NSID sub-keys; this would make the extension part distinct from the original part. or having clear patterns/APIs for doing "sidecar", "wrapper", or "extension" records

@mackuba
Copy link

mackuba commented Jun 11, 2024

@bnewbold A slight offtopic while we're talking about record fields :) has there been any thinking about a "client/app" field that tells what app made the record and could be shown on the site below posts, like the classic "Twitter for iPhone" on Twitter? I've seen that some apps add a "via" field, but in this case it would have to be standardized to be any useful, and like you've said, this is a very obvious/generic field name.

@snarfed
Copy link
Owner Author

snarfed commented Jun 11, 2024

Thanks all, good conversation!

@bnewbold, definitely understood that you all probably recommend vendor prefixing! I don't know that the authority part is actually that different; standardized namespaces like CSS properties and HTTP headers have an authority too, the standards body. But you're right that lexicon owners will usually be more unilateral in how they evolve their lexicons, vs standards bodies that often try to pave cowpaths.

@mackuba re via, it seems like the same situation, right? Different providers may want to create records with the same extra info, whether it's client app or full text or anything else. Generic names lets consumers avoid special cases for every provider, but risk colliding with official lexicon changes in the future. 🤷

@mackuba
Copy link

mackuba commented Jun 11, 2024

Well, in this case it's like the one specific case where prefixing definitely doesn't make sense, because the value would be duplicating the key :)

@snarfed
Copy link
Owner Author

snarfed commented Jun 11, 2024

For redundancy, true. But the goal of vendor prefixing isn't efficiency, it's future-proofing and avoiding possible conflict with the lexicon owner, in case they ever add an official version of the field themselves with a different type, or name, or different limits, or anything else incompatible.

snarfed added a commit to snarfed/granary that referenced this issue Jun 12, 2024
snarfed added a commit to snarfed/granary that referenced this issue Jun 12, 2024
@snarfed
Copy link
Owner Author

snarfed commented Jun 12, 2024

OK! I went with bridgyOriginalDescription and bridgyOriginalUrl in app.bsky.actor.profile and bridgyOriginalText and bridgyOriginalUrl in app.bsky.feed.post. Big and wordy, I know, apologies. Still, they're rolling out now. I'm not backfilling all actor profiles, but all posts should get them going forward, and actor profiles as people update them. Examples:

Profile

at://did:plc:bg5udl25mvzg3rt7l5n2hzet/app.bsky.actor.profile/self

https://api.bsky.app/xrpc/com.atproto.repo.getRecord?repo=snarfed.indieweb.social.ap.brid.gy&collection=app.bsky.actor.profile&rkey=self

{
  "$type": "app.bsky.actor.profile",
  "displayName": "Ryan Barrett",
  "description": "Follow me on the fediverse at @snarfed.org! This account is just for DMs, testing, etc. 😎\n\n[bridged from https://indieweb.social/@snarfed on the fediverse by https://fed.brid.gy/ ]",
  "bridgyOriginalDescription": "<p>Follow me on the fediverse at <span class=\"h-card\" translate=\"no\"><a href=\"https://fed.brid.gy/r/https://snarfed.org/about\" class=\"u-url mention\">@<span>snarfed.org</span></a></span>! This account is just for DMs, testing, etc. 😎</p>",
  "bridgyOriginalUrl": "https://indieweb.social/@snarfed",
  "..."
}

Post

at://did:plc:bg5udl25mvzg3rt7l5n2hzet/app.bsky.feed.post/3kur6sp777oe2

https://api.bsky.app/xrpc/com.atproto.repo.getRecord?repo=snarfed.indieweb.social.ap.brid.gy&collection=app.bsky.feed.post&rkey=3kur6sp777oe2

{
  "$type" : "app.bsky.feed.post",
  "text" : "here's an original post with\nsome\nwhitespace\n\n#ok",
  "createdAt" : "2024-06-12T22:40:00.000Z",
  "bridgyOriginalText" : "<p>here&#39;s an original post with<br />  some<br />   whitespace</p><p><a href=\"https://indieweb.social/tags/ok\" class=\"mention hashtag\" rel=\"tag\">#<span>ok</span></a></p>",
  "bridgyOriginalUrl" : "https://indieweb.social/@snarfed/112606052401093326",
  "..."
}

@snarfed snarfed closed this as completed Jun 12, 2024
@mackuba
Copy link

mackuba commented Jun 13, 2024

Awesome! Sounds good to me 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants