Skip to content

Commit

Permalink
[feature] Federate status language in and out (#2366)
Browse files Browse the repository at this point in the history
* [feature] Federate status language in + out

* go fmt

* tests, little fix

* improve comments

* unnest a bit

* avoid unnecessary nil check

* use more descriptive variable for contentMap

* prefer instance languages when selecting from contentMap

* update docs to reflect lang selection

* rename rdfLangString -> rdfLangs

* update comments to mention Pollable

* iter through slice instead of map
  • Loading branch information
tsmethurst committed Nov 21, 2023
1 parent 1f96237 commit cfefbc0
Show file tree
Hide file tree
Showing 15 changed files with 756 additions and 166 deletions.
61 changes: 61 additions & 0 deletions docs/federation/federating_with_gotosocial.md
Expand Up @@ -482,3 +482,64 @@ For the convenience of remote servers, GoToSocial will always provide both the `
GoToSocial tries to parse incoming Mentions in the same way it sends them out: as a `Mention` type entry in the `tag` property. However, when parsing incoming Mentions it's a bit more relaxed with regards to which properties must be set.

GoToSocial will prefer the `href` property, which can be either the ActivityPub ID/URI or the web URL of the target; if `href` is not present, it will fall back to using the `name` property. If neither property is present, the mention will be considered invalid and discarded.

## Content, ContentMap, and Language

In line with other ActivityPub implementations, GoToSocial uses `content` and `contentMap` fields on `Objects` to infer content and language of incoming posts, and to set content and language on outgoing posts.

### Outgoing

If an outgoing `Object` (usually a `Note`) has content, it will be set as stringified HTML on the `content` field.

If the `content` is in a specific user-selected language, then the `Object` will also have the `contentMap` property set to a single-entry key/value map, where the key is a BCP47 language tag, and the value is the same content from the `content` field.

For example, a post written in English (`en`) will look something like this:

```json
{
"@context": "https://www.w3.org/ns/activitystreams",
"type": "Note",
"attributedTo": "http://example.org/users/i_p_freely",
"to": "https://www.w3.org/ns/activitystreams#Public",
"cc": "http://example.org/users/i_p_freely/followers",
"id": "http://example.org/users/i_p_freely/statuses/01FF25D5Q0DH7CHD57CTRS6WK0",
"url": "http://example.org/@i_p_freely/statuses/01FF25D5Q0DH7CHD57CTRS6WK0",
"published": "2021-11-20T13:32:16Z",
"content": "<p>This is an example note.</p>",
"contentMap": {
"en": "<p>This is an example note.</p>"
},
"attachment": [],
"replies": {...},
"sensitive": false,
"summary": "",
"tag": {...}
}
```

GoToSocial will always set the `content` field if the post has content, but it may not always set the `contentMap` field, if an old version of GoToSocial is in use, or the language used by a user is not set or not a recognized BCP47 language tag.

### Incoming

GoToSocial uses both the `content` and the `contentMap` properties on incoming `Object`s to determine the content and infer the intended "primary" language for that content. It uses the following algorithm:

#### Only `content` is set

Take that content only and mark language as unknown.

#### Both `content` and `contentMap` are set

Look for a language tag as key in the `contentMap`, with a value that matches the stringified HTML set in `content`.

If a match is found, use this as the post's language.

If a match is not found, keep content from `content` and mark language as unknown.

#### Only `contentMap` is set

If `contentMap` has only one entry, take the language tag and content value as the "primary" language and content.

If `contentMap` has multiple entries, we have no way of determining the intended preferred content and language of the post, since map order is not deterministic. In this case, try to pick a language and content entry that matches one of the languages configured in the GoToSocial instance's [configured languages](../configuration/instance.md). If no language can be matched this way, pick a language and content entry from the `contentMap` at random as the "primary" language and content.

!!! Note
In all of the above cases, if the inferred language cannot be parsed as a valid BCP47 language tag, language will fall back to unknown.
6 changes: 6 additions & 0 deletions internal/ap/ap_test.go
Expand Up @@ -93,6 +93,12 @@ func noteWithMentions1() vocab.ActivityStreamsNote {

content := streams.NewActivityStreamsContentProperty()
content.AppendXMLSchemaString("hey @f0x and @dumpsterqueer")

rdfLangString := make(map[string]string)
rdfLangString["en"] = "hey @f0x and @dumpsterqueer"
rdfLangString["fr"] = "bonjour @f0x et @dumpsterqueer"
content.AppendRDFLangString(rdfLangString)

note.SetActivityStreamsContent(content)

return note
Expand Down
37 changes: 22 additions & 15 deletions internal/ap/extract.go
Expand Up @@ -631,27 +631,34 @@ func ExtractPublicKey(i WithPublicKey) (
return nil, nil, nil, gtserror.New("couldn't find public key")
}

// ExtractContent returns a string representation of the
// given interface's Content property, or an empty string
// if no Content is found.
func ExtractContent(i WithContent) string {
contentProperty := i.GetActivityStreamsContent()
if contentProperty == nil {
return ""
// ExtractContent returns an intermediary representation of
// the given interface's Content and/or ContentMap property.
func ExtractContent(i WithContent) gtsmodel.Content {
content := gtsmodel.Content{}

contentProp := i.GetActivityStreamsContent()
if contentProp == nil {
// No content at all.
return content
}

for iter := contentProperty.Begin(); iter != contentProperty.End(); iter = iter.Next() {
for iter := contentProp.Begin(); iter != contentProp.End(); iter = iter.Next() {
switch {
// Content may be parsed as IRI, depending on
// how it's formatted, so account for this.
case iter.IsXMLSchemaString():
return iter.GetXMLSchemaString()
case iter.IsIRI():
return iter.GetIRI().String()
case iter.IsRDFLangString() &&
len(content.ContentMap) == 0:
content.ContentMap = iter.GetRDFLangString()

case iter.IsXMLSchemaString() &&
content.Content == "":
content.Content = iter.GetXMLSchemaString()

case iter.IsIRI() &&
content.Content == "":
content.Content = iter.GetIRI().String()
}
}

return ""
return content
}

// ExtractAttachments attempts to extract barebones MediaAttachment objects from given AS interface type.
Expand Down
5 changes: 3 additions & 2 deletions internal/ap/extractcontent_test.go
Expand Up @@ -30,10 +30,11 @@ type ExtractContentTestSuite struct {

func (suite *ExtractContentTestSuite) TestExtractContent1() {
note := suite.noteWithMentions1

content := ap.ExtractContent(note)

suite.Equal("hey @f0x and @dumpsterqueer", content)
suite.Equal("hey @f0x and @dumpsterqueer", content.Content)
suite.Equal("bonjour @f0x et @dumpsterqueer", content.ContentMap["fr"])
suite.Equal("hey @f0x and @dumpsterqueer", content.ContentMap["en"])
}

func TestExtractContentTestSuite(t *testing.T) {
Expand Down

0 comments on commit cfefbc0

Please sign in to comment.