Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] Federate status language in and out #2366

Merged
merged 13 commits into from Nov 21, 2023
61 changes: 61 additions & 0 deletions docs/federation/federating_with_gotosocial.md
Expand Up @@ -482,3 +482,64 @@ For the convenience of remote servers, GoToSocial will always provide both the `
GoToSocial tries to parse incoming Mentions in the same way it sends them out: as a `Mention` type entry in the `tag` property. However, when parsing incoming Mentions it's a bit more relaxed with regards to which properties must be set.

GoToSocial will prefer the `href` property, which can be either the ActivityPub ID/URI or the web URL of the target; if `href` is not present, it will fall back to using the `name` property. If neither property is present, the mention will be considered invalid and discarded.

## Content, ContentMap, and Language
tsmethurst marked this conversation as resolved.
Show resolved Hide resolved

In line with other ActivityPub implementations, GoToSocial uses `content` and `contentMap` fields on `Objects` to infer content and language of incoming posts, and to set content and language on outgoing posts.

### Outgoing

If an outgoing `Object` (usually a `Note`) has content, it will be set as stringified HTML on the `content` field.

If the `content` is in a specific user-selected language, then the `Object` will also have the `contentMap` property set to a single-entry key/value map, where the key is a BCP47 language tag, and the value is the same content from the `content` field.

For example, a post written in English (`en`) will look something like this:

```json
{
"@context": "https://www.w3.org/ns/activitystreams",
"type": "Note",
"attributedTo": "http://example.org/users/i_p_freely",
"to": "https://www.w3.org/ns/activitystreams#Public",
"cc": "http://example.org/users/i_p_freely/followers",
"id": "http://example.org/users/i_p_freely/statuses/01FF25D5Q0DH7CHD57CTRS6WK0",
"url": "http://example.org/@i_p_freely/statuses/01FF25D5Q0DH7CHD57CTRS6WK0",
"published": "2021-11-20T13:32:16Z",
"content": "<p>This is an example note.</p>",
"contentMap": {
"en": "<p>This is an example note.</p>"
},
"attachment": [],
"replies": {...},
"sensitive": false,
"summary": "",
"tag": {...}
}
```

GoToSocial will always set the `content` field if the post has content, but it may not always set the `contentMap` field, if an old version of GoToSocial is in use, or the language used by a user is not set or not a recognized BCP47 language tag.

### Incoming

GoToSocial uses both the `content` and the `contentMap` properties on incoming `Object`s to determine the content and infer the intended "primary" language for that content. It uses the following algorithm:

#### Only `content` is set

Take that content only and mark language as unknown.

#### Both `content` and `contentMap` are set

Look for a language tag as key in the `contentMap`, with a value that matches the stringified HTML set in `content`.

If a match is found, use this as the post's language.

If a match is not found, keep content from `content` and mark language as unknown.

#### Only `contentMap` is set

If `contentMap` has only one entry, take the language tag and content value as the "primary" language and content.

If `contentMap` has multiple entries, we have no way of determining the intended preferred content and language of the post, since map order is not deterministic. In this case, try to pick a language and content entry that matches one of the languages configured in the GoToSocial instance's [configured languages](../configuration/instance.md). If no language can be matched this way, pick a language and content entry from the `contentMap` at random as the "primary" language and content.

!!! Note
In all of the above cases, if the inferred language cannot be parsed as a valid BCP47 language tag, language will fall back to unknown.
6 changes: 6 additions & 0 deletions internal/ap/ap_test.go
Expand Up @@ -93,6 +93,12 @@ func noteWithMentions1() vocab.ActivityStreamsNote {

content := streams.NewActivityStreamsContentProperty()
content.AppendXMLSchemaString("hey @f0x and @dumpsterqueer")

rdfLangString := make(map[string]string)
rdfLangString["en"] = "hey @f0x and @dumpsterqueer"
rdfLangString["fr"] = "bonjour @f0x et @dumpsterqueer"
content.AppendRDFLangString(rdfLangString)

note.SetActivityStreamsContent(content)

return note
Expand Down
37 changes: 22 additions & 15 deletions internal/ap/extract.go
Expand Up @@ -631,27 +631,34 @@ func ExtractPublicKey(i WithPublicKey) (
return nil, nil, nil, gtserror.New("couldn't find public key")
}

// ExtractContent returns a string representation of the
// given interface's Content property, or an empty string
// if no Content is found.
func ExtractContent(i WithContent) string {
contentProperty := i.GetActivityStreamsContent()
if contentProperty == nil {
return ""
// ExtractContent returns an intermediary representation of
// the given interface's Content and/or ContentMap property.
func ExtractContent(i WithContent) gtsmodel.Content {
content := gtsmodel.Content{}

contentProp := i.GetActivityStreamsContent()
if contentProp == nil {
// No content at all.
return content
}

for iter := contentProperty.Begin(); iter != contentProperty.End(); iter = iter.Next() {
for iter := contentProp.Begin(); iter != contentProp.End(); iter = iter.Next() {
switch {
// Content may be parsed as IRI, depending on
// how it's formatted, so account for this.
case iter.IsXMLSchemaString():
return iter.GetXMLSchemaString()
case iter.IsIRI():
return iter.GetIRI().String()
case iter.IsRDFLangString() &&
len(content.ContentMap) == 0:
content.ContentMap = iter.GetRDFLangString()
NyaaaWhatsUpDoc marked this conversation as resolved.
Show resolved Hide resolved

case iter.IsXMLSchemaString() &&
content.Content == "":
content.Content = iter.GetXMLSchemaString()

case iter.IsIRI() &&
content.Content == "":
content.Content = iter.GetIRI().String()
}
}

return ""
return content
}

// ExtractAttachments attempts to extract barebones MediaAttachment objects from given AS interface type.
Expand Down
5 changes: 3 additions & 2 deletions internal/ap/extractcontent_test.go
Expand Up @@ -30,10 +30,11 @@ type ExtractContentTestSuite struct {

func (suite *ExtractContentTestSuite) TestExtractContent1() {
note := suite.noteWithMentions1

content := ap.ExtractContent(note)

suite.Equal("hey @f0x and @dumpsterqueer", content)
suite.Equal("hey @f0x and @dumpsterqueer", content.Content)
suite.Equal("bonjour @f0x et @dumpsterqueer", content.ContentMap["fr"])
suite.Equal("hey @f0x and @dumpsterqueer", content.ContentMap["en"])
}

func TestExtractContentTestSuite(t *testing.T) {
Expand Down