-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hashtags with extended alphabet characters aren't recognized as hashtags, AP=>Bluesky #1131
Comments
Huh, this turned out to be more interesting than I though. Mastodon's AS2 JSON for this post removes the umlauts from those characters in the ...but the AS2 Interestingly, if you click on the #Äänestäminen hashtag chip in the UI, it goes to the hashtag page, https://mementomori.social/tags/%C3%84%C3%A4nest%C3%A4minen , which has the umlauts, but they're only for show, evidently they're not in the underlying hashtag index. If you remove them from that URL to get https://mementomori.social/tags/Aanestaminen , it renders the hashtag without them but shows the same results. {
"type" : "Note",
"id" : "https://mementomori.social/users/rolle/statuses/112586679114646311",
"url" : "https://mementomori.social/@rolle/112586679114646311",
"attributedTo" : "https://mementomori.social/users/rolle",
"content" : "<p>Muista käydä äänestämässä! Klo 20 asti aikaa. On tyhmää olla vaikuttamatta, kun siihen demokratiassa on mahdollisuus. Kaikille maailmassa ei tällaista suoda.</p><p><a href=\"https://mementomori.social/tags/Eurovaalit2024\" class=\"mention hashtag\" rel=\"tag\">#<span>Eurovaalit2024</span></a> <a href=\"https://mementomori.social/tags/Eurovaalit\" class=\"mention hashtag\" rel=\"tag\">#<span>Eurovaalit</span></a> <a href=\"https://mementomori.social/tags/%C3%84%C3%A4nest%C3%A4minen\" class=\"mention hashtag\" rel=\"tag\">#<span>Äänestäminen</span></a> <a href=\"https://mementomori.social/tags/Politiikka\" class=\"mention hashtag\" rel=\"tag\">#<span>Politiikka</span></a></p>",
"tag" : [
{
"href" : "https://mementomori.social/tags/eurovaalit2024",
"name" : "#eurovaalit2024",
"type" : "Hashtag"
},
{
"href" : "https://mementomori.social/tags/eurovaalit",
"name" : "#eurovaalit",
"type" : "Hashtag"
},
{
"href" : "https://mementomori.social/tags/aanestaminen",
"name" : "#aanestaminen",
"type" : "Hashtag"
},
{
"href" : "https://mementomori.social/tags/politiikka",
"name" : "#politiikka",
"type" : "Hashtag"
}
]
} |
I actually like this, it seems clever and a good UX idea, but it's definitely more difficult to translate. Bluesky uses index-based facets for hashtags and other rich text, but Mastodon's AS2 I could do something Mastodon-specific and parse |
More details on Mastodon's behavior here in mastodon/mastodon#26518 . No response from their team though. |
FYI, it looks like this is is fixed in Iceshrimp https://bsky.app/profile/AlderForrest.1m2lab.anvil.top.ap.brid.gy/post/3l25re3eiu7c2 as the hashtag #härkis is working. https://1m2lab.anvil.top/ |
@MS-potilas nice! Or maybe it always worked in Iceshrimp? Here are the key parts of the AS2 for that post: "content": "<p><span>h\u00e4rkisdolmiospagettikastike. Ehdottomasti jatkoon!<br><br></span><a href=\"https://1m2lab.anvil.top/tags/h\u00e4rkis\" rel=\"tag\">#h\u00e4rkis</a></p>",
"tag": [{
"type": "Hashtag",
"href": "https://1m2lab.anvil.top/tags/h%C3%A4rkis",
"name": "#h\u00e4rkis"
}] Unlike Mastodon, Iceshrimp preserves the |
Ah, I thought Iceshrimp is a Mastodon fork, but it is a Misskey fork, so maybe it did work from the beginning. |
What if we searched content with umlauts removed to get the indices, those indices will work also with the original content with umlauts. Simpler than parsing the content tags etc. This of course only in Mastodon. Just a thought. |
Sadly Bluesky facet indices are bytes, not characters/graphemes, so they won't match. Eg |
I guess the solution here would be to use the Here's another broken example that you can use for testing:
Corresponding hashtag searches:
Testing on whether conversion must be done could be either querying the Fediverse node software, or splitting the last element off the href and comparing it to the name. If they're identical, no conversion needs to be done. |
%-encoding like that is for URLs. Here, we have the un-encoded Unicode text. |
AP Hashtags containing extended alphabet characters, like ä (a with dots) and ö (o with dots), aren't recognized as hashtags. They show as text in Bluesky.
Example:
https://mementomori.social/@rolle/112586679114646311
https://bsky.app/profile/rolle.mementomori.social.ap.brid.gy/post/3kuikyelvzdc2
Here #Äänestäminen was not recognized as hashtag,
The text was updated successfully, but these errors were encountered: