-
-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Content is not indexed despite opt-in #22452
Comments
I believe that the major search engines are still "coming to terms" with Mastodon - I know "in the olden days" Google had full access to the "Twitter Firehose" which significantly bumped Twitter up in the SERs. Looking at my https://mastodon.org.uk/@rbairwell profile in private mode, I can see that the page is actually a Javascript React.js app page which does - still - cause difficulties for search engines as they have to run a Javascript-interpreting browser (such as headless Chromium) to index the site which is slower/handles lower throughput than a basic HTML parser. I would say just give the search engines time - it may take a few months before they've fully updated their systems to "properly handle" Mastodon servers. |
Fortunately the provided meta tags provide a way for at least partial parsing without browser emulation. This is proven by the numerous toots already indexed (see the google search I linked). In the meantime I diffed the static contents of an indexed link against an unindexed one, and see no significant differences, so it might really be some prioritization delaying the indexing the less popular accounts. |
Indexing posts on Mastodon 4.0.2 is definitely possible by Google. @v-p-b My recommendation would be signing up for Google Search Console and verifying your domain: https://search.google.com/search-console/about Once you do that, you should get actionable data about the indexing status of your content (in a day or two). EDIT: Ah, you're not the instance admin. Have you contacted the admins of infosec.exchange about this? |
@ineffyble no I haven't yet, this will be my next step and post any updates here! Thanks for the feedback! |
Is your profile listed in the profile directory? Before Mastodon 4, the logged-out profile directory served as a sort of sitemap, because it was easy to crawl. See https://respublicae.eu/explore as an example. https://infosec.exchange is running on v4.1.0rc1+glitch, so I'm not sure how your account would be discovered by search engines. The example post also doesn't contain any real hashtag, so that's another way it can't be discovered. |
I set up a site specifically to provide in-links for my Mastodon posts directly (https://web.archive.org/web/20221219231050/https://infosex.exchange/) , and made Google index that domain via Search Console. It didn't improve much (I converted the domain to Akkoma, that supports search by default since then). My instance admin also registered infosec.exchange to Search Console as advised by @ineffyble, but there is still only a couple of my posts indexed.
The point would be not relying on hashtags, but proper full-text search. |
Il 22/01/23 13:17, buherator ha scritto:
The point would be not relying on hashtags, but proper full-text search.
Do search engines agree with this recommendation for discovery?
|
I'm not sure I follow. Search engines tend to be able to find stuff without hashtags. There are also examples of Google indexing Mastodon posts (with or without hashtags). There is a possibility that the root of the problem is that search engines don't currently have a way to discover new posts (+ time between crawls makes them miss posts). In this case I'm interested in what my instance admin could do to make such a feature available for users who opt-in. |
Search engines are increasing their crawl of major instances. I think the problem is - Mastodon doesn't adhere to the expected behavior with these options. There are still lots of parts of mastodon that are noindex even with that set (looking at the code, it doesn't even look to see if these parameters are set) |
There is lots of code that is set to noindex in the js/jsx files like
When i think many admins expect it to be more like
but i'm not even sure this achieves much is isIndexable may not be the public timeline options but rather the unfederated property of the local instance where if a post is public, can't be guaranteed anywhere else. I spoke to Gargron about this on slack a while ago when i noticed my landing pages were having bad SERPS and Claire even asked me about it as it wasn't using my meta descriptions. When i removed some of the |
Hey all, what's the consensus here? I've been using mastodon for over a year now on |
Steps to reproduce the problem
Expected behaviour
Posts are indexed by search engines
Actual behaviour
Posts are (mostly) not indexed by search engines
Detailed description
I really want my posts to be discoverable, and while I know about Mastodon's stance on search inside the platform, having search giants do the work seems like a good option.
I opted-in for search engine indexing the minute I moved to my current instance (I'm @buherator@infosec.exchange), and I even created a website that points to my posts, fed it to Google for scraping via Search Console.
Still, I can't look up my content in Google (or Bing for that matter).
Now I see the chance, that it's just Google not handling Mastodon posts correctly, as I can see that on post URL's content is only presented in META tags, and visible content is dynamically generated with JavaScipt, so this is something that pbbly needs to be handled at the search engine's side. However, some content from my instance is indexed by Google, see https://www.google.com/search?q=site%3Ainfosec.exchange+twitter .
This a post of mine:
https://infosec.exchange/@buherator/109535230739398168
I can see no noindex attributes here, but I'm not sure if any other tags can scare crawlers away.
As of today this is the robots.txt on my instance:
As far as I can tell this doesn't affect my posts visibility either.
Any advice on how I can debug this further would be appreciated!
Specifications
Mastodon v4.0.2+glitch
The text was updated successfully, but these errors were encountered: