Search Engine Accessibility #6402

joysfera · 2019-01-06T21:54:22Z

Expected behavior

Google, Bing and others can index each public post in Friendica and later offer them in their search indexes by direct URL to the given post.

Actual behavior

It seems to me that Google always returns the /search page of Friendica.

I am very eager to get my posts and posts in our forums indexed by search engines. What can I do for that, please?

Steps to reproduce the problem

pick one of your posts with unique keywords that has been online long enough to get indexed
use google.com or bing.com and enter the keywords there, plus the site:domain filter (not sure if bing has it)

For comparison, Google+ (yes, I'm coming from dying Google+) posts are indexed properly, so if you search for say my name "Petr Stehlík" and some keywords you'll get the unique direct URLs to the posts in the form of https://plus.google.com/+PetrStehl%C3%ADk/posts/uniqueID

Friendica version you encountered the problem

2019.01rc

Friendica source (git, zip)

as currently on nerdica.net

PHP version

as currently on nerdica.net

SQL version

as currently on nerdica.net

MrPetovan · 2019-01-06T22:13:22Z

Yeah, currently there isn't any sensible SEO in Friendica, and the internal search itself is a mess, sorry about that.

joysfera · 2019-01-06T22:23:35Z

Any idea how to improve it? For example in Google+ each of the messages in the stream (that I suppose is what the crawler gets to see) contains the following snippet:

<a href="./+PetrStehlík/posts/G1oSQWvHhTH" class="eZ8gzf" jsaction="click:WZfesd(preventDefault=true);" jsname="hJglhd" jslog="14487; track:click" aria-label="Full post view"><span class="DPvwYc rRPL7d" aria-hidden="true"></span></a>

In Friendica the full post view direct URL link is hidden in a popup window (in Frio theme, at least) and looks as follows:

<li role="menuitem"><a title="odkaz na zdroj" href="redir/123093?url=display/a85d7459-115c-3262-e803-020860534242" class="navicon plink u-url"><i class="fa fa-external-link" aria-hidden="true"></i> odkaz na zdroj</a></li>

I have no idea if pulling the direct link out of the popup menu would help the crawlers. Actually I don't even know what the crawlers see so it's hard for me to suggest what to improve to help them index it better.

Any idea, please?

MrPetovan · 2019-01-06T22:28:11Z

In a popup window? This behavior doesn't sound familiar. And the redir only appears because you're logged in. Please try the search in a private browsing window without logging in.

Otherwise, there is a host of HTML metadata that we could provide to enable search engine crawlers, including sitemaps, page info, etc... but someone™ has to do it.

joysfera · 2019-01-06T22:34:10Z

In a private window it's the same:

<li class="dropdown open">
				<button type="button" class="btn-link dropdown-toggle" data-toggle="dropdown" id="dropdownMenuTools-4371740" aria-haspopup="true" aria-expanded="true"><i class="fa fa-angle-down" aria-hidden="true"></i></button>
				<ul class="dropdown-menu pull-right" role="menu" aria-labelledby="dropdownMenuTools-4371740">
    				<li role="menuitem">
						<a title="odkaz na zdroj" href="https://nerdica.net/display/a85d7459-105c-3245-8fef-ffb119770705" class="navicon plink u-url"><i class="fa fa-external-link" aria-hidden="true"></i> odkaz na zdroj</a>

This is the most crucial thing for me. I can work around bugs in navigation, can remember not to post what I cannot delete, but I selected Friendica over say MeWe.com as I believed my posts would be searchable and search engines would index it for me. I'd very much need this fixed. What can I do in order to help improve the SEO stuff, please? Just let the search engines index the posts, I don't need it super efficient and win some keyword war, nothing like that. Just to let them understand that these are single posts available under given URL.

MrPetovan · 2019-01-06T22:49:50Z

The snipped you copied is the top-left dropdown menu showing the original URL of the displayed post, which may be an external link if it was posted on a remote server first. Not sure if it has any impact on SEO though.

I understand your concern, but we don't have a resident SEO expert in the Friendica team, so everything that we may do will necessitate a lot of learning friction, and most of us would rather work on other stuff because it's more convenient.

joysfera · 2019-01-06T22:52:59Z

Definitely work on more convenient things. The development should and needs to be fun.
I'll look into the SEO stuff by myself. Though not only I am not a SEO expert in any way, I am also totally unfamiliar with Friendica internals or anything even remotely related to that. So if you have any useful hints please share them. Thanks.

joysfera · 2019-01-06T22:54:50Z

As for the HTML snippet - it contains the very URL the crawler needs to see and remember, that's why I was searching for it on the page and posting it here. Maybe it's was a nonsense idea in the first place, I don't know.

MrPetovan · 2019-01-07T00:15:32Z

Changing HTML templates is pretty straightforward, if you have specific improvements to suggest, I'd be happy to implement them.

joysfera · 2019-01-07T15:06:53Z

Good start is here: https://search.google.com/search-console/welcome
I'm just afraid that I'd need to run my instance of Friendica first in order to prove Google that I "own" the site, so first things first...

MrPetovan · 2019-01-07T15:25:44Z

If you're ready to do it, we can certainly help with that.

annando · 2019-01-07T16:21:49Z

Yeah, we are really happy with every person who contributes stuff!

tobiasd · 2019-01-08T06:46:41Z

Keep those in mind, who do not wish a good search ability and make any SEO measurements optional.

MrPetovan · 2019-01-08T14:21:06Z

This is nonsense, either these people should have all their post private or have a conservative robots.txt file. Everyone else should have their public posts correctly crawled by search engines.

annando · 2019-01-10T13:08:28Z

We already do have a default robots.txt mechanism (/mod/robots_txt). I suggest to have it configurable so that it is allowed to crawl the profiles and the local community, but not more. No search, no global community, no other page.

The other settings should be some: "Leave me alone" setting.

AFAIK all SEO improvements depend upon the robots.txt settings, so it should be no problem at all, improving the SEO stuff.

joysfera · 2019-07-28T08:20:21Z

So what I have found in the meantime: it's turned out that Friendica itself was OK. If you try searching for say "Petr Stehlík ploché konektory", or "Petr Stehlík ESP8266 z bláta do louže", you'll find perfectly indexed posts under the URL domain/display/MESSAGE_ID (the former on www.friendica.cz domain, the latter on www.libranet.de domain), and it works just great.
Unfortunately, when leaving Google+ I chose to use the www.nerdica.net hosting and there the web crawler indexing DOES NOT WORK: it either remembers the profile page URL (which is unusable because it's a stream of new posts so you don't find the one you search for) or it even indexed a date based page (as https://nerdica.net/profile/joy/2019-02-28/2019-02-01?page=0 ) but that is off, too.

It seems to be a configuration issue, right? Any idea what to search for? What could I ask the admin of www.nerdica.net to change or reconfigure, please?

annando · 2019-07-28T08:39:01Z

This can be configured in the robots.txt file. See here for details: https://support.google.com/webmasters/answer/6062596

joysfera · 2019-07-28T10:36:35Z

Hm, for comparison - libranet.de (that is indexed properly):

User-Agent: *
Disallow: /
User-Agent: Googlebot
Allow: /
User-Agent: Googlebot-Mobile
Allow: /
User-Agent: Bingbot
Allow: /
User-Agent: DuckDuckBot
Allow: /
User-Agent: yacybot
Allow: /
User-Agent: Archive.org_bot
Allow: /

and nerdica.net (that seems to be indexed improperly yet still some pages are in search engines' archives):

User-agent: *
Disallow: /settings/
Disallow: /admin/
Disallow: /message/
Disallow: /search
Disallow: /help
Disallow: /proxy

So libranet.de invites few good crawlers by "allowing" them to index everything while nerdica.net lists a bunch of paths that are not to be indexed and doesn't say anything about the rest of the web.

Since full access is the assumption and the explicit Allow thus can be omitted, the libranet's inviting robots.txt should not be different than nerdica.net's one. So if it cause of bad indexing is indeed the robots.txt then one of the paths that are listed on nerdica.net as disallowed is crucial for the proper message-ID indexing. However, none of the paths listed above seems to be relevant to message indexing, to me anyway.

If you disagree and feel like one of the paths could be causing search engine's indexing issues please tell me.

annando · 2019-07-28T11:38:11Z

I'm not a real expert in this stuff. And I must confess that I'm working more on the opposite, means: Rejecting access for search crawlers at all. This has the background that with article 17 of the copyright directive in the EU the responsibility for copyright violations had been changed. So we should do everything to avoid that copyrighted material is shared - but also that it cannot be found.

MrPetovan added UX Federation Enhancement labels Jan 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Search Engine Accessibility #6402

Search Engine Accessibility #6402

joysfera commented Jan 6, 2019

MrPetovan commented Jan 6, 2019

joysfera commented Jan 6, 2019

MrPetovan commented Jan 6, 2019

joysfera commented Jan 6, 2019

MrPetovan commented Jan 6, 2019

joysfera commented Jan 6, 2019

joysfera commented Jan 6, 2019

MrPetovan commented Jan 7, 2019

joysfera commented Jan 7, 2019

MrPetovan commented Jan 7, 2019

annando commented Jan 7, 2019

tobiasd commented Jan 8, 2019

MrPetovan commented Jan 8, 2019

annando commented Jan 10, 2019

joysfera commented Jul 28, 2019 •

edited

Loading

annando commented Jul 28, 2019

joysfera commented Jul 28, 2019 •

edited

Loading

annando commented Jul 28, 2019

Search Engine Accessibility #6402

Search Engine Accessibility #6402

Comments

joysfera commented Jan 6, 2019

Expected behavior

Actual behavior

Steps to reproduce the problem

Friendica version you encountered the problem

Friendica source (git, zip)

PHP version

SQL version

MrPetovan commented Jan 6, 2019

joysfera commented Jan 6, 2019

MrPetovan commented Jan 6, 2019

joysfera commented Jan 6, 2019

MrPetovan commented Jan 6, 2019

joysfera commented Jan 6, 2019

joysfera commented Jan 6, 2019

MrPetovan commented Jan 7, 2019

joysfera commented Jan 7, 2019

MrPetovan commented Jan 7, 2019

annando commented Jan 7, 2019

tobiasd commented Jan 8, 2019

MrPetovan commented Jan 8, 2019

annando commented Jan 10, 2019

joysfera commented Jul 28, 2019 • edited Loading

annando commented Jul 28, 2019

joysfera commented Jul 28, 2019 • edited Loading

annando commented Jul 28, 2019

joysfera commented Jul 28, 2019 •

edited

Loading

joysfera commented Jul 28, 2019 •

edited

Loading