-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve interaction with search engines: add noindex
to some routes and add structured data
#559
Conversation
In the next commit, the frontend will insert appropriate noindex meta tags, but for search bots that do not execute JS, it's important to also exclude these routes from indexing via backend. We exclude everything starting with these prefixes: - /~graphiql - /~manage - /~search - /!
These tags are added in the frontend. Google (and some other search bots) execute JavaScript and respect these tags added by JS. All the routes excluded here already have a `x-robots-tag: noindex` header from the backend. This is just an additional protection in case search crawlers actually work like browsers and navigate via JS without sending another HTTP request to the backend.
See comment in code.
This makes search engines (among other bots) better understand the page structure. Hopefully, Google will show the proper breadcrumbs in the result with this.
This should improve finding videos via Google or other search engines. Only video pages (not the video block, for example) emit this structured data: it's a safe start as I'm not sure how Google would display the result if there are multiple videos on the page. And in any case, we prefer people visiting the dedicated video page.
🚀 This PR was deployed at https://pr559.tobira.opencast.org. The deployment will be updated whenever someone pushes onto this PR's branch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should direct series/video links really be excluded unconditionally? For unlisted entities sure, but in general? If such a link is shared on some third party website and Google follows it, why should it not index it? 🤷
I think that's the main requirement behind #549. People don't want those things to appear on Google even if they are shared on a third party website. |
That also talks about hidden videos/series, but you can have direct links to mounted videos/series as well. But you brought up a good argument in chat: Those have (a) better link(s) for sharing. So 🤷 |
It's safer to not index stuff by default than the other way around. The uploader can't be seen unless logged in and the login page is also not really useful as result.
Not sure whether this is the right ticket to add this (from a discussion we had a couple of months ago): ETH received reports from Google about Opencast videos embedded in their website being not properly indexed. https://support.google.com/webmasters/answer/9495631?utm_source=wnc_20211735&utm_medium=gamma&utm_campaign=wnc_20211735&utm_content=msg_110558249&hl=en describes how to optimize video publication to increase visibility for those videos. Maybe contrary to the "noindex" header of this ticket, but anyway... |
This PR generally improves how search engines like Google understand Tobira. There are two main parts this PR does:
Routes that we don't want to show up as search results are properly marked with
noindex
. This includes direct video/series links,/~manage/*
routes and some more.Add structured data for videos and thumbnails to let search engines better understand the content.
See the commits for more information. I wasn't able to 100% test all these things yet. I will have to see how Google indexes our test deployment after merging.
There is obviously a lot more to be done. I opened an issue to track some related things: #560
Fixes #549