Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve interaction with search engines: add noindex to some routes and add structured data #559

Merged
merged 7 commits into from
Oct 10, 2022

Conversation

LukasKalbertodt
Copy link
Member

@LukasKalbertodt LukasKalbertodt commented Oct 5, 2022

This PR generally improves how search engines like Google understand Tobira. There are two main parts this PR does:

  • Routes that we don't want to show up as search results are properly marked with noindex. This includes direct video/series links, /~manage/* routes and some more.

  • Add structured data for videos and thumbnails to let search engines better understand the content.

See the commits for more information. I wasn't able to 100% test all these things yet. I will have to see how Google indexes our test deployment after merging.

There is obviously a lot more to be done. I opened an issue to track some related things: #560

Fixes #549

In the next commit, the frontend will insert appropriate noindex meta
tags, but for search bots that do not execute JS, it's important to
also exclude these routes from indexing via backend. We exclude
everything starting with these prefixes:

- /~graphiql
- /~manage
- /~search
- /!
These tags are added in the frontend. Google (and some other search
bots) execute JavaScript and respect these tags added by JS. All the
routes excluded here already have a `x-robots-tag: noindex` header from
the backend. This is just an additional protection in case search
crawlers actually work like browsers and navigate via JS without sending
another HTTP request to the backend.
This makes search engines (among other bots) better understand the page
structure. Hopefully, Google will show the proper breadcrumbs in the
result with this.
This should improve finding videos via Google or other search engines.
Only video pages (not the video block, for example) emit this structured
data: it's a safe start as I'm not sure how Google would display the
result if there are multiple videos on the page. And in any case, we
prefer people visiting the dedicated video page.
@github-actions
Copy link

github-actions bot commented Oct 5, 2022

🚀 This PR was deployed at https://pr559.tobira.opencast.org. The deployment will be updated whenever someone pushes onto this PR's branch.

Copy link
Member

@JulianKniephoff JulianKniephoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should direct series/video links really be excluded unconditionally? For unlisted entities sure, but in general? If such a link is shared on some third party website and Google follows it, why should it not index it? 🤷

@LukasKalbertodt
Copy link
Member Author

I think that's the main requirement behind #549. People don't want those things to appear on Google even if they are shared on a third party website.

@JulianKniephoff
Copy link
Member

That also talks about hidden videos/series, but you can have direct links to mounted videos/series as well. But you brought up a good argument in chat: Those have (a) better link(s) for sharing. So 🤷

backend/src/http/handlers.rs Outdated Show resolved Hide resolved
frontend/src/routes/Video.tsx Show resolved Hide resolved
frontend/src/util/index.ts Outdated Show resolved Hide resolved
frontend/src/util/index.ts Outdated Show resolved Hide resolved
It's safer to not index stuff by default than the other way around.
The uploader can't be seen unless logged in and the login page is also
not really useful as result.
@JulianKniephoff JulianKniephoff merged commit 8c5b47a into elan-ev:master Oct 10, 2022
@LukasKalbertodt LukasKalbertodt added the changelog:admin Changes primarily for admins label Oct 12, 2022
@oas777
Copy link
Collaborator

oas777 commented Dec 8, 2022

Not sure whether this is the right ticket to add this (from a discussion we had a couple of months ago): ETH received reports from Google about Opencast videos embedded in their website being not properly indexed. https://support.google.com/webmasters/answer/9495631?utm_source=wnc_20211735&utm_medium=gamma&utm_campaign=wnc_20211735&utm_content=msg_110558249&hl=en describes how to optimize video publication to increase visibility for those videos. Maybe contrary to the "noindex" header of this ticket, but anyway...

@LukasKalbertodt
Copy link
Member Author

@oas777 I cross posted into #560 which tracks all further things related to SEO. I think Tobira should already do better in the regard you mentioned. Once it is deployed at ETH, we should take a look at the google webmaster report again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog:admin Changes primarily for admins
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make sure hidden pages are excluded from search engine results (e.g. Google)
3 participants