Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not redirect users/bots to localized content based on header information #11538

Closed
slightlyoffbeat opened this issue Apr 25, 2022 · 11 comments
Assignees
Labels
P2 Second level priority - Should have SEO

Comments

@slightlyoffbeat
Copy link
Collaborator

slightlyoffbeat commented Apr 25, 2022

Description

Googlebot does not use the language header, and specifically asks that websites do not redirect visitors to localized content based on header information. Currently, we redirect Googlebot from any unlocalized link that it crawls to en-US content, which creates a strong indexing bias towards that locale.

Original proposal

Proposed Solution:

GET https://www.mozilla.org/ :

When the user does not declare a language, or declares a language that we do not support:
Serve the “Choose your language” page as the root domain response.
Redirect the URL …/locales/ to https://www.mozilla.org/ and remove the Languages link from the footer.

When the user declares a language that we support:
Redirect them, with a 302, to the appropriate page.

GET other unlocalized URLs:

If the user does not declare a language, or if there is no version of that page for that language:
Serve a custom 404 that offers links to all of the localized versions of that page.
(Therefore if Googlebot crawls an unlocalized URL, it gets a 404, and that non-existent URL does not appear in the index)

When the user declares a language that we support:
Redirect them, with a 302, to the appropriate page.


Success Criteria

@slightlyoffbeat slightlyoffbeat added P2 Second level priority - Should have SEO labels Apr 25, 2022
@slightlyoffbeat
Copy link
Collaborator Author

After speaking to Pmac, working team for this is @robhudson and Adria. Adria will reach out and set up time to discuss and onboard.

Lets keep Pmac and myself looped in to proposals and work.

@alexgibson
Copy link
Member

If the /locales/ page is something we might redirect users to, then we might also consider fixing #6454 to make the list of locales there more complete / configurable. Right now the template is just hardcoded, and is likely already incomplete.

@robhudson
Copy link
Member

I've re-written the above logic with some extra edge cases and noting where the proposed changes differ with current behavior. Can you let me know if this looks correct?

1. Requests with NO accept-language header (primarly for the Google bot):
    a. Is the URL a non-locale-prefixed page? (e.g. /credits, /robots.txt)
        - YES: render it
        - NO: pass through
    b. Is the requested URL prefixed with a locale?
        - YES: pass through
        - NO: return 404 with supported locales for path ->
    c. Are there translations for this locale and URL?
        - YES: render it
        - NO: return 404 with supported locales for path ->  
2. Requests with accept-language header (people using browsers):
    a. Is the URL a non-locale-prefixed page? (e.g. /credits, /robots.txt)
        - YES: render it
        - NO: pass through
    b. Is the requested URL prefixed with a locale?
        - YES, user requested /{locale}/{path}/: 
            i. Are there translations for this locale and URL?
                - YES: render it
                - NO: determine best match based on language header and available translations:
                    - if any, redirect to best matching locale ->
                    - if none, return 404 with supported locales for path ->
                        - Note: We currently redirect to first available translation
        - NO, user requested /{path}/:
            i. Is there a matching locale based on available translations and user's language header?
                - YES: redirect to /{locale}/{path}/ where locale is best match ->
                - NO: return 404 with supported locales for path ->
                    - Note: We currently redirect to first available translation

I'll point out the "no match" case where a user (not a bot) requested a URL but there are no available translations that match the user's accept-language preferences. Currently we redirect to the first defined translation, but perhaps it is better to 404 with the page showing all available translations for the user to choose?

I'm also not sure exactly what is meant by "remove the Languages link from the footer" but perhaps we can land these changes first, then circle back around and make the languages link and select box list of languages a bit of a nicer end-user experience?

@slightlyoffbeat
Copy link
Collaborator Author

A quick note on the footer language link: I agree that I'd prefer to hold on footer changes for now.

This looks correct to me. I'd like to tag @pmac for an additional set of eyes on this.

Can we have analytics in place to see how many people hit a 404 for 2b, and perhaps even with info on their language header? It would be good to monitor.

@pmac
Copy link
Member

pmac commented Apr 29, 2022

Rob and I have been discussing a lot. We're both curious about you and Adria's thoughts on the UX impact of the 404s that we would show to real users. We could choose to not change that behavior for now, but it does seem like the right thing to do if we really don't have any match for their accept-language header values. I think that case should be a small number, but good call on making sure we can measure that impact.

@a-kyne
Copy link

a-kyne commented May 19, 2022

Where are we with this? Is there a blocker?

@pmac
Copy link
Member

pmac commented May 19, 2022

Making progress. @robhudson is working on it now. No blockers that I know of. It is a major change to how the site works though, so it's a lot of careful work.

@robhudson
Copy link
Member

The work in PR associated with this issue may also satisfy issue #9233 by providing a route to contribute on pages with incomplete translations.

@a-kyne
Copy link

a-kyne commented Oct 5, 2022

Hi, given that there are some business issues with the redirection solution that we're not sure how to resolve right now, let's put redirection aside.
However, rather than redirecting the root domain using the current localization, we will serve a "Choose your language" page at the root domain https://www.mozilla.org/ . The page will link to all of the translated home pages, and enable Googlebot to find home pages for each of the translated languages, rather than only for en-US.

@robhudson
Copy link
Member

Curious if we still need this issue open? If not, can we summarize here how things ended up? /cc @a-kyne

@a-kyne
Copy link

a-kyne commented Jun 16, 2023

I was unable to access the web server data that would tell us how many times users request unlocalized URLs other than the Privacy Policy, or indeed if there are any unlocalized URLs other than the home page that Google is requesting. Consequently I am still not sure if it would be safe to change our localization strategy for requests for non-Privacy Policy unlocalized URLs.

We have redirected the root domain, however, and that is probably enough to prevent major issues from developing. (current misdirected traffic is low, e.g. about 1k impressions and 125 clicks/month on en-US URLs for search queries containing فاير for example)

Let's close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 Second level priority - Should have SEO
Projects
None yet
Development

No branches or pull requests

5 participants