Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate a split sitemap (also fix robots.txt) #4639

Merged
merged 3 commits into from
Apr 14, 2023
Merged

Commits on Apr 14, 2023

  1. Generate a split sitemap

    `sitemap.xml.gz` remains identical.
    
    `sitemap-index.xml` can be used instead as an index file, which will
    link to `sitemap1.xml.gz`, `sitemap2.xml.gz`, ...
    
    The default index size is 2000 which also considers the max file size
    to remain under Google's limit. (50k)
    reebalazs committed Apr 14, 2023
    Configuration menu
    Copy the full SHA
    9953c93 View commit details
    Browse the repository at this point in the history
  2. Remove the content encoding from the sitemap.xml.gz

    Although it's all the same for Google with or without this, it's more
    correct not to add the content encoding gzip header, as we just want
    to transfer the gzipped file as a binary and not consider the browser
    to decode it when downloaded.
    
    (The other option would be to leave the content encoding header but then
    just call the file as `.xml` without the `.gz` ending. That would
    however only result in larger file sizes when saved and would give no
    extra benefit. It would also lead to non-compatible changes.)
    reebalazs committed Apr 14, 2023
    Configuration menu
    Copy the full SHA
    ad71c08 View commit details
    Browse the repository at this point in the history
  3. Fix robots.txt to contain a public link

    Replace http://backend from the robots.txt provided by the backend with
    the public facing url.
    
    Also, publish the index file instead of the single file that would be
    rejected by Google.
    reebalazs committed Apr 14, 2023
    Configuration menu
    Copy the full SHA
    b18e3aa View commit details
    Browse the repository at this point in the history