Skip to content

Conversation

@ISSOtm
Copy link
Contributor

@ISSOtm ISSOtm commented Jul 30, 2021

Fixes #1491

Mergeable as-is, but there is optional work:

  • <lastmod> from .md file modification time?
  • Check that the generated file is under 50 MiB, as per the spec
  • Warnings if the site URL e.g. has parameters, since .join() overwrites them

I will work on them if maintainers think they're a good thing.

@avivace
Copy link

avivace commented Feb 13, 2022

Any news on this one?

@ISSOtm
Copy link
Contributor Author

ISSOtm commented Feb 15, 2022

Just rebased the PR.

@sjkim04
Copy link

sjkim04 commented Feb 21, 2022

The feature will be very helpful, also for me. Hope this will be merged!

@billy1624
Copy link

Looking forward to this feature! :D

@hervyqa
Copy link

hervyqa commented Apr 18, 2022

Any update? 🙂

@brettchalupa
Copy link
Contributor

Is there anything I can do to help get this merged?

@ehuss
Copy link
Contributor

ehuss commented Oct 15, 2022

Can you provide some more context on sitemaps in general? What benefits does it have? Shouldn't crawlers be able to crawl the summary index? I'm not very familiar with them, so if you could provide some general information, that would be helpful.

This looks like it needs some tests.

@hervyqa
Copy link

hervyqa commented Oct 16, 2022

search engines like google, yahoo, bing, etc.. usually use sitemap xml to crawl every page of the website. this is good so that the website is easily indexed by search engines.
this is related to SEO.

@schungx
Copy link

schungx commented Oct 16, 2022

I believe a site map hints to a search engine what to index and what not to index. So unnecessary or outdated files can be skipped if they are not in the map.

Site maps are also more reliable wrt page ranking these days because things like a nav bar pointing to all the pages made simply counting the number of backlinks unreliable as a metric.

@ISSOtm
Copy link
Contributor Author

ISSOtm commented Oct 16, 2022

Well, the info about why this would be useful has been said above 😛

What should be tested? I can only think of comparing the sitemap that would be output from the test book against a reference, but I believe that would produce a bunch of false positives.

@brettchalupa
Copy link
Contributor

brettchalupa commented Oct 16, 2022

@ISSOtm with regards to what should be tested, just from checking out the PR, here are my thoughts:

  • generate_sitemap tests:
    • BookItems collection passed in to verify the file is created and has the expected contents for those items passed in — this is an integration test of sorts since this function calls out to write_sitemap
    • The destination existing already and error being returned
    • passing in a site_url without a trailing slash and ensuring it works as expected
  • xml_escapes unit tests by passing in a variety of strings to ensure they're escaped as expected

I'm happy to collaborate or help on any of the tests if you want. 😄

Also just want to say +1 to this comment you left: // TODO: lastmod from src file modification time? I think that makes sense and would be useful.

@ISSOtm
Copy link
Contributor Author

ISSOtm commented Oct 22, 2022

Thanks @brettchalupa!

I'd be happy if you could give me some strings that would be useful for testing xml_escapes.

I'll begin writing the rest of the tests and the lastmod TODO this weekend/next week.

@brettchalupa
Copy link
Contributor

brettchalupa commented Oct 26, 2022

@ISSOtm here are some strings for that unit test that might be helpful (even if it's unlikely some of these may get passed through in the grand scheme of things, it's probably okay to be extra thorough):

  • "https://example.com/foo/bar.html"
  • "https://Bücher.example/foo/bar"
  • "https://'<foo'/>.com/foo?bar=1&biz=100" -- you know, more permutations that exercise what's intended to be escaped 😂

@andymac4182
Copy link

Any movement on this? It is still causing issues with Google indexing.

mirrorcult added a commit to space-wizards/mdBook-spacewizards that referenced this pull request Sep 28, 2023
commit 18404c4bc22be7cd8c598c6cf033909c92e80dfe
Merge: 333873a 2c3df28
Author: Kara <lunarautomaton6@gmail.com>
Date:   Thu Sep 28 10:50:59 2023 -0500

    Merge branch 'master' into sitemap

    # Conflicts:
    #	Cargo.toml
    #	src/renderer/html_handlebars/hbs_renderer.rs

commit 333873a
Author: ISSOtm <eldredhabert0@gmail.com>
Date:   Fri Jul 30 18:29:48 2021 +0200

    Add sitemap generation support to HTML renderer

    Fixes rust-lang#1491
@avivace
Copy link

avivace commented Oct 25, 2023

Any news here?

@ISSOtm
Copy link
Contributor Author

ISSOtm commented Oct 26, 2023

Sorry, I've been dedicating my time elsewhere, as contribution to mdBook is very difficult due to lack of available maintainer time.

I'd be happy to let someone else take this PR to the finish line in my stead.

@rustbot
Copy link
Collaborator

rustbot commented Apr 30, 2025

☔ The latest upstream changes (possibly #2681) made this pull request unmergeable. Please resolve the merge conflicts.

@rustbot rustbot added the S-waiting-on-author Status: The marked PR is awaiting some action (such as code changes) from the PR author. label Apr 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-author Status: The marked PR is awaiting some action (such as code changes) from the PR author.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sitemap.xml Generate

10 participants