New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crawl sitemaps #632
Comments
@fromcouch - this may be a nice issue to work on for you? If you want it, feel free to assign yourself and let me know if any questions |
Yes, i will review. |
@leonstafford seems that isn't a good idea to implement default wordpress sitemap a lot of problems with multisite and multilanguage. You can see comments here: Instead, I could detect if sitemap.xml exists and parse it (or ask directly to Yoast or another plugin the URL list) |
Yes, detect and parse if exists sounds right. I used to have some code in to detect Yoast sitemaps specifically. We can add those to a list of
|
/sitemap.xml only this file is getting crawled, below path are missing RankMaths / Yoast/main-sitemap.xsl Ideally, I would expect like this ... EDIT: I reported this issue at different thread. This was finding as per Static HTML Output, not wp2static. |
Maybe will be easier if we read sitemap file from robots.txt and if not detected search for:
And then crawl ... |
@leonstafford i have a problem here. When I read sitemaps I get a list of all URLs. This means that I can't respect checkboxes that ask for detect posts, etc. Maybe we could add a checkbox in configuration called "use sitemaps" that deactivates: Detect Custom Post Types Let me something ... |
@fromcouch - ah, good point! I'd like to see togglable option for "Use Sitemaps", which is on by default. We can allow users to check box sitemaps and any other detection option. Adding a warning in the Export Log that:
Detecting too much is usually preferred to not detecting enough, especially when users have the ability to go back and adjust settings to limit detection. |
This functionality should already be merged in |
WP 5.5 includes a sitemap by default. It should be pretty easy to parse and crawl the sitemaps, but we need some changes to allow adding new URLs during a crawl.
The text was updated successfully, but these errors were encountered: