Join GitHub today
XML Sitemap - Don't include content from trashed pages #1423
Reported by Karl Bratby by email.
When you have images attached to a page or post and the page/post is moved to the trash, the image attachment pages still exist. These image attachment pages can only be viewed when you're logged into the site. If you have Media / Attachments included in the XML Sitemap then these URLs get included in the sitemap which results in 404 errors when the sitemap is submitted to Google.
We should not include any content linked to trashed pages. If a URL has includes __trashed then we should exclude it from the sitemap.
An example can be found here - http://stevemortiboy.com/sitemap.xml. If you search this sitemap for the word trashed you will see five image attachment pages. These are images that were uploaded to a page but then the page was moved to trash. We shouldn't include those image attachment pages in the sitemap because the page they were attached to no longer exists.