New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sitemap.xml has a strange URL, Google will not read it #3067
Comments
The URL looks OK to me. If I click it, then I get a valid sitemap file.
This is a "url-encoded" slash character. It's perfectly normal.
Can you give more details? Is there an error message? |
I found it, the XML was allright, but CloudFlare was blocking access to the sitemap when you used a GoogleBot useragent. I disabled CloudFlare for the time being, so nothing wrong with Webtrees. Thanks for your help! |
No, there is still something fishy (no offence), Google still does not find the pages in the Sitemap. The sitemap.xml can be read now, but it is not valid xml says https://www.xmlvalidation.com/index.php?id=1&L=0.
Error in the XML document: |
That validator tool requires that you upload the sitemap schema first. Here is a validator that is designed to test sitemap files. It says that both your sitemapindex and sitemap files are valid.
Can you give more details? |
I found that site too :-) But another site says the first link is not right: So I fixed the xml manually, deleting all the extra spaces that are in the file and uploaded it to my site as sitemapfixed.xml. Now it thinks it is better: Google thinks it is better too now, but no URLS found. Then I fixed the https://www.stamboomwesterman.net/index.php?route=%2Fsitemap-tree1-INDI-0.xml too, just by aligning the XML tags. I called it sitemap-tree1-INDI-0-fixed.xml (I left only one INDI in there). You can see it likes my fixed xml's, and finds the URL! So it is about spaces and aligning! |
Yes I understand. So it must be something with PHP on my shared hoster right? I asked the Dreamhost guys, but they cannot find anything. The logic seems to be: https://www.stamboomwesterman.net/sitemapfixed.xml |
Continuing the journey, I am starting to like this :-) Using the Google Advanced Rest Client, I could get the 500 error from my machine. But after that, if started working, only 200's. I have seen that before, a new browser will give a 500 error first and work after a refresh. I enabled php logging, and got a decent error now:
So it disables it after it goes wrong, that is why the second time it works! So I put pcre.jit=0 in my php.ini and now I got another error:
Maybe this is something Webtrees related or should I tune PHP some more? Greetings, |
According to the documentation, I read that the default is
You get this error when? Every page? The sitemap page? |
Just a shared webhoster that tries to limit their users... I asked for an upgrade to a VPS already.
Only when asking for the sitemap pages. The rest of Webtrees works great. The error is only in the php.log, you get HTTP error 500 returncode, but the browser renders the sitemap OK. |
That error is in the default page template - which isn't used for the sitemap files. So, this sounds odd.
Are you 100% certain that the 500 response is for the sitemap? You should either get a valid sitemap with a 200 response - or an error page with a 500. |
Yes, I can trigger it 100% with https://botsimulator.com with the url https://www.stamboomwesterman.net/index.php?route=%2Fsitemap.xml But I have a workaround now, I just downloaded all the sitemaps, saved them in local xml files on my webserver, and pointed my robots.txt and Google Search to https://stamboomwesterman.net/sitemaplocal.xml. At least my website can be found with Google and Bing :-) Lets just say that my webhoster did something funky. |
I can set the User Agent using a browser plugin. The botsimulator uses the same User Agent.
There is something strange happening on your server... |
Let's close it, way too much energy in this now :-D Thanks for all your help!! |
https://www.stamboomwesterman.net/index.php?route=%2Fsitemap.xml is the URL of my sitemap (I manually add the workaround you made a couple of days ago: [https://github.com//issues/3065])
This %2F in the URL doesn't sound right? Google Search won't read the sitemap, though I can see it in a browser.
The text was updated successfully, but these errors were encountered: