Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Google indexes Template, Helpcenter, and Notion Landing Pages on my Domain #63

Open
marcklingen opened this issue Nov 24, 2020 · 17 comments

Comments

@marcklingen
Copy link
Contributor

marcklingen commented Nov 24, 2020

Hi @stephenou,

thanks again for your ongoing development and support of this project! While #19 #18 aim to fix this issue, I just would like to reiterate why it would be super important to exclude all pages that are not whitelisted with a 404 or no-index header.

Yesterday, I saw that Google indexed 374 pages on my domain labeled as Indexed, not submitted in sitemap. Here's a screenshot with examples:
image

While I am not sure about the legal implications, it would be for sure nice to only have own pages in the index.

I am happy to contribute to the solution, let me know how I can help best.

@stephenou
Copy link
Owner

Hey @marcklingen! Yep, I agree we should fix this. The solution I came up with is to 301-redirect unlisted pages back to the homepage.

See 7f273ed for what changes you need to make in your script.

Do you mind helping me test it out before I announce it widely? Thanks!

@marcklingen
Copy link
Contributor Author

Thanks @stephenou, the redirect works for me. I'd suggest to also add X-Robots-Tag: noindex to the header of the response.

@marcklingen
Copy link
Contributor Author

marcklingen commented Nov 26, 2020

Just found an exception to the rule in the screenshot. While this solution solves most of the problem, it does not address pages which do not have the characteristic page id, e.g.: /tools-and-craft/01-andy-hertzfeld, /pricing

@ThallyssonKlein
Copy link

@marcklingen How did you manage to index the domain on Google? Mine does not appear ...

@marcklingen
Copy link
Contributor Author

@ThallyssonKlein If you domain is not automatically indexed, you can add the /sitemap.xml in the Search Console.

@ThallyssonKlein
Copy link

@marcklingen Where does this sitemap.xml file come from?

@marcklingen
Copy link
Contributor Author

@ThallyssonKlein The sitemap is generated by the worker, you can find the line here:

if (url.pathname === "/sitemap.xml") {

@lasharor
Copy link

Had the same problem. Changed the code as per 7f273ed. Pages indexed by google now all go back to the homepage.

@marcklingen
Copy link
Contributor Author

@lasharor That does not work for pages with a nice slug such as /pricing

@ThallyssonKlein
Copy link

image

@marcklingen How long can indexing take?

@marcklingen
Copy link
Contributor Author

@ThallyssonKlein Usually it takes a couple of days but less than a week.

@ThallyssonKlein
Copy link

@marcklingen I have been notified that the pages are not being read correctly

image

@stephenou
Copy link
Owner

Just found an exception to the rule in the screenshot. While this solution solves most of the problem, it does not address pages which do not have the characteristic page id, e.g.: /tools-and-craft/01-andy-hertzfeld, /pricing

Hey Marc, yeah unfortunately that's true. One solution is to define a denylist of URLs that Notion uses for marketing, but it's hard to keep it updated when Notion adds new pages. Another solution is to define an allowlist of URLs that your site can visit, but it's also hard to keep it updated when you add new pages.

@vlafriday
Copy link

Hi @stephenou,

thanks again for your ongoing development and support of this project! While #19 #18 aim to fix this issue, I just would like to reiterate why it would be super important to exclude all pages that are not whitelisted with a 404 or no-index header.

Yesterday, I saw that Google indexed 374 pages on my domain labeled as Indexed, not submitted in sitemap. Here's a screenshot with examples: image

While I am not sure about the legal implications, it would be for sure nice to only have own pages in the index.

I am happy to contribute to the solution, let me know how I can help best.

Hi, @marcklingen, I know it's been many years, but maybe you know the solution? Now I get the same problem with indexing. Although sitemap.xml has been added to Google Console.

I see your website (https://marcklingen.com/) it works without problems. How did you do that?
Can you share and help?

Screenshot_35
Screenshot_36

@stephenou

@marcklingen
Copy link
Contributor Author

Hi @vlafriday, I switched to react-notion-x for this page while still using fruition for others. Based on the screenshot (which I do not fully understand) your problem looks different to the one I had as you have a 404 and not too many pages in the index.

@vlafriday
Copy link

Do you have good indexing in google console? Does it work? Are you using Vercel to deploy code?

Hi @vlafriday, I switched to react-notion-x for this page while still using fruition for others. Based on the screenshot (which I do not fully understand) your problem looks different to the one I had as you have a 404 and not too many pages in the index.

@marcklingen
Copy link
Contributor Author

Do you have good indexing in google console? Does it work? Are you using Vercel to deploy code?

Hi @vlafriday, I switched to react-notion-x for this page while still using fruition for others. Based on the screenshot (which I do not fully understand) your problem looks different to the one I had as you have a 404 and not too many pages in the index.

Yes, works well for me and I deploy on Vercel. You can also go for managed solutions like super.so, have not tried them though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants