Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only one URL has been discovered #71

Open
jean-christophe-manciot opened this issue Mar 7, 2020 · 6 comments
Open

Only one URL has been discovered #71

jean-christophe-manciot opened this issue Mar 7, 2020 · 6 comments
Labels

Comments

@jean-christophe-manciot
Copy link

Do you want to request a feature or report a bug?
bug

$ npm install -S sitemap-generator
npm WARN saveError ENOENT: no such file or directory, open '/home/actionmystique/.config/sitemap-generator/package.json'
npm notice created a lockfile as package-lock.json. You should commit this file.
npm WARN enoent ENOENT: no such file or directory, open '/home/actionmystique/.config/sitemap-generator/package.json'
npm WARN sitemap-generator No description
npm WARN sitemap-generator No repository field.
npm WARN sitemap-generator No README data
npm WARN sitemap-generator No license field.

+ sitemap-generator@8.4.2
added 39 packages from 64 contributors and audited 58 packages in 4.133s
found 0 vulnerabilities

sitemap-generator.js:

const SitemapGenerator = require('sitemap-generator');

// create generator
const generator = SitemapGenerator('https://git.sdxlive.com', {
  filepath: './sitemap.xml',
  lastMod: true,
  maxDepth: 9999,
  maxEntriesPerFile: 50000,
  stripQuerystring: true
});

// register event listeners
generator.on('done', () => {
  // sitemaps created
});

// start the crawler
generator.start();
node sitemap-generator.js

leads to sitemap.xml:

<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://git.sdxlive.com/</loc>
    <lastmod>2020-03-07</lastmod>
  </url>
</urlset>

@lgraubner What am I missing?

@jean-christophe-manciot
Copy link
Author

Same issue with sitemap-generator-cli:

$ sudo npm install -g sitemap-generator-cli
/usr/local/bin/sitemap-generator -> /usr/local/lib/node_modules/sitemap-generator-cli/index.js
+ sitemap-generator-cli@7.5.0
added 47 packages from 67 contributors in 2.363s
$ sitemap-generator --last-mod https://git.sdxlive.com

sitemap.xml:

<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://git.sdxlive.com/</loc>
    <lastmod>2020-03-07</lastmod>
  </url>
</urlset>

@dhruvkaushal11
Copy link

Facing the same issue did anyone found a workaround?

@lgraubner lgraubner added the bug label May 4, 2020
@hanshou101
Copy link

hanshou101 commented Nov 21, 2020

I also encountered the same problem when I visited my local VuePress document website .

@kevinvella1
Copy link

We are experiencing this same problem. Any update on this?

@Wintermute79
Copy link

Same here. It's simply not working, generates sitemap with the links on the initial URL only. No deeper crawling.

@dkoo761
Copy link

dkoo761 commented Feb 25, 2023

For anyone else who comes across this, if only your root webpage is included in the sitemap, it usually means that your website pages are being generated client-side by a Javascript framework such as React, Vue, etc. Since the sitemap crawler doesn't execute Javascript, it will just see a mostly blank page. You can confirm this by using curl YOUR_DOMAIN from your terminal...if your page <body> is mostly empty and doesn't contain your actual webpage HTML then you have this problem.

A couple solutions:

  1. Use server-side rendering with your frontend framework (like next.js for React or nuxt.js for Vue) to generate complete HTML pages on the server.

  2. Use a prerendering service like prerender.io or ostr.io to pre-render your pages for search engine crawlers. You can then build the sitemap by telling sitemap-generator to pretend it's Googlebot. This will then tell your site to return the full prerendered HTML page to sitemap-generator. Using the cli version:

sitemap-generator --verbose --max-concurrency 2 --user-agent "Googlebot/2.1 (+http://www.google.com/bot.html)" YOUR_DOMAIN

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants