Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot create a sitemap for www.forthea.com, my app just freezes #52

Open
Chamomile11 opened this issue Nov 26, 2018 · 5 comments
Open

Comments

@Chamomile11
Copy link

Do you want to request a feature or report a bug?

BUG

What is the current behavior?

I am building a small console Node app. I successfully managed to create sitemaps for several thousands websites using this package. However, when I try to use it for www.forthea.com my app just hangs forever (even when I use it inside a promise with a timeout timer to force it to stop and even despite its own built-in timeout value of 30000 ms). My app just freezes but does not crash and there is no any errors.

If the current behavior is a bug, please provide the steps to reproduce.

const sitemapGenerator = require('sitemap-generator');
const generator = sitemapGenerator('http://www.forthea.com', {
stripQuerystring: false
});

generator.start(); // Loops forever, no errors, no sitemap, even when used with a timer.

What is the expected behavior?

The package must be capable of building a sitemap for www.forthea.com (or at least generate some error).

@lgraubner
Copy link
Owner

Will have a look at it tomorrow.

@Chamomile11
Copy link
Author

Will have a look at it tomorrow.

Alright, thank you Lars.

@Chamomile11
Copy link
Author

Seems like the package blocks the main event loop altogether when parsing www.forthea.com so that even timers set via setTimeout stop working. I fixed my problem by running my sitemap generation code inside a separate child process and killing it after a predetermined amount of time, as I originally needed. This way it works alright and I can kill the child process even if it hangs.

@lgraubner
Copy link
Owner

Interesting. Would still be good to know why the main event loop is blocked for this website especially.

@Rob-Rychs
Copy link
Contributor

that is weird, I've never experienced this infinite loop error. Are you setting the crawler to respect robots.txt? @Chamomile11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants