-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel engine not working #22
Comments
Hi can you share what version of AbotX you are using and post some intialization/configuration and/or some log files (set to level debug)? |
Hi,
I am using version 2.1.6. Crawler is stopping after the first page;
private static CrawlConfigurationX GetSafeConfig()
{
/*The following settings will help not get your ip banned
by the sites you are trying to crawl. The idea is to crawl
only 5 pages and wait 2 seconds between http requests
*/
return new CrawlConfigurationX
{
MaxPagesToCrawl = 100,
MinCrawlDelayPerDomainMilliSeconds = 2000
};
}
private static async Task DemoParallelCrawlerEngine()
{
var siteToCrawlProvider = new SiteToCrawlProvider();
var config = GetSafeConfig();
var crawlEngine = new ParallelCrawlerEngine(
config);
crawlEngine.Impls.SiteToCrawlProvider.AddSitesToCrawl(new
List<SiteToCrawl>
{
new SiteToCrawl{ Uri = new Uri("https://chubbydeveloper.com"),
Id = Guid.NewGuid() },
});
await crawlEngine.StartAsync();
}
…On Sat, Apr 25, 2020 at 5:58 AM Steven ***@***.***> wrote:
Hi can you share what version of AbotX you are using and post some
intialization/configuration and/or some log files (set to level debug)?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#22 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AD7XTOGPBVJT7FQNZSISCALROJNU5ANCNFSM4MJKM5WA>
.
|
Try initializing like the following and see if that helps...
var crawlEngine = new ParallelCrawlerEngine(
config,
new ParallelImplementationOverride(config)
{
SiteToCrawlProvider = siteToCrawlProvider;
});
var crawlEngine = new ParallelCrawlerEngine(
config,
new ParallelImplementationOverride(config,
new ParallelImplementationContainer()
{
SiteToCrawlProvider = siteToCrawlProvider,
WebCrawlerFactory = new WebCrawlerFactory(config)//Same config will be used for every crawler
})
);
…On Mon, Apr 27, 2020, 2:46 AM sbonello ***@***.***> wrote:
Hi,
I am using version 2.1.6. Crawler is stopping after the first page;
private static CrawlConfigurationX GetSafeConfig()
{
/*The following settings will help not get your ip banned
by the sites you are trying to crawl. The idea is to crawl
only 5 pages and wait 2 seconds between http requests
*/
return new CrawlConfigurationX
{
MaxPagesToCrawl = 100,
MinCrawlDelayPerDomainMilliSeconds = 2000
};
}
private static async Task DemoParallelCrawlerEngine()
{
var siteToCrawlProvider = new SiteToCrawlProvider();
var config = GetSafeConfig();
var crawlEngine = new ParallelCrawlerEngine(
config);
crawlEngine.Impls.SiteToCrawlProvider.AddSitesToCrawl(new
List<SiteToCrawl>
{
new SiteToCrawl{ Uri = new Uri("https://chubbydeveloper.com"),
Id = Guid.NewGuid() },
});
await crawlEngine.StartAsync();
}
On Sat, Apr 25, 2020 at 5:58 AM Steven ***@***.***> wrote:
> Hi can you share what version of AbotX you are using and post some
> intialization/configuration and/or some log files (set to level debug)?
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <#22 (comment)>, or
> unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/AD7XTOGPBVJT7FQNZSISCALROJNU5ANCNFSM4MJKM5WA
>
> .
>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#22 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA5C3YT7WQJTG43LK4OWIE3ROVH7HANCNFSM4MJKM5WA>
.
|
worse it didn't even crawl one site
output.
2020-04-27 23:27:56:484 +02:00] [1] [INF] - Started [1] thread for
monitoring
[2020-04-27 23:27:56:486 +02:00] [7] [INF] - Engine is still running...
[2020-04-27 23:27:56:518 +02:00] [1] [INF] - Started [1] thread running
ISiteToCrawlProducer of type [AbotX2.Parallel.SiteToCrawlProducer]
[2020-04-27 23:27:56:525 +02:00] [8] [INF] - Retrieving up to [5] sites to
crawl
[2020-04-27 23:27:56:529 +02:00] [1] [INF] - Started [2] threads running
ISiteToCrawlConsumer of type [AbotX2.Parallel.SiteToCrawlConsumer]
[2020-04-27 23:27:56:536 +02:00] [8] [INF] - Retrieved [0] sites to crawl
[2020-04-27 23:27:56:540 +02:00] [8] [INF] - ISiteToCrawlProvider
[AbotX2.Parallel.SiteToCrawlProvider] is reporting that it is complete.
Will not make anymore requests for sites to crawl.
[2020-04-27 23:27:57:674 +02:00] [13] [INF] - All ISiteToCrawlConsumer
threads have completed
[2020-04-27 23:27:57:680 +02:00] [11] [INF] - All ISiteToCrawlProducer
threads have completed
[2020-04-27 23:27:57:684 +02:00] [13] [INF] - All crawls have completed
…On Mon, Apr 27, 2020 at 5:28 PM Steven ***@***.***> wrote:
Try initializing like the following and see if that helps...
var crawlEngine = new ParallelCrawlerEngine(
config,
new ParallelImplementationOverride(config)
{
SiteToCrawlProvider = siteToCrawlProvider;
});
On Mon, Apr 27, 2020, 2:46 AM sbonello ***@***.***> wrote:
> Hi,
>
> I am using version 2.1.6. Crawler is stopping after the first page;
>
> private static CrawlConfigurationX GetSafeConfig()
> {
> /*The following settings will help not get your ip banned
> by the sites you are trying to crawl. The idea is to crawl
> only 5 pages and wait 2 seconds between http requests
> */
> return new CrawlConfigurationX
> {
> MaxPagesToCrawl = 100,
> MinCrawlDelayPerDomainMilliSeconds = 2000
> };
> }
>
> private static async Task DemoParallelCrawlerEngine()
> {
> var siteToCrawlProvider = new SiteToCrawlProvider();
>
> var config = GetSafeConfig();
>
> var crawlEngine = new ParallelCrawlerEngine(
> config);
>
> crawlEngine.Impls.SiteToCrawlProvider.AddSitesToCrawl(new
> List<SiteToCrawl>
> {
> new SiteToCrawl{ Uri = new Uri("https://chubbydeveloper.com"),
> Id = Guid.NewGuid() },
> });
>
> await crawlEngine.StartAsync();
> }
>
> On Sat, Apr 25, 2020 at 5:58 AM Steven ***@***.***> wrote:
>
> > Hi can you share what version of AbotX you are using and post some
> > intialization/configuration and/or some log files (set to level debug)?
> >
> > —
> > You are receiving this because you authored the thread.
> > Reply to this email directly, view it on GitHub
> > <#22 (comment)>,
or
> > unsubscribe
> > <
>
https://github.com/notifications/unsubscribe-auth/AD7XTOGPBVJT7FQNZSISCALROJNU5ANCNFSM4MJKM5WA
> >
> > .
> >
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <#22 (comment)>, or
> unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/AA5C3YT7WQJTG43LK4OWIE3ROVH7HANCNFSM4MJKM5WA
>
> .
>
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#22 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AD7XTOCNX5F3JAWM4HEQ5QLROWQCLANCNFSM4MJKM5WA>
.
|
So there are 2 issues here, neither are your fault.
Let me know if that fixes your issues. |
Thanks
Will test as soon as the new version is out
Simon
…On Mon, May 4, 2020 at 1:59 AM Steven ***@***.***> wrote:
So there are 2 issues here, neither are your fault.
1.
The docs and my example suggestion above were incorrect syntax. I
updated the docs and that comment above. Notice the
ParallelImplementationContainer as a parameter to
ParallelImplementationOverride.
2.
There was a bug in the code that should now be fixed in version 2.1.7.
Let me know if that fixes your issues.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#22 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AD7XTOFZL56KUR7Q2WRYODLRPYANZANCNFSM4MJKM5WA>
.
|
Its out there whenever you are ready. |
Closing issue. Please reopen issue if problem persists. |
I am trying to test the parallel engine but it is not working . it is returning after the first page crawl. I am testing using with a license. Would consider to upgrade if it works
The text was updated successfully, but these errors were encountered: