[Question] Squidwarc Frontier Mangament and long scalable crawls #5

N0taN3rd · 2018-08-20T05:26:27Z

One of the use cases I have wanted to support in Squidwarc is multiple worker crawlers populating and pulling from a single master frontier.

As well as a move from the current in memory frontier to a more scalable frontier scheme.

Since warcworker is light years ahead in this regard 😍 (i.e. frontend for Squidwarc with multiple crawler workers and expandability potential for managing long crawls), I thought it best to see it if warcworker has any interest in this functionality and if so to coordinate development 😃

peterk · 2018-08-28T20:14:52Z

Sorry for the late reply @N0taN3rd ! Would love to be able to support further development although I fear Squidwarc code quality is way ahead of warcworker right now though. Pls share any ideas you have. I am also looking at building a collection front end (or adapt to SFM later).

N0taN3rd · 2018-09-05T22:28:00Z

A starting idea for this is to make a continuous crawl mode rather than the current one off crawls, i.e. start Squidwarc crawl and once it completes will wait for another config to be sent to it or is killed.

The best way to communicate with Squidwarc not sure. Thinking websockets to keep deps light(ish) for starters.

peterk · 2018-09-21T14:02:43Z

Right now the worker is waiting for the next item in the queue so I guess it already works in that way (but isolated from Squidwarc)? Or are you thinking about something else?

Possibly related: @Segerberg was interested in looking at settings to dedup and append to warcs. Maybe there should be some kind of a "set" concept for which a single crawler was responsible for dedup, and warc appending for the set?

N0taN3rd mentioned this issue Dec 7, 2018

Make scalable N0taN3rd/Squidwarc#17

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Squidwarc Frontier Mangament and long scalable crawls #5

[Question] Squidwarc Frontier Mangament and long scalable crawls #5

N0taN3rd commented Aug 20, 2018

peterk commented Aug 28, 2018

N0taN3rd commented Sep 5, 2018

peterk commented Sep 21, 2018

[Question] Squidwarc Frontier Mangament and long scalable crawls #5

[Question] Squidwarc Frontier Mangament and long scalable crawls #5

Comments

N0taN3rd commented Aug 20, 2018

peterk commented Aug 28, 2018

N0taN3rd commented Sep 5, 2018

peterk commented Sep 21, 2018