You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One of the use cases I have wanted to support in Squidwarc is multiple worker crawlers populating and pulling from a single master frontier.
As well as a move from the current in memory frontier to a more scalable frontier scheme.
Since warcworker is light years ahead in this regard 😍 (i.e. frontend for Squidwarc with multiple crawler workers and expandability potential for managing long crawls), I thought it best to see it if warcworker has any interest in this functionality and if so to coordinate development 😃
The text was updated successfully, but these errors were encountered:
Sorry for the late reply @N0taN3rd ! Would love to be able to support further development although I fear Squidwarc code quality is way ahead of warcworker right now though. Pls share any ideas you have. I am also looking at building a collection front end (or adapt to SFM later).
A starting idea for this is to make a continuous crawl mode rather than the current one off crawls, i.e. start Squidwarc crawl and once it completes will wait for another config to be sent to it or is killed.
The best way to communicate with Squidwarc not sure. Thinking websockets to keep deps light(ish) for starters.
Right now the worker is waiting for the next item in the queue so I guess it already works in that way (but isolated from Squidwarc)? Or are you thinking about something else?
Possibly related: @Segerberg was interested in looking at settings to dedup and append to warcs. Maybe there should be some kind of a "set" concept for which a single crawler was responsible for dedup, and warc appending for the set?
One of the use cases I have wanted to support in Squidwarc is multiple worker crawlers populating and pulling from a single master frontier.
As well as a move from the current in memory frontier to a more scalable frontier scheme.
Since warcworker is light years ahead in this regard 😍 (i.e. frontend for Squidwarc with multiple crawler workers and expandability potential for managing long crawls), I thought it best to see it if warcworker has any interest in this functionality and if so to coordinate development 😃
The text was updated successfully, but these errors were encountered: