Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable spider queue class #197

Closed
bezkos opened this issue Dec 26, 2016 · 4 comments · Fixed by #476
Closed

Configurable spider queue class #197

bezkos opened this issue Dec 26, 2016 · 4 comments · Fixed by #476

Comments

@bezkos
Copy link

bezkos commented Dec 26, 2016

I have started to implement a custom job queue so i can use a shared job queue with Postgre.
I tried to use this setting SPIDER_QUEUE_CLASS but i found out that its missing from Scrapyd. Is it possible to have this setting back cause i want to avoid patch scrapyd code? I think implementation of this feature again, it is important when someone want to use Scrapyd in a multi-server environment.

@Digenis Digenis changed the title Shared Job Queue with Postgresql Configurable spider queue class Jan 4, 2017
@Digenis
Copy link
Member

Digenis commented Jan 4, 2017

I summarize here my reply from the mailing list (only for the record).

The SPIDER_QUEUE_CLASS setting
is from the time that scrapyd was part of scrapy
not only as a package but as a module.
scrapy/scrapy@75e2c3e
Unfortunately the release notes do not cover all the details of scrapyd's separation as a module
(and it would probably be impractical)

Scrapyd never had this setting.

@Digenis
Copy link
Member

Digenis commented Jan 4, 2017

It seems easy to implement such a setting.

diff --git a/scrapyd/default_scrapyd.conf b/scrapyd/default_scrapyd.conf
index 0da344f..e2b0c35 100644
--- a/scrapyd/default_scrapyd.conf
+++ b/scrapyd/default_scrapyd.conf
@@ -15,4 +15,5 @@ runner      = scrapyd.runner
 application = scrapyd.app.application
 launcher    = scrapyd.launcher.Launcher
+spiderqueue = scrapyd.spiderqueue.SqliteSpiderQueue
 webroot     = scrapyd.website.Root
 
diff --git a/scrapyd/utils.py b/scrapyd/utils.py
index 602a726..c96add0 100644
--- a/scrapyd/utils.py
+++ b/scrapyd/utils.py
@@ -9,8 +9,11 @@ import json
 from twisted.web import resource
 
-from scrapyd.spiderqueue import SqliteSpiderQueue
+from scrapy.utils.misc import load_object
 from scrapyd.config import Config
 
 
+DEFAULT_SPIDERQUEUE = 'scrapyd.spiderqueue.SqliteSpiderQueue'
+
+
 class JsonResource(resource.Resource):
 
@@ -57,8 +60,9 @@ def get_spider_queues(config):
     if not os.path.exists(dbsdir):
         os.makedirs(dbsdir)
+    spiderqueue = load_object(config.get('spiderqueue', DEFAULT_SPIDERQUEUE))
     d = {}
     for project in get_project_list(config):
         dbpath = os.path.join(dbsdir, '%s.db' % project)
-        d[project] = SqliteSpiderQueue(dbpath)
+        d[project] = spiderqueue(dbpath)
     return d
 

I notice that the builtin spider queue should be merged with the sqlite priority queue
which may also be eventually replaced by https://github.com/scrapy/queuelib
Such a plan shouldn't prevent us from introducing the above config option
as long as the interface remains the same.

@Digenis
Copy link
Member

Digenis commented Jun 16, 2019

An update on my last comment about replacing this queue module with scrapy/queuelib.
There are more projects to consider.
https://github.com/balena/python-pqueue
https://github.com/peter-wangxu/persist-queue

@jpmckinney
Copy link
Contributor

jpmckinney commented Mar 8, 2023

Discussion of replacement to default spider queue moved to #475

Noting that this issue was postponed "in favour of #187 solution (Unify queues/dbs)" #201 (comment)

However, #187 has gone nowhere since 2016, and if we do decide to make breaking changes, we can just do that in a major version. So, this is no longer postponed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants