Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Per-project job limit #140

Closed
wants to merge 35 commits into from

Conversation

luc4smoreira
Copy link

I developed this new feature to allow limit the maximum jobs per project. Please, check if it is interesting.

@Digenis
Copy link
Member

Digenis commented Apr 7, 2016

Hi,

What is your use-case for this feature?
Do you have problems with projects monopolizing resources?
If so, how did this happen?
What kind of projects do you run in the same scrapyd instance?

Btw, try running the tests locally before opening a PR
instead of waiting for TravisCI.

@luc4smoreira
Copy link
Author

Hello Digenis.

I want to use Scrapyd in a production environment. There is a lot of Spiders projects. Some of those, runs eventually (monthly) but take about 3 days to complete all jobs, with about 500 jobs. So, I don´t want to lock the other jobs when this project starts.

I found other users that need this kind of feature too, like this one:
https://groups.google.com/forum/#!topic/scrapy-users/FME7PVpD2k8

I will work to fix the tests today, if I have time. And push the code to this branch.

@Digenis
Copy link
Member

Digenis commented Apr 7, 2016

I will need someone who's been more involved in the poller/launcher to review this when ready.

@luc4smoreira
Copy link
Author

Sorry about the mess in Travis history.

I fixed the unit test using mock, with this module: https://pypi.python.org/pypi/mock

I am looking for how to add this egg in travis.

@Digenis Digenis added this to the 1.2 milestone May 22, 2016
schedules no postgres, esse comportamento pode ser hablitado através da
configuração 'enable_postgres_persist = true'. Isso viabilizará o
reprocessamento de jobs.
request_count) no banco ao termíno de uma execução. Além disso foi
criado parametros no arquivo de configuração para os buckets, do s3,
para armazenamento do arquivo de log e items.
@Digenis Digenis mentioned this pull request Apr 9, 2021
@Digenis Digenis changed the title Limits the maximum jobs per project Per-project job limit Apr 13, 2021
@jpmckinney jpmckinney mentioned this pull request Sep 23, 2021
@jpmckinney
Copy link
Contributor

This PR has severe conflicts. Would any of the contributors be able to resolve them? If not, I will close the PR and create an issue instead (or defer to #197 as suggested in #389).

@pawelmhm
Copy link
Contributor

This introduces postgres and rabbitmq as dependencies, will increase technical debt. Also some things added here were already done in simpler way using sqlite here: #359 and merged.

So in this form this PR cannot be merged.

Ideally we should just allow configurable Pollers, now we load QueuePoller class by default, but we could just make it possible for people to write any sort of complex Pollers themselves. Same for scheduler. I think ScrapyD should be basic and simple, but should provide building blocks to extend it with your desired functionality. This desired functionality from this PR could be added as custom project extension of some specific ScrapyD project, and ScrapyD should just allow people to integrate it easily by making all core components configurable.

@jpmckinney jpmckinney modified the milestones: 1.3.0, 1.4.0 May 13, 2022
@jpmckinney jpmckinney closed this Feb 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants