Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An attempt to resolve #957, add signal to be sent when request is dropped by the scheduler #961

Merged
merged 2 commits into from Dec 2, 2014

Conversation

@ldmberman
Copy link
Contributor

@ldmberman ldmberman commented Nov 27, 2014

There are some ideas about how to solve the issue #957 in this patch. I am new to the project and thus may have missed something here.

Is it better to name the signal in a more specific way, eg request_dropped_by_dupefilter?

…pped by the scheduler
@@ -173,7 +173,9 @@ def crawl(self, request, spider):
def schedule(self, request, spider):
self.signals.send_catch_log(signal=signals.request_scheduled,
request=request, spider=spider)
return self.slot.scheduler.enqueue_request(request)

This comment has been minimized.

@kmike

kmike Nov 27, 2014
Member

I don't know if it matters, but the value of enqueue_request(request) was previously returned by schedule method. Standard Scheduler doesn't return anything from enqueue_request, but custom schedulers could. So this changes ExecutionEngines and Scheduler APIs.

@dangra @pablohoffman does it matter?

This comment has been minimized.

@nramirezuy

nramirezuy Nov 27, 2014
Contributor

I think it should keep returning the same, for backwards compatibility.

This comment has been minimized.

@ldmberman

ldmberman Nov 27, 2014
Author Contributor

What about adding a method to the scheduler which will validate the requests? Something like is_request_valid. Also, it can be called twice (in engine and in enqueue_request itself) to preserve backwards compatibility. I do not see troubles in current approach though - as far as I can see the result of this function is not used anywhere.

This comment has been minimized.

@dangra

dangra Nov 27, 2014
Member

IMO it won't break anyone code, and we haven't made any contracts on Scheduler api.

This comment has been minimized.

@kmike

kmike Nov 28, 2014
Member

I think that the updated enqueue_request API (returning False if request was not scheduled and True if it was) makes sense and it is better than just returning None.

@kmike
Copy link
Member

@kmike kmike commented Nov 27, 2014

The changes look good for me. +1 to merging once we resolve the following:

  1. It'd be good to add a test for this new signal - it should check that the signal works as expected (the existing test is more like a sanity test).
  2. We should check that ExecutionEngine and Scheduler API changes are ok.
@kmike
Copy link
Member

@kmike kmike commented Nov 28, 2014

LGTM.

@pablohoffman
Copy link
Member

@pablohoffman pablohoffman commented Dec 2, 2014

I'm OK with the api changes, @dangra ?

@nramirezuy
Copy link
Contributor

@nramirezuy nramirezuy commented Dec 2, 2014

I wonder which is the use case of this, also #977 doesn't exists here.

@dangra
Copy link
Member

@dangra dangra commented Dec 2, 2014

LGTM

@pablohoffman
Copy link
Member

@pablohoffman pablohoffman commented Dec 2, 2014

@nramirezuy he meant #957

@pablohoffman pablohoffman changed the title An attempt to resolve #977, add signal to be sent when request is dropped by the scheduler An attempt to resolve #957, add signal to be sent when request is dropped by the scheduler Dec 2, 2014
@nramirezuy
Copy link
Contributor

@nramirezuy nramirezuy commented Dec 2, 2014

What do you think about firing request_scheduled when enqueue_request(request) == True?

@ldmberman
Copy link
Contributor Author

@ldmberman ldmberman commented Dec 2, 2014

It seems that the issue author wants to monitor scheduled and dropped
requests both
02.12.2014 19:36 пользователь "Nicolás Alejandro Ramírez Quiros" <
notifications@github.com> написал:

What do you think about firing request_scheduled when enqueue_request(request)
== True?


Reply to this email directly or view it on GitHub
#961 (comment).

pablohoffman added a commit that referenced this pull request Dec 2, 2014
An attempt to resolve #957, add signal to be sent when request is dropped by the scheduler
@pablohoffman pablohoffman merged commit 1b03b12 into scrapy:master Dec 2, 2014
1 check passed
1 check passed
continuous-integration/travis-ci The Travis CI build passed
Details
@jacob1237
Copy link

@jacob1237 jacob1237 commented Dec 3, 2014

Yep, you right @ldmberman.
It will be great to have a possibility to know what requests was scheduled, and what requests was dropped.

So, if authors wants to leave request_scheduled signal as a main signal, I would suggest to create a third signal: request_accepted or request_passed, for example.

And the code will be:

def schedule(self, request, spider):
    self.signals.send_catch_log(signal=signals.request_scheduled,
            request=request, spider=spider)

    if not self.slot.scheduler.enqueue_request(request):
        self.signals.send_catch_log(signal=signals.request_dropped,
                                    request=request, spider=spider)
    else:
        self.signals.send_catch_log(signal=signals.request_passed,
                                    request=request, spider=spider)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

6 participants