Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MGR+1] Change, document `LOG_UNSERIALIZABLE_REQUESTS` #1610

Merged
merged 4 commits into from Jul 29, 2016

Conversation

@darshanime
Copy link
Contributor

@darshanime darshanime commented Nov 21, 2015

Changed LOG_UNSERIALIZABLE_REQUESTS to SCHEDULER_DEBUG. Document it in the docs. Changed behavior to log once and collect stats.
Fixes #1592

@codecov-io
Copy link

@codecov-io codecov-io commented Nov 21, 2015

Current coverage is 83.44% (diff: 28.57%)

Merging #1610 into master will decrease coverage by 0.02%

Powered by Codecov. Last update ec1c615...d8e62e6

@darshanime darshanime changed the title Change, document `LOG_UNSERIALIZABLE_REQUESTS` [WIP] Change, document `LOG_UNSERIALIZABLE_REQUESTS` Nov 21, 2015
@darshanime
Copy link
Contributor Author

@darshanime darshanime commented Nov 21, 2015

Kindly review @curita, @nramirezuy.
The codecov is complaining because I need to write tests yet, correct ?
Where should they go ? (I am thinking about test.util_reqser.py)

@@ -28,7 +28,7 @@ def from_crawler(cls, crawler):
dupefilter = dupefilter_cls.from_settings(settings)
dqclass = load_object(settings['SCHEDULER_DISK_QUEUE'])
mqclass = load_object(settings['SCHEDULER_MEMORY_QUEUE'])
logunser = settings.getbool('LOG_UNSERIALIZABLE_REQUESTS')

This comment has been minimized.

@nramirezuy

nramirezuy Nov 23, 2015
Contributor

We never completely remove older cases. This settings should still remain in the code with a deprecation warning.

This comment has been minimized.

@darshanime

darshanime Nov 25, 2015
Author Contributor

Made the change in the latest commit

@darshanime
Copy link
Contributor Author

@darshanime darshanime commented Feb 8, 2016

@nramirezuy, can you please review the PR? Thanks.

@lopuhin
Copy link
Member

@lopuhin lopuhin commented May 23, 2016

Anything blocking the merge here?
I have no opinion on the best name for the setting, but at least mentioning the setting in the docs could save a lot of debugging time (as the serialization error could be raised in several different places).

@@ -96,4 +96,6 @@ But this will::
somearg = response.meta['somearg']
print "the argument passed is:", somearg

If you wish to log the requests that couldn't be serialized, you can set the ``SCHEDULER_DEBUG`` setting to ``True`` in the project's settings page. It is ``False`` by default.

This comment has been minimized.

@redapple

redapple Jul 25, 2016
Contributor

Could you wrap the line around 80 chars?

This comment has been minimized.

@darshanime

darshanime Jul 25, 2016
Author Contributor

okay. Also, I'll pull in the latest code and merge the conflict. Is there anything else?

This comment has been minimized.

@darshanime

darshanime Jul 25, 2016
Author Contributor

okay. Also, I'll pull in the latest code and resolve the conflict. Is there anything else?


Default: ``False``

Setting to ``True`` will log the first unserializable request encountered.

This comment has been minimized.

@redapple

redapple Jul 25, 2016
Contributor

SCHEDULER_DEBUG reads like "enable to help debugging the scheduler component."
Wording it as "it's only to log unserializable requests" feels a bit limiting.

What about something along the lines of

enable this setting to log debug information about Scrapy requests scheduler.
This currently logs if requests cannot be serialized to disk. If this happens, a line will appear in the logs (only once), and a scheduler/unserializable stats counter tracks how many times this happened.

Including an example log line would also be good.

@darshanime darshanime force-pushed the darshanime:scheduler_debug branch from a7ffc5b to 0c77b6d Jul 25, 2016
@darshanime
Copy link
Contributor Author

@darshanime darshanime commented Jul 25, 2016

@redapple kindly review!
Travis builds, codecov wants tests:/

Default: ``False``

Setting to ``True`` will log debug information about the requests scheduler.
This currently logs(only once) if the requests cannot be serialized to disk.

This comment has been minimized.

@redapple

redapple Jul 25, 2016
Contributor

missing space: logs (only once)

{'request': request, 'reason': e},
msg = ("Unable to serialize request: %(request)s - reason:"
" %(reason)s - no more unserializable requests will be"
" logged (stats being collected)")

This comment has been minimized.

@redapple

redapple Jul 25, 2016
Contributor

(stats being collected) could be changed to something like (see 'scheduler/unserializable' stats counter)

@@ -96,4 +96,8 @@ But this will::
somearg = response.meta['somearg']
print "the argument passed is:", somearg

If you wish to log the requests that couldn't be serialized, you can set the
``SCHEDULER_DEBUG`` setting to ``True`` in the project's settings page.

This comment has been minimized.

@redapple

redapple Jul 25, 2016
Contributor

You can add a direct link to the setting:

you can set the :setting:`SCHEDULER_DEBUG` to ...

Example entry in logs::

1956-01-31 00:00:00+0800 [scrapy] ERROR: Unable to serialize request: <request>

This comment has been minimized.

@redapple

redapple Jul 25, 2016
Contributor

Can you use complete log line with request and exception, obfuscating domains if needed?

@redapple
Copy link
Contributor

@redapple redapple commented Jul 26, 2016

LGTM

@darshanime darshanime changed the title [WIP] Change, document `LOG_UNSERIALIZABLE_REQUESTS` [MGR] Change, document `LOG_UNSERIALIZABLE_REQUESTS` Jul 29, 2016
@darshanime
Copy link
Contributor Author

@darshanime darshanime commented Jul 29, 2016

@redapple, can we merge this?

@redapple redapple changed the title [MGR] Change, document `LOG_UNSERIALIZABLE_REQUESTS` [MGR+1] Change, document `LOG_UNSERIALIZABLE_REQUESTS` Jul 29, 2016
@redapple
Copy link
Contributor

@redapple redapple commented Jul 29, 2016

Fine by me. @kmike , @eliasdorneles , @dangra , what do you think?

@dangra
Copy link
Member

@dangra dangra commented Jul 29, 2016

LGTM.

@eliasdorneles
Copy link
Member

@eliasdorneles eliasdorneles commented Jul 29, 2016

Looks good, merging -- thanks @darshanime !

@eliasdorneles eliasdorneles merged commit 34e7dad into scrapy:master Jul 29, 2016
1 of 2 checks passed
1 of 2 checks passed
codecov/patch 28.57% of diff hit (target 100%)
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@darshanime darshanime deleted the darshanime:scheduler_debug branch Jul 29, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

7 participants
You can’t perform that action at this time.