-
Notifications
You must be signed in to change notification settings - Fork 10.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Log cipher, certificate and temp key info on establishing an SSL connection #3450
Conversation
Such information is super-useful, but +1 to not log it by default, as it is only needed in very specific cases. It seems you can access settings from HTTP11DownloadHandler and HTTP10DownloadHandler, and pass arguments to DOWNLOADER_CLIENTCONTEXTFACTORY, which can pass them to ScrapyClientTLSOptions. It can be also a good chance to introduce from_settings/from_crawler support for more Scrapy components, instead of passing just specific option values - basically, create objects using |
Codecov Report
@@ Coverage Diff @@
## master #3450 +/- ##
==========================================
+ Coverage 85.39% 85.45% +0.06%
==========================================
Files 169 165 -4
Lines 9687 9624 -63
Branches 1445 1446 +1
==========================================
- Hits 8272 8224 -48
+ Misses 1166 1146 -20
- Partials 249 254 +5
|
@wRAR tests are failing for pypy/py27/jessie, and it looks like a genuine failure - could you please take a look? Alternatively, we may say this feature is for the next Scrapy release, which doesn't support Python 2.7 (and check that this PR works with the new baseline set of dependencies). |
@wRAR can you please also add a test for the new option? I'm not sure it should be super-detailed and cover all code paths for all OpenSSL versions, but checking that after enabling this option crawling still works, and there are some messages which make sense, could be nice. |
super(ScrapyClientContextFactory, self).__init__(*args, **kwargs) | ||
self._ssl_method = method | ||
if settings: | ||
self.tls_verbose_logging = settings['DOWNLOADER_CLIENT_TLS_VERBOSE_LOGGING'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.tls_verbose_logging = settings['DOWNLOADER_CLIENT_TLS_VERBOSE_LOGGING'] | |
self.tls_verbose_logging = settings.getbool('DOWNLOADER_CLIENT_TLS_VERBOSE_LOGGING') |
Without it, -s DOWNLOADER_CLIENT_TLS_VERBOSE_LOGGING=0
in command-line may be evaluated as True, because self.tls_verbose_logging will be '0'.
@@ -28,9 +29,17 @@ class ScrapyClientContextFactory(BrowserLikePolicyForHTTPS): | |||
understand the SSLv3, TLSv1, TLSv1.1 and TLSv1.2 protocols.' | |||
""" | |||
|
|||
def __init__(self, method=SSL.SSLv23_METHOD, *args, **kwargs): | |||
def __init__(self, method=SSL.SSLv23_METHOD, settings=None, *args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a common practice is to pass tls_verbose_logging, not Settings instance, and extract option value in from_settings / from_crawler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with changing this, though I don't like that we need to list all optional args in the documentation and the error message, as I was going to pass yet another setting here in a different PR, maybe we can rephrase the messages.
|
||
@classmethod | ||
def from_settings(cls, settings, method=SSL.SSLv23_METHOD, *args, **kwargs): | ||
return cls(method=method, settings=settings, *args, **kwargs) | ||
if settings: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think settings should always be passed here
Thanks @wRAR for the fix and @Gallaecio for the review! |
Fix #2111. Also related to #2726, though that ticket most likely asks for a programmatical access to all of this (I don't know if it's easily doable).
Output example:
I'm not sure this should be enabled by default as a lot of websites are now HTTPS and this will be printed on all requests including redirects, but I don't know how to access settings in this class.
Note that there is a lot of direct FFI code to access the temporary key params, I hope it works correctly and doesn't cause memleaks. It also requires OpenSSL 1.0.2, but I couldn't test the check with an older OpenSSL, as I couldn't run tests on an actual jessie system.
There is other info that can be added, look at the
Connection
methods (easy) and at theopenssl s_client
output (may be harder, like with the temp key info).