Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log cipher, certificate and temp key info on establishing an SSL connection #3450

Merged
merged 9 commits into from Jul 23, 2019

Conversation

@wRAR
Copy link
Contributor

@wRAR wRAR commented Oct 5, 2018

Fix #2111. Also related to #2726, though that ticket most likely asks for a programmatical access to all of this (I don't know if it's easily doable).

Output example:

2018-10-05 20:08:36 [scrapy.core.downloader.tls] DEBUG: SSL connection to market.yandex.ru using protocol TLSv1.2, cipher ECDHE-RSA-AES128-GCM-SHA256
2018-10-05 20:08:36 [scrapy.core.downloader.tls] DEBUG: SSL connection certificate: issuer "/C=RU/O=Yandex LLC/OU=Yandex Certification Authority/CN=Yandex CA", subject "/C=RU/O=Yandex LLC/OU=ITO/L=Moscow/ST=Russian Federation/CN=market.yandex.ru"
2018-10-05 20:08:36 [scrapy.core.downloader.tls] DEBUG: SSL temp key: ECDH, P-256, 256 bits

I'm not sure this should be enabled by default as a lot of websites are now HTTPS and this will be printed on all requests including redirects, but I don't know how to access settings in this class.

Note that there is a lot of direct FFI code to access the temporary key params, I hope it works correctly and doesn't cause memleaks. It also requires OpenSSL 1.0.2, but I couldn't test the check with an older OpenSSL, as I couldn't run tests on an actual jessie system.

There is other info that can be added, look at the Connection methods (easy) and at the openssl s_client output (may be harder, like with the temp key info).

scrapy/core/downloader/tls.py Outdated Show resolved Hide resolved
@kmike
Copy link
Member

@kmike kmike commented Oct 5, 2018

I'm not sure this should be enabled by default as a lot of websites are now HTTPS and this will be printed on all requests including redirects, but I don't know how to access settings in this class.

Such information is super-useful, but +1 to not log it by default, as it is only needed in very specific cases.

It seems you can access settings from HTTP11DownloadHandler and HTTP10DownloadHandler, and pass arguments to DOWNLOADER_CLIENTCONTEXTFACTORY, which can pass them to ScrapyClientTLSOptions.

It can be also a good chance to introduce from_settings/from_crawler support for more Scrapy components, instead of passing just specific option values - basically, create objects using scrapy.utils.misc.create_instance instead of creating them directly.

@codecov
Copy link

@codecov codecov bot commented Oct 8, 2018

Codecov Report

Merging #3450 into master will increase coverage by 0.06%.
The diff coverage is 69.69%.

@@            Coverage Diff             @@
##           master    #3450      +/-   ##
==========================================
+ Coverage   85.39%   85.45%   +0.06%     
==========================================
  Files         169      165       -4     
  Lines        9687     9624      -63     
  Branches     1445     1446       +1     
==========================================
- Hits         8272     8224      -48     
+ Misses       1166     1146      -20     
- Partials      249      254       +5
Impacted Files Coverage Δ
scrapy/core/downloader/handlers/http10.py 100% <100%> (ø) ⬆️
scrapy/settings/default_settings.py 98.67% <100%> (+0.01%) ⬆️
scrapy/utils/ssl.py 57.14% <57.14%> (ø)
scrapy/core/downloader/handlers/http11.py 92.53% <66.66%> (ø) ⬆️
scrapy/core/downloader/contextfactory.py 71.05% <77.77%> (+0.08%) ⬆️
scrapy/core/downloader/tls.py 88.88% <85.71%> (-1.12%) ⬇️
scrapy/utils/decorators.py 80% <0%> (-8%) ⬇️
scrapy/crawler.py 89.94% <0%> (-2.46%) ⬇️
scrapy/pipelines/files.py 66.53% <0%> (-1.71%) ⬇️
scrapy/extensions/feedexport.py 83.48% <0%> (-1.42%) ⬇️
... and 28 more

docs/topics/settings.rst Outdated Show resolved Hide resolved
docs/topics/settings.rst Outdated Show resolved Hide resolved
@kmike
Copy link
Member

@kmike kmike commented Jul 10, 2019

@wRAR tests are failing for pypy/py27/jessie, and it looks like a genuine failure - could you please take a look? Alternatively, we may say this feature is for the next Scrapy release, which doesn't support Python 2.7 (and check that this PR works with the new baseline set of dependencies).

@kmike
Copy link
Member

@kmike kmike commented Jul 11, 2019

@wRAR can you please also add a test for the new option? I'm not sure it should be super-detailed and cover all code paths for all OpenSSL versions, but checking that after enabling this option crawling still works, and there are some messages which make sense, could be nice.

super(ScrapyClientContextFactory, self).__init__(*args, **kwargs)
self._ssl_method = method
if settings:
self.tls_verbose_logging = settings['DOWNLOADER_CLIENT_TLS_VERBOSE_LOGGING']
Copy link
Member

@kmike kmike Jul 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.tls_verbose_logging = settings['DOWNLOADER_CLIENT_TLS_VERBOSE_LOGGING']
self.tls_verbose_logging = settings.getbool('DOWNLOADER_CLIENT_TLS_VERBOSE_LOGGING')

Without it, -s DOWNLOADER_CLIENT_TLS_VERBOSE_LOGGING=0 in command-line may be evaluated as True, because self.tls_verbose_logging will be '0'.

@@ -28,9 +29,17 @@ class ScrapyClientContextFactory(BrowserLikePolicyForHTTPS):
understand the SSLv3, TLSv1, TLSv1.1 and TLSv1.2 protocols.'
"""

def __init__(self, method=SSL.SSLv23_METHOD, *args, **kwargs):
def __init__(self, method=SSL.SSLv23_METHOD, settings=None, *args, **kwargs):
Copy link
Member

@kmike kmike Jul 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a common practice is to pass tls_verbose_logging, not Settings instance, and extract option value in from_settings / from_crawler.

Copy link
Contributor Author

@wRAR wRAR Jul 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with changing this, though I don't like that we need to list all optional args in the documentation and the error message, as I was going to pass yet another setting here in a different PR, maybe we can rephrase the messages.


@classmethod
def from_settings(cls, settings, method=SSL.SSLv23_METHOD, *args, **kwargs):
return cls(method=method, settings=settings, *args, **kwargs)
if settings:
Copy link
Member

@kmike kmike Jul 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think settings should always be passed here

@kmike
Copy link
Member

@kmike kmike commented Jul 23, 2019

Thanks @wRAR for the fix and @Gallaecio for the review!

@kmike kmike merged commit 9c514b9 into scrapy:master Jul 23, 2019
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

3 participants