Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log cipher, certificate and temp key info on establishing an SSL connection #3450

Merged
merged 9 commits into from
Jul 23, 2019

Conversation

wRAR
Copy link
Member

@wRAR wRAR commented Oct 5, 2018

Fix #2111. Also related to #2726, though that ticket most likely asks for a programmatical access to all of this (I don't know if it's easily doable).

Output example:

2018-10-05 20:08:36 [scrapy.core.downloader.tls] DEBUG: SSL connection to market.yandex.ru using protocol TLSv1.2, cipher ECDHE-RSA-AES128-GCM-SHA256
2018-10-05 20:08:36 [scrapy.core.downloader.tls] DEBUG: SSL connection certificate: issuer "/C=RU/O=Yandex LLC/OU=Yandex Certification Authority/CN=Yandex CA", subject "/C=RU/O=Yandex LLC/OU=ITO/L=Moscow/ST=Russian Federation/CN=market.yandex.ru"
2018-10-05 20:08:36 [scrapy.core.downloader.tls] DEBUG: SSL temp key: ECDH, P-256, 256 bits

I'm not sure this should be enabled by default as a lot of websites are now HTTPS and this will be printed on all requests including redirects, but I don't know how to access settings in this class.

Note that there is a lot of direct FFI code to access the temporary key params, I hope it works correctly and doesn't cause memleaks. It also requires OpenSSL 1.0.2, but I couldn't test the check with an older OpenSSL, as I couldn't run tests on an actual jessie system.

There is other info that can be added, look at the Connection methods (easy) and at the openssl s_client output (may be harder, like with the temp key info).

scrapy/core/downloader/tls.py Outdated Show resolved Hide resolved
@kmike
Copy link
Member

kmike commented Oct 5, 2018

I'm not sure this should be enabled by default as a lot of websites are now HTTPS and this will be printed on all requests including redirects, but I don't know how to access settings in this class.

Such information is super-useful, but +1 to not log it by default, as it is only needed in very specific cases.

It seems you can access settings from HTTP11DownloadHandler and HTTP10DownloadHandler, and pass arguments to DOWNLOADER_CLIENTCONTEXTFACTORY, which can pass them to ScrapyClientTLSOptions.

It can be also a good chance to introduce from_settings/from_crawler support for more Scrapy components, instead of passing just specific option values - basically, create objects using scrapy.utils.misc.create_instance instead of creating them directly.

@codecov
Copy link

codecov bot commented Oct 8, 2018

Codecov Report

Merging #3450 into master will increase coverage by 0.06%.
The diff coverage is 69.69%.

@@            Coverage Diff             @@
##           master    #3450      +/-   ##
==========================================
+ Coverage   85.39%   85.45%   +0.06%     
==========================================
  Files         169      165       -4     
  Lines        9687     9624      -63     
  Branches     1445     1446       +1     
==========================================
- Hits         8272     8224      -48     
+ Misses       1166     1146      -20     
- Partials      249      254       +5
Impacted Files Coverage Δ
scrapy/core/downloader/handlers/http10.py 100% <100%> (ø) ⬆️
scrapy/settings/default_settings.py 98.67% <100%> (+0.01%) ⬆️
scrapy/utils/ssl.py 57.14% <57.14%> (ø)
scrapy/core/downloader/handlers/http11.py 92.53% <66.66%> (ø) ⬆️
scrapy/core/downloader/contextfactory.py 71.05% <77.77%> (+0.08%) ⬆️
scrapy/core/downloader/tls.py 88.88% <85.71%> (-1.12%) ⬇️
scrapy/utils/decorators.py 80% <0%> (-8%) ⬇️
scrapy/crawler.py 89.94% <0%> (-2.46%) ⬇️
scrapy/pipelines/files.py 66.53% <0%> (-1.71%) ⬇️
scrapy/extensions/feedexport.py 83.48% <0%> (-1.42%) ⬇️
... and 28 more

docs/topics/settings.rst Outdated Show resolved Hide resolved
docs/topics/settings.rst Outdated Show resolved Hide resolved
@kmike
Copy link
Member

kmike commented Jul 10, 2019

@wRAR tests are failing for pypy/py27/jessie, and it looks like a genuine failure - could you please take a look? Alternatively, we may say this feature is for the next Scrapy release, which doesn't support Python 2.7 (and check that this PR works with the new baseline set of dependencies).

@kmike
Copy link
Member

kmike commented Jul 11, 2019

@wRAR can you please also add a test for the new option? I'm not sure it should be super-detailed and cover all code paths for all OpenSSL versions, but checking that after enabling this option crawling still works, and there are some messages which make sense, could be nice.

super(ScrapyClientContextFactory, self).__init__(*args, **kwargs)
self._ssl_method = method
if settings:
self.tls_verbose_logging = settings['DOWNLOADER_CLIENT_TLS_VERBOSE_LOGGING']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.tls_verbose_logging = settings['DOWNLOADER_CLIENT_TLS_VERBOSE_LOGGING']
self.tls_verbose_logging = settings.getbool('DOWNLOADER_CLIENT_TLS_VERBOSE_LOGGING')

Without it, -s DOWNLOADER_CLIENT_TLS_VERBOSE_LOGGING=0 in command-line may be evaluated as True, because self.tls_verbose_logging will be '0'.

@@ -28,9 +29,17 @@ class ScrapyClientContextFactory(BrowserLikePolicyForHTTPS):
understand the SSLv3, TLSv1, TLSv1.1 and TLSv1.2 protocols.'
"""

def __init__(self, method=SSL.SSLv23_METHOD, *args, **kwargs):
def __init__(self, method=SSL.SSLv23_METHOD, settings=None, *args, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a common practice is to pass tls_verbose_logging, not Settings instance, and extract option value in from_settings / from_crawler.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with changing this, though I don't like that we need to list all optional args in the documentation and the error message, as I was going to pass yet another setting here in a different PR, maybe we can rephrase the messages.


@classmethod
def from_settings(cls, settings, method=SSL.SSLv23_METHOD, *args, **kwargs):
return cls(method=method, settings=settings, *args, **kwargs)
if settings:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think settings should always be passed here

@kmike
Copy link
Member

kmike commented Jul 23, 2019

Thanks @wRAR for the fix and @Gallaecio for the review!

@kmike kmike merged commit 9c514b9 into scrapy:master Jul 23, 2019
@wRAR wRAR deleted the tls-logging branch May 8, 2023 10:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add option to Log actual TLS connection options/Cipher Suite on 1st connection to a domain
3 participants