New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with running scrapy spider from script. #2473

Open
tituskex opened this Issue Jan 2, 2017 · 38 comments

Comments

Projects
None yet
@tituskex
Copy link

tituskex commented Jan 2, 2017

Hi, I'm trying to run scrapy from a script like this:

import scrapy
from scrapy.crawler import CrawlerProcess

class MySpider(scrapy.Spider):
    name = "basic"
    allowed_domains = ["web"]
    start_urls = ['http://www.example.com']

    def parse(self, response):
        l = ItemLoader(item=PropertiesItem(), response = response)
        l.add_xpath('title', '//h1[1]/text()')

        return l.load_item()
process = CrawlerProcess({
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})

process.crawl(MySpider)
process.start()

However, when I run this script I get the following error:

File "/Library/Python/2.7/site-packages/Twisted-16.7.0rc1-py2.7-macosx-10.11-
intel.egg/twisted/internet/_sslverify.py", line 38, in <module>
TLSVersion.TLSv1_1: SSL.OP_NO_TLSv1_1,
AttributeError: 'module' object has no attribute 'OP_NO_TLSv1_1'

Does anyone know how to fix this? Thanks in advance.

@IAlwaysBeCoding

This comment has been minimized.

Copy link
Contributor

IAlwaysBeCoding commented Jan 3, 2017

I would try to downgrade your twisted version from Twisted==16.7.0rc1 to Twisted==16.4.1. I got some weird errors too on the downloader part when I ran my Scrapy spiders with the same version you are running.

2017-01-02 14:25:00 [scrapy] ERROR: Error downloading <GET http://www.citysearch.com/profile/645344264/jackson_ms/wright_patrick_b_md_patrick_b_wright_md.html>
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1297, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/usr/local/lib/python2.7/dist-packages/twisted/python/failure.py", line 393, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/core/downloader/middleware.py", line 43, in process_request
    defer.returnValue((yield download_func(request=request,spider=spider)))
  File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/defer.py", line 45, in mustbe_deferred
    result = f(*args, **kw)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/core/downloader/handlers/__init__.py", line 65, in download_request
    return handler.download_request(request, spider)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/core/downloader/handlers/http11.py", line 60, in download_request
    return agent.download_request(request)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/core/downloader/handlers/http11.py", line 285, in download_request
    method, to_bytes(url, encoding='ascii'), headers, bodyproducer)
  File "/usr/local/lib/python2.7/dist-packages/twisted/web/client.py", line 1631, in request
    parsedURI.originForm)
  File "/usr/local/lib/python2.7/dist-packages/twisted/web/client.py", line 1408, in _requestWithEndpoint
    d = self._pool.getConnection(key, endpoint)
  File "/usr/local/lib/python2.7/dist-packages/twisted/web/client.py", line 1294, in getConnection
    return self._newConnection(key, endpoint)
  File "/usr/local/lib/python2.7/dist-packages/twisted/web/client.py", line 1306, in _newConnection
    return endpoint.connect(factory)
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/endpoints.py", line 779, in connect
    EndpointReceiver, self._hostText, portNumber=self._port
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/_resolver.py", line 174, in resolveHostName
    onAddress = self._simpleResolver.getHostByName(hostName)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/resolver.py", line 21, in getHostByName
    d = super(CachingThreadedResolver, self).getHostByName(name, timeout)
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/base.py", line 276, in getHostByName
    timeoutDelay = sum(timeout)
TypeError: 'float' object is not iterable

After, downgrading to the version I had(Twisted==16.4.1) things went back to working great again.

command: pip install Twisted==16.4.1
If you need sudo access then add it to your command.

@IAlwaysBeCoding

This comment has been minimized.

Copy link
Contributor

IAlwaysBeCoding commented Jan 5, 2017

#2479 This is related to this one as well.

@redapple

This comment has been minimized.

Copy link
Contributor

redapple commented Jan 24, 2017

@tituskex , did you manage to make it work?
did downgrading Twisted work?

@pembeci

This comment has been minimized.

Copy link

pembeci commented Feb 14, 2017

Downgrading Twisted worked for me too.

@kmike

This comment has been minimized.

Copy link
Member

kmike commented Feb 14, 2017

@pembeci what is your Scrapy version?

@pembeci

This comment has been minimized.

Copy link

pembeci commented Feb 14, 2017

@kmike The latest from pip install: 1.3.2. I am running on an old machine which is not upgraded for a while: Ubuntu 12.04 LTS - 32 bit
So may be that's why I needed to downgrade Twisted.

@kmike

This comment has been minimized.

Copy link
Member

kmike commented Feb 14, 2017

@pembeci what was the exception?
Hm, maybe it is caused by Twisted 17+ dropping pyOpenSSL < 0.16 support.

@rmax

This comment has been minimized.

Copy link
Contributor

rmax commented Feb 14, 2017

@pembeci I would recommend to use (mini)conda to have the latest releases without having to upgrade system libraries in old systems.

@wzpan

This comment has been minimized.

Copy link

wzpan commented Mar 1, 2017

+1 . Same problem with scrapy (1.3.2) and twisted (17.1.0) .

  File "/Library/Python/2.7/site-packages/twisted/protocols/tls.py", line 63, in <module>
    from twisted.internet._sslverify import _setAcceptableProtocols
  File "/Library/Python/2.7/site-packages/twisted/internet/_sslverify.py", line 38, in <module>
    TLSVersion.TLSv1_1: SSL.OP_NO_TLSv1_1,
AttributeError: 'module' object has no attribute 'OP_NO_TLSv1_1'
@kmike

This comment has been minimized.

Copy link
Member

kmike commented Mar 1, 2017

@wzpan what is your pyOpenSSL version?

Twisted dropped support for pyOpenSSL < 16.0.0 in Twisted 16.4.0 release (see http://twistedmatrix.com/trac/ticket/8441); in fact it worked for some time, but they recently removed some of the supporting code as well. Is upgrading it an option? You can check pyOpenSSL version by running python -c 'import OpenSSL; print(OpenSSL.version.__version__)'

@wzpan

This comment has been minimized.

Copy link

wzpan commented Mar 2, 2017

@kmike awesome! 👍
My pyOpenSSL version is 0.13.1. After upgrading it to 16.2.0, scrapy works like a charm!

@noprom

This comment has been minimized.

Copy link

noprom commented Mar 2, 2017

I run into this problem, too.
Here is my stacktrace:

➜  ~ scrapy shell 'http://jbk.39.net/bw_t1/'
Traceback (most recent call last):
  File "/usr/local/bin/scrapy", line 7, in <module>
    from scrapy.cmdline import execute
  File "/Library/Python/2.7/site-packages/scrapy/cmdline.py", line 9, in <module>
    from scrapy.crawler import CrawlerProcess
  File "/Library/Python/2.7/site-packages/scrapy/crawler.py", line 7, in <module>
    from twisted.internet import reactor, defer
  File "/Library/Python/2.7/site-packages/twisted/internet/reactor.py", line 38, in <module>
    from twisted.internet import default
  File "/Library/Python/2.7/site-packages/twisted/internet/default.py", line 56, in <module>
    install = _getInstallFunction(platform)
  File "/Library/Python/2.7/site-packages/twisted/internet/default.py", line 50, in _getInstallFunction
    from twisted.internet.selectreactor import install
  File "/Library/Python/2.7/site-packages/twisted/internet/selectreactor.py", line 18, in <module>
    from twisted.internet import posixbase
  File "/Library/Python/2.7/site-packages/twisted/internet/posixbase.py", line 18, in <module>
    from twisted.internet import error, udp, tcp
  File "/Library/Python/2.7/site-packages/twisted/internet/tcp.py", line 28, in <module>
    from twisted.internet._newtls import (
  File "/Library/Python/2.7/site-packages/twisted/internet/_newtls.py", line 21, in <module>
    from twisted.protocols.tls import TLSMemoryBIOFactory, TLSMemoryBIOProtocol
  File "/Library/Python/2.7/site-packages/twisted/protocols/tls.py", line 63, in <module>
    from twisted.internet._sslverify import _setAcceptableProtocols
  File "/Library/Python/2.7/site-packages/twisted/internet/_sslverify.py", line 38, in <module>
    TLSVersion.TLSv1_1: SSL.OP_NO_TLSv1_1,
AttributeError: 'module' object has no attribute 'OP_NO_TLSv1_1'

Version:

➜  ~ python --version
Python 2.7.10
➜  ~ pip list | grep Scrapy
DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the [list] section) to disable this warning.
Scrapy (1.2.1)

Any help would be appreciated.

@wzpan

This comment has been minimized.

Copy link

wzpan commented Mar 2, 2017

@noprom Try doing these:

pip install --upgrade scrapy
pip install --upgrade twisted
pip install --upgrade pyopenssl
@noprom

This comment has been minimized.

Copy link

noprom commented Mar 2, 2017

@wzpan
But another problem occurs:

➜  OS scrapy shell 'http://jbk.39.net/bw_t1/'
2017-03-02 20:31:05 [scrapy.utils.log] INFO: Scrapy 1.3.2 started (bot: scrapybot)
2017-03-02 20:31:05 [scrapy.utils.log] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0, 'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter'}
2017-03-02 20:31:05 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.corestats.CoreStats']
2017-03-02 20:31:05 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2017-03-02 20:31:05 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2017-03-02 20:31:05 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2017-03-02 20:31:05 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2017-03-02 20:31:05 [scrapy.core.engine] INFO: Spider opened
2017-03-02 20:31:05 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://jbk.39.net/bw_t1/> (failed 1 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionDone: Connection was closed cleanly.>]
2017-03-02 20:31:05 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://jbk.39.net/bw_t1/> (failed 2 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionDone: Connection was closed cleanly.>]
2017-03-02 20:31:05 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET http://jbk.39.net/bw_t1/> (failed 3 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionDone: Connection was closed cleanly.>]
Traceback (most recent call last):
  File "/usr/local/bin/scrapy", line 11, in <module>
    sys.exit(execute())
  File "/Library/Python/2.7/site-packages/scrapy/cmdline.py", line 142, in execute
    _run_print_help(parser, _run_command, cmd, args, opts)
  File "/Library/Python/2.7/site-packages/scrapy/cmdline.py", line 88, in _run_print_help
    func(*a, **kw)
  File "/Library/Python/2.7/site-packages/scrapy/cmdline.py", line 149, in _run_command
    cmd.run(args, opts)
  File "/Library/Python/2.7/site-packages/scrapy/commands/shell.py", line 73, in run
    shell.start(url=url, redirect=not opts.no_redirect)
  File "/Library/Python/2.7/site-packages/scrapy/shell.py", line 48, in start
    self.fetch(url, spider, redirect=redirect)
  File "/Library/Python/2.7/site-packages/scrapy/shell.py", line 115, in fetch
    reactor, self._schedule, request, spider)
  File "/Library/Python/2.7/site-packages/twisted/internet/threads.py", line 122, in blockingCallFromThread
    result.raiseException()
  File "<string>", line 2, in raiseException
twisted.web._newclient.ResponseNeverReceived: [<twisted.python.failure.Failure twisted.internet.error.ConnectionDone: Connection was closed cleanly.>]

It seems that there's a problem with twisted.

@rmax

This comment has been minimized.

Copy link
Contributor

rmax commented Mar 2, 2017

@noprom The site does not complete the response when you use the default user agent (or the one you are using).

$ scrapy shell 'http://jbk.39.net/bw_t1/' --set USER_AGENT=Mozilla --loglevel INFO
2017-03-02 09:38:49 [scrapy.utils.log] INFO: Scrapy 1.3.2 started (bot: scrapybot)
2017-03-02 09:38:49 [scrapy.utils.log] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0, 'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter', 'USER_AGENT': 'Mozilla', 'LOG_LEVEL': 'INFO'}
2017-03-02 09:38:49 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.corestats.CoreStats']
2017-03-02 09:38:49 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2017-03-02 09:38:49 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2017-03-02 09:38:49 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2017-03-02 09:38:49 [scrapy.core.engine] INFO: Spider opened
2017-03-02 09:38:50 [traitlets] WARNING: Config option `pager` not recognized by `InteractiveShellEmbed`.
[s] Available Scrapy objects:
[s]   scrapy     scrapy module (contains scrapy.Request, scrapy.Selector, etc)
[s]   crawler    <scrapy.crawler.Crawler object at 0x109100d68>
[s]   item       {}
[s]   request    <GET http://jbk.39.net/bw_t1/>
[s]   response   <200 http://jbk.39.net/bw_t1/>
[s]   settings   <scrapy.settings.Settings object at 0x109100eb8>
[s]   spider     <DefaultSpider 'default' at 0x10bf23dd8>
[s] Useful shortcuts:
[s]   fetch(url[, redirect=True]) Fetch URL and update local objects (by default, redirects are followed)
[s]   fetch(req)                  Fetch a scrapy.Request and update local objects
[s]   shelp()           Shell help (print this help)
[s]   view(response)    View response in a browser
In [1]: response.body[:100]
b'\r\n<!doctype html>\r\n<html>\r\n<head>\r\n    <meta http-equiv="Content-Type" content="text/html; charset=g'
@noprom

This comment has been minimized.

Copy link

noprom commented Mar 2, 2017

@rolando
Cool! Thanks a lot.😄

@noprom

This comment has been minimized.

Copy link

noprom commented Mar 2, 2017

@wzpan
Thanks, you solved my problem.

@rapliandras

This comment has been minimized.

Copy link

rapliandras commented Mar 8, 2017

pip install Twisted==16.4.1

also solved mine, but tbh backwards incompatibility is a shame
twisted guys should really get this fixed

@eegilbert

This comment has been minimized.

Copy link

eegilbert commented Mar 13, 2017

I couldn't even run scrapy by itself with out the SSL error until I downgraded Twisted from 17 to 16.4.1 per @rapliandras

@redapple

This comment has been minimized.

Copy link
Contributor

redapple commented Mar 13, 2017

For the record, we've released "packaging fix" versions that prevent Twisted>=17 getting installed, because branches 1.0.x, 1.1.x and 1.2.x only support Twisted<=16.6

  • v1.0.7
  • v1.1.4
  • v1.2.3

Master branch (and the recent v1.3.3) are compatible with Twisted 17+

@redapple

This comment has been minimized.

Copy link
Contributor

redapple commented Mar 27, 2017

So it seems that latest Twisted does require pyOpenSSL>=0.16, but provided you add the [tls] extra, as-in pip install twisted[tls].
Twisted 15.5 required pyOpenSSL>=0.13, but Twisted 16.6 requires pyOpenSSL>=0.16.
I think Scrapy should add the [tls] extra in its requirements, even if it will show a warning for Twisted<15 (the extra did not exist then). It should not prevent Scrapy from getting installed.

@kmike

This comment has been minimized.

Copy link
Member

kmike commented Mar 27, 2017

@redapple I haven't realized it is just a warning, not an error. If adding [tls] still allows to install Twisted then +1 to add it.

@kmike

This comment has been minimized.

Copy link
Member

kmike commented Mar 27, 2017

It seems that pip < 6.1.0 raises an error if extra requirement is unknown intead of showing a warning - see pypa/pip#2142. I'm not sure what happens if Twisted < 15.0 is already installed, user has pip < 6.1.0 (e.g. pip 1.5 is still popular), and runs pip install scrapy - does it work?

@redapple

This comment has been minimized.

Copy link
Contributor

redapple commented Mar 28, 2017

Good point @kmike . It does not work if one asks for Twisted<15:

$ pip install --upgrade 'pip<6.1.0'
$ pip install 'twisted<15'
$ pip install --upgrade 'twisted[tls]<15'
Successfully installed twisted-14.0.2
$ pip install --upgrade 'twisted[tls]<15'
You are using pip version 6.0.8, however version 9.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Requirement already up-to-date: twisted[tls]<15 in /home/paul/.virtualenvs/piptests/lib/python2.7/site-packages
  Exception:
  Traceback (most recent call last):
    File "/home/paul/.virtualenvs/piptests/local/lib/python2.7/site-packages/pip/basecommand.py", line 232, in main
      status = self.run(options, args)
    File "/home/paul/.virtualenvs/piptests/local/lib/python2.7/site-packages/pip/commands/install.py", line 339, in run
      requirement_set.prepare_files(finder)
    File "/home/paul/.virtualenvs/piptests/local/lib/python2.7/site-packages/pip/req/req_set.py", line 436, in prepare_files
      req_to_install.extras):
    File "/home/paul/.virtualenvs/piptests/local/lib/python2.7/site-packages/pip/_vendor/pkg_resources/__init__.py", line 2504, in requires
      "%s has no such extra feature %r" % (self, ext)
  UnknownExtra: Twisted 14.0.2 has no such extra feature 'tls'

If we consider upgrades to latest Twisted, it works though, because latest Twisted has the extra:

$ pip install --upgrade 'twisted[tls]'
You are using pip version 6.0.8, however version 9.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Collecting twisted[tls] from https://pypi.python.org/packages/d2/5d/ed5071740be94da625535f4333793d6fd238f9012f0fee189d0c5d00bd74/Twisted-17.1.0.tar.bz2#md5=5b4b9ea5a480bec9c1449ffb57b2052a
  Using cached Twisted-17.1.0.tar.bz2
    Installed /tmp/pip-build-RuAoHT/twisted/.eggs/incremental-16.10.1-py2.7.egg
Requirement already up-to-date: zope.interface>=3.6.0 in /home/paul/.virtualenvs/piptests/lib/python2.7/site-packages (from twisted[tls])
Collecting constantly>=15.1 (from twisted[tls])
  Using cached constantly-15.1.0-py2.py3-none-any.whl
Collecting incremental>=16.10.1 (from twisted[tls])
  Using cached incremental-16.10.1-py2.py3-none-any.whl
Collecting Automat>=0.3.0 (from twisted[tls])
  Using cached Automat-0.5.0-py2.py3-none-any.whl
Collecting pyopenssl>=16.0.0 (from twisted[tls])
  Using cached pyOpenSSL-16.2.0-py2.py3-none-any.whl
Collecting service-identity (from twisted[tls])
  Using cached service_identity-16.0.0-py2.py3-none-any.whl
Collecting idna>=0.6 (from twisted[tls])
  Using cached idna-2.5-py2.py3-none-any.whl
Requirement already up-to-date: setuptools in /home/paul/.virtualenvs/piptests/lib/python2.7/site-packages (from zope.interface>=3.6.0->twisted[tls])
Requirement already up-to-date: six in /home/paul/.virtualenvs/piptests/lib/python2.7/site-packages (from Automat>=0.3.0->twisted[tls])
Collecting attrs (from Automat>=0.3.0->twisted[tls])
  Using cached attrs-16.3.0-py2.py3-none-any.whl
Collecting cryptography>=1.3.4 (from pyopenssl>=16.0.0->twisted[tls])
  Using cached cryptography-1.8.1.tar.gz
Collecting pyasn1-modules (from service-identity->twisted[tls])
  Using cached pyasn1_modules-0.0.8-py2.py3-none-any.whl
Collecting pyasn1 (from service-identity->twisted[tls])
  Using cached pyasn1-0.2.3-py2.py3-none-any.whl
Requirement already up-to-date: packaging>=16.8 in /home/paul/.virtualenvs/piptests/lib/python2.7/site-packages (from setuptools->zope.interface>=3.6.0->twisted[tls])
Requirement already up-to-date: appdirs>=1.4.0 in /home/paul/.virtualenvs/piptests/lib/python2.7/site-packages (from setuptools->zope.interface>=3.6.0->twisted[tls])
Collecting asn1crypto>=0.21.0 (from cryptography>=1.3.4->pyopenssl>=16.0.0->twisted[tls])
  Using cached asn1crypto-0.22.0-py2.py3-none-any.whl
Collecting enum34 (from cryptography>=1.3.4->pyopenssl>=16.0.0->twisted[tls])
  Using cached enum34-1.1.6-py2-none-any.whl
Collecting ipaddress (from cryptography>=1.3.4->pyopenssl>=16.0.0->twisted[tls])
  Using cached ipaddress-1.0.18-py2-none-any.whl
Collecting cffi>=1.4.1 (from cryptography>=1.3.4->pyopenssl>=16.0.0->twisted[tls])
  Downloading cffi-1.10.0.tar.gz (418kB)
    100% |################################| 421kB 437kB/s 
Requirement already up-to-date: pyparsing in /home/paul/.virtualenvs/piptests/lib/python2.7/site-packages (from packaging>=16.8->setuptools->zope.interface>=3.6.0->twisted[tls])
Collecting pycparser (from cffi>=1.4.1->cryptography>=1.3.4->pyopenssl>=16.0.0->twisted[tls])
  Using cached pycparser-2.17.tar.gz
Installing collected packages: pycparser, cffi, ipaddress, enum34, asn1crypto, pyasn1, pyasn1-modules, cryptography, attrs, idna, service-identity, pyopenssl, Automat, incremental, constantly, twisted
(...)
Successfully installed Automat-0.5.0 asn1crypto-0.22.0 attrs-16.3.0 cffi-1.10.0 constantly-15.1.0 cryptography-1.8.1 enum34-1.1.6 idna-2.5 incremental-16.10.1 ipaddress-1.0.18 pyasn1-0.2.3 pyasn1-modules-0.0.8 pycparser-2.17 pyopenssl-16.2.0 service-identity-16.0.0 twisted-17.1.0

Is it fair to say that installing and upgrading via pip with twisted[tls] in dependencies would work in this case? (assuming Twisted>=15 is available from the package index being used)
I may be missing something.

@kmike

This comment has been minimized.

Copy link
Member

kmike commented Mar 28, 2017

I was asking about a different case:

  1. User already has Twisted < 15 installed (e.g. from system packages), but doesn't have Scrapy installed.
  2. Then user runs pip install scrapy, without --upgrade or specifying a version.

It seems it can fail (I've execute this in a clean virtualenv):

> pip install 'pip < 6.1.0'
..snip..
> pip install 'twisted<15'
..snip..
> pip install twisted[tls]
You are using pip version 6.0.8, however version 9.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Requirement already satisfied (use --upgrade to upgrade): twisted[tls] in /Users/kmike/envs/tst-scrapy/lib/python2.7/site-packages
  Exception:
  Traceback (most recent call last):
    File "/Users/kmike/envs/tst-scrapy/lib/python2.7/site-packages/pip/basecommand.py", line 232, in main
      status = self.run(options, args)
    File "/Users/kmike/envs/tst-scrapy/lib/python2.7/site-packages/pip/commands/install.py", line 339, in run
      requirement_set.prepare_files(finder)
    File "/Users/kmike/envs/tst-scrapy/lib/python2.7/site-packages/pip/req/req_set.py", line 436, in prepare_files
      req_to_install.extras):
    File "/Users/kmike/envs/tst-scrapy/lib/python2.7/site-packages/pip/_vendor/pkg_resources/__init__.py", line 2504, in requires
      "%s has no such extra feature %r" % (self, ext)
  UnknownExtra: Twisted 14.0.2 has no such extra feature 'tls'
@redapple

This comment has been minimized.

Copy link
Contributor

redapple commented Mar 28, 2017

@kmike

This comment has been minimized.

Copy link
Member

kmike commented Mar 28, 2017

For the record, both debian jessie and ubuntu 14.04 use pip 1.5 and twisted < 15.0, so these baslines are affected.

Suggesting pip install -U scrapy is ok, but not always - this will upgrade requirements like pyOpenSSL or cryptography or lxml, and installation could fail (compiling may require too much RAM, or build dependencies may be absent). It may also fail after installation, at runtime - I recall upgrading scrapy this way on Ubuntu 14.04 without using virtualenv (with pip3 install --user); installation was successful, but then cryptography failed to load, seemingly because pyOpenSSL was not able to use OpenSSL version installed on Ubuntu 14.04.

@kmike

This comment has been minimized.

Copy link
Member

kmike commented Mar 28, 2017

What do you think about providing scrapy[tls] extra? After bumping requirements to Twisted[tls] >= 15.0 we can make it no-op, and before that users can run pip install scrapy[tls]. I'm not sure it is possible to have the same package both in install_requires and in extra_requires, but with a different version and extras (twisted) - it needs to be checked.

@redapple

This comment has been minimized.

Copy link
Contributor

redapple commented Mar 28, 2017

I am not very fond of introducing a "tls" extra at Scrapy level as well, as I think it could be hard to explain that it does not mean TLS support ON or OFF, when to use it etc. It's just a shame we cannot says something like twisted<15,twisted[tls]>=15 in dependencies.

@kmike

This comment has been minimized.

Copy link
Member

kmike commented Mar 28, 2017

Fair enough. I'm fine with documenting this in FAQ, or maybe in a new Troubleshooting section in Install docs ("got AttributeError: 'module' object has no attribute 'OP_NO_TLSv1_1' exception? This happens because Twisted dropped support for older pyOpenSSL versions. Either downgrade Twisted to ... or upgrade PyOpenSSL to 0.16+).

@Sunil-Cube

This comment has been minimized.

Copy link

Sunil-Cube commented Apr 6, 2017

Have tried to deal with a problem Just follow up step working fine
pip install -U pip
pip install --upgrade scrapy
pip install --upgrade twisted
pip install --upgrade pyopenssl

@redapple

This comment has been minimized.

Copy link
Contributor

redapple commented Apr 6, 2017

+1 for a new Troubleshooting section in Install docs. Could be hard to keep updated, but I believe we have some common cases in StackOverflow and here

@jnikolak

This comment has been minimized.

Copy link

jnikolak commented Apr 8, 2017

Rhel 7/centos 7 works for me

pip install Twisted==16.4.1
Uninstalling Twisted-17.1.0:
Successfully uninstalled Twisted-17.1.0

@babyegern

This comment has been minimized.

Copy link

babyegern commented Apr 26, 2017

@IAlwaysBeCoding you are a programming god. I just signed up to github, only to give a thumbs up. Your suggestion worked perfectly.

@NatashaTing

This comment has been minimized.

Copy link

NatashaTing commented Nov 23, 2017

I'm using
Python 3.6.3 |Anaconda, Inc.|
Scrapy 1.4.0
Twisted 16.4.1 (downgraded from 17.9.0)
OpenSSL 17.4.0

when I run pip install twisted[tls] it shows Requirement already satisfied, but I'm still getting the
AttributeError: 'module' object has no attribute 'OP_NO_TLSv1_1' error when trying to run a spider..Anyone knows what to do?

EDIT: just thought I'd mention that I've also tried putting from OpenSSL import SSL in my main.py file .

@originalix

This comment has been minimized.

Copy link

originalix commented Jan 23, 2018

@wzpan Cool! you solved my problem, Thanks

@richyen richyen referenced this issue Jan 31, 2018

Open

TLS issues? #6

@tokinonagare

This comment has been minimized.

Copy link

tokinonagare commented Feb 10, 2018

In scrapy=1.5.0 still exist this problem, need install Twisted==16.4.1

tommy3531 added a commit to tommy3531/PythonDataScience that referenced this issue Oct 11, 2018

Gallaecio added a commit to Gallaecio/scrapy that referenced this issue Dec 3, 2018

Add a troubleshooting section to the installation instructions
Its initial content covers the workaround for scrapy#2473.

victor-torres added a commit to victor-torres/scrapy that referenced this issue Dec 27, 2018

@diehummel

This comment has been minimized.

Copy link

diehummel commented Jan 31, 2019

Hi,
uninstall scrapy and twisted etc from pip2 and install it with pip3.
It works with twisted 18.9, scrapy 1.6 for me with pip3.6 on centos.
give it a try
you maybe need to adjust the path (enironment) from /usr/bin to /usr/local/bin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment