Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with running scrapy spider from script. #2473

Closed
tituskex opened this issue Jan 2, 2017 · 42 comments
Closed

Issue with running scrapy spider from script. #2473

tituskex opened this issue Jan 2, 2017 · 42 comments

Comments

@tituskex
Copy link

tituskex commented Jan 2, 2017

Hi, I'm trying to run scrapy from a script like this:

import scrapy
from scrapy.crawler import CrawlerProcess

class MySpider(scrapy.Spider):
    name = "basic"
    allowed_domains = ["web"]
    start_urls = ['http://www.example.com']

    def parse(self, response):
        l = ItemLoader(item=PropertiesItem(), response = response)
        l.add_xpath('title', '//h1[1]/text()')

        return l.load_item()
process = CrawlerProcess({
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})

process.crawl(MySpider)
process.start()

However, when I run this script I get the following error:

File "/Library/Python/2.7/site-packages/Twisted-16.7.0rc1-py2.7-macosx-10.11-
intel.egg/twisted/internet/_sslverify.py", line 38, in <module>
TLSVersion.TLSv1_1: SSL.OP_NO_TLSv1_1,
AttributeError: 'module' object has no attribute 'OP_NO_TLSv1_1'

Does anyone know how to fix this? Thanks in advance.

@IAlwaysBeCoding
Copy link
Contributor

I would try to downgrade your twisted version from Twisted==16.7.0rc1 to Twisted==16.4.1. I got some weird errors too on the downloader part when I ran my Scrapy spiders with the same version you are running.

2017-01-02 14:25:00 [scrapy] ERROR: Error downloading <GET http://www.citysearch.com/profile/645344264/jackson_ms/wright_patrick_b_md_patrick_b_wright_md.html>
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1297, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/usr/local/lib/python2.7/dist-packages/twisted/python/failure.py", line 393, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/core/downloader/middleware.py", line 43, in process_request
    defer.returnValue((yield download_func(request=request,spider=spider)))
  File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/defer.py", line 45, in mustbe_deferred
    result = f(*args, **kw)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/core/downloader/handlers/__init__.py", line 65, in download_request
    return handler.download_request(request, spider)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/core/downloader/handlers/http11.py", line 60, in download_request
    return agent.download_request(request)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/core/downloader/handlers/http11.py", line 285, in download_request
    method, to_bytes(url, encoding='ascii'), headers, bodyproducer)
  File "/usr/local/lib/python2.7/dist-packages/twisted/web/client.py", line 1631, in request
    parsedURI.originForm)
  File "/usr/local/lib/python2.7/dist-packages/twisted/web/client.py", line 1408, in _requestWithEndpoint
    d = self._pool.getConnection(key, endpoint)
  File "/usr/local/lib/python2.7/dist-packages/twisted/web/client.py", line 1294, in getConnection
    return self._newConnection(key, endpoint)
  File "/usr/local/lib/python2.7/dist-packages/twisted/web/client.py", line 1306, in _newConnection
    return endpoint.connect(factory)
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/endpoints.py", line 779, in connect
    EndpointReceiver, self._hostText, portNumber=self._port
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/_resolver.py", line 174, in resolveHostName
    onAddress = self._simpleResolver.getHostByName(hostName)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/resolver.py", line 21, in getHostByName
    d = super(CachingThreadedResolver, self).getHostByName(name, timeout)
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/base.py", line 276, in getHostByName
    timeoutDelay = sum(timeout)
TypeError: 'float' object is not iterable

After, downgrading to the version I had(Twisted==16.4.1) things went back to working great again.

command: pip install Twisted==16.4.1
If you need sudo access then add it to your command.

@IAlwaysBeCoding
Copy link
Contributor

#2479 This is related to this one as well.

@redapple
Copy link
Contributor

@tituskex , did you manage to make it work?
did downgrading Twisted work?

@pembeci
Copy link

pembeci commented Feb 14, 2017

Downgrading Twisted worked for me too.

@kmike
Copy link
Member

kmike commented Feb 14, 2017

@pembeci what is your Scrapy version?

@pembeci
Copy link

pembeci commented Feb 14, 2017

@kmike The latest from pip install: 1.3.2. I am running on an old machine which is not upgraded for a while: Ubuntu 12.04 LTS - 32 bit
So may be that's why I needed to downgrade Twisted.

@kmike
Copy link
Member

kmike commented Feb 14, 2017

@pembeci what was the exception?
Hm, maybe it is caused by Twisted 17+ dropping pyOpenSSL < 0.16 support.

@rmax
Copy link
Contributor

rmax commented Feb 14, 2017

@pembeci I would recommend to use (mini)conda to have the latest releases without having to upgrade system libraries in old systems.

@wzpan
Copy link

wzpan commented Mar 1, 2017

+1 . Same problem with scrapy (1.3.2) and twisted (17.1.0) .

  File "/Library/Python/2.7/site-packages/twisted/protocols/tls.py", line 63, in <module>
    from twisted.internet._sslverify import _setAcceptableProtocols
  File "/Library/Python/2.7/site-packages/twisted/internet/_sslverify.py", line 38, in <module>
    TLSVersion.TLSv1_1: SSL.OP_NO_TLSv1_1,
AttributeError: 'module' object has no attribute 'OP_NO_TLSv1_1'

@kmike
Copy link
Member

kmike commented Mar 1, 2017

@wzpan what is your pyOpenSSL version?

Twisted dropped support for pyOpenSSL < 16.0.0 in Twisted 16.4.0 release (see http://twistedmatrix.com/trac/ticket/8441); in fact it worked for some time, but they recently removed some of the supporting code as well. Is upgrading it an option? You can check pyOpenSSL version by running python -c 'import OpenSSL; print(OpenSSL.version.__version__)'

@wzpan
Copy link

wzpan commented Mar 2, 2017

@kmike awesome! 👍
My pyOpenSSL version is 0.13.1. After upgrading it to 16.2.0, scrapy works like a charm!

@noprom
Copy link

noprom commented Mar 2, 2017

I run into this problem, too.
Here is my stacktrace:

➜  ~ scrapy shell 'http://jbk.39.net/bw_t1/'
Traceback (most recent call last):
  File "/usr/local/bin/scrapy", line 7, in <module>
    from scrapy.cmdline import execute
  File "/Library/Python/2.7/site-packages/scrapy/cmdline.py", line 9, in <module>
    from scrapy.crawler import CrawlerProcess
  File "/Library/Python/2.7/site-packages/scrapy/crawler.py", line 7, in <module>
    from twisted.internet import reactor, defer
  File "/Library/Python/2.7/site-packages/twisted/internet/reactor.py", line 38, in <module>
    from twisted.internet import default
  File "/Library/Python/2.7/site-packages/twisted/internet/default.py", line 56, in <module>
    install = _getInstallFunction(platform)
  File "/Library/Python/2.7/site-packages/twisted/internet/default.py", line 50, in _getInstallFunction
    from twisted.internet.selectreactor import install
  File "/Library/Python/2.7/site-packages/twisted/internet/selectreactor.py", line 18, in <module>
    from twisted.internet import posixbase
  File "/Library/Python/2.7/site-packages/twisted/internet/posixbase.py", line 18, in <module>
    from twisted.internet import error, udp, tcp
  File "/Library/Python/2.7/site-packages/twisted/internet/tcp.py", line 28, in <module>
    from twisted.internet._newtls import (
  File "/Library/Python/2.7/site-packages/twisted/internet/_newtls.py", line 21, in <module>
    from twisted.protocols.tls import TLSMemoryBIOFactory, TLSMemoryBIOProtocol
  File "/Library/Python/2.7/site-packages/twisted/protocols/tls.py", line 63, in <module>
    from twisted.internet._sslverify import _setAcceptableProtocols
  File "/Library/Python/2.7/site-packages/twisted/internet/_sslverify.py", line 38, in <module>
    TLSVersion.TLSv1_1: SSL.OP_NO_TLSv1_1,
AttributeError: 'module' object has no attribute 'OP_NO_TLSv1_1'

Version:

➜  ~ python --version
Python 2.7.10
➜  ~ pip list | grep Scrapy
DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the [list] section) to disable this warning.
Scrapy (1.2.1)

Any help would be appreciated.

@wzpan
Copy link

wzpan commented Mar 2, 2017

@noprom Try doing these:

pip install --upgrade scrapy
pip install --upgrade twisted
pip install --upgrade pyopenssl

@noprom

This comment has been minimized.

@rmax

This comment has been minimized.

@noprom

This comment has been minimized.

@noprom

This comment has been minimized.

@rapliandras
Copy link

pip install Twisted==16.4.1

also solved mine, but tbh backwards incompatibility is a shame
twisted guys should really get this fixed

@eegilbert
Copy link

I couldn't even run scrapy by itself with out the SSL error until I downgraded Twisted from 17 to 16.4.1 per @rapliandras

@redapple
Copy link
Contributor

For the record, we've released "packaging fix" versions that prevent Twisted>=17 getting installed, because branches 1.0.x, 1.1.x and 1.2.x only support Twisted<=16.6

  • v1.0.7
  • v1.1.4
  • v1.2.3

Master branch (and the recent v1.3.3) are compatible with Twisted 17+

@redapple
Copy link
Contributor

So it seems that latest Twisted does require pyOpenSSL>=0.16, but provided you add the [tls] extra, as-in pip install twisted[tls].
Twisted 15.5 required pyOpenSSL>=0.13, but Twisted 16.6 requires pyOpenSSL>=0.16.
I think Scrapy should add the [tls] extra in its requirements, even if it will show a warning for Twisted<15 (the extra did not exist then). It should not prevent Scrapy from getting installed.

@kmike
Copy link
Member

kmike commented Mar 27, 2017

@redapple I haven't realized it is just a warning, not an error. If adding [tls] still allows to install Twisted then +1 to add it.

@kmike
Copy link
Member

kmike commented Mar 27, 2017

It seems that pip < 6.1.0 raises an error if extra requirement is unknown intead of showing a warning - see pypa/pip#2142. I'm not sure what happens if Twisted < 15.0 is already installed, user has pip < 6.1.0 (e.g. pip 1.5 is still popular), and runs pip install scrapy - does it work?

@redapple
Copy link
Contributor

redapple commented Mar 28, 2017

Good point @kmike . It does not work if one asks for Twisted<15:

$ pip install --upgrade 'pip<6.1.0'
$ pip install 'twisted<15'
$ pip install --upgrade 'twisted[tls]<15'
Successfully installed twisted-14.0.2
$ pip install --upgrade 'twisted[tls]<15'
You are using pip version 6.0.8, however version 9.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Requirement already up-to-date: twisted[tls]<15 in /home/paul/.virtualenvs/piptests/lib/python2.7/site-packages
  Exception:
  Traceback (most recent call last):
    File "/home/paul/.virtualenvs/piptests/local/lib/python2.7/site-packages/pip/basecommand.py", line 232, in main
      status = self.run(options, args)
    File "/home/paul/.virtualenvs/piptests/local/lib/python2.7/site-packages/pip/commands/install.py", line 339, in run
      requirement_set.prepare_files(finder)
    File "/home/paul/.virtualenvs/piptests/local/lib/python2.7/site-packages/pip/req/req_set.py", line 436, in prepare_files
      req_to_install.extras):
    File "/home/paul/.virtualenvs/piptests/local/lib/python2.7/site-packages/pip/_vendor/pkg_resources/__init__.py", line 2504, in requires
      "%s has no such extra feature %r" % (self, ext)
  UnknownExtra: Twisted 14.0.2 has no such extra feature 'tls'

If we consider upgrades to latest Twisted, it works though, because latest Twisted has the extra:

$ pip install --upgrade 'twisted[tls]'
You are using pip version 6.0.8, however version 9.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Collecting twisted[tls] from https://pypi.python.org/packages/d2/5d/ed5071740be94da625535f4333793d6fd238f9012f0fee189d0c5d00bd74/Twisted-17.1.0.tar.bz2#md5=5b4b9ea5a480bec9c1449ffb57b2052a
  Using cached Twisted-17.1.0.tar.bz2
    Installed /tmp/pip-build-RuAoHT/twisted/.eggs/incremental-16.10.1-py2.7.egg
Requirement already up-to-date: zope.interface>=3.6.0 in /home/paul/.virtualenvs/piptests/lib/python2.7/site-packages (from twisted[tls])
Collecting constantly>=15.1 (from twisted[tls])
  Using cached constantly-15.1.0-py2.py3-none-any.whl
Collecting incremental>=16.10.1 (from twisted[tls])
  Using cached incremental-16.10.1-py2.py3-none-any.whl
Collecting Automat>=0.3.0 (from twisted[tls])
  Using cached Automat-0.5.0-py2.py3-none-any.whl
Collecting pyopenssl>=16.0.0 (from twisted[tls])
  Using cached pyOpenSSL-16.2.0-py2.py3-none-any.whl
Collecting service-identity (from twisted[tls])
  Using cached service_identity-16.0.0-py2.py3-none-any.whl
Collecting idna>=0.6 (from twisted[tls])
  Using cached idna-2.5-py2.py3-none-any.whl
Requirement already up-to-date: setuptools in /home/paul/.virtualenvs/piptests/lib/python2.7/site-packages (from zope.interface>=3.6.0->twisted[tls])
Requirement already up-to-date: six in /home/paul/.virtualenvs/piptests/lib/python2.7/site-packages (from Automat>=0.3.0->twisted[tls])
Collecting attrs (from Automat>=0.3.0->twisted[tls])
  Using cached attrs-16.3.0-py2.py3-none-any.whl
Collecting cryptography>=1.3.4 (from pyopenssl>=16.0.0->twisted[tls])
  Using cached cryptography-1.8.1.tar.gz
Collecting pyasn1-modules (from service-identity->twisted[tls])
  Using cached pyasn1_modules-0.0.8-py2.py3-none-any.whl
Collecting pyasn1 (from service-identity->twisted[tls])
  Using cached pyasn1-0.2.3-py2.py3-none-any.whl
Requirement already up-to-date: packaging>=16.8 in /home/paul/.virtualenvs/piptests/lib/python2.7/site-packages (from setuptools->zope.interface>=3.6.0->twisted[tls])
Requirement already up-to-date: appdirs>=1.4.0 in /home/paul/.virtualenvs/piptests/lib/python2.7/site-packages (from setuptools->zope.interface>=3.6.0->twisted[tls])
Collecting asn1crypto>=0.21.0 (from cryptography>=1.3.4->pyopenssl>=16.0.0->twisted[tls])
  Using cached asn1crypto-0.22.0-py2.py3-none-any.whl
Collecting enum34 (from cryptography>=1.3.4->pyopenssl>=16.0.0->twisted[tls])
  Using cached enum34-1.1.6-py2-none-any.whl
Collecting ipaddress (from cryptography>=1.3.4->pyopenssl>=16.0.0->twisted[tls])
  Using cached ipaddress-1.0.18-py2-none-any.whl
Collecting cffi>=1.4.1 (from cryptography>=1.3.4->pyopenssl>=16.0.0->twisted[tls])
  Downloading cffi-1.10.0.tar.gz (418kB)
    100% |################################| 421kB 437kB/s 
Requirement already up-to-date: pyparsing in /home/paul/.virtualenvs/piptests/lib/python2.7/site-packages (from packaging>=16.8->setuptools->zope.interface>=3.6.0->twisted[tls])
Collecting pycparser (from cffi>=1.4.1->cryptography>=1.3.4->pyopenssl>=16.0.0->twisted[tls])
  Using cached pycparser-2.17.tar.gz
Installing collected packages: pycparser, cffi, ipaddress, enum34, asn1crypto, pyasn1, pyasn1-modules, cryptography, attrs, idna, service-identity, pyopenssl, Automat, incremental, constantly, twisted
(...)
Successfully installed Automat-0.5.0 asn1crypto-0.22.0 attrs-16.3.0 cffi-1.10.0 constantly-15.1.0 cryptography-1.8.1 enum34-1.1.6 idna-2.5 incremental-16.10.1 ipaddress-1.0.18 pyasn1-0.2.3 pyasn1-modules-0.0.8 pycparser-2.17 pyopenssl-16.2.0 service-identity-16.0.0 twisted-17.1.0

Is it fair to say that installing and upgrading via pip with twisted[tls] in dependencies would work in this case? (assuming Twisted>=15 is available from the package index being used)
I may be missing something.

@kmike
Copy link
Member

kmike commented Mar 28, 2017

I was asking about a different case:

  1. User already has Twisted < 15 installed (e.g. from system packages), but doesn't have Scrapy installed.
  2. Then user runs pip install scrapy, without --upgrade or specifying a version.

It seems it can fail (I've execute this in a clean virtualenv):

> pip install 'pip < 6.1.0'
..snip..
> pip install 'twisted<15'
..snip..
> pip install twisted[tls]
You are using pip version 6.0.8, however version 9.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Requirement already satisfied (use --upgrade to upgrade): twisted[tls] in /Users/kmike/envs/tst-scrapy/lib/python2.7/site-packages
  Exception:
  Traceback (most recent call last):
    File "/Users/kmike/envs/tst-scrapy/lib/python2.7/site-packages/pip/basecommand.py", line 232, in main
      status = self.run(options, args)
    File "/Users/kmike/envs/tst-scrapy/lib/python2.7/site-packages/pip/commands/install.py", line 339, in run
      requirement_set.prepare_files(finder)
    File "/Users/kmike/envs/tst-scrapy/lib/python2.7/site-packages/pip/req/req_set.py", line 436, in prepare_files
      req_to_install.extras):
    File "/Users/kmike/envs/tst-scrapy/lib/python2.7/site-packages/pip/_vendor/pkg_resources/__init__.py", line 2504, in requires
      "%s has no such extra feature %r" % (self, ext)
  UnknownExtra: Twisted 14.0.2 has no such extra feature 'tls'

@redapple
Copy link
Contributor

redapple commented Mar 28, 2017 via email

@kmike
Copy link
Member

kmike commented Mar 28, 2017

For the record, both debian jessie and ubuntu 14.04 use pip 1.5 and twisted < 15.0, so these baslines are affected.

Suggesting pip install -U scrapy is ok, but not always - this will upgrade requirements like pyOpenSSL or cryptography or lxml, and installation could fail (compiling may require too much RAM, or build dependencies may be absent). It may also fail after installation, at runtime - I recall upgrading scrapy this way on Ubuntu 14.04 without using virtualenv (with pip3 install --user); installation was successful, but then cryptography failed to load, seemingly because pyOpenSSL was not able to use OpenSSL version installed on Ubuntu 14.04.

@kmike
Copy link
Member

kmike commented Mar 28, 2017

What do you think about providing scrapy[tls] extra? After bumping requirements to Twisted[tls] >= 15.0 we can make it no-op, and before that users can run pip install scrapy[tls]. I'm not sure it is possible to have the same package both in install_requires and in extra_requires, but with a different version and extras (twisted) - it needs to be checked.

@redapple
Copy link
Contributor

I am not very fond of introducing a "tls" extra at Scrapy level as well, as I think it could be hard to explain that it does not mean TLS support ON or OFF, when to use it etc. It's just a shame we cannot says something like twisted<15,twisted[tls]>=15 in dependencies.

@kmike
Copy link
Member

kmike commented Mar 28, 2017

Fair enough. I'm fine with documenting this in FAQ, or maybe in a new Troubleshooting section in Install docs ("got AttributeError: 'module' object has no attribute 'OP_NO_TLSv1_1' exception? This happens because Twisted dropped support for older pyOpenSSL versions. Either downgrade Twisted to ... or upgrade PyOpenSSL to 0.16+).

@sunilsharma07
Copy link

Have tried to deal with a problem Just follow up step working fine
pip install -U pip
pip install --upgrade scrapy
pip install --upgrade twisted
pip install --upgrade pyopenssl

@redapple
Copy link
Contributor

redapple commented Apr 6, 2017

+1 for a new Troubleshooting section in Install docs. Could be hard to keep updated, but I believe we have some common cases in StackOverflow and here

@jnikolak
Copy link

jnikolak commented Apr 8, 2017

Rhel 7/centos 7 works for me

pip install Twisted==16.4.1
Uninstalling Twisted-17.1.0:
Successfully uninstalled Twisted-17.1.0

@babyegern

This comment has been minimized.

@NatashaTing
Copy link

NatashaTing commented Nov 23, 2017

I'm using
Python 3.6.3 |Anaconda, Inc.|
Scrapy 1.4.0
Twisted 16.4.1 (downgraded from 17.9.0)
OpenSSL 17.4.0

when I run pip install twisted[tls] it shows Requirement already satisfied, but I'm still getting the
AttributeError: 'module' object has no attribute 'OP_NO_TLSv1_1' error when trying to run a spider..Anyone knows what to do?

EDIT: just thought I'd mention that I've also tried putting from OpenSSL import SSL in my main.py file .

@originalix
Copy link

@wzpan Cool! you solved my problem, Thanks

@tokinonagare
Copy link

In scrapy=1.5.0 still exist this problem, need install Twisted==16.4.1

tommy3531 added a commit to tommy3531/PythonDataScience that referenced this issue Oct 11, 2018
Gallaecio added a commit to Gallaecio/scrapy that referenced this issue Dec 3, 2018
Its initial content covers the workaround for scrapy#2473.
victor-torres pushed a commit to victor-torres/scrapy that referenced this issue Dec 27, 2018
@diehummel
Copy link

diehummel commented Jan 31, 2019

Hi,
uninstall scrapy and twisted etc from pip2 and install it with pip3.
It works with twisted 18.9, scrapy 1.6 for me with pip3.6 on centos.
give it a try
you maybe need to adjust the path (enironment) from /usr/bin to /usr/local/bin

whalebot-helmsman pushed a commit to whalebot-helmsman/scrapy that referenced this issue Mar 22, 2019
@Kunal614

This comment has been minimized.

Kunal614 added a commit to Kunal614/scrapy that referenced this issue Feb 28, 2020
@tainangao

This comment has been minimized.

@wRAR
Copy link
Member

wRAR commented Aug 17, 2021

I wonder if it's still needed with modern Scrapy?

@Gallaecio
Copy link
Member

We actually covered this in the documentation as part of #3517

But now I wonder if we should remove that from the documentation now, if this were not needed nowadays.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.