Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PY3: use MutableMapping for python 3 #810

Merged
merged 1 commit into from Jul 24, 2014
Merged

Conversation

@felixonmars
Copy link
Contributor

@felixonmars felixonmars commented Jul 21, 2014

While Python 2.7 also has collections.MutableMapping, it's not compatible with current code (raises some metaclass errors).

This PR works in both Python 2.7 & 3.4 (3.3 not tested, but should work too).

@dangra
Copy link
Member

@dangra dangra commented Jul 21, 2014

so, in py3.4 it pass the "metaclass" associated tests that fails on py2.7 when using MutableMapping, right?

side note about DictItem, I don't remember why we need a mutable dict as class attribute, it looks wrong to me and the cause for a call to cls.fields.copy() in ItemMeta

@felixonmars
Copy link
Contributor Author

@felixonmars felixonmars commented Jul 21, 2014

Yes, in 3.4 it works fine, and I also see other projects using this import routine to gain compatibility across Python 2/3.

@kmike
Copy link
Member

@kmike kmike commented Jul 21, 2014

What about UserDict methods that are not implemented in MutableMapping? See http://stackoverflow.com/questions/11165188/how-to-achieve-the-functionality-of-userdict-dictmixin-in-python-3

By the way, it was me who broke Travis with scrapy/w3lib#23, disregard the failure.

@felixonmars
Copy link
Contributor Author

@felixonmars felixonmars commented Jul 22, 2014

Hrm, I see.

I didn't find any clue that the two affected methods were used anywhere in scrapy itself, so this looks like breaking backward compatibility for someone else when he migrates to Python 3. I would suggest to keep it this way for now, and if people are complaining about this, we can append the two methods directly to DictItem.

@kmike
Copy link
Member

@kmike kmike commented Jul 22, 2014

I'm fine with that.

@dangra
Copy link
Member

@dangra dangra commented Jul 24, 2014

I'm not convinced about using different base classes

the following is debug information about running tests with MutableMapping in Python2

$ tox -e py27 scrapy/tests/test_item.py 
GLOB sdist-make: /home/daniel/src/scrapy/setup.py
py27 inst-nodeps: /home/daniel/src/scrapy/.tox/dist/Scrapy-0.25.1.zip
py27 runtests: PYTHONHASHSEED='1543480270'
py27 runtests: commands[0] | py.test --twisted scrapy/tests/test_item.py
Traceback (most recent call last):
  File ".tox/py27/bin/py.test", line 9, in <module>
    load_entry_point('pytest==2.5.2', 'console_scripts', 'py.test')()
  File "/home/daniel/src/scrapy/.tox/py27/lib/python2.7/site-packages/_pytest/config.py", line 19, in main
    config = _prepareconfig(args, plugins)
  File "/home/daniel/src/scrapy/.tox/py27/lib/python2.7/site-packages/_pytest/config.py", line 63, in _prepareconfig
    pluginmanager=pluginmanager, args=args)
  File "/home/daniel/src/scrapy/.tox/py27/lib/python2.7/site-packages/_pytest/core.py", line 377, in __call__
    return self._docall(methods, kwargs)
  File "/home/daniel/src/scrapy/.tox/py27/lib/python2.7/site-packages/_pytest/core.py", line 388, in _docall
    res = mc.execute()
  File "/home/daniel/src/scrapy/.tox/py27/lib/python2.7/site-packages/_pytest/core.py", line 289, in execute
    res = method(**kwargs)
  File "/home/daniel/src/scrapy/.tox/py27/lib/python2.7/site-packages/_pytest/helpconfig.py", line 27, in pytest_cmdline_parse
    config = __multicall__.execute()
  File "/home/daniel/src/scrapy/.tox/py27/lib/python2.7/site-packages/_pytest/core.py", line 289, in execute
    res = method(**kwargs)
  File "/home/daniel/src/scrapy/.tox/py27/lib/python2.7/site-packages/_pytest/config.py", line 618, in pytest_cmdline_parse
    self.parse(args)
  File "/home/daniel/src/scrapy/.tox/py27/lib/python2.7/site-packages/_pytest/config.py", line 711, in parse
    self._preparse(args)
  File "/home/daniel/src/scrapy/.tox/py27/lib/python2.7/site-packages/_pytest/config.py", line 691, in _preparse
    args=args, parser=self._parser)
  File "/home/daniel/src/scrapy/.tox/py27/lib/python2.7/site-packages/_pytest/core.py", line 377, in __call__
    return self._docall(methods, kwargs)
  File "/home/daniel/src/scrapy/.tox/py27/lib/python2.7/site-packages/_pytest/core.py", line 388, in _docall
    res = mc.execute()
  File "/home/daniel/src/scrapy/.tox/py27/lib/python2.7/site-packages/_pytest/core.py", line 289, in execute
    res = method(**kwargs)
  File "/home/daniel/src/scrapy/.tox/py27/lib/python2.7/site-packages/_pytest/capture.py", line 83, in pytest_load_initial_conftests
    return __multicall__.execute()
  File "/home/daniel/src/scrapy/.tox/py27/lib/python2.7/site-packages/_pytest/core.py", line 289, in execute
    res = method(**kwargs)
  File "/home/daniel/src/scrapy/.tox/py27/lib/python2.7/site-packages/_pytest/config.py", line 674, in pytest_load_initial_conftests
    self._conftest.setinitial(args)
  File "/home/daniel/src/scrapy/.tox/py27/lib/python2.7/site-packages/_pytest/config.py", line 487, in setinitial
    self._try_load_conftest(anchor)
  File "/home/daniel/src/scrapy/.tox/py27/lib/python2.7/site-packages/_pytest/config.py", line 493, in _try_load_conftest
    self._path2confmods[None] = self.getconftestmodules(anchor)
  File "/home/daniel/src/scrapy/.tox/py27/lib/python2.7/site-packages/_pytest/config.py", line 512, in getconftestmodules
    clist.append(self.importconftest(conftestpath))
  File "/home/daniel/src/scrapy/.tox/py27/lib/python2.7/site-packages/_pytest/config.py", line 538, in importconftest
    self._conftestpath2mod[conftestpath] = mod = conftestpath.pyimport()
  File "/home/daniel/src/scrapy/.tox/py27/lib/python2.7/site-packages/py/_path/local.py", line 608, in pyimport
    __import__(pkgpath.basename)
  File "/home/daniel/src/scrapy/scrapy/__init__.py", line 61, in <module>
    from scrapy.selector import Selector
  File "/home/daniel/src/scrapy/scrapy/selector/__init__.py", line 4, in <module>
    from scrapy.selector.unified import *
  File "/home/daniel/src/scrapy/scrapy/selector/unified.py", line 7, in <module>
    from scrapy.utils.misc import extract_regex
  File "/home/daniel/src/scrapy/scrapy/utils/misc.py", line 11, in <module>
    from scrapy.item import BaseItem
  File "/home/daniel/src/scrapy/scrapy/item.py", line 85, in <module>
    class Item(DictItem):
  File "/home/daniel/src/scrapy/scrapy/item.py", line 35, in __new__
    cls = super(ItemMeta, mcs).__new__(mcs, class_name, bases, new_attrs)
TypeError: Error when calling the metaclass bases
    metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases
ERROR: InvocationError: '/home/daniel/src/scrapy/.tox/py27/bin/py.test --twisted scrapy/tests/test_item.py'
_______________________________________________________________________________________________ summary _______________________________________________________________________________________________
ERROR:   py27: commands failed
@dangra
Copy link
Member

@dangra dangra commented Jul 24, 2014

I think this PR doesn't add any value if we are only merging an optional import for python3 that actually doesn't pass scrapy/tests/test_items.py, it only makes scrapy.items importable but not functional at all.

Assigning metaclasses in python3 is different than python2 so I think it is not failing for MutableMapping because __metaclass__ atrribute is ignored by Py3.

@felixonmars
Copy link
Contributor Author

@felixonmars felixonmars commented Jul 24, 2014

Hrm, I think I got the point. I'll look into it further.

@dangra
Copy link
Member

@dangra dangra commented Jul 24, 2014

# metatest.py
import six
from collections import MutableMapping

class M(type):
    pass

class A(MutableMapping):
    pass

class B(object):
    pass

class C(six.with_metaclass(M, A, B)):
    pass

C()

in Python3:

$ python --version
Python 3.4.1
$ python metatest.py 
Traceback (most recent call last):
  File "metatest.py", line 13, in <module>
    class C(six.with_metaclass(M, A, B)):
  File "/usr/lib/python3.4/site-packages/six.py", line 706, in __new__
    return meta(name, bases, d)
TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases

in python2:

$ python --version
Python 2.7.8
$ python metatest.py 
Traceback (most recent call last):
  File "metatest.py", line 13, in <module>
    class C(six.with_metaclass(M, A, B)):
  File "/home/daniel/envs/sh/lib/python2.7/site-packages/six.py", line 631, in with_metaclass
    return meta("NewBase", bases, {})
TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases
@felixonmars
Copy link
Contributor Author

@felixonmars felixonmars commented Jul 24, 2014

Hi, I've re-submitted a commit, to actually implement the missing methods from MutableMapping, and make all test cases in test_item passed.

However, since I'm not familiar with this part, I could do this wrong. For example, I noticed that __iter__ and __hash__ would be called before __init__ (seems like because of object_ref.__new__ used it as a key), so I added some workarounds - which may cause the hash to be a fixed value, which doesn't look good...

for i in getattr(self, "_values", ()):
yield i

def __hash__(self):

This comment has been minimized.

@kmike

kmike Jul 24, 2014
Member

Providing __hash__ for mutable objects can lead to strange issues. Why is it needed?

@dangra
Copy link
Member

@dangra dangra commented Jul 24, 2014

I see how __hash__ is needed and can't see a solution yet. Let me think about it but if object_ref is going to bother us we can talk about removing it from BaseItem base classes.

what is calling __iter__ before __init__?

@dangra
Copy link
Member

@dangra dangra commented Jul 24, 2014

@kmike, it is needed because of scrapy.utils.trackref#L30

In our case DictItem.__hash__ should return hash(self._values).

@kmike
Copy link
Member

@kmike kmike commented Jul 24, 2014

Is it for weakrefs? WeakKeyDictionary should use references as keys, not objects directly, so there shouldn't be a requirement for object to be hashable.

@kmike
Copy link
Member

@kmike kmike commented Jul 24, 2014

weakref docs says that "Several built-in types such as list and dict do not directly support weak references but can add support through subclassing". Did it stop working because DictItem stopped being a subclass of dict?

@dangra
Copy link
Member

@dangra dangra commented Jul 24, 2014

DictItem never was a dict subclass

happens that DictMixin doesnt define a __hash__ method so BaseItem.__hash__ is called.

MutableMapping does define a __hash__ attribute set to None than when called raise TypeError and that is enough to declare the instances as not hashable.

we can keep old behavior with:

    def __hash__(self):
        return BaseItem.__hash__(self)
@kmike
Copy link
Member

@kmike kmike commented Jul 24, 2014

ah, right, it was not even a subclass of UserDict (and UserDict is also not a subclass of dict :)

Another way to keep old behaviour: __hash__ = BaseItem.__hash__

@dangra
Copy link
Member

@dangra dangra commented Jul 24, 2014

The following patch pass tests:

$ git diff
diff --git a/scrapy/item.py b/scrapy/item.py
index ec41b21..1f1e8ac 100644
--- a/scrapy/item.py
+++ b/scrapy/item.py
@@ -76,14 +76,10 @@ class DictItem(MutableMapping, BaseItem):
         return len(self._values)

     def __iter__(self):
-        for i in getattr(self, "_values", ()):
-            yield i
+        return iter(self._values)

     def __hash__(self):
-        if hasattr(self, "_values"):
-            return hash(frozenset(self._values.items()))
-        else:
-            return 1
+        return BaseItem.__hash__(self)

     def keys(self):
         return self._values.keys()
diff --git a/tox.ini b/tox.ini
index 2bd9732..002a778 100644
--- a/tox.ini
+++ b/tox.ini
@@ -54,6 +54,10 @@ deps =
     :HPK:pytest>2.5.2
     pytest-twisted

+[testenv:py34]
+basepython = python3.4
+deps = {[testenv:py33]deps}
+
 [testenv:windows]
 commands =
     bin/runtests.bat []
$ tox -e py34 scrapy/tests/test_item.py 
GLOB sdist-make: /home/daniel/src/scrapy/setup.py
py34 create: /home/daniel/src/scrapy/.tox/py34
py34 installdeps: twisted >= 14.0.0, lxml>=3.2.4, pyOpenSSL>=0.13.1, cssselect>=0.9, queuelib>=1.1.1, w3lib>=1.5, mock, :HPK:pytest>2.5.2, pytest-twisted
py34 inst: /home/daniel/src/scrapy/.tox/dist/Scrapy-0.25.1.zip
py34 runtests: PYTHONHASHSEED='1975113793'
py34 runtests: commands[0] | py.test --twisted scrapy/tests/test_item.py
======================================= test session starts =======================================
platform linux -- Python 3.4.1 -- py-1.4.22 -- pytest-2.6.0
plugins: twisted
collected 12 items 

scrapy/tests/test_item.py ............

==================================== 12 passed in 0.11 seconds ====================================
_____________________________________________ summary _____________________________________________
  py34: commands succeeded
  congratulations :)
@dangra
Copy link
Member

@dangra dangra commented Jul 24, 2014

@kmike alternative to __hash__ method is OK for me too.

@dangra
Copy link
Member

@dangra dangra commented Jul 24, 2014

@felixonmars : what do you think if we use six.add_metaclass() instead of six.with_metaclass(). The former doesn't create an intermediate class.

@kmike
Copy link
Member

@kmike kmike commented Jul 24, 2014

@dangra i thought it was fixed in six 1.7 (https://bitbucket.org/gutworth/six/issue/66/replace-the-implementation-of). Not sure why aren't docs on pythonhosted.org up to date (people, use RTFD instead).

@kmike
Copy link
Member

@kmike kmike commented Jul 24, 2014

.. but we require six >= 1.5.2, so it is better to either update six version or to use add_metaclass as Daniel suggested.

@dangra dangra merged commit dbc9b37 into scrapy:master Jul 24, 2014
1 check failed
1 check failed
continuous-integration/travis-ci The Travis CI build failed
Details
@dangra
Copy link
Member

@dangra dangra commented Jul 24, 2014

I merged it with the changes we discussed here.
thanks!

@felixonmars
Copy link
Contributor Author

@felixonmars felixonmars commented Jul 25, 2014

That was great! I just woke up and see these :D

@felixonmars felixonmars deleted the felixonmars:py3-port branch Apr 24, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants