Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Refactor test_hashing.py as per pytest design. #454

Merged
merged 17 commits into from
Dec 13, 2016

Conversation

kdexd
Copy link

@kdexd kdexd commented Dec 12, 2016

Third Phase PR on #411 (Succeeding PR #453)

This PR deals with refactor of tests in test_hashing.py to adopt the pytest flavor.

  • Parametrization of certain tests have been done by an accompanying fixture to protect numpy specific code using @with_numpy. Refer Move to py.test? #411 (comment) for more details.

  • Regex checking of tests has been done using pytest now.

  • tmpdir_path fixture has been removed and pytest's own tmpdir is brought back to used to keep uniformity across modules. Some older occurences (if any) will be taken care later.

  • Some cosmetic changes like reordering of imports has been included.

@kdexd
Copy link
Author

kdexd commented Dec 12, 2016

@lesteve I had kept this ready for you today ! Please review as per convenience 😄

@lesteve
Copy link
Member

lesteve commented Dec 12, 2016

The travis error indicates that with_numpy does not seem to work as expected, not sure why.

@lesteve
Copy link
Member

lesteve commented Dec 12, 2016

Note that the failing build is the one where numpy is not installed.

@kdexd
Copy link
Author

kdexd commented Dec 12, 2016

@lesteve I knew there was a problem with skipif marker. I guess the old with_numpy cannot be changed. Old with_numpy works perfectly fine.

# of same name due to 'indirect' keyword used here
# Expected results have been generated with joblib 0.9.0
[(0, {'py2': '80f2387e7752abbda2658aafed49e086',
'py3': '10a6afc379ca2708acfbaef0ab676eab'}),
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lesteve I cannot write it as:

@with_numpy
@parametrize(['to_hash', 'expected'],
             [(np.arange(100, dtype='<i8').reshape((10, 10))[:, :2],
               {'py2': '80f2387e7752abbda2658aafed49e086',
                'py3': '10a6afc379ca2708acfbaef0ab676eab'}), .... #others])
def method_name(to_hash, expected):
    # method body
    pass

I cannot use np in @parametrize args... They are still exposed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you not use a function that returns the parameters i.e. something like:

def input_params():
    return [(np.arange(100, dtype='<i8').reshape((10, 10))[:, :2],
               {'py2': '80f2387e7752abbda2658aafed49e086',
                'py3': '10a6afc379ca2708acfbaef0ab676eab'}), .... #others]

@with_numpy
@parametrize(['to_hash', 'expected'], input_params())
def method_name:
    ...

Alternatively it seems like the pytest issue you pointed at was skipping parameters inside the input_params function (or at least that is what I wildly guessed from reading it).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively it seems like the pytest issue you pointed at was skipping parameters inside the input_params function (or at least that is what I wildly guessed from reading it).

I am reasonably confident the first option is not going to work but the latter one will. Here is a snippet that seems to work for me:

import pytest

try:
    import bla
except ImportError:
    bla = None


def params():
    skip = pytest.mark.skipif(bla is None, reason="bla not installed")
    return [skip('a'), skip('b')]


@pytest.mark.parametrize("param", params())
def test_foo(param):
    assert False

There may be better ways to do it though.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well yes, the first one did not work, I tested it. It is because @parametrize runs at setup and the function call also happens at setup. But skipping does not happen. Which is why the second snippet you gave works, because pytest allows providing skipif as parameters to parametrize.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a snippet that seems to work for me

Shall we go this way ? Giving out skipif markers ? Just because we have got one more alternative, out of the two I will vote the already done implementation because I am assuming it would look familiar to eyes of people familair with pytest (maybe ?)

Here is a link to standard example in documentation:

http://doc.pytest.org/en/latest/example/parametrize.html#apply-indirect-on-particular-arguments

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed I feel using parametrize in this case is too convoluted. So just use a dumb for loop.

@kdexd kdexd force-pushed the pytestify-test-hashing branch 2 times, most recently from ab8f008 to a5cb2c2 Compare December 12, 2016 16:06
@kdexd
Copy link
Author

kdexd commented Dec 12, 2016

The travis error indicates that with_numpy does not seem to work as expected, not sure why.

@lesteve: Although I fixed that error, there are yield based tests in test_memory.py which keep on misbehaving. So with_numpy is still the old standard decorator for this PR. I will convert it to skipif marker later.

So for now, with_numpy is unchanged and everything is green now...

import tempfile
import os
import sys
import collections
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't modify the imports order. If you have a plugin that orders imports alphabetically, disable it.

"""Smoke test hash on various types."""
# Check that 2 objects have the same hash only if they are the same.
if obj1 is obj2:
assert hash(obj1) == hash(obj2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope you are not testing the same as the test as it was previously this should be:

is_hash_equal = (hash(obj1) == hash(obj2)) 
assert is_hash_equal == ob1 is obj2

""" Test hashing with numpy arrays.
def three_np_arrays():
"""
Returns three numpy arrays where first two are same and third is a bit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring not PEP0258.

But it seems like having a fixture with with_numpy is a good work-around.

arr1, arr2, arr3 = three_np_arrays

# Only same arrays will have same hash
assert hash(arr1) == hash(arr2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You modified the test here too, keep it as it was:

for obj1, obj2 in itertools.product(obj_list, obj_list):
    is_hash_equal = hash(obj1) == hash(obj2)
    is_array_equal = np.all(obj1 == obj2)
    assert is_hash_equal == is_array_equal

[('This is a string to hash',
{'py2': '80436ada343b0d79a99bfd8883a96e45',
'py3': '71b3f47df22cb19431d85d92d0b230b2'}),

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not skip new lines, it seems a bit weird to me.

'108f6ee98e7db19ea2006ffd208f4bf1',
'bd48ccaaff28e16e6badee81041b7180']}
# Return member of list requested by parametrize marker in test below
return to_hash_list[request.param]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great try but I think this makes the test quite hard to understand. I spent 10 minutes trying to put together something using pytest features but did not get anywhere.

I would be in favour of not using parametrize in this case and do something very close to what was originally here:

@with_numpy
def test_hashes_stay_the_same_with_numpy_objects:
    to_hash_list = ...
    expected_dict = ...
    for to_hash, expected in zip(to_hash_list, expected_list):
        assert hash(to_hash), expected

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, my general feeling so far is that parametrize can make the test more obscure than it was. The only advantage compared to having a loop in the test is to create one test for each input argument, so I feel parametrize should be used sparingly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But won't this for loop stop if the first test case itself is incorrect ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine. Running a separate test for each parameter is nice but not that crucial. If it makes the test hard to understand then we should give up on using parametrize.

hash(a, coerce_mmap=coerce_mmap) ==
hash(m, coerce_mmap=coerce_mmap),
coerce_mmap)
assert ((hash(a, coerce_mmap=coerce_mmap) ==
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Define a temporary variable to make this less obscure to read:

is_hash_equal = hash(a, coerce_mmap=coerce_mmap) == hash(m, coerce_mmap==coerce_mmap)
assert is_hash_equal == coerce_mmap

@@ -93,8 +93,8 @@ def f(self, x):
def test_trival_hash(obj1, obj2):
"""Smoke test hash on various types."""
# Check that 2 objects have the same hash only if they are the same.
is_hash_equal = (hash(obj1) == hash(obj2))
assert is_hash_equal == (obj1 is obj2)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this second paranthesis is needed. Not having them failed tests for me. Because probably is_hash_equal == obj1 is obj2 will be true if all are not None.

@lesteve
Copy link
Member

lesteve commented Dec 13, 2016

Oops I broke the tests by removing the parentheses, sorry about that, I'll fix them.

and rename variables.
@lesteve
Copy link
Member

lesteve commented Dec 13, 2016

Test should be fixed now. Turns out I was wrong about operators priority ..

@kdexd
Copy link
Author

kdexd commented Dec 13, 2016

I am removing the parametrization from test_hashes_stay_the_same_with_numpy_objects but I insist to keep expected in a dictionary form per test case ( {'py2': hash, 'py3': hash} as it is in @parametrize of other tests, as I will locally try to refactor this test in some acceptable and good looking manner.

py_version_str = 'py3' if PY3_OR_LATER else 'py2'
expected_list = expected_dict[py_version_str]
# Expected results have been generated with joblib 0.9.0
expected_list = [{'py2': '80f2387e7752abbda2658aafed49e086',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you leave the expected_dict definition as it was. Remember: look at your diff and do not add changes if there is not a good reason for it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am experimenting ways to parametrize this in a good way so I kept this diff. Nevermind, I shifted this to a separate branch.

def test_hash_numpy():
""" Test hashing with numpy arrays.
def three_np_arrays():
"""Returns three numpy arrays where first two are same and third is a bit
Copy link
Member

@lesteve lesteve Dec 13, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove the docstring, I don't think it adds anything.

def test_hash_numpy_arrays(three_np_arrays):
arr1, arr2, arr3 = three_np_arrays

# Only same arrays will have same hash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this comment the code is clear enough and it does not say much.


yield assert_not_equal, hash(arr1), hash(arr1.T)
# Two dicts will have same hash if they have exact same key value pairs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this comment too.

# Check that 2 objects have the same hash only if they are
# the same.
yield assert_equal, hash(obj1) == hash(obj2), obj1 is obj2
@parametrize('obj1, obj2', list(itertools.product(
Copy link
Member

@lesteve lesteve Dec 13, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment I found out while reading the doc that you can do:

input_list = [...]
@parametrize('obj1', input_list)
@parametrize('obj2', input_list)
def test_trivial_hash(obj1, obj2):
    ...

and this will have the same effect, i.e. iterating on the cartesian product of input_list with itself.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought of doing this, but then I stopped, thinking that for the first look I should not demonstrate my diff which introduces a not so special list in global scope.

But now we're on the same boat, so modifying it...

@@ -460,13 +444,13 @@ def test_hashes_stay_the_same_with_numpy_objects():
expected_list = expected_dict[py_version_str]

for to_hash, expected in zip(to_hash_list, expected_list):
yield assert_equal, hash(to_hash), expected
assert hash(to_hash) == expected[py_version_str]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to remove [py_version_str] to get the tests to pass I believe.



@with_numpy
@skipif(sys.platform == 'win32', reason='This test is not stable under windows'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm I missed that is that a new thing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK it's not I see it in the diff below, sorry.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, cool 😄

@lesteve
Copy link
Member

lesteve commented Dec 13, 2016

LGTM, merging, thanks a lot!

@lesteve lesteve merged commit 117009d into joblib:master Dec 13, 2016
@kdexd kdexd deleted the pytestify-test-hashing branch December 13, 2016 12:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants