-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose get_json_object_options to Python #11180
Expose get_json_object_options to Python #11180
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good! Nice work, @SrikarVanavasam!
OK to test |
ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @SrikarVanavasam, thanks for the PR. I have a few suggestions for improvement.
rerun tests |
Codecov Report
@@ Coverage Diff @@
## branch-22.08 #11180 +/- ##
===============================================
Coverage ? 86.30%
===============================================
Files ? 144
Lines ? 22699
Branches ? 0
===============================================
Hits ? 19590
Misses ? 3109
Partials ? 0 Continue to review full report at Codecov.
|
@gpucibot merge |
@gpucibot merge |
def test_string_get_json_object_allow_single_quotes(): | ||
gs = cudf.Series( | ||
[ | ||
""" | ||
{ | ||
"store":{ | ||
"book":[ | ||
{ | ||
'author':"Nigel Rees", | ||
"title":'Sayings of the Century', | ||
"price":8.95 | ||
}, | ||
{ | ||
"category":"fiction", | ||
"author":"Evelyn Waugh", | ||
'title':"Sword of Honour", | ||
"price":12.99 | ||
} | ||
] | ||
} | ||
} | ||
""" | ||
] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps prefer
@pytest.fixture
def gs():
return cudf.Series(...)
def test-string_get_json_object_allow_single_quotes(gs)
...
To avoid repeating the definition of the series object in each test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion. @vyasr would this be in line with the recommendations for writing test? Or would this be better written in a pytest-case
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @wence-'s suggestion is already a big improvement here. I would indeed prefer to use cases, but before making the argument for, I want to be clear on the fact that the difference between cases and fixtures is almost entirely one of semantics and providing nicer syntactic sugar. @wence- perhaps the discussion will help you give me some suggestions on how to improve with regards to your comment on my other PR, or maybe it'll provide you ammunition for a counterpoint that makes me change my mind!
Here's how I would write something like this with cases:
def json_object_case_single_quotes():
return cudf.Series(...)
@parametrize_with_cases(gs, cases=".", prefix="json_object_case_")
def test_string_get_json_object(gs):
....
@parametrize_with_cases("gs", cases=".", prefix="json_object_case_")
def test_string_get_json_object(gs):
....
@parametrize_with_cases("gs", cases=".", prefix="json_object_case_")
def test_string_get_json_object_strip_quotes(gs):
Some notes:
- I've included the cases in the same file here for brevity, but for a test file
test_foo.py
cases will be automatically detected fromtest_foo_cases.py
. This has the benefit of enforcing some organization that you would otherwise have to do manually. The alternatives are cluttering the test file (or worse, conftest.py). - Cases are designed for parametrization, whereas fixtures are not. By that, what I mean is that a test runs on a single fixture, not many. (pytest-cases actually allows you to relax this restriction in some ways, but not as naturally as with cases. Also, it leads to cognitive dissonance since that's a way in which pytest-cases behaves differently from pytest rather than just adding functionality, so it could confuse users)
- With cases you can select in various ways: prefix is an obvious choice, but you can also use globs or tag specific cases, for instance. That makes it much easier to create a suite of cases for a particular function, then write a set of tests for that function, and then create a many-to-many mapping between cases and the tests that those cases should be used for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So let me see if I understand. In this instance, I think there is no difference in the behaviour. So I will try and explain what I think and then provide an example that shows differences?
A pytest.fixture
is requested by a test by name.
@pytest.fixture
def foo():
pass
def test_foo(foo):
# fixture foo is invoked
So any parametrisation of fixtures happens by params:
@pytest.fixture(params=[1,2,3])
def foo(request):
return request.param
def test_foo(foo):
# invoked three times with foo=1, foo=2, foo=3
In the example we're discussing, the object we're testing is not parameterised by anything, so we're using a fixture for OAOO, and to avoid eager initialisation. This is better than just defining a module-level variable in a number of ways (including that the object is recreated for each test in turn, so stateful modifications between tests are not a problem).
In contrast, I think for the pytest-cases
approach if I want a test argument to take multiple values I would write separate cases:
def my_case_the_first():
...
def my_case_the_second():
...
@parameterize_with_cases("foo", cases=".", prefix="my_case_")
def test_foo(foo):
... # test is called twice, first with foo=my_case_the_first(), then with foo=my_case_the_second()
So I can create a bunch of test cases and then select how I use them in my test functions.
So now my fixtures can have descriptive names, but I can bind them locally in the test to something short: I agree this makes fixtures more self-documenting.
I guess the other big thing that pytest-cases brings is that it augments the algebra of parameterized fixtures to union of sets rather than only cartesian product (though we're not using that here).
So. In sum, I can see the advantages for cases in the general case (hah!) when building lots of fixtures that one will use for many test functions (and need to manipulate in various ways).
In this particular instance, I'm not sold on it, since it seems to introduce more boilerplate than really necessary. Perhaps just using @fixture
from pytest-cases
rather than pytest
would be reasonable in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this particular instance, I'm not sold on it, since it seems to introduce more boilerplate than really necessary.
Just to clarify, the boilerplate that you're referring to is the need to decorate each function with parametrize_with_cases
, right? And there's not much benefit here because 1) we only have a single case, and 2) the exact same case is used in a large number of tests? If so, I think that I agree.
Using some of your points, let's try to come up with some reasonable guidelines for when using cases vs fixtures.
Use fixtures when:
- Many tests must be run on exactly the same input
- Many tests must be run on exactly the same set of inputs, and all of those inputs can be constructed as simple parametrizations.
Use cases when:
- A test must be run on many different kinds of inputs
- Given a set of cases, different tests need to run on different subsets with a nonempty intersection
- Many tests must be run on exactly the same set of inputs, and all of those inputs can be constructed as
simplecomplex parametrizations.
We would also need to clearly define "simple parametrization". To me, it's something where a single scalar parameter varies, and where that parameter is directly passed to some other function (like a constructor). Something like this:
@pytest.fixture(params=['a', 'b'])
def foo(request):
if request.param == 'a':
# Some complex initialization
elif request.param == 'b':
# Some other complex initialization
would constitute a fixture that should instead be implemented as separate cases.
Do these align with your expectations @wence- ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that's what I expect. Although I note that the last two points of the "use X when:" lists are the same, was the cases one meant to say "complex parameterizations" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, that is what I meant. Fixed.
This PR exposes
get_json_object_options
to the Python API. Addresses #10196