Expose get_json_object_options to Python #11180

SrikarVanavasam · 2022-06-30T21:39:48Z

This PR exposes get_json_object_options to the Python API. Addresses #10196

…ct_options_python

shwina

This looks really good! Nice work, @SrikarVanavasam!

shwina · 2022-07-06T15:09:26Z

OK to test

ajschmidt8 · 2022-07-06T16:14:17Z

ok to test

bdice

Hi @SrikarVanavasam, thanks for the PR. I have a few suggestions for improvement.

python/cudf/cudf/_lib/cpp/strings/json.pxd

python/cudf/cudf/_lib/strings/json.pyx

python/cudf/cudf/core/column/string.py

python/cudf/cudf/tests/test_string.py

ajschmidt8 · 2022-07-06T17:00:00Z

rerun tests

codecov · 2022-07-06T22:20:23Z

Codecov Report

❗ No coverage uploaded for pull request base (branch-22.08@f9ebaf1). Click here to learn what that means.
The diff coverage is n/a.

@@               Coverage Diff               @@
##             branch-22.08   #11180   +/-   ##
===============================================
  Coverage                ?   86.30%           
===============================================
  Files                   ?      144           
  Lines                   ?    22699           
  Branches                ?        0           
===============================================
  Hits                    ?    19590           
  Misses                  ?     3109           
  Partials                ?        0

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f9ebaf1...3b9f14f. Read the comment docs.

python/cudf/cudf/_lib/strings/json.pyx

vyasr · 2022-07-11T21:45:02Z

@gpucibot merge

Multiple approving reviews

vyasr · 2022-07-11T21:45:57Z

@gpucibot merge

wence- · 2022-07-12T09:56:37Z

python/cudf/cudf/tests/test_string.py

+def test_string_get_json_object_allow_single_quotes():
+    gs = cudf.Series(
+        [
+            """
+            {
+                "store":{
+                    "book":[
+                        {
+                            'author':"Nigel Rees",
+                            "title":'Sayings of the Century',
+                            "price":8.95
+                        },
+                        {
+                            "category":"fiction",
+                            "author":"Evelyn Waugh",
+                            'title':"Sword of Honour",
+                            "price":12.99
+                        }
+                    ]
+                }
+            }
+            """
+        ]
+    )


Perhaps prefer

@pytest.fixture def gs(): return cudf.Series(...) def test-string_get_json_object_allow_single_quotes(gs) ...

To avoid repeating the definition of the series object in each test.

Good suggestion. @vyasr would this be in line with the recommendations for writing test? Or would this be better written in a pytest-case?

I think @wence-'s suggestion is already a big improvement here. I would indeed prefer to use cases, but before making the argument for, I want to be clear on the fact that the difference between cases and fixtures is almost entirely one of semantics and providing nicer syntactic sugar. @wence- perhaps the discussion will help you give me some suggestions on how to improve with regards to your comment on my other PR, or maybe it'll provide you ammunition for a counterpoint that makes me change my mind!

Here's how I would write something like this with cases:

def json_object_case_single_quotes(): return cudf.Series(...) @parametrize_with_cases(gs, cases=".", prefix="json_object_case_") def test_string_get_json_object(gs): .... @parametrize_with_cases("gs", cases=".", prefix="json_object_case_") def test_string_get_json_object(gs): .... @parametrize_with_cases("gs", cases=".", prefix="json_object_case_") def test_string_get_json_object_strip_quotes(gs):

Some notes:

I've included the cases in the same file here for brevity, but for a test file test_foo.py cases will be automatically detected from test_foo_cases.py. This has the benefit of enforcing some organization that you would otherwise have to do manually. The alternatives are cluttering the test file (or worse, conftest.py).

Cases are designed for parametrization, whereas fixtures are not. By that, what I mean is that a test runs on a single fixture, not many. (pytest-cases actually allows you to relax this restriction in some ways, but not as naturally as with cases. Also, it leads to cognitive dissonance since that's a way in which pytest-cases behaves differently from pytest rather than just adding functionality, so it could confuse users)

With cases you can select in various ways: prefix is an obvious choice, but you can also use globs or tag specific cases, for instance. That makes it much easier to create a suite of cases for a particular function, then write a set of tests for that function, and then create a many-to-many mapping between cases and the tests that those cases should be used for.

So let me see if I understand. In this instance, I think there is no difference in the behaviour. So I will try and explain what I think and then provide an example that shows differences?

A pytest.fixture is requested by a test by name.

@pytest.fixture def foo(): pass def test_foo(foo): # fixture foo is invoked

So any parametrisation of fixtures happens by params:

@pytest.fixture(params=[1,2,3]) def foo(request): return request.param def test_foo(foo): # invoked three times with foo=1, foo=2, foo=3

In the example we're discussing, the object we're testing is not parameterised by anything, so we're using a fixture for OAOO, and to avoid eager initialisation. This is better than just defining a module-level variable in a number of ways (including that the object is recreated for each test in turn, so stateful modifications between tests are not a problem).

In contrast, I think for the pytest-cases approach if I want a test argument to take multiple values I would write separate cases:

def my_case_the_first(): ... def my_case_the_second(): ... @parameterize_with_cases("foo", cases=".", prefix="my_case_") def test_foo(foo): ... # test is called twice, first with foo=my_case_the_first(), then with foo=my_case_the_second()

So I can create a bunch of test cases and then select how I use them in my test functions.

So now my fixtures can have descriptive names, but I can bind them locally in the test to something short: I agree this makes fixtures more self-documenting.

I guess the other big thing that pytest-cases brings is that it augments the algebra of parameterized fixtures to union of sets rather than only cartesian product (though we're not using that here).

So. In sum, I can see the advantages for cases in the general case (hah!) when building lots of fixtures that one will use for many test functions (and need to manipulate in various ways).

In this particular instance, I'm not sold on it, since it seems to introduce more boilerplate than really necessary. Perhaps just using @fixture from pytest-cases rather than pytest would be reasonable in this case.

In this particular instance, I'm not sold on it, since it seems to introduce more boilerplate than really necessary.

Just to clarify, the boilerplate that you're referring to is the need to decorate each function with parametrize_with_cases, right? And there's not much benefit here because 1) we only have a single case, and 2) the exact same case is used in a large number of tests? If so, I think that I agree.

Using some of your points, let's try to come up with some reasonable guidelines for when using cases vs fixtures.

Use fixtures when:

Many tests must be run on exactly the same input

Many tests must be run on exactly the same set of inputs, and all of those inputs can be constructed as simple parametrizations.

Use cases when:

A test must be run on many different kinds of inputs

Given a set of cases, different tests need to run on different subsets with a nonempty intersection

Many tests must be run on exactly the same set of inputs, and all of those inputs can be constructed as ~~simple~~ complex parametrizations.

We would also need to clearly define "simple parametrization". To me, it's something where a single scalar parameter varies, and where that parameter is directly passed to some other function (like a constructor). Something like this:

@pytest.fixture(params=['a', 'b']) def foo(request): if request.param == 'a': # Some complex initialization elif request.param == 'b': # Some other complex initialization

would constitute a fixture that should instead be implemented as separate cases.

Do these align with your expectations @wence- ?

Yes, I think that's what I expect. Although I note that the last two points of the "use X when:" lists are the same, was the cases one meant to say "complex parameterizations" ?

Correct, that is what I meant. Fixed.

SrikarVanavasam added 5 commits June 27, 2022 14:51

get_json_object_options

412ae7a

Merge remote-tracking branch 'origin/branch-22.08' into get_json_obje…

7291173

…ct_options_python

Merge branch 'branch-22.08' into get_json_object_options_python

b271bc8

Merge branch 'branch-22.08' into get_json_object_options_python

b638122

Added tests

dc54be1

SrikarVanavasam requested a review from a team as a code owner June 30, 2022 21:39

SrikarVanavasam requested review from trxcllnt and brandon-b-miller June 30, 2022 21:39

github-actions bot added the Python Affects Python cuDF API. label Jun 30, 2022

SrikarVanavasam added 3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jun 30, 2022

check-style

1c67a02

shwina approved these changes Jul 6, 2022

View reviewed changes

bdice previously requested changes Jul 6, 2022

View reviewed changes

ajschmidt8 added non-breaking Non-breaking change and removed non-breaking Non-breaking change labels Jul 6, 2022

ajschmidt8 changed the title ~~Expose get_json_object_options to Python~~ Expose get_json_object_options to Python Jul 6, 2022

reviewer feedback

a54aa9c

SrikarVanavasam requested a review from bdice July 6, 2022 20:29

shwina reviewed Jul 6, 2022

View reviewed changes

python/cudf/cudf/_lib/strings/json.pyx Outdated Show resolved Hide resolved

GetJsonObjectOptions constructor

4bcabb6

SrikarVanavasam requested a review from shwina July 7, 2022 16:31

shwina approved these changes Jul 7, 2022

View reviewed changes

trxcllnt approved these changes Jul 9, 2022

View reviewed changes

make GetJsonObjectOptions ctor keyword-only

3b9f14f

vyasr approved these changes Jul 11, 2022

View reviewed changes

rapids-bot bot merged commit 002cb1c into rapidsai:branch-22.08 Jul 11, 2022

wence- reviewed Jul 12, 2022

View reviewed changes

vyasr mentioned this pull request Jul 18, 2022

Add developer documentation for benchmarking #11122

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose get_json_object_options to Python #11180

Expose get_json_object_options to Python #11180

SrikarVanavasam commented Jun 30, 2022

shwina left a comment

shwina commented Jul 6, 2022

ajschmidt8 commented Jul 6, 2022

bdice left a comment

ajschmidt8 commented Jul 6, 2022

codecov bot commented Jul 6, 2022 •

edited

Loading

vyasr commented Jul 11, 2022

vyasr commented Jul 11, 2022

wence- Jul 12, 2022

shwina Jul 12, 2022

vyasr Jul 12, 2022

wence- Jul 13, 2022

vyasr Jul 14, 2022 •

edited

Loading

wence- Jul 19, 2022

vyasr Jul 20, 2022

Expose get_json_object_options to Python #11180

Expose get_json_object_options to Python #11180

Conversation

SrikarVanavasam commented Jun 30, 2022

shwina left a comment

Choose a reason for hiding this comment

shwina commented Jul 6, 2022

ajschmidt8 commented Jul 6, 2022

bdice left a comment

Choose a reason for hiding this comment

ajschmidt8 commented Jul 6, 2022

codecov bot commented Jul 6, 2022 • edited Loading

Codecov Report

vyasr commented Jul 11, 2022

vyasr commented Jul 11, 2022

wence- Jul 12, 2022

Choose a reason for hiding this comment

shwina Jul 12, 2022

Choose a reason for hiding this comment

vyasr Jul 12, 2022

Choose a reason for hiding this comment

wence- Jul 13, 2022

Choose a reason for hiding this comment

vyasr Jul 14, 2022 • edited Loading

Choose a reason for hiding this comment

wence- Jul 19, 2022

Choose a reason for hiding this comment

vyasr Jul 20, 2022

Choose a reason for hiding this comment

codecov bot commented Jul 6, 2022 •

edited

Loading

vyasr Jul 14, 2022 •

edited

Loading