Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixturize JSON tests #31191

Merged
merged 15 commits into from
Mar 19, 2020
Merged

Conversation

WillAyd
Copy link
Member

@WillAyd WillAyd commented Jan 22, 2020

Ultimately looking to split this module up into something like:

  • test_serializer.py
  • test_deserializer.py
  • test_roundtrip.py
  • test_orient.py

TBD on those being the actual modules, but for now starting with some parametrization

@WillAyd WillAyd added Testing pandas testing functions or related to the test suite IO JSON read_json, to_json, json_normalize labels Jan 22, 2020
@@ -681,10 +662,10 @@ def test_series_roundtrip_empty(self, orient, numpy):
tm.assert_series_equal(result, expected)

@pytest.mark.parametrize("numpy", [True, False])
def test_series_roundtrip_timeseries(self, orient, numpy):
data = self.ts.to_json(orient=orient)
def test_series_roundtrip_timeseries(self, orient, numpy, datetime_series):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks really similar to test_series_roundtrip_empty and test_series_roundtrip_object. would it make sense to something like any_series fixture for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I think a great idea. I’m thinking worth doing after breaking out and setting up test_roundtrip.py with other parametrization unless a blocker here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good. This LGTM

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @WillAyd a few comments otherwise lgtm

@@ -474,12 +455,12 @@ def test_v12_compat(self):
df["modified"] = df["date"]
df.iloc[1, df.columns.get_loc("modified")] = pd.NaT

v12_json = os.path.join(self.dirpath, "tsframe_v012.json")
v12_json = os.path.join("tsframe_v012.json")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could just use the filename string directly in the read_json call below?

df_unser = pd.read_json(v12_json)
tm.assert_frame_equal(df, df_unser)

df_iso = df.drop(["modified"], axis=1)
v12_iso_json = os.path.join(self.dirpath, "tsframe_iso_v012.json")
v12_iso_json = os.path.join("tsframe_iso_v012.json")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

result = pd.read_json(data, typ="series", orient=orient, numpy=numpy)
expected = self.series.copy()
expected = string_series.copy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strictly don't need to copy as fixture creates new object for each invocation of test

result = pd.read_json(
data, typ="series", orient=orient, numpy=numpy, dtype=dtype
)
expected = self.objSeries.copy()
expected = object_series.copy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

result = pd.read_json(data, typ="series", orient=orient, numpy=numpy)
expected = self.empty_series.copy()
expected = empty_series.copy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

result = pd.read_json(data, typ="series", orient=orient, numpy=numpy)
expected = self.ts.copy()
expected = datetime_series.copy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@@ -126,7 +126,7 @@ def series(index, _series_name, _static_values):


@pytest.fixture
def empty_series(series):
def empty_series_dti(series):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why change this? doesn't the series type depends on the _index_factory fixture.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to avoid naming conflicts with the top level empty_series fixture, which I think is more general; would they not conflict if named the same?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see #24886, @jbrockmendel removed empty series fixture since it is a singleton and inlined instead. adding a top-level empty_series fixture is inconsistent with the series tests.

The fixture here is a composable fixture and more generic and IMO should not be renamed.

Does the fixture added in this PR need to be in the top level conftest or could it be just added to pandas/tests/io/json/test_pandas.py as a module level fixture?

@@ -93,13 +93,13 @@ def test_raises_on_non_datetimelike_index():

@all_ts
@pytest.mark.parametrize("freq", ["M", "D", "H"])
def test_resample_empty_series(freq, empty_series, resample_method):
def test_resample_empty_series(freq, empty_series_dti, resample_method):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC the all_ts decorator is running this test for date_range, period_range and timedelta_range, not just dti

@simonjayhawkins simonjayhawkins added this to the 1.1 milestone Jan 22, 2020
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we had basically these same fixtures, then removed them, now are adding them back (in pandas/conftest.py)

can you see if these already exist in other conftests.py

we also need a script to deduplicate fixtures (and check this)

def setup(self, datapath):
self.dirpath = datapath("io", "json", "data")
@pytest.fixture(autouse=True)
def setup(self, datapath, monkeypatch):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what? why would you need to chdir?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use this same idiom in the excel file tests to read in files directly from where they are stored. Can alternately use datapath directly in the tests at the cost of more code duplication

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a really strange pattern, would prefer to clean it up

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i find this a really strange pattern which is not standard and likely not even needed

pls fix

@WillAyd
Copy link
Member Author

WillAyd commented Jan 23, 2020

we had basically these same fixtures, then removed them, now are adding them back (in pandas/conftest.py)

With the exception of empty_series this isn't adding anything but instead moving from pandas/tests/series/conftest.py to the higher level conftest for re-use. Can alternately duplicate definitions in JSON conftest if we don't like them at the high level, but note that there are quite a few other tests areas that could reuse these rather than calling tm.make* functions directly

@jreback
Copy link
Contributor

jreback commented Jan 24, 2020

we had basically these same fixtures, then removed them, now are adding them back (in pandas/conftest.py)

With the exception of empty_series this isn't adding anything but instead moving from pandas/tests/series/conftest.py to the higher level conftest for re-use. Can alternately duplicate definitions in JSON conftest if we don't like them at the high level, but note that there are quite a few other tests areas that could reuse these rather than calling tm.make* functions directly

i think you are not seeing the big picture here. I am all for making fixtures top-level and not speciying them at a lower level. However, we have had a number of PRs moving them to lower levels, and then others moving them up.

Before we merge this we need a check to avoid duplication of fixtures.

@WillAyd
Copy link
Member Author

WillAyd commented Jan 24, 2020

Ah OK I get what you are asking. So running pytest pandas --fixtures and searching for these would manually confirm they aren't duplicated. I'll check with the pytest folks to see if they offer anything more robust

@jbrockmendel
Copy link
Member

needs rebase

@@ -648,7 +650,7 @@ def test_series_roundtrip_empty(self, orient, numpy, empty_series):
data = empty_series.to_json(orient=orient)
result = pd.read_json(data, typ="series", orient=orient, numpy=numpy)

# TODO: see what causes inconsistency
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did this get resolved?

@jbrockmendel
Copy link
Member

unused import, otherwise LGTM

@jbrockmendel
Copy link
Member

needs rebase, ping on green

@jbrockmendel
Copy link
Member

cc @jreback im about ready to merge this, any objection?

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs some work

if u fixturize then remove the setup

def setup(self, datapath):
self.dirpath = datapath("io", "json", "data")
@pytest.fixture(autouse=True)
def setup(self, datapath, monkeypatch):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i find this a really strange pattern which is not standard and likely not even needed

pls fix

self.dirpath = datapath("io", "json", "data")
@pytest.fixture(autouse=True)
def setup(self, datapath, monkeypatch):
monkeypatch.chdir(datapath("io", "json", "data"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is there even a setup left? this is very messy

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't get rid of the setup just yet because would still need to take care of all of the frame fixtures (this PR only has done Series)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But yea goal would be to get rid of it

@jreback
Copy link
Contributor

jreback commented Mar 15, 2020

can you merge master and see if you can avoid the monkeypatch

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather empty_series_dti wasn't named that way. see #31191 (comment) and #31191 (comment)

Having different fixtures with the same name shouldn't cause conflicts if the directory structure is properly organised and in this case I see no reason to rename it.

If you are renaming can you avoid the use of dti? maybe empty_series_datetimelike.

@WillAyd
Copy link
Member Author

WillAyd commented Mar 18, 2020

Having different fixtures with the same name shouldn't cause conflicts if the directory structure is properly organised and in this case I see no reason to rename it.

I don't think we are worried about this causing errors as much as being confusing; I don't know that we want to override fixtures in different packages, and Jeff was asking previously for some way to check that we explicitly don't do this (unless I misunderstood)

If you are renaming can you avoid the use of dti? maybe empty_series_datetimelike.

These are all empty series with datetimelike-typed indices, which is why I went for dti. I personally would misread series_datetimelike as referring to the values not the indices; @jbrockmendel since you have the most expertise in that area - can you offer an opinion?

@jbrockmendel
Copy link
Member

I personally would misread series_datetimelike as referring to the values not the indices

I would be at risk of this too.

This naming convention problem is not easy and is not unique to this PR, should not hold it up.

AFAICT there are three options for getting informative names:

  1. long names e.g. "datetime64_series_with_interval_index"
  2. adopt some predictable conventions to shorten these e.g. "dt64ser_ii"
  3. live with the not-really-that-bad status quo

@jreback
Copy link
Contributor

jreback commented Mar 19, 2020

lgtm. am a little concerted about changing to use empty_series_dti, but agree we need to fix this; so ok for now (maybe open a followup issue for discussion).

any more @simonjayhawkins or @jbrockmendel

@jbrockmendel
Copy link
Member

im a happy camper

@simonjayhawkins simonjayhawkins merged commit f6099bf into pandas-dev:master Mar 19, 2020
@simonjayhawkins
Copy link
Member

Thanks @WillAyd

SeeminSyed pushed a commit to CSCD01-team01/pandas that referenced this pull request Mar 22, 2020
jbrockmendel pushed a commit to jbrockmendel/pandas that referenced this pull request Mar 23, 2020
jbrockmendel pushed a commit to jbrockmendel/pandas that referenced this pull request Mar 25, 2020
@WillAyd WillAyd mentioned this pull request Mar 30, 2020
@WillAyd WillAyd deleted the json-test-cleanup branch April 12, 2023 20:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO JSON read_json, to_json, json_normalize Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants