Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(bigquery): unit and system test for dataframe with int column with Nan values #39

Merged
merged 12 commits into from May 13, 2020

Conversation

@HemangChothani
Copy link
Contributor

@HemangChothani HemangChothani commented Feb 18, 2020

Fixes #22

@q-logic q-logic force-pushed the bigquery_issue_22 branch from 12345bb to b906717 Feb 18, 2020
@HemangChothani HemangChothani changed the title Bigquery issue 22 feat(bigquery): unit and system test for dataframe with int column with Nan values Feb 18, 2020
Copy link
Contributor

@plamut plamut left a comment

I think it's good, just have two suggestion on improving the skipIf decorators.

Loading

tests/system.py Outdated
@@ -736,6 +736,65 @@ def test_load_table_from_dataframe_w_automatic_schema(self):
)
self.assertEqual(table.num_rows, 3)

@unittest.skipIf(
pandas is None or pandas.__version__ < "1.0.1",
"Only `pandas version >=1.0.1` supports",
Copy link
Contributor

@plamut plamut Feb 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Only `pandas version >=1.0.1` supports",
"Only `pandas version >=1.0.1` supported",

Seems like more correct? (disclaimer: not a native speaker)

Loading

tests/system.py Outdated
@@ -736,6 +736,65 @@ def test_load_table_from_dataframe_w_automatic_schema(self):
)
self.assertEqual(table.num_rows, 3)

@unittest.skipIf(
pandas is None or pandas.__version__ < "1.0.1",
Copy link
Contributor

@plamut plamut Feb 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought - I think comparing plain version strings will actually work in this particular case, but cannot be generalized (e.g. somebody copies this pattern over to another test, or if we have to change the version for some reason).

Let's instead add a library like packaging as a test dependency, and use that for parsing and semantically comparing versions?

Loading

Copy link
Contributor

@plamut plamut Feb 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, why pandas version 1.0.1+, as the code sample from the issue description also seems to works in pandas 1.0.0?

Loading

Copy link
Contributor Author

@HemangChothani HemangChothani Feb 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but need suggestion on version comparison like should i use parse(pandas.__version__) < parse("1.0.0") or should i change the pandas required minimum version in setup.py file and skip that if PY2 or something else?

Loading

Copy link
Contributor

@plamut plamut Feb 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the issue was due to a bug in Pandas, it's probably best to directly compare the version of Pandas rather than a proxy value such as the Python version.

Re: bumping the version - if the new Pandas version is still compatible with Python 2.7, we can bump it (we have tests...), otherwise we will have to wait until April or so when we officially drop Python 2 support (IIRC).

Loading

Copy link
Contributor Author

@HemangChothani HemangChothani Feb 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New version of pandas isn't compatible with Python 2.7, it Requires: Python >=3.6.1.

Loading

Copy link
Contributor

@plamut plamut Feb 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. We could still conditionally bump the pandas version for Python 3.5+ in setup.py, though, to at least guarantee correct behavior in Python 3, even though we cannot fix the bug in Python 2.7 due to pandas incompatibility.

@tswast What do you think?

P.S.: If there is an efficient way of detecting whether a column contains a NaN value, we could also emit a user warning if too old a pandas version is installed... just a thought.

Loading

Copy link
Contributor

@plamut plamut Feb 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tswast bump

Loading

Copy link
Contributor

@tswast tswast Feb 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have setuptools as a required dependency, so why aren't you using pkg_resources?

https://github.com/pydata/pandas-gbq/blob/97e9a9ee3a8f65ede29f25954b1af3f3bc606f58/pandas_gbq/gbq.py#L46

Loading

Copy link
Contributor

@tswast tswast Feb 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of places (including Google) are pretty slow to update pandas, so I'd hesitate to require 1.0+ just yet.

Loading

tests/system.py Outdated
"Only `pandas version >=1.0.1` is supported",
)
@unittest.skipIf(pyarrow is None, "Requires `pyarrow`")
def test_load_table_from_dataframe_w_none_value(self):
Copy link
Contributor

@tswast tswast Feb 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nullable Int64 data type is the real reason for the test. There are plenty of existing tests & samples with loading None values object dtype.

Loading

plamut
plamut approved these changes Mar 4, 2020
@HemangChothani HemangChothani requested a review from shollyman Mar 17, 2020
@plamut
Copy link
Contributor

@plamut plamut commented Apr 2, 2020

@shollyman Friendly ping. :)

@HemangChothani Please just bring the branch up to date in the meanwhile, thanks!

Loading

@HemangChothani
Copy link
Contributor Author

@HemangChothani HemangChothani commented Apr 29, 2020

@shollyman PTAL!

Loading

@HemangChothani
Copy link
Contributor Author

@HemangChothani HemangChothani commented May 11, 2020

@shollyman Friendly ping. :)

Loading

tests/system.py Outdated
@@ -125,6 +126,9 @@
(TooManyRequests, InternalServerError, ServiceUnavailable)
)

PANDAS_MINIUM_VERSION = pkg_resources.parse_version("1.0.0")
Copy link
Contributor

@shollyman shollyman May 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/MINIUM/MINIMUM/ and all the references

Loading

@HemangChothani HemangChothani merged commit 5fd840e into googleapis:master May 13, 2020
3 checks passed
Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

5 participants