Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: use pandas function to check for NaN #750

Merged
merged 4 commits into from Jul 12, 2021
Merged

fix: use pandas function to check for NaN #750

merged 4 commits into from Jul 12, 2021

Conversation

@LinuxChristian
Copy link
Contributor

@LinuxChristian LinuxChristian commented Jul 11, 2021

Starting with pandas 1.0, an experimental pandas.NA value (singleton) is available to represent scalar missing values as
opposed to numpy.nan. Comparing the variable with itself results in a pandas.NA value that doesn't support type-casting
to boolean. Using the build-in pandas.isna function handles all pandas supported NaN values (None, pd.NA, np.nan, NaT).

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #729

Starting with pandas 1.0, an experimental pandas.NA value (singleton) is available to represent scalar missing values as
opposed to numpy.nan. Comparing the variable with itself results in a pandas.NA value that doesn't support type-casting
to boolean. Using the build-in pandas.isna function handles all pandas supported NaN values.
@LinuxChristian LinuxChristian requested review from as code owners Jul 11, 2021
@LinuxChristian LinuxChristian requested review from loferris and removed request for Jul 11, 2021
@google-cla google-cla bot added the cla: yes label Jul 11, 2021
Copy link
Contributor

@plamut plamut left a comment

Thanks for the fix and the test, it generally looks good! Just made two improvement suggestions.

Loading

tests/unit/test__pandas_helpers.py Show resolved Hide resolved
Loading
tests/unit/test__pandas_helpers.py Outdated Show resolved Hide resolved
Loading
@plamut
Copy link
Contributor

@plamut plamut commented Jul 12, 2021

FWIW, the prerelease-deps failure is flakiness, while Python 3.6 unit tests failure is legitimate - we use the oldest supported depenency versions there, and the new test failed with an attribute error:

 AttributeError: module 'pandas' has no attribute 'NA'

Loading

@LinuxChristian
Copy link
Contributor Author

@LinuxChristian LinuxChristian commented Jul 12, 2021

I added one commit per comment so it's easier to review. If you would like me to squash a final version let me know.

Loading

@pytest.mark.skipif(pandas is None, reason="Requires `pandas`")
@pytest.mark.skipIf(
pandas is None or PANDAS_INSTALLED_VERSION < PANDAS_MINIUM_VERSION,
reason="Requires `pandas version >= 1.0.0` which introduces pandas.NA",
Copy link
Contributor

@plamut plamut Jul 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat to also include the reason why a minimum version is needed. 👍

Loading

plamut
plamut approved these changes Jul 12, 2021
Copy link
Contributor

@plamut plamut left a comment

LGTM, thanks for the quick update!

Nothing to worry about squashing the commits, we do that when the PR is merged.

Loading

@plamut
Copy link
Contributor

@plamut plamut commented Jul 12, 2021

It appears that the test was not skipped?

    def test_dataframe_to_json_generator(module_under_test):
        utcnow = datetime.datetime.utcnow()
        df_data = collections.OrderedDict(
            [
>               ("a_series", [pandas.NA, 2, 3, 4]),
                ("b_series", [0.1, float("NaN"), 0.3, 0.4]),
                ("c_series", ["a", "b", pandas.NA, "d"]),
                ("d_series", [utcnow, utcnow, utcnow, pandas.NaT]),
                ("e_series", [True, False, True, None]),
            ]
        )
E       AttributeError: 'NoneType' object has no attribute 'NA'

Also reproducible locally.

Edit: Ah, typo in the decorator name, I actually remember seeing that before. Fixed it myself, should be fine now.

Loading

tests/unit/test__pandas_helpers.py Outdated Show resolved Hide resolved
Loading
@plamut plamut merged commit 67bc5fb into googleapis:master Jul 12, 2021
12 checks passed
Loading
gcf-merge-on-green bot pushed a commit that referenced this issue Jul 13, 2021
Supersedes #711.


## [2.21.0](https://www.github.com/googleapis/python-bigquery/compare/v2.20.0...v2.21.0) (2021-07-13)


### Features

* Add max_results parameter to some of the `QueryJob` methods. ([#698](https://www.github.com/googleapis/python-bigquery/issues/698)) ([2a9618f](https://www.github.com/googleapis/python-bigquery/commit/2a9618f4daaa4a014161e1a2f7376844eec9e8da))
* Add support for decimal target types. ([#735](https://www.github.com/googleapis/python-bigquery/issues/735)) ([7d2d3e9](https://www.github.com/googleapis/python-bigquery/commit/7d2d3e906a9eb161911a198fb925ad79de5df934))
* Add support for table snapshots. ([#740](https://www.github.com/googleapis/python-bigquery/issues/740)) ([ba86b2a](https://www.github.com/googleapis/python-bigquery/commit/ba86b2a6300ae5a9f3c803beeb42bda4c522e34c))
* Enable unsetting policy tags on schema fields. ([#703](https://www.github.com/googleapis/python-bigquery/issues/703)) ([18bb443](https://www.github.com/googleapis/python-bigquery/commit/18bb443c7acd0a75dcb57d9aebe38b2d734ff8c7))
* Make it easier to disable best-effort deduplication with streaming inserts. ([#734](https://www.github.com/googleapis/python-bigquery/issues/734)) ([1246da8](https://www.github.com/googleapis/python-bigquery/commit/1246da86b78b03ca1aa2c45ec71649e294cfb2f1))
* Support passing struct data to the DB API. ([#718](https://www.github.com/googleapis/python-bigquery/issues/718)) ([38b3ef9](https://www.github.com/googleapis/python-bigquery/commit/38b3ef96c3dedc139b84f0ff06885141ae7ce78c))


### Bug Fixes

* Inserting non-finite floats with `insert_rows()`. ([#728](https://www.github.com/googleapis/python-bigquery/issues/728)) ([d047419](https://www.github.com/googleapis/python-bigquery/commit/d047419879e807e123296da2eee89a5253050166))
* Use `pandas` function to check for `NaN`. ([#750](https://www.github.com/googleapis/python-bigquery/issues/750)) ([67bc5fb](https://www.github.com/googleapis/python-bigquery/commit/67bc5fbd306be7cdffd216f3791d4024acfa95b3))


### Documentation

* Add docs for all enums in module. ([#745](https://www.github.com/googleapis/python-bigquery/issues/745)) ([145944f](https://www.github.com/googleapis/python-bigquery/commit/145944f24fedc4d739687399a8309f9d51d43dfd))
* Omit mention of Python 2.7 in `CONTRIBUTING.rst`. ([#706](https://www.github.com/googleapis/python-bigquery/issues/706)) ([27d6839](https://www.github.com/googleapis/python-bigquery/commit/27d6839ee8a40909e4199cfa0da8b6b64705b2e9))
gcf-merge-on-green bot pushed a commit that referenced this issue Jul 14, 2021
🤖 I have created a release \*beep\* \*boop\*
---
## [2.21.0](https://www.github.com/googleapis/python-bigquery/compare/v2.20.0...v2.21.0) (2021-07-14)


### Features

* add always_use_jwt_access ([#714](https://www.github.com/googleapis/python-bigquery/issues/714)) ([92fbd4a](https://www.github.com/googleapis/python-bigquery/commit/92fbd4ade37e0be49dc278080ef73c83eafeea18))
* add max_results parameter to some of the QueryJob methods ([#698](https://www.github.com/googleapis/python-bigquery/issues/698)) ([2a9618f](https://www.github.com/googleapis/python-bigquery/commit/2a9618f4daaa4a014161e1a2f7376844eec9e8da))
* add support for decimal target types ([#735](https://www.github.com/googleapis/python-bigquery/issues/735)) ([7d2d3e9](https://www.github.com/googleapis/python-bigquery/commit/7d2d3e906a9eb161911a198fb925ad79de5df934))
* add support for table snapshots ([#740](https://www.github.com/googleapis/python-bigquery/issues/740)) ([ba86b2a](https://www.github.com/googleapis/python-bigquery/commit/ba86b2a6300ae5a9f3c803beeb42bda4c522e34c))
* enable unsetting policy tags on schema fields ([#703](https://www.github.com/googleapis/python-bigquery/issues/703)) ([18bb443](https://www.github.com/googleapis/python-bigquery/commit/18bb443c7acd0a75dcb57d9aebe38b2d734ff8c7))
* make it easier to disable best-effort deduplication with streaming inserts ([#734](https://www.github.com/googleapis/python-bigquery/issues/734)) ([1246da8](https://www.github.com/googleapis/python-bigquery/commit/1246da86b78b03ca1aa2c45ec71649e294cfb2f1))
* Support passing struct data to the DB API ([#718](https://www.github.com/googleapis/python-bigquery/issues/718)) ([38b3ef9](https://www.github.com/googleapis/python-bigquery/commit/38b3ef96c3dedc139b84f0ff06885141ae7ce78c))


### Bug Fixes

* inserting non-finite floats with insert_rows() ([#728](https://www.github.com/googleapis/python-bigquery/issues/728)) ([d047419](https://www.github.com/googleapis/python-bigquery/commit/d047419879e807e123296da2eee89a5253050166))
* use pandas function to check for NaN ([#750](https://www.github.com/googleapis/python-bigquery/issues/750)) ([67bc5fb](https://www.github.com/googleapis/python-bigquery/commit/67bc5fbd306be7cdffd216f3791d4024acfa95b3))


### Documentation

* add docs for all enums in module ([#745](https://www.github.com/googleapis/python-bigquery/issues/745)) ([145944f](https://www.github.com/googleapis/python-bigquery/commit/145944f24fedc4d739687399a8309f9d51d43dfd))
* omit mention of Python 2.7 in `CONTRIBUTING.rst` ([#706](https://www.github.com/googleapis/python-bigquery/issues/706)) ([27d6839](https://www.github.com/googleapis/python-bigquery/commit/27d6839ee8a40909e4199cfa0da8b6b64705b2e9))
---


This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

3 participants