Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deps: expand pyarrow dependencies to include version 2 #368

Merged
merged 3 commits into from Nov 10, 2020

Conversation

@tswast
Copy link
Contributor

@tswast tswast commented Nov 4, 2020

Pyarrow 2.0 includes several bug fixes. The wire format remains the same, so it continues to be compatible with the BigQuery Storage API.

@tswast
Copy link
Contributor Author

@tswast tswast commented Nov 4, 2020

Test failure is relevant

______ TestRowIterator.test_to_dataframe_timestamp_out_of_pyarrow_bounds _______

self = <tests.unit.test_table.TestRowIterator testMethod=test_to_dataframe_timestamp_out_of_pyarrow_bounds>

    @pytest.mark.xfail(
        six.PY2,
        reason=(
            "Requires pyarrow>-1.0 to work, but the latter is not compatible "
            "with Python 2 anymore."
        ),
    )
    @unittest.skipIf(pandas is None, "Requires `pandas`")
    @unittest.skipIf(pyarrow is None, "Requires `pyarrow`")
    def test_to_dataframe_timestamp_out_of_pyarrow_bounds(self):
        from google.cloud.bigquery.schema import SchemaField
    
        schema = [SchemaField("some_timestamp", "TIMESTAMP")]
        rows = [
            {"f": [{"v": "81953424000.0"}]},  # 4567-01-01 00:00:00  UTC
            {"f": [{"v": "253402214400.0"}]},  # 9999-12-31 00:00:00  UTC
        ]
        path = "/foo"
        api_request = mock.Mock(return_value={"rows": rows})
        row_iterator = self._make_one(_mock_client(), api_request, path, schema)
    
        df = row_iterator.to_dataframe(create_bqstorage_client=False)
    
        self.assertIsInstance(df, pandas.DataFrame)
        self.assertEqual(len(df), 2)  # verify the number of rows
        self.assertEqual(list(df.columns), ["some_timestamp"])
>       self.assertEqual(
            list(df["some_timestamp"]),
            [dt.datetime(4567, 1, 1), dt.datetime(9999, 12, 31)],
        )
E       AssertionError: Lists differ: [date[25 chars] 0, 0, tzinfo=<UTC>), datetime.datetime(9999, [23 chars]TC>)] != [date[25 chars] 0, 0), datetime.datetime(9999, 12, 31, 0, 0)]
E       
E       First differing element 0:
E       datetime.datetime(4567, 1, 1, 0, 0, tzinfo=<UTC>)
E       datetime.datetime(4567, 1, 1, 0, 0)
E       
E       + [datetime.datetime(4567, 1, 1, 0, 0), datetime.datetime(9999, 12, 31, 0, 0)]
E       - [datetime.datetime(4567, 1, 1, 0, 0, tzinfo=<UTC>),
E       -  datetime.datetime(9999, 12, 31, 0, 0, tzinfo=<UTC>)]

tests/unit/test_table.py:2345: AssertionError

Loading

@tswast
Copy link
Contributor Author

@tswast tswast commented Nov 5, 2020

Fixed the test failure in the latest commit. Even though the behavior is slightly different between pyarrow 1.0 and 2.0 as seen in the commit, I think it's worth keeping a wider range due to Arrow's use as a core library.

Loading

@gcf-merge-on-green gcf-merge-on-green bot merged commit cd9febd into googleapis:master Nov 10, 2020
10 checks passed
Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants