Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade Docker image and pyarrow in it #677

Merged
merged 3 commits into from May 1, 2020
Merged

Upgrade Docker image and pyarrow in it #677

merged 3 commits into from May 1, 2020

Conversation

rshkv
Copy link

@rshkv rshkv commented May 1, 2020

What changes were proposed in this pull request?

This is preparation for #649 and #673. They both rely on an image that has an upgraded pyarrow in it:

Why are the changes needed?

This unblock upgrading pyarrow.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

No logic was touched. Existing tests.

BryanCutler and others added 3 commits May 1, 2020 15:47
…Pandas assert_frame_equals

## What changes were proposed in this pull request?

Running PySpark tests with Pandas 0.24.x causes a failure in `test_pandas_udf_grouped_map` test_supported_types:
`ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()`

This is because a column is an ArrayType and the method `sqlutils ReusedSQLTestCase.assertPandasEqual ` does not properly check this.

This PR removes `assertPandasEqual` and replaces it with the built-in `pandas.util.testing.assert_frame_equal` which can properly handle columns of ArrayType and also prints out better diff between the DataFrames when an error occurs.

Additionally, imports of pandas and pyarrow were moved to the top of related test files to avoid duplicating the same import many times.

## How was this patch tested?

Existing tests

Closes apache#24306 from BryanCutler/python-pandas-assert_frame_equal-SPARK-27387.

Authored-by: Bryan Cutler <cutlerb@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
…timzeone due to more recent timezone updates in later JDK 8

Recent timezone definition changes in very new JDK 8 (and beyond) releases cause test failures. The below was observed on JDK 1.8.0_232. As before, the easy fix is to allow for these inconsequential variations in test results due to differing definition of timezones.

Keeps test passing on the latest JDK releases.

None

Existing tests

Closes apache#26236 from srowen/SPARK-29578.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
@rshkv rshkv requested a review from robert3005 May 1, 2020 15:39
@robert3005
Copy link

we should manually merge it to preserve the commits

@rshkv rshkv merged commit acb7d04 into master May 1, 2020
@rshkv rshkv deleted the wr/upgrade-image branch May 1, 2020 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants