Skip to content

Commit

Permalink
[SPARK-27276][PYTHON][DOCS][FOLLOW-UP] Update documentation about Arr…
Browse files Browse the repository at this point in the history
…ow version in PySpark as well

## What changes were proposed in this pull request?

Looks updating documentation from 0.8.0 to 0.12.1 was missed.

## How was this patch tested?

N/A

Closes apache#24504 from HyukjinKwon/SPARK-27276-followup.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Bryan Cutler <cutlerb@gmail.com>
  • Loading branch information
HyukjinKwon authored and rshkv committed May 21, 2020
1 parent c33f870 commit c2daee4
Showing 1 changed file with 3 additions and 4 deletions.
7 changes: 3 additions & 4 deletions docs/sql-pyspark-pandas-with-arrow.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ displayTitle: PySpark Usage Guide for Pandas with Apache Arrow
* Table of contents
{:toc}

## Apache Arrow in Spark
## Apache Arrow in PySpark

Apache Arrow is an in-memory columnar data format that is used in Spark to efficiently transfer
data between JVM and Python processes. This currently is most beneficial to Python users that
Expand All @@ -20,7 +20,7 @@ working with Arrow-enabled data.

If you install PySpark using pip, then PyArrow can be brought in as an extra dependency of the
SQL module with the command `pip install pyspark[sql]`. Otherwise, you must ensure that PyArrow
is installed and available on all cluster nodes. The current supported version is 0.8.0.
is installed and available on all cluster nodes. The current supported version is 0.12.1.
You can install using pip or conda from the conda-forge channel. See PyArrow
[installation](https://arrow.apache.org/docs/python/install.html) for details.

Expand Down Expand Up @@ -128,8 +128,7 @@ For detailed usage, please see [`pyspark.sql.functions.pandas_udf`](api/python/p
### Supported SQL Types

Currently, all Spark SQL data types are supported by Arrow-based conversion except `MapType`,
`ArrayType` of `TimestampType`, and nested `StructType`. `BinaryType` is supported only when
installed PyArrow is equal to or higher than 0.10.0.
`ArrayType` of `TimestampType`, and nested `StructType`.

### Setting Arrow Batch Size

Expand Down

0 comments on commit c2daee4

Please sign in to comment.