Skip to content

Commit

Permalink
[SPARK-32888][DOCS] Add user document about header flag and RDD as pa…
Browse files Browse the repository at this point in the history
…th for reading CSV

### What changes were proposed in this pull request?

This proposes to enhance user document of the API for loading a Dataset of strings storing CSV rows. If the header option is set to true, the API will remove all lines same with the header.

### Why are the changes needed?

This behavior can confuse users. We should explicitly document it.

### Does this PR introduce _any_ user-facing change?

No. Only doc change.

### How was this patch tested?

Only doc change.

Closes apache#29765 from viirya/SPARK-32888.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
  • Loading branch information
viirya authored and HyukjinKwon committed Sep 16, 2020
1 parent 2e3aa2f commit 550c1c9
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 0 deletions.
3 changes: 3 additions & 0 deletions python/pyspark/sql/readwriter.py
Original file line number Diff line number Diff line change
Expand Up @@ -430,6 +430,9 @@ def csv(self, path, schema=None, sep=None, encoding=None, quote=None, escape=Non
character. By default (None), it is disabled.
:param header: uses the first line as names of columns. If None is set, it uses the
default value, ``false``.
.. note:: if the given path is a RDD of Strings, this header
option will remove all lines same with the header if exists.
:param inferSchema: infers the input schema automatically from data. It requires one extra
pass over the data. If None is set, it uses the default value, ``false``.
:param enforceSchema: If it is set to ``true``, the specified or inferred schema will be
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -600,6 +600,9 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
* If the enforceSchema is set to `false`, only the CSV header in the first line is checked
* to conform specified or inferred schema.
*
* @note if `header` option is set to `true` when calling this API, all lines same with
* the header will be removed if exists.
*
* @param csvDataset input Dataset with one CSV row per record
* @since 2.2.0
*/
Expand Down

0 comments on commit 550c1c9

Please sign in to comment.