## Read data from Text Files

- Spark SQL provides `spark.read().text("file_name")` to read a file or directory of text files into a Spark DataFrame, and `dataframe.write().text("path")` to write to a text file. 

- When reading a text file, each line becomes each row that has string “value” column by default.

## Getting Started

In [0]:
from pyspark.sql import SparkSession

In [0]:
# Initialize Spark Session
spark = (SparkSession.builder
         .appName("Read CSV Data")
         .getOrCreate())

### Data Source

In [0]:
%run ../DatasetSourcePath

- A text dataset is pointed to by path.
- The path can be either a single text file or a directory of text files

In [0]:
path = sourcePath + "/dataset/people.txt"
path

In [0]:
%fs head abfss://files@storage33e.dfs.core.windows.net/dataset/people.txt

In [0]:
path = sourcePath + "/dataset/people.txt"
df1 = spark.read.text(path)
df1.show()

In [0]:
df1.first()

- You can use `lineSep` option to define the line separator.
- The line separator handles all `\r`, `\r\n` and `\n` by default.

In [0]:
df2 = spark.read.text(path, lineSep=",")
df2.show()

- You can also use `wholetext` option to read each input file as a single row.

In [0]:
df3 = spark.read.text(path, wholetext=True)
df3.show()