We can pass the options using different ways while creating the Data Frame.
* Using key word arguments as part of APIs. We can use key word arguments as part of `load` as well as direct API (`csv`).
* `spark.read.option`
* `spark.read.options`
* If key in the option is incorrect then the options will be ignored.

Depending up on the API based on the file format the options as well as arguments vary.

In [0]:
import getpass
username = getpass.getuser()

In [0]:
# Default behavior
# It will delimit the data using comma as separator
# Column names will be system generated
# All the fields will be of type strings
orders = spark.read.csv(f'/user/{username}/retail_db_pipe/orders')

In [0]:
orders.show()

In [0]:
# schema, sep, quote, header, mode (to deal with corrupt records)
# inferSchema, ignoring spaces, null values, multiLine, etc
help(spark.read.csv)

In [0]:
orders = spark. \
    read. \
    csv(
        f'/user/{username}/retail_db_pipe/orders',
        sep='|',
        header=None,
        inferSchema=True
    ). \
    toDF('order_id', 'order_date', 'order_customer_id', 'order_status')

In [0]:
orders.show()

In [0]:
help(spark.read.format('csv').load)

In [0]:
orders = spark. \
    read. \
    format('csv'). \
    load(
        f'/user/{username}/retail_db_pipe/orders',
        sep='|',
        header=None,
        inferSchema=True
    ). \
    toDF('order_id', 'order_date', 'order_customer_id', 'order_status')

In [0]:
orders.show()

In [0]:
help(spark.read.option)

In [0]:
orders = spark. \
    read. \
    option('sep', '|'). \
    option('header', None). \
    option('inferSchema', True). \
    csv(f'/user/{username}/retail_db_pipe/orders'). \
    toDF('order_id', 'order_date', 'order_customer_id', 'order_status')

In [0]:
orders.show()

In [0]:
orders.dtypes

In [0]:
help(spark.read.options)

In [0]:
orders = spark. \
    read. \
    options(sep='|', header=None, inferSchema=True). \
    csv(f'/user/{username}/retail_db_pipe/orders'). \
    toDF('order_id', 'order_date', 'order_customer_id', 'order_status')

In [0]:
orders.show()

In [0]:
options = {
    'sep': '|',
    'header': None,
    'inferSchema': True
}

In [0]:
orders = spark. \
    read. \
    options(**options). \
    csv(f'/user/{username}/retail_db_pipe/orders'). \
    toDF('order_id', 'order_date', 'order_customer_id', 'order_status')

In [0]:
orders.show()

In [0]:
orders = spark. \
    read. \
    options(**options). \
    format('csv'). \
    load(f'/user/{username}/retail_db_pipe/orders'). \
    toDF('order_id', 'order_date', 'order_customer_id', 'order_status')

In [0]:
orders.show()