# Data Sources for DataFrames and SQL Tables
As shown in Figure 4-1, Spark SQL provides an interface to a variety of data sources.
It also provides a set of common methods for reading and writing data to and from
these data sources using the Data Sources API.
In this section we will cover some of the built-in data sources, available file formats,
and ways to load and write data, along with specific options pertaining to these data
sources. But first, let’s take a closer look at two high-level Data Source API constructs
that dictate the manner in which you interact with different data sources: DataFrameR
eader and DataFrameWriter.

## DataFrameReader
DataFrameReader is the core construct for reading data from a data source into a
DataFrame. It has a defined format and a recommended pattern for usage:

DataFrameReader.format(args).option("key", "value").schema(args).load()

This pattern of stringing methods together is common in Spark, and easy to read. We
saw it in Chapter 3 when exploring common data analysis patterns.
Note that you can only access a DataFrameReader through a SparkSession instance.
That is, you cannot create an instance of DataFrameReader. To get an instance handle
to it, use:

SparkSession.read
// or
SparkSession.readStream

While read returns a handle to DataFrameReader to read into a DataFrame from a
static data source, readStream returns an instance to read from a streaming source.
(We will cover Structured Streaming later in the book.)

Arguments to each of the public methods to DataFrameReader take different values.
Table 4-1 enumerates these, with a subset of the supported arguments.

While we won’t comprehensively enumerate all the different combinations of arguments
and options, the documentation for Python, Scala, R, and Java offers suggestions
and guidance. It’s worthwhile to show a couple of examples, though:

// In Scala
// Use Parquet
val file = """/databricks-datasets/learning-spark-v2/flights/summarydata/
parquet/2010-summary.parquet"""
val df = spark.read.format("parquet").load(file)

// Use Parquet; you can omit format("parquet") if you wish as it's the default
val df2 = spark.read.load(file)

// Use CSV
val df3 = spark.read.format("csv")
.option("inferSchema", "true")
.option("header", "true")
.option("mode", "PERMISSIVE")
.load("/databricks-datasets/learning-spark-v2/flights/summary-data/csv/*")

// Use JSON
val df4 = spark.read.format("json")
.load("/databricks-datasets/learning-spark-v2/flights/summary-data/json/*")

# DataFrameWriter
DataFrameWriter does the reverse of its counterpart: it saves or writes data to a specified
built-in data source. Unlike with DataFrameReader, you access its instance not
from a SparkSession but from the DataFrame you wish to save. It has a few recommended
usage patterns:

DataFrameWriter.format(args)
.option(args)
.bucketBy(args)
.partitionBy(args)
.save(path)

DataFrameWriter.format(args).option(args).sortBy(args).saveAsTable(table)

To get an instance handle, use:

DataFrame.write
// or
DataFrame.writeStream

Arguments to each of the methods to DataFrameWriter also take different values. We
list these in Table 4-2, with a subset of the supported arguments.

### rest of the topics in the book is related to reading/writing different data sources such as parquet(default), json, csv, avro, orc using dataframe and spark sql. please refer docs for more info and code.