## Generic Load Functions

## Getting Started

In [0]:
from pyspark.sql import SparkSession

In [0]:
# Initialize Spark Session
spark = (SparkSession.builder
         .appName("Read CSV Data")
         .getOrCreate())

### Data Source

In [0]:
%run ../DatasetSourcePath

In [0]:
path = sourcePath + "/dataset/users.parquet"
users_df = spark.read.format('parquet').load(path)
users_df.show()

Data sources are specified by their fully qualified name (i.e., org.apache.spark.sql.parquet), but for built-in sources you can also use their short names (json, parquet, jdbc, orc, libsvm, csv, text). DataFrames loaded from any data source type can be converted into other types using this syntax.

### Load json file

In [0]:
path = sourcePath + "/dataset/people.json"
people_df = spark.read.load(path, format="json")
people_df.show()
people_df.printSchema()


### To load a CSV file you can use:

In [0]:
path = sourcePath + "/dataset/people.csv"
people_df = spark.read.load(
    path,
    format="csv",
    sep=";",
    inferSchema="true",
    header="true"
)

people_df.show()

In [0]:
spark.read.csv(path, header=True, inferSchema=True, sep=";").show(4)

### Run SQL on files directly

In [0]:
path = sourcePath + "/dataset/users.parquet"
spark.sql(f"SELECT * FROM parquet.`{path}`").show()