# Readers and Writers in PySpark
    - In PySpark, Readers and Writers are components that enable you to read data from external sources into DataFrames or write DataFrames back to external storage
    
    - These components are essential for ingesting and persisting data in a distributed computing environment

# 1. Writers:
    - Writers are responsible for writing DataFrames to external storage systems.
    - PySpark provides various built-in writers to handle different file formats and external storage systems

### 1.0. DataFrameWriter:
    - The `DataFrameWriter` is a high-level API in PySpark that allows you to write DataFrames to various file formats and external storage systems.
    - It is accessible through the `DataFrame.write` attribute.

**Note**: This Notebook need DBFS Supported File System , to work and run in Databricks Env supported by DBFS

### 1.1. JSONWriter

In [0]:
from pyspark.sql import SparkSession

# Create a Spark session
spark = SparkSession.builder.appName("CreateJSONFile").getOrCreate()


In [0]:
spark

In [0]:
# Sample employee data
data = [("Alice", 25), ("Bob", 30), ("Charlie", 22)]

# Create a DataFrame from the data
columns = ["Name", "Age"]
json_df = spark.createDataFrame(data, columns)

# Location for Shared storage using DBFS
outpath_empjsonFile= "dbfs:/FileStore/shared_uploads/iriscloudone@outlook.com/emp.json"

# Save the DataFrame as a JSON file
json_df.write.json(outpath_empjsonFile,mode="overwrite")

In [0]:
json_df.show()

### 1.2. CSVWriter:
   - The `csv()` method in the DataFrameWriter allows writing DataFrames to CSV files.

In [0]:
# Sample employee data
data = [("Alice", 25), ("Bob", 30), ("Charlie", 22)]

# Create a DataFrame from the data
columns = ["Name", "Age"]
csv_df = spark.createDataFrame(data, columns)

# Location for Shared storage using DBFS
outpath_empcsvFile= "dbfs:/FileStore/shared_uploads/iriscloudone@outlook.com/emp.csv"

# Save the DataFrame as a CSV file
csv_df.write.csv(outpath_empcsvFile,header=True,mode="overwrite")

In [0]:
csv_df.show()

### 1.3 ParquetWriter()

In [0]:
# Sample employee data
data = [("Alice", 25), ("Bob", 30), ("Charlie", 22)]

# Create a DataFrame from the data
columns = ["Name", "Age"]
parquet_df = spark.createDataFrame(data, columns)

# Specify the path to save the Parquet file
outpath_empparquetFile = "dbfs:/FileStore/shared_uploads/iriscloudone@outlook.com/emp.parquet"

# Write the DataFrame to a Parquet file
parquet_df.write.parquet(outpath_empparquetFile,mode="overwrite")

In [0]:
parquet_df.show()

## 2. Readers:
    - Readers are responsible for reading data from external sources and creating DataFrames from that data. 
    - PySpark provides various built-in readers to handle different file formats and data sources:

### 2.1. JSONReader:
   - PySpark can read data from JSON files into DataFrames using the `DataFrameReader`

In [0]:
output_json_df = spark.read.json(outpath_empjsonFile)
output_json_df.show()

### 2.2. CSVReader:
   - The CSV file format is a columnar storage file format that is optimized for big data processing in Spark
   - PySpark supports reading Parquet files through the `DataFrameReader`

In [0]:
output_csv_df = spark.read.csv(outpath_empcsvFile, header=True, inferSchema=True)
output_csv_df.show()

### 2.3. ParquetReader:
   - The Parquet file format is a columnar storage file format that is optimized for big data processing in Spark
   - PySpark supports reading Parquet files through the `DataFrameReader`

In [0]:
# Reading data from a Parquet file into a DataFrame
output_parquet_df = spark.read.parquet(outpath_empparquetFile)
output_parquet_df.show()