## Batch computation

Before we begin with Streaming, let's go back to Spark batch computation. The most basic steps for batch procesing are:
1. Read data from source file
2. Do transformation
3. Write transformed DataFrame to a file

In [None]:
# Import libraries
from pyspark.sql import SparkSession
from pyspark.sql.functions import avg, round
from pyspark.sql.types import StructType, StructField, IntegerType, StringType, DecimalType

In [None]:
# Initialize a Spark session
spark = SparkSession.builder.appName("Batch_Streaming_Comparison").getOrCreate()

In [None]:
# Defining schema for `data/batch_resource/real_estate.csv`
real_estate_schema = StructType(
    [StructField('UID', IntegerType()), 
    StructField('Location', StringType(), True), 
    StructField('Price', DecimalType(11,2), True), 
    StructField('Bedrooms', IntegerType(), True), 
    StructField('Bathrooms', IntegerType(), True), 
    StructField('Size', IntegerType(), True), 
    StructField('Price SQ Ft', DecimalType(7,2), True), 
    StructField('Status', StringType(), True)])

### Exercise

As a warmup exercise:
1. Read csv files from `../data/batch_resource`
2. Use earlier defined `real_estate_schema` StructType object
3. Group data by `Location`
4. Get average `Price` per `Location`. Round it to 2 decimal places
5. Sort data by average price in descending order
6. Print output to console

In [None]:
# TODO: 

real_estate_batch = ...

Compare it to how the same process looks like in Spark Structured Streaming:

In [None]:
real_estate_stream = (spark
    .readStream
    .schema(real_estate_schema)
    .csv("../data/batch_resource", header=True)
    .groupBy("Location")
    .agg(round(avg("Price"), 2).alias('AveragePrice'))
    .orderBy('AveragePrice')
    .writeStream
    .outputMode("complete")
    .format("console")
    .start())


In [None]:
real_estate_stream.stop()

The similarity between batch and streaming processing is very noticible. `readStream` and `writeStream` are counterparts to `read` and `write` in batch processing methods.