### Spark SQL
Spark SQL is a Spark module that makes it both easier and more efficient to load and query for structured and semistructred data. We can interact with Spark SQL with SQL and regular Python/Java/Scala code. Internally, Spark SQL uses extra information to optimize the performance of the processing. In following example we will show that how to run SQL queries using Spark SQL.

At first we need to contruct a DataFrame for a JSON dataset.

In [1]:
import findspark
findspark.init()

import pyspark
sc = pyspark.SparkContext(appName="myAppName")

In [2]:
json_strings = ['{"name":"Bob","address":{"city":"Los Angeles","state":"California"}}']

In [3]:
# Defines an RDD from the Python list.
peopleRDD = sc.parallelize(json_strings)

In [6]:
# Creates an DataFrame from an RDD[String].
from pyspark.sql.session import SparkSession
spark = SparkSession(sc)
people = spark.read.json(peopleRDD)

In [7]:
people.show()

+--------------------+----+
|             address|name|
+--------------------+----+
|[Los Angeles,Cali...| Bob|
+--------------------+----+



Now we register the DataFrame as a SQL temporary view using the funtion sql which retures the result as a DataFrame, and then we can run SQL queries.

In [8]:
people.createOrReplaceTempView("people")

In [9]:
sqlDF = spark.sql("SELECT * FROM people")

In [10]:
sqlDF.show()

+--------------------+----+
|             address|name|
+--------------------+----+
|[Los Angeles,Cali...| Bob|
+--------------------+----+

