# Spark SQL

Spark SQL is a Spark module that makes it both easier and more efficient to load and query for structured and semistructred data. We can interact with Spark SQL with SQL and regular Python/Java/Scala code. Internally, Spark SQL uses extra information to optimize the performance of the processing. In following example we will show that how to run SQL queries using Spark SQL.

At first we need to contruct a DataFrame for a JSON dataset. You can refer to [the official Spark SQL guide](http://spark.apache.org/docs/latest/sql-programming-guide.html#running-sql-queries-programmatically) for more information.

In [1]:
json_strings = ['{"name":"Bob","address":{"city":"Los Angeles","state":"California"}}', 
               '{"name":"Adam","address":{"city":"Seattle","state":"Washington"}}']
# Defines an RDD from the Python list.
peopleRDD = sc.parallelize(json_strings)
# Creates an DataFrame from an RDD[String].
people = spark.read.json(peopleRDD)
people.show()

+--------------------+----+
|             address|name|
+--------------------+----+
|[Los Angeles,Cali...| Bob|
|[Seattle,Washington]|Adam|
+--------------------+----+



Now we register the DataFrame as a SQL temporary view using the funtion *sql* which returns the result as a *DataFrame*, and then we can run SQL queries.

In [2]:
people.createOrReplaceTempView("people")

sqlDF = spark.sql("SELECT * FROM people").filter(people['name']=="Adam")
sqlDF.show()

+--------------------+----+
|             address|name|
+--------------------+----+
|[Seattle,Washington]|Adam|
+--------------------+----+



You can learn more information about Spark SQL [here](http://spark.apache.org/docs/latest/sql-programming-guide.html#running-sql-queries-programmatically).