## Using Spark SQL

Let us understand how we can use Spark SQL to process data in Metastore Tables and Temporary Views.

* Once tables are created in metastore or temporary views are created, we can run queries against the tables to perform all standard transformations.

Let us start spark context for this Notebook so that we can execute the code provided.

If you want to use terminal for the practice, here is the command to use.

```
spark2-shell \
  --master yarn \
  --name "Joining Data Sets" \
  --conf spark.ui.port=0
```

In [None]:
import org.apache.spark.sql.SparkSession

val spark = SparkSession.
    builder.
    config("spark.ui.port", "0").
    appName("Spark Metastore").
    master("yarn").
    getOrCreate()

In [None]:
spark.conf.set("spark.sql.shuffle.partitions", "2")

In [None]:
import spark.implicits._

In [None]:
import getpass
username = getpass.getuser()

In [None]:
spark.catalog.setCurrentDatabase(f"{username}_airlines")

* Here are some of the transformations which can be performed.
  * Row Level Transformations using functions in SELECT clause.
  * Filtering using WHERE clause
   * Aggregations using GROUP BY and aggregate functions.
  * Sorting using ORDER BY or SORT BY

### Tasks

Let us perform some tasks to understand how to process data using Spark SQL using Metastore Tables or Temporary Views.
* Make sure table or view created for airport-codes. We can use the table or view created in the previous step.

In [None]:
spark.catalog.listTables()

* Write a query to get number of airports per state in the US. 
 * Get only those states which have more than 10 airports.
 * Make sure data is sorted in descending order by number of airports.

In [None]:
spark.
    sql("""SELECT state, count(1) AS airport_cnt
           FROM airport_codes
           GROUP BY state
               HAVING count(1) >= 10
           ORDER BY airport_cnt DESC
        """).
  show()