## Using Spark SQL

Let us understand how we can use Spark SQL to process data in Metastore Tables and Temporary Views.

* Once tables are created in metastore or temporary views are created, we can run queries against the tables to perform all standard transformations.

In [None]:
import getpass
username = getpass.getuser()

In [None]:
spark.catalog.setCurrentDatabase(f"{username}_airlines")

* Here are some of the transformations which can be performed.
  * Row Level Transformations using functions in SELECT clause.
  * Filtering using WHERE clause
  * Aggregations using GROUP BY and aggregate functions.
  * Sorting using ORDER BY or SORT BY

### Tasks

Let us perform some tasks to understand how to process data using Spark SQL using Metastore Tables or Temporary Views.
* Make sure table or view created for airport-codes. We can use the table or view created in the previous step.

In [None]:
spark.catalog.listTables()

* Write a query to get number of airports per state in the US. 
  * Get only those states which have more than 10 airports.
  * Make sure data is sorted in descending order by number of airports.

In [None]:
spark. \
    sql("""SELECT state, count(1) AS airport_cnt
           FROM airport_codes_v
           GROUP BY state
               HAVING count(1) >= 10
           ORDER BY airport_cnt DESC
        """). \
  show()

In [None]:
airport_count = spark. \
    sql("""SELECT state, count(1) AS airport_cnt
           FROM airport_codes_v
           GROUP BY state
               HAVING count(1) >= 10
        """)

In [None]:
from matplotlib import pyplot as plt

airport_count_dict = dict(airport_count.collect())

In [None]:
states = list(airport_count_dict.keys())
states

In [None]:
airport_counts = list(airport_count_dict.values())
airport_counts

In [None]:
plt.plot(states, airport_counts)
plt.xlabel('States')
plt.ylabel('Airport Counts')
plt.show()