## Reading and Processing Tables

Let us see how we can read tables using functions such as `spark.read.table` and process data using Data Frame APIs.

* Using Data Frame APIs - `spark.read.table("table_name")`.
* We can also prefix the database name to read tables belong to a particular database.
* When we read the table, it will result in a Data Frame.
* Once Data Frame is created we can use functions such as `filter` or `where`, `groupBy`, `sort` or `orderBy` to process the data in the Data Frame.

### Tasks
Let us see how we can create a table using data in a file and then read into a Data Frame.

* Create Database for **airlines** data.

In [None]:
import getpass
username = getpass.getuser()

In [None]:
spark.sql(f"CREATE DATABASE IF NOT EXISTS {username}_airlines")

In [None]:
spark.catalog.setCurrentDatabase(f"{username}_airlines")

* Create table by name **airport-codes** for file **airport-codes.txt**. The file contains header and each field in each row is delimited by a tab character.

In [None]:
airport_codes_path = f"/user/{username}/airlines_all/airport-codes"

In [None]:
spark.sql(f'DROP TABLE {username}_airlines.airport_codes')

In [None]:
airport_codes_df = spark. \
    read. \
    csv(airport_codes_path,
        sep="\t",
        header=True,
        inferSchema=True
       )

In [None]:
airport_codes_df.write.saveAsTable(f"{username}_airlines.airport_codes")

* Read data from table and get number of airports by state.

In [None]:
airport_codes = spark.read.table("airport_codes")

In [None]:
type(airport_codes)

In [None]:
spark.sql('DESCRIBE FORMATTED airport_codes').show(100, False)

In [None]:
airport_codes. \
    groupBy("state"). \
    count(). \
    show()