## Creating Metastore Tables

Data Frames can be written into Metastore Tables using APIs such as `saveAsTable` and `insertInto` available as part of write on top of objects of type Data Frame.

* We can create a new table using Data Frame using `saveAsTable`. We can also create an empty table by using `spark.catalog.createTable` or `spark.catalog.createExternalTable`.
* We can also prefix the database name to write data into tables belong to a particular database. If the database is not specified then the session will be attached to default database.
* Databases can be created using `spark.sql("CREATE DATABASE database_name")`. We can list Databases using `spark.sql` or `spark.catalog.listDatabases()`
* We can use modes such as `append`, `overwrite` and `error` with `saveAsTable`. Default is error.
* We can use modes such as `append` and `overwrite` with `insertInto`. Default is append.
* When we use `saveAsTable`, following happens:
  * Check for table if the table already exists. By default `saveAsTable` will throw exception.
  * If the table does not exists the table will be created.
  * Data from Data Frame will be copied into the table.
  * We can alter the behavior by using mode. We can overwrite the existing table or we can append into it.
* We can list the tables using `spark.catalog.listTables` after switching to appropriate database using `spark.catalog.setCurrentDatabase`.
* We can also switch the database and list tables using `spark.sql`.

Let us start spark context for this Notebook so that we can execute the code provided.

If you want to use terminal for the practice, here is the command to use.

```
spark2-shell \
  --master yarn \
  --name "Joining Data Sets" \
  --conf spark.ui.port=0
```

In [None]:
import org.apache.spark.sql.SparkSession

val spark = SparkSession.
    builder.
    config("spark.ui.port", "0").
    appName("Spark Metastore").
    master("yarn").
    getOrCreate()

In [None]:
spark.conf.set("spark.sql.shuffle.partitions", "2")

In [None]:
import spark.implicits._

In [None]:
spark.catalog

### Tasks
Let us perform few tasks to understand how to write a Data Frame into Metastore tables and also list them.
* Create database by name db in the metastore. We need to use `spark.sql` as there is no function to create database under `spark.catalog`.

In [None]:
val username = System.getProperty("user.name")

In [None]:
spark.sql(s"CREATE DATABASE ${username}_db")

In [None]:
spark.catalog.setCurrentDatabase(s"${username}_db")

* List the databases using both API as well as SQL approach. As we have too many databases in our environment, it might take too much time to return the results

In [None]:
spark.catalog.listDatabases()

* Create a Data Frame which contain one column by name **dummy** and one row with value **X**.

In [None]:
l = List("X")
val df = l.toDF

* Create a table by name dual for the above Data Frame in the database created. 

In [None]:
df.write.saveAsTable("dual")

In [None]:
spark.catalog.listTables()

In [None]:
spark.read.table("dual").show()

* Let us drop the table **dual** and then database **db**. We need to use `spark.sql` as `spark.catalog` does not have API to drop the tables or databases.

In [None]:
spark.sql("DROP TABLE dual")

In [None]:
spark.sql(f"DROP DATABASE {username}_db")

In [None]:
# We can use CASCADE to drop database along with tables.
spark.sql(f"DROP DATABASE {username}_db CASCADE")