#### Creating Metastore Tables using catalog

##### Data Frames can be written into Metastore Tables using APIs such as `saveAsTable` and `insertInto` available as part of write on top of objects of type Data Frame.



*  We can create a new table using Data Frame using `saveAsTable`. 
   We can use modes such as `append`, `overwrite` and `error` with `saveAsTable`. Default is error.
   
*  We can also create an empty table by using `spark.catalog.createTable` or `spark.catalog.createExternalTable`.
   We can use modes such as `append` and `overwrite` with `insertInto`. Default is append.

In [1]:
from pyspark.sql import SparkSession

import getpass
username = getpass.getuser()

spark = SparkSession. \
    builder. \
    config('spark.ui.port', '0'). \
    config("spark.sql.warehouse.dir", f"/user/{username}/warehouse"). \
    enableHiveSupport(). \
    appName(f'{username} | Python - Spark Metastore'). \
    master('yarn'). \
    getOrCreate()

In [2]:
spark.catalog?

[0;31mType:[0m        property
[0;31mString form:[0m <property object at 0x7f5620d6b228>
[0;31mDocstring:[0m  
Interface through which the user may create, drop, alter or query underlying
databases, tables, functions, etc.

:return: :class:`Catalog`

.. versionadded:: 2.0


##### We can use help(spark.catalog) to get the list of methods

___Create database by name demo_db in the metastore.___

We need to use spark.sql as there is no function to create database under spark.catalog.

In [3]:
import getpass
username = getpass.getuser()

In [4]:
username

'itv736079'

In [5]:
spark.sql(f"DROP DATABASE IF EXISTS {username}_demo_db CASCADE")

In [6]:
spark.sql(f"CREATE DATABASE {username}_demo_db")

In [7]:
spark.catalog.setCurrentDatabase(f'{username}_demo_db')

# spark.sql(f"USE {username}_demo_db")

In [None]:
spark.catalog.listDatabases()

In [9]:
spark.catalog.currentDatabase()

'itv736079_demo_db'

##### Create a Data Frame which contain one column by name dummy and one row with value X.

In [10]:
l = [("X", )]
df = spark.createDataFrame(l, schema="dummy STRING")

In [12]:
df.show()

+-----+
|dummy|
+-----+
|    X|
+-----+



##### Create a table by name dual for the above Data Frame in the database created.

In [13]:
df.write.saveAsTable("dual", mode='overwrite')

In [14]:
spark.catalog.listTables()

[Table(name='dual', database='itv736079_demo_db', description=None, tableType='MANAGED', isTemporary=False)]

In [15]:
spark.read.table("dual").show()

+-----+
|dummy|
+-----+
|    X|
+-----+



In [16]:
spark.sql('SELECT * FROM dual').show()

+-----+
|dummy|
+-----+
|    X|
+-----+



##### Create Empty table and insert data into it.

In [19]:
spark.sql("DROP TABLE dual")

In [17]:
schema = df.schema

In [18]:
schema

StructType(List(StructField(dummy,StringType,true)))

In [20]:
spark.catalog.createTable('dual', schema=schema)

dummy


In [21]:
spark.catalog.listTables()

[Table(name='dual', database='itv736079_demo_db', description=None, tableType='MANAGED', isTemporary=False)]

In [22]:
df.write.insertInto('dual') # default mode append

In [23]:
spark.read.table("dual").show()

+-----+
|dummy|
+-----+
|    X|
+-----+



In [24]:
spark.sql('SELECT * FROM dual').show()

+-----+
|dummy|
+-----+
|    X|
+-----+



##### Let us drop the table dual and then database db. We need to use spark.sql as spark.catalog does not have API to drop the tables or databases.

In [25]:
# We can use CASCADE to drop database along with tables.
spark.sql(f"DROP DATABASE IF EXISTS {username}_demo_db CASCADE")