## Creating Partitioned Tables

We can also create partitioned tables as part of Spark Metastore Tables.

* There are some challenges in creating partitioned tables directly using `spark.catalog.createTable`.
* But if the directories are similar to partitioned tables with data, we should be able to create partitioned tables. They are typically external tables.
* Let us create partitioned table for `orders` by `order_month`.

In [None]:
import getpass
username = getpass.getuser()

In [None]:
spark.sql(f'CREATE DATABASE IF NOT EXISTS {username}_retail')

In [None]:
spark.catalog.setCurrentDatabase(f'{username}_retail')

In [None]:
orders_path = '/public/retail_db/orders'

In [None]:
%%sh

hdfs dfs -ls /public/retail_db/orders

In [None]:
from pyspark.sql.functions import date_format

In [None]:
spark.sql('DROP TABLE orders_part')

In [None]:
%%sh

hdfs dfs -ls /user/`whoami`/retail_db/orders_part

In [None]:
%%sh

hdfs dfs -rm -R /user/`whoami`/retail_db/orders_part

In [None]:
spark. \
    read. \
    csv(orders_path,
        schema='''order_id INT, order_date DATE,
                  order_customer_id INT, order_status STRING
               '''
       ). \
    withColumn('order_month', date_format('order_date', 'yyyyMM')). \
    write. \
    partitionBy('order_month'). \
    parquet(f'/user/{username}/retail_db/orders_part')

In [None]:
%%sh

hdfs dfs -ls -R /user/`whoami`/retail_db/orders_part

In [None]:
spark. \
    read. \
    parquet(f'/user/{username}/retail_db/orders_part/order_month=201308'). \
    show()

In [None]:
spark. \
    read. \
    parquet(f'/user/{username}/retail_db/orders_part'). \
    show()

In [None]:
spark. \
    catalog. \
    createTable('orders_part',
                path=f'/user/{username}/retail_db/orders_part',
                source='parquet'
               )

In [None]:
spark.catalog.recoverPartitions('orders_part')

In [None]:
spark.read.table('orders_part').show()

In [None]:
spark.sql('SELECT order_month, count(1) FROM orders_part GROUP BY order_month').show()