# Creating Iceberg Tables with Spark SQL

This notebook demonstrates how to create and use Iceberg tables in Spark using SQL syntax.

## 1. Create a Basic Iceberg Table (No Partitioning)

We create a simple Iceberg table named `products` using Spark SQL, insert some records, and display the data.

In [10]:
# Basic Table (No Partitioning)
spark.sql("""
CREATE TABLE local.db.products (
    id INT,
    name STRING,
    price DOUBLE
)
USING ICEBERG
TBLPROPERTIES ('format-version' = '2')
""")
spark.sql("""
INSERT INTO local.db.products VALUES
(1, 'Laptop', 999.99),
(2, 'Phone', 499.50)
""")
spark.sql("""SELECT * FROM local.db.products""").show()

+---+------+------+
| id|  name| price|
+---+------+------+
|  1|Laptop|999.99|
|  2| Phone| 499.5|
+---+------+------+



## 2. Inspect Table Schema and Properties

We use `DESCRIBE TABLE EXTENDED` to view the schema and properties of the `products` table.

In [11]:
spark.sql("DESCRIBE TABLE EXTENDED local.db.products").show(truncate=False)

+----------------------------+----------------------------------------------------------------------------------------------------------------------+-------+
|col_name                    |data_type                                                                                                             |comment|
+----------------------------+----------------------------------------------------------------------------------------------------------------------+-------+
|id                          |int                                                                                                                   |NULL   |
|name                        |string                                                                                                                |NULL   |
|price                       |double                                                                                                                |NULL   |
|                            |                      

## 3. Create a Partitioned Iceberg Table (by Month)

We create a partitioned Iceberg table named `sales`, partitioned by month of the `sale_date` column, insert data, and display the contents.

In [12]:
# Create an Iceberg table using SQL
spark.sql("""
    CREATE TABLE local.db.sales (
        id INT,
        product STRING,
        amount DOUBLE,
        sale_date DATE
    )
    USING ICEBERG
    PARTITIONED BY (months(sale_date))
    TBLPROPERTIES (
        'format-version' = '2'
    )
""")

# Insert records
spark.sql("""
    INSERT INTO local.db.sales VALUES
    (1, 'Pen', 5.5, DATE '2024-01-15'),
    (2, 'Pencil', 2.0, DATE '2024-01-17'),
    (3, 'Notebook', 7.25, DATE '2024-02-10')
""")

# Read and display
spark.sql("SELECT * FROM local.db.sales").show()

+---+--------+------+----------+
| id| product|amount| sale_date|
+---+--------+------+----------+
|  1|     Pen|   5.5|2024-01-15|
|  2|  Pencil|   2.0|2024-01-17|
|  3|Notebook|  7.25|2024-02-10|
+---+--------+------+----------+

