# Pyspark Usage with Delta Lake & Minio

This notebook shows how to write a CSV file directly to Minio, and also how to write and read a managed Delta Lake table in Minio.

Click the Table of Contents button in the left JupyterLab sidebar (the button on the far left of this browser window that looks like a bulleted list) to see the types of examples provided. **Make sure to run all the cells above a given section, since most examples in this notebook depend on those above them**

## Get Environment Variables for Minio (S3) Connection

In [1]:
import pyspark
import os

In [2]:
os.environ 
## Should see S3_ENDPOINT, S3_ACCESS_KEY, and S3_SECRET_KEY environment varibles.
# These environment variables are set in the docker-compose.yml, and the service account used by PySpark
#> to read from and write to Minio are created by the minio-init container defined in docker-compose.yml

environ{'PATH': '/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin',
        'HOSTNAME': '13890db84cd2',
        'ENDPOINT_URL': 'http://minio:9000',
        'S3_BUCKET': 'test',
        'AWS_ACCESS_KEY_ID': 'jupyteraccesskey',
        'AWS_SECRET_ACCESS_KEY': 'jupytersupersecretkey',
        'LANG': 'C.UTF-8',
        'GPG_KEY': 'A035C8C19219BA821ECEA86B64E628F8D684696D',
        'PYTHON_VERSION': '3.11.0',
        'PYTHON_PIP_VERSION': '22.3',
        'PYTHON_SETUPTOOLS_VERSION': '65.5.0',
        'PYTHON_GET_PIP_URL': 'https://github.com/pypa/get-pip/raw/66030fa03382b4914d4c4d0896961a0bdeeeb274/public/get-pip.py',
        'PYTHON_GET_PIP_SHA256': '1e501cf004eac1b7eb1f97266d28f995ae835d30250bec7f8850562703067dc6',
        'HOME': '/root',
        'PYDEVD_USE_FRAME_EVAL': 'NO',
        'JPY_SESSION_NAME': '9f1e88a7-487b-44d6-9857-793d0e2adc20',
        'JPY_PARENT_PID': '1',
        'TERM': 'xterm-color',
        'CLICOLOR': '1',
        'PAGER': 'cat',
     

In [3]:
S3_ACCESS_KEY = os.environ.get("AWS_ACCESS_KEY_ID")
S3_BUCKET = os.environ.get("S3_BUCKET")
S3_SECRET_KEY = os.environ.get("AWS_SECRET_ACCESS_KEY")
S3_ENDPOINT = os.environ.get("ENDPOINT_URL")
# S3_ACCESS_KEY = "sparkaccesskey"
# S3_BUCKET = "test"
# S3_SECRET_KEY = "sparksupersecretkey"
# S3_ENDPOINT = "http://minio:9000"

## Configure Pyspark to Connect to Minio and Enable Delta-Lake Format

In [4]:
# This cell may take some time to run the first time, as it must download the necessary spark jars
conf = pyspark.SparkConf()

## IF YOU ARE USING THE SPARK CONTAINERS, UNCOMMENT THE LINE BELOW TO OFFLOAD EXECUTION OF SPARK TASKS TO SPARK CONTAINERS
#conf.setMaster("spark://spark:7077")

conf.set("spark.jars.packages", 'org.apache.hadoop:hadoop-aws:3.3.1,io.delta:delta-core_2.12:2.1.0')
# conf.set('spark.hadoop.fs.s3a.aws.credentials.provider', 'org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider')
conf.set('spark.hadoop.fs.s3a.endpoint', S3_ENDPOINT)
conf.set('spark.hadoop.fs.s3a.access.key', S3_ACCESS_KEY)
conf.set('spark.hadoop.fs.s3a.secret.key', S3_SECRET_KEY)
conf.set('spark.hadoop.fs.s3a.path.style.access', "true")
conf.set("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension")
conf.set("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")

sc = pyspark.SparkContext(conf=conf)

# sc.setLogLevel("INFO")

:: loading settings :: url = jar:file:/usr/local/lib/python3.11/site-packages/pyspark/jars/ivy-2.5.0.jar!/org/apache/ivy/core/settings/ivysettings.xml


Ivy Default Cache set to: /root/.ivy2/cache
The jars for the packages stored in: /root/.ivy2/jars
org.apache.hadoop#hadoop-aws added as a dependency
io.delta#delta-core_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-fd9f541b-69d6-4625-a2a2-90ebf09d8fff;1.0
	confs: [default]
	found org.apache.hadoop#hadoop-aws;3.3.1 in central
	found com.amazonaws#aws-java-sdk-bundle;1.11.901 in central
	found org.wildfly.openssl#wildfly-openssl;1.0.7.Final in central
	found io.delta#delta-core_2.12;2.1.0 in central
	found io.delta#delta-storage;2.1.0 in central
	found org.antlr#antlr4-runtime;4.8 in central
	found org.codehaus.jackson#jackson-core-asl;1.9.13 in central
downloading https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/3.3.1/hadoop-aws-3.3.1.jar ...
	[SUCCESSFUL ] org.apache.hadoop#hadoop-aws;3.3.1!hadoop-aws.jar (443ms)
downloading https://repo1.maven.org/maven2/io/delta/delta-core_2.12/2.1.0/delta-core_2.12-2.1.0.jar ...
	[SUCCESSFUL 

23/04/15 02:38:38 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).


In [5]:
spark = pyspark.sql.SparkSession(sc)

## Read in Sample CSV Data from Local Filesystem

In [6]:
df = spark.read.option("header", "true").csv("/data/appl_stock.csv")

In [7]:
df.show()

+----------+------------------+------------------+------------------+------------------+---------+------------------+
|      Date|              Open|              High|               Low|             Close|   Volume|         Adj Close|
+----------+------------------+------------------+------------------+------------------+---------+------------------+
|2010-01-04|        213.429998|        214.499996|212.38000099999996|        214.009998|123432400|         27.727039|
|2010-01-05|        214.599998|        215.589994|        213.249994|        214.379993|150476200|27.774976000000002|
|2010-01-06|        214.379993|            215.23|        210.750004|        210.969995|138040000|27.333178000000004|
|2010-01-07|            211.75|        212.000006|        209.050005|            210.58|119282800|          27.28265|
|2010-01-08|        210.299994|        212.000006|209.06000500000002|211.98000499999998|111902700|         27.464034|
|2010-01-11|212.79999700000002|        213.000002|      

In [8]:
df.printSchema()

root
 |-- Date: string (nullable = true)
 |-- Open: string (nullable = true)
 |-- High: string (nullable = true)
 |-- Low: string (nullable = true)
 |-- Close: string (nullable = true)
 |-- Volume: string (nullable = true)
 |-- Adj Close: string (nullable = true)



## Modify Column Types

In [9]:
for col in ["Open", "High", "Low", "Close", "Adj Close"]:
    df = df.withColumn(col,df[col].cast('double'))
for col in ["Volume"]:
    df = df.withColumn(col, df[col].cast('int'))

In [10]:
df.printSchema()

root
 |-- Date: string (nullable = true)
 |-- Open: double (nullable = true)
 |-- High: double (nullable = true)
 |-- Low: double (nullable = true)
 |-- Close: double (nullable = true)
 |-- Volume: integer (nullable = true)
 |-- Adj Close: double (nullable = true)



## Write CSV Directly to Minio (Not as a Delta Table)

In [11]:
df.write.csv(f"s3a://{S3_BUCKET}/appl_stock.csv", mode="overwrite")

23/04/15 02:38:45 WARN MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
23/04/15 02:38:45 WARN AbstractS3ACommitterFactory: Using standard FileOutputCommitter to commit work. This is slow and potentially unsafe.
23/04/15 02:38:46 WARN AbstractS3ACommitterFactory: Using standard FileOutputCommitter to commit work. This is slow and potentially unsafe.


**Navigate to http://localhost:9090 and login to the Minio Console to see the CSV file**

(username and password for minio can be found in the environment variables section of the minio service definition in the docker-compose.yml)

# Write a Delta Lake Table in Minio using Spark

In [12]:
# Have to replace spaces in column names with underscores for Delta
delta_df = df
for col in delta_df.columns:
    delta_df = delta_df.withColumnRenamed(col, col.replace(" ","_"))

In [13]:
delta_df.show()

+----------+------------------+------------------+------------------+------------------+---------+------------------+
|      Date|              Open|              High|               Low|             Close|   Volume|         Adj_Close|
+----------+------------------+------------------+------------------+------------------+---------+------------------+
|2010-01-04|        213.429998|        214.499996|212.38000099999996|        214.009998|123432400|         27.727039|
|2010-01-05|        214.599998|        215.589994|        213.249994|        214.379993|150476200|27.774976000000002|
|2010-01-06|        214.379993|            215.23|        210.750004|        210.969995|138040000|27.333178000000004|
|2010-01-07|            211.75|        212.000006|        209.050005|            210.58|119282800|          27.28265|
|2010-01-08|        210.299994|        212.000006|209.06000500000002|211.98000499999998|111902700|         27.464034|
|2010-01-11|212.79999700000002|        213.000002|      

In [14]:
delta_df.printSchema()

root
 |-- Date: string (nullable = true)
 |-- Open: double (nullable = true)
 |-- High: double (nullable = true)
 |-- Low: double (nullable = true)
 |-- Close: double (nullable = true)
 |-- Volume: integer (nullable = true)
 |-- Adj_Close: double (nullable = true)



## Create Month and Year columns for partitioning

In [15]:
from pyspark.sql.functions import month, year

In [16]:
delta_df = delta_df.withColumn("Month", month(delta_df.Date))
delta_df = delta_df.withColumn("Year", year(delta_df.Date))

In [17]:
delta_table_name = "appl_stock_delta_table"

In [18]:
delta_df.write.format("delta").partitionBy('Year','Month').option("overwriteSchema", "true").save(f"s3a://{S3_BUCKET}/{delta_table_name}", mode="overwrite")

                                                                                

**Navigate to http://localhost:9090 and login to the Minio Console to see the Delta Lake Table**

**Note that the Delta Lake Table includes both the data partitions and the metadata log**

(username and password for minio can be found in the environment variables section of the minio service definition in the docker-compose.yml)

# Read the Delta Table Back into Spark

In [19]:
new_delta_df = spark.read.format("delta").load(f"s3a://{S3_BUCKET}/{delta_table_name}")

In [20]:
new_delta_df.show()

+----------+------------------+------------------+------------------+------------------+---------+------------------+-----+----+
|      Date|              Open|              High|               Low|             Close|   Volume|         Adj_Close|Month|Year|
+----------+------------------+------------------+------------------+------------------+---------+------------------+-----+----+
|2011-08-01|397.77999900000003|        399.500011|        392.369995|        396.749989|153209000|          51.40275|    8|2011|
|2011-08-02|        397.650009|397.90000200000003|         388.35001|        388.909996|159884900|         50.387004|    8|2011|
|2011-08-03|390.98000299999995|        393.549995|         382.23999|            392.57|183127000|         50.861193|    8|2011|
|2011-08-04|        389.410007|391.32001099999997|377.34999799999997|        377.369999|217851900|         48.891888|    8|2011|
|2011-08-05|        380.440002|        383.499992|        362.570007|        373.620007|301147700

## Delete Data From Delta Table

In [21]:
from delta.tables import *

In [22]:
delta_table = DeltaTable.forPath(spark, f"s3a://{S3_BUCKET}/{delta_table_name}")

In [23]:
delta_table.delete("Date < '2010-02-01'")

                                                                                

In [24]:
# delta_table.vacuum()

# .vacuum() is not really necessary for this example. For more info, see https://docs.delta.io/latest/delta-utility.html#remove-files-no-longer-referenced-by-a-delta-table

In [25]:
updated_df = delta_table.toDF()

In [26]:
updated_df.describe().show()
# Notice the min date due to the delete above



+-------+----------+------------------+------------------+------------------+------------------+--------------------+------------------+------------------+------------------+
|summary|      Date|              Open|              High|               Low|             Close|              Volume|         Adj_Close|             Month|              Year|
+-------+----------+------------------+------------------+------------------+------------------+--------------------+------------------+------------------+------------------+
|  count|      1743|              1743|              1743|              1743|              1743|                1743|              1743|              1743|              1743|
|   mean|      null| 314.2063570154905|317.05242083132543|310.96767053872634| 314.0739527596099| 9.307720510613884E7| 75.52596069420538| 6.610441767068273|2013.0338496844522|
| stddev|      null|185.98853126511312|187.59247321328291|184.05333687045766|185.82494613123112|5.8516940489542216E7|28.28297

                                                                                

## Use Time Travel to READ a Previous Version of the Delta Table

In [27]:
previous_df = spark.read.format("delta").option("versionAsOf", 0).load(f"s3a://{S3_BUCKET}/{delta_table_name}")
previous_df.describe().show()
# Notice the min date, showing that we are reading from a previous version

                                                                                

+-------+----------+------------------+------------------+------------------+------------------+-------------------+------------------+------------------+------------------+
|summary|      Date|              Open|              High|               Low|             Close|             Volume|         Adj_Close|             Month|              Year|
+-------+----------+------------------+------------------+------------------+------------------+-------------------+------------------+------------------+------------------+
|  count|      1762|              1762|              1762|              1762|              1762|               1762|              1762|              1762|              1762|
|   mean|      null| 313.0763111589103| 315.9112880164587|309.82824050794557| 312.9270656379114|9.422577587968218E7| 75.00174115607263|6.5499432463110105|2013.0011350737798|
| stddev|      null|185.29946803981534|186.89817686485773|183.38391664370977|185.14710361709427|6.020518776592713E7|28.57492972179

                                                                                

In [28]:
delta_table.history().show()

+-------+-------------------+------+--------+---------+--------------------+----+--------+---------+-----------+--------------+-------------+--------------------+------------+--------------------+
|version|          timestamp|userId|userName|operation| operationParameters| job|notebook|clusterId|readVersion|isolationLevel|isBlindAppend|    operationMetrics|userMetadata|          engineInfo|
+-------+-------------------+------+--------+---------+--------------------+----+--------+---------+-----------+--------------+-------------+--------------------+------------+--------------------+
|      1|2023-04-15 02:38:56|  null|    null|   DELETE|{predicate -> ["(...|null|    null|     null|          0|  Serializable|        false|{numRemovedFiles ...|        null|Apache-Spark/3.3....|
|      0|2023-04-15 02:38:51|  null|    null|    WRITE|{mode -> Overwrit...|null|    null|     null|       null|  Serializable|        false|{numFiles -> 84, ...|        null|Apache-Spark/3.3....|
+-------+------

## Use Time Travel to RESTORE a Previous Version of the Delta Table

In [29]:
from datetime import datetime

In [30]:
# Capture a timestamp before we restore the delta table so we can see how to use a timestamp to do restore later on
pre_restore_time = datetime.now().strftime("%Y-%m-%d %X")

In [31]:
# Restore to a numbered version, and show the result summary of the restore operation
delta_table.restoreToVersion(0).show()

+------------------------+--------------------------+-----------------+------------------+------------------+-------------------+
|table_size_after_restore|num_of_files_after_restore|num_removed_files|num_restored_files|removed_files_size|restored_files_size|
+------------------------+--------------------------+-----------------+------------------+------------------+-------------------+
|                  251203|                        84|                0|                 1|                 0|               2919|
+------------------------+--------------------------+-----------------+------------------+------------------+-------------------+



In [32]:
delta_table.history().show()

+-------+-------------------+------+--------+---------+--------------------+----+--------+---------+-----------+--------------+-------------+--------------------+------------+--------------------+
|version|          timestamp|userId|userName|operation| operationParameters| job|notebook|clusterId|readVersion|isolationLevel|isBlindAppend|    operationMetrics|userMetadata|          engineInfo|
+-------+-------------------+------+--------+---------+--------------------+----+--------+---------+-----------+--------------+-------------+--------------------+------------+--------------------+
|      2|2023-04-15 02:39:06|  null|    null|  RESTORE|{version -> 0, ti...|null|    null|     null|          1|  Serializable|        false|{numRestoredFiles...|        null|Apache-Spark/3.3....|
|      1|2023-04-15 02:38:56|  null|    null|   DELETE|{predicate -> ["(...|null|    null|     null|          0|  Serializable|        false|{numRemovedFiles ...|        null|Apache-Spark/3.3....|
|      0|2023-0

In [33]:
## We can always un-restore (or restore a more recent version) because restoring is a metadata-only operation
##  (i.e. the data files themselves are not modified)

In [34]:
#delta_table.restoreToVersion(1)

# Instead of using restoreToVersion, we can use a timestamp to revert to the table as it was at a specific time
delta_table.restoreToTimestamp(pre_restore_time).show()

+------------------------+--------------------------+-----------------+------------------+------------------+-------------------+
|table_size_after_restore|num_of_files_after_restore|num_removed_files|num_restored_files|removed_files_size|restored_files_size|
+------------------------+--------------------------+-----------------+------------------+------------------+-------------------+
|                  248284|                        83|                1|                 0|              2919|                  0|
+------------------------+--------------------------+-----------------+------------------+------------------+-------------------+



In [35]:
delta_table.history().show()

+-------+-------------------+------+--------+---------+--------------------+----+--------+---------+-----------+--------------+-------------+--------------------+------------+--------------------+
|version|          timestamp|userId|userName|operation| operationParameters| job|notebook|clusterId|readVersion|isolationLevel|isBlindAppend|    operationMetrics|userMetadata|          engineInfo|
+-------+-------------------+------+--------+---------+--------------------+----+--------+---------+-----------+--------------+-------------+--------------------+------------+--------------------+
|      3|2023-04-15 02:39:12|  null|    null|  RESTORE|{version -> null,...|null|    null|     null|          2|  Serializable|        false|{numRestoredFiles...|        null|Apache-Spark/3.3....|
|      2|2023-04-15 02:39:06|  null|    null|  RESTORE|{version -> 0, ti...|null|    null|     null|          1|  Serializable|        false|{numRestoredFiles...|        null|Apache-Spark/3.3....|
|      1|2023-0

# Trigger Trino to Automatically Infer Schema from Delta Table and Make Data Available for End User Querying / Dashboarding

In [36]:
import requests
import json
from time import sleep

In [37]:
delta_table_name = "appl_stock_delta_table"
delta_schema_name = "my_schema"

In [60]:
# Utility function to simplify query execution against Trino REST API
def execute_trino_query(query, statement_endpoint = "http://trino:8080/v1/statement", user = "admin", password = ""):
    
    print(f"Executing query:\n{query}")
    res = requests.post(statement_endpoint,data = query.encode("UTF8"), auth=requests.auth.HTTPBasicAuth(user,password))
    
    data = []
    cols = None
    while True:
        json_res = res.json()
        state = json_res.get("stats").get("state")
        print(f"State: {state}")

        res_data = json_res.get("data")
        if res_data:
            data.extend(res_data)
        
        res_cols = json_res.get("columns")
        if res_cols:
            cols = [i["name"] for i in res_cols]
            
        next_uri = json_res.get("nextUri")
        if next_uri:
            sleep(.5)
            res = requests.get(next_uri)
        else:
            if state == "FAILED":
                raise Exception(res.content)
            return [dict(zip(cols, d)) for d in data]
                
            


## Trigger Trino to Read Delta Table Schema

In [65]:
create_schema_statement = f"""
CREATE SCHEMA IF NOT EXISTS delta.my_schema
WITH (location = 's3a://{S3_BUCKET}/')
"""

register_table_statement = f"""CALL delta.system.register_table(schema_name => '{delta_schema_name}', table_name => '{delta_table_name}', table_location => 's3a://{S3_BUCKET}/{delta_table_name}')"""


In [66]:
for query in [create_schema_statement, register_table_statement]:
    print(execute_trino_query(query))

Executing query:

CREATE SCHEMA IF NOT EXISTS delta.my_schema
WITH (location = 's3a://test/')

State: QUEUED
State: QUEUED
State: QUEUED
State: FINISHED
[]
Executing query:
CALL delta.system.register_table(schema_name => 'my_schema', table_name => 'appl_stock_delta_table', table_location => 's3a://test/appl_stock_delta_table')
State: QUEUED
State: QUEUED
State: QUEUED
State: FINISHED
[]


## Query Data from Table 

In [67]:
LIMIT = 10
select_statement = f"SELECT * FROM delta.{delta_schema_name}.{delta_table_name}"
if LIMIT and type(LIMIT) == int:
    select_statement += f" LIMIT {LIMIT}"

In [68]:
data = execute_trino_query(select_statement)

Executing query:
SELECT * FROM delta.my_schema.appl_stock_delta_table LIMIT 10
State: QUEUED
State: QUEUED
State: QUEUED
State: RUNNING
State: FINISHED


In [69]:
print(data)

[{'date': '2013-05-01', 'open': 444.45999900000004, 'high': 444.929996, 'low': 434.389996, 'close': 439.29000099999996, 'volume': 126727300, 'adj_close': 57.754282999999994, 'month': 5, 'year': 2013}, {'date': '2012-06-01', 'open': 569.159996, 'high': 572.650009, 'low': 560.5200120000001, 'close': 560.989983, 'volume': 130246900, 'adj_close': 72.68160999999999, 'month': 6, 'year': 2012}, {'date': '2013-05-02', 'open': 441.779991, 'high': 448.58997300000004, 'low': 440.630005, 'close': 445.519997, 'volume': 105457100, 'adj_close': 58.573352, 'month': 5, 'year': 2013}, {'date': '2013-05-03', 'open': 451.30998200000005, 'high': 453.23002599999995, 'low': 449.149986, 'close': 449.980019, 'volume': 90325200, 'adj_close': 59.159718999999996, 'month': 5, 'year': 2013}, {'date': '2012-06-04', 'open': 561.500008, 'high': 567.4999849999999, 'low': 548.499977, 'close': 564.289978, 'volume': 139248900, 'adj_close': 73.109156, 'month': 6, 'year': 2012}, {'date': '2012-06-05', 'open': 561.269989, 'h

## Create a New Delta Lake Table Using Trino 'CREATE TABLE AS'

In [44]:
statement = f"CREATE TABLE delta.{delta_schema_name}.{delta_table_name}_copy AS (SELECT * FROM delta.{delta_schema_name}.{delta_table_name} LIMIT 10)"

In [45]:
data = execute_trino_query(statement)

Executing query:
CREATE TABLE delta.my_schema.appl_stock_delta_table_copy AS (SELECT * FROM delta.my_schema.appl_stock_delta_table LIMIT 10)
State: QUEUED
State: QUEUED
State: FAILED


In [46]:
statement = f"SELECT * FROM delta.{delta_schema_name}.{delta_table_name}_copy LIMIT 10"

In [47]:
data = execute_trino_query(statement)

Executing query:
SELECT * FROM delta.my_schema.appl_stock_delta_table_copy LIMIT 10
State: QUEUED
State: QUEUED
State: FAILED


In [48]:
data

[]

# Use Superset API To Add Connection to Trino Delta Lake Database

### NOTE: THE STEPS BELOW WILL ONLY WORK IF YOU ARE ALSO USING THE SUPERSET CONTAINERS

In [49]:
SUPERSET_BASE_URL = "http://superset:8088"
TOKEN_ENDPOINT = f"{SUPERSET_BASE_URL}/api/v1/security/login"

In [50]:
data = {
  "password": "admin",
  "provider": "db",
  "refresh": True,
  "username": "admin"
}
headers = {
    "Content-Type":"application/json",
    "Accept":"application/json"
}
res = requests.post(TOKEN_ENDPOINT, data=json.dumps(data), headers=headers)

In [51]:
auth_token = res.json()["access_token"]
headers["Authorization"] = f"Bearer {auth_token}"

In [52]:
### Disabled CSRF Token in Superset config.py

# CSRF_ENDPOINT = f"{SUPERSET_BASE_URL}/api/v1/security/csrf_token/"

# res = requests.get(CSRF_ENDPOINT,headers=headers)

# csrf_token = res.json()["result"]
# headers["X-CSRFToken"] = csrf_token

In [53]:
DATABASE_ENDPOINT = f"{SUPERSET_BASE_URL}/api/v1/database/"

In [54]:
data = {
    "database_name": "Delta",
    "engine": "trino",
    "configuration_method": "sqlalchemy_form",
    "catalog": [
        {
            "name": "",
            "value": ""
        }
    ],
    "sqlalchemy_uri": "trino://trino@trino:8080/delta",
    "expose_in_sqllab": True,
    "allow_ctas": True,
    "allow_cvas": True,
    "allow_dml": True,
    "extra_json": {
        "allows_virtual_table_explore": True
    },
    "extra": '{"allows_virtual_table_explore":true,"metadata_params":{},"engine_params":{},"schemas_allowed_for_file_upload":[]}'
}

In [55]:
res = requests.post(DATABASE_ENDPOINT, data=json.dumps(data), headers=headers)

In [56]:
headers

{'Content-Type': 'application/json',
 'Accept': 'application/json',
 'Authorization': 'Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJmcmVzaCI6dHJ1ZSwiaWF0IjoxNjgxNTI2MzYyLCJqdGkiOiJlY2Q0OGEwZC05OTg3LTRmNzctYmJiYi1kOGRkNTZlZTZhMzQiLCJ0eXBlIjoiYWNjZXNzIiwic3ViIjoxLCJuYmYiOjE2ODE1MjYzNjIsImV4cCI6MTY4MTUyNzI2Mn0.h_noyLavzMEzzKNz0RCFWbiNUhExiu5eB-8irvRAm68'}

In [57]:
res.content

b'{"id":2,"result":{"allow_ctas":true,"allow_cvas":true,"allow_dml":true,"configuration_method":"sqlalchemy_form","database_name":"Delta","expose_in_sqllab":true,"extra":"{\\"allows_virtual_table_explore\\":true,\\"metadata_params\\":{},\\"engine_params\\":{},\\"schemas_allowed_for_file_upload\\":[]}","sqlalchemy_uri":"trino://trino@trino:8080/delta"}}\n'