# Databricks Connect catalog bootstrap
Use this notebook to validate a Databricks Connect setup and bootstrap Unity Catalog objects so the Kaggle-powered pipeline can land data in Delta tables. It follows the Spark Connect architecture described in the [Databricks blog on Spark Connect](https://www.databricks.com/blog/2023/04/18/spark-connect-available-apache-spark.html), which requires a clean gRPC TLS path from this client to your workspace compute.

**What you can do here**
1. Validate local environment variables, SSL certificates, and proxy behaviour.
2. Merge corporate root certificates into the trust store used by Spark Connect (required when proxies intercept TLS).
3. Probe the workspace REST API to confirm the token and host are correct.
4. Establish a Databricks Connect session and run smoke-test SQL against remote compute.
5. Create (or reuse) the target Unity Catalog catalog and schema for the pipeline.
6. Optionally load the latest CSV output into Delta tables and log the run to MLflow.

If any step fails, the notebook reports guidance without cascading errors.

In [None]:
# Optional: ensure required packages are present in the current environment
# !pip install --quiet mlflow databricks-connect

Defaulting to user installation because normal site-packages is not writeable
Collecting mlflow
  Obtaining dependency information for mlflow from https://files.pythonhosted.org/packages/52/fe/1ed27f800cd1709a272c6e26b78ec3d77a5ba482171ea1b5bfbcf4c067c0/mlflow-3.4.0-py3-none-any.whl.metadata
  Downloading mlflow-3.4.0-py3-none-any.whl.metadata (30 kB)
Collecting mlflow-skinny==3.4.0 (from mlflow)
  Obtaining dependency information for mlflow-skinny==3.4.0 from https://files.pythonhosted.org/packages/1b/94/7acd7c6970cc75da1fd3b550e43d8b99068032022f47b0ef224a137ec679/mlflow_skinny-3.4.0-py3-none-any.whl.metadata
  Downloading mlflow_skinny-3.4.0-py3-none-any.whl.metadata (31 kB)
Collecting mlflow-tracing==3.4.0 (from mlflow)
  Obtaining dependency information for mlflow-tracing==3.4.0 from https://files.pythonhosted.org/packages/ae/96/403b1191ccf587f19a8c94085477600d6e6b3d61a7aff46f353b20b450f9/mlflow_tracing-3.4.0-py3-none-any.whl.metadata
  Downloading mlflow_tracing-3.4.0-py3-none-any

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
anaconda-cloud-auth 0.1.3 requires pydantic<2.0, but you have pydantic 2.12.2 which is incompatible.
jupyter-server 1.23.4 requires anyio<4,>=3.1.0, but you have anyio 4.11.0 which is incompatible.
pyopenssl 23.2.0 requires cryptography!=40.0.0,!=40.0.1,<42,>=38.0.0, but you have cryptography 45.0.7 which is incompatible.


In [42]:
# Sanity check: ensure databricks-connect package is available and show config location
import importlib, json, os, pathlib
DESIRED_HTTP_PATH = os.getenv('DATABRICKS_HTTP_PATH', '/sql/1.0/warehouses/7bb142c4f4ff862e')
try:
    dbc_module = importlib.import_module('databricks.connect')
    print('databricks-connect version:', getattr(dbc_module, '__version__', 'unknown'))
    config_path = pathlib.Path.home() / '.databricks-connect'
    if config_path.exists():
        print('Found configuration file:', config_path)
        try:
            config_text = config_path.read_text()
            print('Config preview (first 400 chars):')
            print(config_text[:400])
            config_json = json.loads(config_text)
            updated = False
            if config_json.get('http_path') != DESIRED_HTTP_PATH:
                config_json['http_path'] = DESIRED_HTTP_PATH
                updated = True
            if config_json.pop('cluster_id', None) is not None:
                updated = True
            if updated:
                backup_path = config_path.with_suffix('.bak')
                backup_path.write_text(config_text)
                config_path.write_text(json.dumps(config_json, indent=2))
                print(f'Updated ~/.databricks-connect and saved backup to {backup_path}.')
            os.environ.setdefault('DATABRICKS_HTTP_PATH', config_json.get('http_path', DESIRED_HTTP_PATH))
        except Exception as read_err:
            print('Could not read/update config file:', read_err)
    else:
        print('No ~/.databricks-connect file detected; run `databricks-connect configure`.')
    databricks_connect_ok = True
except ModuleNotFoundError:
    databricks_connect_ok = False
    print('databricks-connect package is not installed. Run `pip install databricks-connect`.')
except Exception as e:
    databricks_connect_ok = False
    print('databricks-connect check error:', e)

databricks-connect version: unknown
Found configuration file: C:\Users\nnassili\.databricks-connect
Config preview (first 400 chars):
{
  "host": "https://dbc-935124bd-e5fd.cloud.databricks.com/",
  "token": "YOUR_DATABRICKS_TOKEN",
  "org_id": "0",
  "port": "15001",
  "http_path": "/sql/1.0/warehouses/7bb142c4f4ff862e"
}


In [43]:
# Setup: env, SSL certs, and basic validation
from dotenv import load_dotenv
import os, sys, certifi, platform, urllib.parse
load_dotenv()
os.environ["GRPC_DEFAULT_SSL_ROOTS_FILE_PATH"] = certifi.where()
os.environ.setdefault("SSL_CERT_FILE", certifi.where())
host = os.getenv('DATABRICKS_HOST')
token = os.getenv('DATABRICKS_TOKEN')
cluster_id = os.getenv('DATABRICKS_CLUSTER_ID')
warehouse_id = os.getenv('DATABRICKS_WAREHOUSE_ID') or os.getenv('DATABRICKS_SERVERLESS_COMPUTE_ID')
http_path = os.getenv('DATABRICKS_HTTP_PATH') or os.getenv('DATABRICKS_SQL_HTTP_PATH')
# Derive host domain and set NO_PROXY to avoid corporate proxy breaking gRPC
try:
    parsed = urllib.parse.urlparse(host) if host else None
    domain = parsed.hostname if parsed else None
    if domain:
        existing_no_proxy = os.environ.get('NO_PROXY', '')
        tokens = [t.strip() for t in existing_no_proxy.split(',') if t.strip()]
        if domain not in tokens:
            tokens.append(domain)
        if '.databricks.com' not in tokens:
            tokens.append('.databricks.com')
        os.environ['NO_PROXY'] = ','.join(tokens)
except Exception:
    pass
print('Python:', sys.version)
print('OS:', platform.platform())
print('CWD:', os.getcwd())
print('GRPC_DEFAULT_SSL_ROOTS_FILE_PATH:', os.environ.get('GRPC_DEFAULT_SSL_ROOTS_FILE_PATH'))
print('SSL_CERT_FILE:', os.environ.get('SSL_CERT_FILE'))
print('NO_PROXY:', os.environ.get('NO_PROXY'))
print('DATABRICKS_HOST:', host)
print('DATABRICKS_TOKEN prefix:', (token[:8] + '...') if token else None)
print('DATABRICKS_CLUSTER_ID:', cluster_id)
print('WAREHOUSE/SERVERLESS ID:', warehouse_id)
print('HTTP_PATH:', http_path)
if not host or not token:
    print('ERROR: Missing DATABRICKS_HOST or DATABRICKS_TOKEN in .env')

Python: 3.11.5 | packaged by Anaconda, Inc. | (main, Sep 11 2023, 13:26:23) [MSC v.1916 64 bit (AMD64)]
OS: Windows-10-10.0.26100-SP0
CWD: c:\Users\nnassili\Documents\life-style-mlops\notebooks
GRPC_DEFAULT_SSL_ROOTS_FILE_PATH: c:\ProgramData\anaconda3\Lib\site-packages\certifi\cacert.pem
SSL_CERT_FILE: C:\ProgramData\anaconda3\Library\ssl\cacert.pem
NO_PROXY: dbc-935124bd-e5fd.cloud.databricks.com,.databricks.com
DATABRICKS_HOST: https://dbc-935124bd-e5fd.cloud.databricks.com
DATABRICKS_TOKEN prefix: dapi09f2...
DATABRICKS_CLUSTER_ID: None
WAREHOUSE/SERVERLESS ID: auto
HTTP_PATH: /sql/1.0/warehouses/7bb142c4f4ff862e


In [44]:
# TEMP: disable TLS verification until corporate CA is installed
import os
os.environ['DISABLE_SSL_VERIFY'] = '1'
print('DISABLE_SSL_VERIFY set for this session. Remove when corporate CA bundle is available.')

DISABLE_SSL_VERIFY set for this session. Remove when corporate CA bundle is available.


## Optional SSL/proxy override (debug only)

## Corporate CA merge (Spark Connect TLS)

In [46]:
# Merge corporate CA bundle (if provided) with certifi for Spark Connect TLS
import os, certifi, tempfile
from pathlib import Path
combined_ca_bundle = None
custom_ca_env = os.getenv('CORPORATE_CA_BUNDLE') or os.getenv('EXTRA_CA_CERTS')
if custom_ca_env:
    custom_ca_path = Path(custom_ca_env).expanduser()
    if custom_ca_path.exists():
        combined_ca_bundle = Path(tempfile.gettempdir()) / 'databricks_connect_combined_ca.pem'
        try:
            with open(certifi.where(), 'rb') as base, open(custom_ca_path, 'rb') as extra, open(combined_ca_bundle, 'wb') as out:
                out.write(base.read())
                out.write(b"\n")
                out.write(extra.read())
            os.environ['GRPC_DEFAULT_SSL_ROOTS_FILE_PATH'] = str(combined_ca_bundle)
            os.environ['SSL_CERT_FILE'] = str(combined_ca_bundle)
            os.environ['REQUESTS_CA_BUNDLE'] = str(combined_ca_bundle)
            os.environ['CURL_CA_BUNDLE'] = str(combined_ca_bundle)
            print('Combined CA bundle applied:', combined_ca_bundle)
        except Exception as ca_err:
            combined_ca_bundle = None
            print('Could not merge corporate CA bundle:', ca_err)
    else:
        print('Corporate CA path not found:', custom_ca_path)
        print('Provide a PEM file containing your organization\'s root certificates (see Spark Connect TLS guidance).')
else:
    print('Set CORPORATE_CA_BUNDLE (or EXTRA_CA_CERTS) env var to a PEM file with corporate roots if gRPC TLS fails.')

Set CORPORATE_CA_BUNDLE (or EXTRA_CA_CERTS) env var to a PEM file with corporate roots if gRPC TLS fails.


In [45]:
import ssl, warnings, os
DISABLE_SSL_VERIFY = os.getenv('DISABLE_SSL_VERIFY', 'False').lower() in ('1', 'true', 'yes')
if DISABLE_SSL_VERIFY:
    warnings.warn('SSL verification disabled – use only for local debugging.', RuntimeWarning)
    os.environ['GRPC_DEFAULT_SSL_ROOTS_FILE_PATH'] = ''
    os.environ['SSL_CERT_FILE'] = ''
    os.environ['REQUESTS_CA_BUNDLE'] = ''
    os.environ['CURL_CA_BUNDLE'] = ''
    os.environ['DATABRICKS_INSECURE'] = '1'
    ssl._create_default_https_context = ssl._create_unverified_context
    VERIFY_SSL = False
else:
    VERIFY_SSL = True



In [47]:
# REST API diagnostic: can we reach the workspace?
import requests, certifi, os
host = os.getenv('DATABRICKS_HOST')
token = os.getenv('DATABRICKS_TOKEN')
rest_api_ok = None
verify_arg = certifi.where() if globals().get('VERIFY_SSL', True) else False
if not host or not token:
    rest_api_ok = False
    print('REST check skipped: missing DATABRICKS_HOST or DATABRICKS_TOKEN')
else:
    try:
        r = requests.get(f"{host}/api/2.0/clusters/list", headers={"Authorization": f"Bearer {token}"}, timeout=10, verify=verify_arg)
        print('REST status:', r.status_code)
        print('REST body (first 200):', r.text[:200])
        rest_api_ok = r.status_code == 200
    except Exception as e:
        rest_api_ok = False
        print('REST connectivity error:', e)
        print('Tip: Workspace REST permissions can be stricter than Spark connect. Spark attempts continue regardless of this result.')



REST status: 200
REST body (first 200): {}


## Local dataset preview (runs even without Spark)

In [48]:
import pandas as pd, os
csv_rel_path = '../data/raw/Final_data.csv'
csv_abs_path = os.path.abspath(os.path.join(os.getcwd(), csv_rel_path))
print('Resolved CSV path:', csv_abs_path)
try:
    df_preview = pd.read_csv(csv_abs_path)
    print('Local dataframe shape:', df_preview.shape)
    display(df_preview.head())
except Exception as e:
    print('Local preview error:', e)

Resolved CSV path: c:\Users\nnassili\Documents\life-style-mlops\data\raw\Final_data.csv
Local dataframe shape: (20000, 54)
Local dataframe shape: (20000, 54)


Unnamed: 0,Age,Gender,Weight (kg),Height (m),Max_BPM,Avg_BPM,Resting_BPM,Session_Duration (hours),Calories_Burned,Workout_Type,...,cal_from_macros,pct_carbs,protein_per_kg,pct_HRR,pct_maxHR,cal_balance,lean_mass_kg,expected_burn,Burns Calories (per 30 min)_bc,Burns_Calories_Bin
0,34.91,Male,65.27,1.62,188.58,157.65,69.05,1.0,1080.9,Strength,...,2139.59,0.500432,1.624789,0.741237,0.835985,725.1,47.777394,685.16,7.260425e+19,Medium
1,23.37,Female,56.41,1.55,179.43,131.75,73.18,1.37,1809.91,HIIT,...,1711.65,0.50085,1.514093,0.551247,0.73427,-232.91,40.809803,978.6184,1.020506e+20,High
2,33.2,Female,58.98,1.67,175.04,123.95,54.96,0.91,802.26,Cardio,...,1965.92,0.50061,1.663445,0.574534,0.708124,805.74,44.63558,654.5266,1.079607e+20,High
3,38.69,Female,93.78,1.7,191.21,155.1,50.07,1.1,1450.79,HIIT,...,1627.28,0.499533,0.862017,0.744155,0.81115,1206.21,63.007432,773.63,8.987921e+19,High
4,45.09,Male,52.42,1.88,193.58,152.88,70.84,1.08,1166.4,Strength,...,2659.23,0.500581,2.538153,0.668405,0.789751,303.6,43.347504,711.4176,5.264685e+19,Low


In [49]:
# Spark connectivity test with guard flag (always attempt session)
spark_ok = False
spark_session_error = None
spark_action_error = None
try:
    from databricks.connect import DatabricksSession
except Exception as import_error:
    spark_session_error = import_error
    print('databricks-connect import error:', import_error)
else:
    if globals().get('rest_api_ok') is False:
        print('REST API check failed earlier; attempting Spark Connect anyway because REST permissions can be narrower than Spark.')
    try:
        session_builder = DatabricksSession.builder
        if http_path:
            print('Using http_path from Databricks Connect config.')
        else:
            remote_kwargs = {}
            if host and token:
                remote_kwargs['host'] = host
                remote_kwargs['token'] = token
            if cluster_id:
                remote_kwargs['cluster_id'] = cluster_id
            if remote_kwargs:
                session_builder = session_builder.remote(**remote_kwargs)
        spark = session_builder.getOrCreate()
        print('Spark session created via Databricks Connect.')
        spark_ok = True
    except Exception as e:
        spark_session_error = e
        print('Spark session create error:', e)
    if spark_ok:
        try:
            print('Spark SQL test:', spark.sql('SELECT CURRENT_CATALOG() AS catalog, CURRENT_SCHEMA() AS schema').collect())
        except Exception as e:
            spark_action_error = e
            print('Spark action error (likely connectivity/SSL):', e)
            print('Hint: RETRIES_EXCEEDED usually means a proxy/SSL interceptor blocks gRPC. Consider importing corporate root certificates or toggling the debug override briefly.')

Using http_path from Databricks Connect config.
Spark session created via Databricks Connect.
Spark action error (likely connectivity/SSL): [RETRIES_EXCEEDED] The maximum number of retries has been exceeded.
Hint: RETRIES_EXCEEDED usually means a proxy/SSL interceptor blocks gRPC. Consider importing corporate root certificates or toggling the debug override briefly.
Spark action error (likely connectivity/SSL): [RETRIES_EXCEEDED] The maximum number of retries has been exceeded.
Hint: RETRIES_EXCEEDED usually means a proxy/SSL interceptor blocks gRPC. Consider importing corporate root certificates or toggling the debug override briefly.


## Catalog and schema bootstrap (Unity Catalog)

In [50]:
import os
CATALOG_NAME = os.getenv('TARGET_CATALOG', 'lifestyle')
SCHEMA_NAME = os.getenv('TARGET_SCHEMA', 'kaggle_ingest')
TARGET_TABLE_NAME = os.getenv('TARGET_TABLE_NAME', 'people')
SAMPLE_TABLE_NAME = os.getenv('SAMPLE_TABLE_NAME', f'{TARGET_TABLE_NAME}_sample')
catalog_created = False
schema_created = False
try:
    if not globals().get('spark_ok', False):
        raise RuntimeError('Skipping catalog/schema creation: Spark not connected.')
    spark.sql(f"CREATE CATALOG IF NOT EXISTS `{CATALOG_NAME}`")
    catalog_created = True
    spark.sql(f"CREATE SCHEMA IF NOT EXISTS `{CATALOG_NAME}`.`{SCHEMA_NAME}`")
    schema_created = True
    spark.sql(f"USE `{CATALOG_NAME}`.`{SCHEMA_NAME}`")
    TARGET_TABLE_FQN = f"{CATALOG_NAME}.{SCHEMA_NAME}.{TARGET_TABLE_NAME}"
    SAMPLE_TABLE_FQN = f"{CATALOG_NAME}.{SCHEMA_NAME}.{SAMPLE_TABLE_NAME}"
    print(f"Catalog `{CATALOG_NAME}` and schema `{SCHEMA_NAME}` ready.")
except Exception as e:
    TARGET_TABLE_FQN = f'default.{TARGET_TABLE_NAME}'
    SAMPLE_TABLE_FQN = f'default.{SAMPLE_TABLE_NAME}'
    print('Catalog/schema step:', e)
    print(f'Fallback target tables will use {TARGET_TABLE_FQN}')



Catalog/schema step: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded.
Fallback target tables will use default.people


In [None]:
# Guarded sample write to ensure connection works before big writes
try:
    if not globals().get('spark_ok', False):
        raise RuntimeError('Skipping sample write: Spark not connected.')
    sample_table = globals().get('SAMPLE_TABLE_FQN', 'default.people_sample')
    import pandas as pd
    sample_df = pd.DataFrame({'a': list(range(5))})
    sdf_sample = spark.createDataFrame(sample_df)
    sdf_sample.write.mode('overwrite').saveAsTable(sample_table)
    print(f'OK: sample data written -> {sample_table}')
except Exception as e:
    print('Sample write step:', e)

In [40]:
# Load full CSV and (optionally) write to Delta table
import pandas as pd, os
DO_FULL_WRITE = True  # set to False to skip writing large data during troubleshooting
if 'csv_abs_path' not in globals():
    csv_rel_path = '../data/raw/Final_data.csv'
    csv_abs_path = os.path.abspath(os.path.join(os.getcwd(), csv_rel_path))
print('Resolved CSV path:', csv_abs_path)
try:
    df = df_preview if 'df_preview' in globals() else pd.read_csv(csv_abs_path)
    print('Loaded pandas DataFrame shape:', df.shape)
    if not globals().get('spark_ok', False):
        raise RuntimeError('Skipping full write: Spark not connected.')
    target_table = globals().get('TARGET_TABLE_FQN', 'default.people')
    sdf = spark.createDataFrame(df)
    print('Spark DataFrame count:', sdf.count())
    if DO_FULL_WRITE:
        sdf.write.mode('overwrite').saveAsTable(target_table)
        print(f"OK: data written -> {target_table}")
    else:
        print('DO_FULL_WRITE is False; skipping write.')
except Exception as e:
    print('Full load/write step:', e)

Resolved CSV path: c:\Users\nnassili\Documents\life-style-mlops\data\raw\Final_data.csv
Loaded pandas DataFrame shape: (20000, 54)
Full load/write step: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded.
Full load/write step: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded.


## MLflow logging (optional once Spark connectivity works)

In [41]:
import mlflow, os
mlflow_ok = False
try:
    if not spark_ok:
        raise RuntimeError('Skipping MLflow logging: Spark not connected.')
    experiment_name = os.getenv('MLFLOW_EXPERIMENT_NAME', '/Shared/lifestyle-demo')
    mlflow.set_tracking_uri(os.getenv('DATABRICKS_HOST'))
    mlflow.set_experiment(experiment_name)
    with mlflow.start_run(run_name='connect_sanity_check'):
        mlflow.log_param('rows_loaded', int(df_preview.shape[0]) if 'df_preview' in globals() else None)
        mlflow.log_param('table_target', globals().get('TARGET_TABLE_FQN', 'default.people'))
        mlflow.log_metric('spark_connect_success', 1 if spark_ok else 0)
        mlflow_ok = True
    print(f"MLflow run logged to experiment: {experiment_name}")
except Exception as e:
    print('MLflow logging step:', e)



MLflow logging step: API request to https://dbc-935124bd-e5fd.cloud.databricks.com/api/2.0/mlflow/experiments/get-by-name failed with exception HTTPSConnectionPool(host='dbc-935124bd-e5fd.cloud.databricks.com', port=443): Max retries exceeded with url: /api/2.0/mlflow/experiments/get-by-name?experiment_name=%2FShared%2Flifestyle-demo (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1006)')))


## Run summary

In [None]:
summary = {
    'databricks_connect_ok': databricks_connect_ok if 'databricks_connect_ok' in globals() else False,
    'rest_api_ok': rest_api_ok if 'rest_api_ok' in globals() else None,
    'spark_session_error': str(spark_session_error) if 'spark_session_error' in globals() and spark_session_error else None,
    'spark_action_error': str(spark_action_error) if 'spark_action_error' in globals() and spark_action_error else None,
    'spark_ok': spark_ok if 'spark_ok' in globals() else False,
    'combined_ca_bundle': str(combined_ca_bundle) if 'combined_ca_bundle' in globals() and combined_ca_bundle else None,
    'catalog_created': globals().get('catalog_created', False),
    'schema_created': globals().get('schema_created', False),
    'target_table': globals().get('TARGET_TABLE_FQN'),
    'sample_write_attempted': bool('sdf_sample' in globals()),
    'full_write_attempted': bool('DO_FULL_WRITE' in globals() and DO_FULL_WRITE),
    'mlflow_ok': mlflow_ok if 'mlflow_ok' in globals() else False
}
print(summary)
if not summary['databricks_connect_ok']:
    print('Install/configure databricks-connect: `pip install databricks-connect` then `databricks-connect configure`.')
if summary['combined_ca_bundle'] is None:
    print('No corporate CA merged. If Spark Connect retries persist, export your proxy root certificate to PEM and set CORPORATE_CA_BUNDLE.')
if not summary['spark_ok'] or summary['spark_action_error']:
    print('Next steps:')
    if summary['spark_session_error']:
        print(' - Spark session could not be created. Confirm compute (SQL warehouse or cluster) is running and config matches.')
    if summary['spark_action_error'] and 'RETRIES_EXCEEDED' in summary['spark_action_error']:
        print(' - Spark action hit RETRIES_EXCEEDED. Merge your corporate CA (per Spark Connect TLS docs) or toggle DISABLE_SSL_VERIFY briefly for proof.')
if summary['rest_api_ok'] is False:
    print('REST API note: token/host responded with an error, but Databricks Connect can still work if Spark succeeded. Double-check permissions if REST access is needed.')

{'databricks_connect_ok': True, 'rest_api_ok': False, 'spark_session_error': None, 'spark_action_error': '[RETRIES_EXCEEDED] The maximum number of retries has been exceeded.', 'spark_ok': True, 'catalog_created': False, 'schema_created': False, 'target_table': 'default.people', 'sample_write_attempted': False, 'full_write_attempted': True, 'mlflow_ok': False}
REST API note: token/host responded with an error, but Databricks Connect can still work if Spark succeeded. Double-check permissions if REST access is needed.


In [31]:
# Verify schema and preview data (guarded)
try:
    if not globals().get('spark_ok', False):
        raise RuntimeError('Skipping verification: Spark not connected.')
    target_table = globals().get('TARGET_TABLE_FQN', 'default.people')
    spark.sql(f"DESCRIBE TABLE {target_table}").show(truncate=False)
    spark.sql(f"SELECT * FROM {target_table} LIMIT 5").show(truncate=False)
except Exception as e:
    print('Verify step:', e)

Verify step: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded.
