Databricks

Replicate

To replicate snapshot and incremental data of a TiDB Table to Databricks:

export AWS_ACCESS_KEY_ID=<ACCESS_KEY>
export AWS_SECRET_ACCESS_KEY=<SECRET_KEY>

./tidb2dw databricks \
    --storage s3://my-demo-bucket/prefix \
    --table <database_name>.<table_name> \
    --databricks.host dbc-********-****.cloud.databricks.com \
    --databricks.endpoint 2**************4 \
    --databricks.catalog <catalog> \
    --databricks.schema <schema> \
    --databricks.token dapi******************************** \
    --databricks.credential <storage-credential> 

# You can use 'SHOW STORAGE CREDENTIALS' in databricks to check what credential names are available.
# Note that you may also need to specify these parameters:
#   --cdc.host x.x.x.x
#   --tidb.host x.x.x.x
#   --tidb.user <user>
#   --tidb.pass <pass>
# Use --help for details.

Supported DDL Operations

All DDL which will change the schema of table are supported (except index related), including:

Add column
Drop column
Rename column
Drop table
Truncate table

Noteworthy

How to give Databricks sufficient permission in AWS
tidb2dw uses these key features in Databricks below
Databricks don't support the BINARY type in the external table with the CSV file which are tidb2dw used. So please ensure that the table you want to replicate doesn't have the BINARY or VARBINARY type column.
The type mapping from TiDB to Databricks is defined here.
Databricks has some limitations on modifying table schemas, like Databricks does not support primary key and foreign key, not support default value in all kind of storage layers yet.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly