To replicate snapshot and incremental data of a TiDB Table to Databricks:
export AWS_ACCESS_KEY_ID=<ACCESS_KEY>
export AWS_SECRET_ACCESS_KEY=<SECRET_KEY>
./tidb2dw databricks \
--storage s3://my-demo-bucket/prefix \
--table <database_name>.<table_name> \
--databricks.host dbc-********-****.cloud.databricks.com \
--databricks.endpoint 2**************4 \
--databricks.catalog <catalog> \
--databricks.schema <schema> \
--databricks.token dapi******************************** \
--databricks.credential <storage-credential>
# You can use 'SHOW STORAGE CREDENTIALS' in databricks to check what credential names are available.
# Note that you may also need to specify these parameters:
# --cdc.host x.x.x.x
# --tidb.host x.x.x.x
# --tidb.user <user>
# --tidb.pass <pass>
# Use --help for details.
All DDL which will change the schema of table are supported (except index related), including:
- Add column
- Drop column
- Rename column
- Drop table
- Truncate table
-
tidb2dw
uses these key features in Databricks below -
Databricks don't support the
BINARY
type in the external table with the CSV file which aretidb2dw
used. So please ensure that the table you want to replicate doesn't have theBINARY
orVARBINARY
type column. -
The type mapping from TiDB to Databricks is defined here.
-
Databricks has some limitations on modifying table schemas, like Databricks does not support primary key and foreign key, not support default value in all kind of storage layers yet.