# Overview

Welcome to YCSB, SQL Server, Arcion and Databricks Demo

- Intial Setup
  - Click `Run All` ( takes about 1 min )
  - Click `View` -> `Results Only`
  - Click `View` -> `Web Terminal`, enter `tmux attach`.  If this fails, then wait about a minute.
  - [Enter Arcion License and DBX Personal Access Token](#Enter-Arcion-License-and-DBX-Personal-Access-Token)
  - Click `Run All` ( takes about less than 1 min )
- Iterate with the following:
  - [Change YCSB Data Size](#change-ycsb-data-size)
  - [Change YCSB TPS](#change-ycsb-tps)
  - [Change Arcion Target](#change-arcion-target)
  - [Start Arcion](#start-arcion)

Use Personal Compute: 
- Standard DS4_v2 (28 GB RAM) show up as 22GB for performance test and demo
- Standard DS3_v2 (14 GB RAM) show up as 8.8 GB avail which does work but a bit tight

Where is Data in Databricks:
  - Spark (Deltalake) uses **Hive Meta Store** catalog: 
    - Open new tab Catalog -> hive_metastore -> <your username>
    - find ycsbdense and ycsbsparse tables 
  - Lakehouse uses **Unity Catalog** catalog: 
    - Open new tab Catalog -> <your username> 
    - find ycsbdense and ycsbsparse tables 
What to iterate:
- Step 1
  - Click Real-Time
  - Run just Arcion
  - Change YCSB Size
  - Watch real-time performance
- Step 2
  - Click Unity Catalog target
  - Select full replication mode
  - Run just Arcion

Note:
- When Arion is running, bulk insert will wait if not enough CPU / RAM is available 
- DBR 13 does not print output of subprocess.run


In [0]:
# prep python env
import subprocess
import math
import pandas as pd
import re
import ipywidgets as widgets
from ipywidgets import HBox, VBox, Label


# setup GUI elements

repl_mode = widgets.Dropdown(options=['snapshot', 'real-time', 'full'],value='snapshot',
    description='Replication:',
)
cdc_mode = widgets.Dropdown(options=['change', 'cdc'],value='change',
    description='CDC Method:',
)

snapshot_threads = widgets.BoundedIntText(value=1,min=1,max=8,
    description='Snapshot Threads:',
)

realtime_threads = widgets.BoundedIntText(value=1,min=1,max=8,
    description='Real Time Threads:',
)    

delta_threads = widgets.BoundedIntText(value=1,min=1,max=8,
    description='Delta Snapshot Threads:',
)    

dbx_destinations = widgets.Dropdown(options=['null', 'deltalake', 'unitycatalog'],value='null',
    description='Destinations:',
)
dbx_staging = widgets.Dropdown(options=['dbfs'],value='dbfs',
    description='Staging:',
)

sparse_cnt = widgets.BoundedIntText(value=1,min=1,max=100,
    description='Table Cnt:',
)
sparse_fieldcount = widgets.BoundedIntText(value=50,min=0,max=9000,
    description='# of Fields:',
)
sparse_fieldlength = widgets.BoundedIntText(value=10,min=1,max=1000,
    description='Field Len:',
)

sparse_tps = widgets.BoundedIntText(value=1,min=0,max=1000,
    description='TPS:',
)
sparse_threads = widgets.BoundedIntText(value=1,min=1,max=8,
    description='Threads:',
)
sparse_recordcount = widgets.Text(value="2K",
    description='Rec Cnt:',
)

sparse_fillpct = widgets.IntRangeSlider(value=[0,0],min=0,max=100,step=1,
    description='Fill Range:', orientation='horizontal', readout=False
)

dense_cnt = widgets.BoundedIntText(value=1,min=1,max=100,
    description='Table Cnt:',
)
dense_fieldcount = widgets.BoundedIntText(value=10,min=0,max=9000,
    description='# of Fields:',
)
dense_fieldlength = widgets.BoundedIntText(value=100,min=1,max=1000,
    description='Field Len:',
)
dense_recordcount = widgets.Text(value="1K",
    description='Rec Cnt:',
)

dense_tps = widgets.BoundedIntText(value=1,min=0,max=1000,
    description='TPS:',
)
dense_threads = widgets.BoundedIntText(value=1,min=1,max=8,
    description='Threads:',
)

dense_fillpct = widgets.IntRangeSlider(value=[0,100],min=0,max=100,step=1,
    description='Fill Range:', orientation='horizontal', readout=False
)

dbx_spark_url = widgets.Textarea(value='',
    description='Spark URL:',
)

dbx_databricks_url = widgets.Textarea(value='',
    description='Databricks URL:',
)

dbx_hostname = widgets.Textarea(value='',
    description='Hostname:',
)

dbx_username = widgets.Textarea(value='',
    description='Username:',
)

arcion_license = widgets.Textarea(value='',
    description='Arcion Lic',
)

dbx_access_token = widgets.Textarea(value='',
    description='Access Token',
)

dbx_default_catalog = widgets.Textarea(value='',
    description='Default Catalog',
)


# cluster where the notebook is running to auto populate the destinations
spark_url=""
databricks_url=""
workspaceUrl=""
username=""
try:
    cluster_id = spark.conf.get("spark.databricks.clusterUsageTags.clusterId")
    workspace_id =spark.conf.get("spark.databricks.clusterUsageTags.clusterOwnerOrgId")

    # clusterName = spark.conf.get("spark.databricks.clusterUsageTags.clusterName")

    workspaceUrl = spark.conf.get("spark.databricks.workspaceUrl") # host name

    http_path = f"sql/protocolv1/o/{workspace_id}/{cluster_id}"

    spark_url=f"jdbc:spark://{workspaceUrl}:443/default;transportMode=http;ssl=1;httpPath={http_path};AuthMech=3;UID=token;"
    databricks_url=f"jdbc:databricks://{workspaceUrl}:443/default;transportMode=http;ssl=1;httpPath={http_path};AuthMech=3;UID=token;"

except:
    pass
dbx_spark_url.value = spark_url
dbx_databricks_url.value = databricks_url
dbx_hostname.value = workspaceUrl

try:
    username = spark.sql("SELECT current_user()").collect()[0][0]
except:
    username='arcdst'
dbx_username.value = re.sub('[.@]','_',username)

try:
    dbx_default_catalog.value=spark.conf.get("spark.databricks.sql.initial.catalog.name")
except:
    pass

# check arcion license via widget
try:
    arclicwidget=dbutils.widgets.get("Arcion License")
    if arclicwidget != "": 
        arcion_license.value=arclicwidget
except:
    pass


## Enter Arcion License and DBX Personal Access Token

  - Enter `Arcion License`
  - Enter `Personal Access Token` (generate **One Day** and delete afterwards)
  - Click **Menu Bar** ->  Run -> Run All Below

Links: [Overview](#overview) | [Change YCSB Data Size](#change-ycsb-data-size) | [Change YCSB TPS](#change-ycsb-tps) | [Change Arcion Target](#change-arcion-target) | [Start Arcion](#start-arcion)

In [0]:
# enter license and DBX personal access token
VBox([HBox([Label('License'), arcion_license, dbx_access_token, dbx_default_catalog]),
      HBox([Label('Workspace'), dbx_spark_url, dbx_databricks_url, dbx_hostname, dbx_username]),
       ])

## Setup Arcion, YCSB and SQL Server

In [0]:
# setup tmux, arcion, ycsb, sql server
subprocess.run(f""". ./bin/setup-tmux.sh; setup_tmux '{dbx_username.value}'""",shell=True,executable="/usr/bin/bash")
subprocess.run(f"""bin/install-sqlserver.sh""",shell=True,executable="/usr/bin/bash")
subprocess.run(f"""bin/download-jars.sh""",shell=True,executable="/usr/bin/bash")
subprocess.run(f"""ARCION_LICENSE='{arcion_license.value}' bin/install-arcion.sh""",shell=True,executable="/usr/bin/bash")
subprocess.run(f"""bin/install-ycsb.sh""",shell=True,executable="/usr/bin/bash")
subprocess.run(f""". ./demo/sqlserver/run-ycsb-sqlserver-source.sh; ping_sql_cli;""",shell=True,executable="/usr/bin/bash")
subprocess.run(f""". ./demo/sqlserver/run-ycsb-sqlserver-source.sh; create_user;""",shell=True,executable="/usr/bin/bash")
subprocess.run(f""". ./demo/sqlserver/run-ycsb-sqlserver-source.sh; set_sqlserver_ram '{dbx_username.value}';""",shell=True,executable="/usr/bin/bash")

# Customize schema and size

Existing tables will be appended with additional rows if the `Fill Range` is the same.  
Increase the `Table Count` to create additional tables.  

The following options are available:
- Table count (Table Cnt): The number of tables to create.  
  - Table names are `ycsbdense`, `ycsbdense2`, `ycsbdense3`, ... and `ycssparse`, `ycsbdense2`, and `ycsbdense3` ...
- Number of Fields (# of Fields): The number of fields per table.  
  - The field names are `FIELD0`, `FIELD1`, `FIELD2`, ...
  - Note the use of `K`,`M`,`B` ... suffix at the end.
- Field Length (Field Len): The length of random character data populated per field.  
  - Note the use of `K`,`M`,`B` ... suffix at the end.
- Record Count (Rec Cnt): The number of records per table generated.
  - Note the use of `K`,`M`,`B` ... suffix at the end.
- Fill Range: The relative start and end range of fields that are populated with data.  Be default: 
    - sparse tables are all NULLs by having the fill range be 0% to 0% ranges
    - dense tables have all fields populated by having the fill range be 0% to 100% of ranges 

```sql
[localhost][arcsrc] 1> \describe ycsbsparse
+-------------+-------------+-----------+-------------+----------------+-------------+
| TABLE_SCHEM | COLUMN_NAME | TYPE_NAME | COLUMN_SIZE | DECIMAL_DIGITS | IS_NULLABLE |
+-------------+-------------+-----------+-------------+----------------+-------------+
| dbo         | YCSB_KEY    | int       |          10 |              0 | NO          |
| dbo         | FIELD0      | text      |  2147483647 |         [NULL] | YES         |
| dbo         | FIELD1      | text      |  2147483647 |         [NULL] | YES         |
```

## Change YCSB Data Size
Make changes below and click `Run All Below`.  
Links: [Overview](#overview) | [Change YCSB Data Size](#change-ycsb-data-size) | [Change YCSB TPS](#change-ycsb-tps) | [Change Arcion Target](#change-arcion-target)

In [0]:
# show YCSB Data Controls
VBox([HBox([Label('Sparse'), sparse_cnt, sparse_fieldcount, sparse_fieldlength, sparse_recordcount, sparse_fillpct]),
    HBox([Label('Dense'),  dense_cnt, dense_fieldcount, dense_fieldlength, dense_recordcount, dense_fillpct])])

In [0]:
# run load_sparse_data_cnt and load_dense_data_cnt 
subprocess.run(f""". ./demo/sqlserver/run-ycsb-sqlserver-source.sh; 
    y_fieldcount={sparse_fieldcount.value} 
    y_fieldlength={sparse_fieldlength.value}  
    y_recordcount={sparse_recordcount.value} 
    y_fillstart={math.ceil((sparse_fillpct.value[0] * sparse_fieldcount.value) / 100)}      
    y_fillend={int((sparse_fillpct.value[1] * sparse_fieldcount.value) / 100)}      
    load_sparse_data_cnt {sparse_cnt.value};
    y_fieldcount={dense_fieldcount.value} 
    y_fieldlength={dense_fieldlength.value} 
    y_recordcount={dense_recordcount.value} 
    y_fillstart={math.ceil((dense_fillpct.value[0] * dense_fieldcount.value) / 100)}      
    y_fillend={int((dense_fillpct.value[1] * dense_fieldcount.value) / 100)}      
    load_dense_data_cnt {dense_cnt.value};
    dump_schema;
    list_table_counts""",
    shell=True,executable="/usr/bin/bash") 


In [0]:
# show loaded tables
pd.read_csv ('/tmp/list_table_counts.csv',header=None, names= ['table name','min key','max key','field count'])

# Run YCSB and Arcion in the background

## Start/Restart YCSB workload at 1 TPS

Choose the options in the UI and run the cell below it to start the workload (YCSB).  


YCSB update (workload A) controls for Dense and Sparse table groups separated. Each group has a separate control.  However, all of the tables in the group use the same controls.  
1. Each table's TPS (throughput per second)
   1. 0=fast as possible
   2. 1=1 TPS
   3. 10=10 TPS
2. Each table's threads (concurrency) used to achieve the desired TPS.

## Change YCSB TPS 
Make changes below and click `Run All Below`.  
Links: [Overview](#overview) | [Change YCSB Data Size](#change-ycsb-data-size) | [Change YCSB TPS](#change-ycsb-tps) | [Change Arcion Target](#change-arcion-target) | [Start Arcion](#start-arcion)

In [0]:
# show YCSB run controls
VBox([HBox([Label('Sparse'), sparse_tps, sparse_threads]), HBox([Label('Dense'),  dense_tps, dense_threads])])

In [0]:
# start/restart YCSB run
subprocess.run(f""". ./demo/sqlserver/run-ycsb-sqlserver-source.sh; 
    kill_ycsb;
    y_target_sparse={sparse_tps.value} 
    y_target_dense={dense_tps.value} 
    y_threads_sparse={sparse_threads.value} 
    y_threads_dense={dense_threads.value} 
    y_fieldlength_sparse={sparse_fieldlength.value} 
    y_fieldlength_dense={dense_fieldlength.value} 
    start_ycsb;""",
    shell=True,executable="/usr/bin/bash")

## About Arcion

Choose the options in the UI and run the cell below it to start the replication.  

The following control are avail in the demo.  
- Arcion - replication type and CDC methods  
- Threads - control the parallelism.
- Target - null, unity catalog or delta lake

NOTE: Full mode does not work at this time.

For SQL Server, change tracking, cdc are available for demo.  

Performance is mainly controlled by the thread count by the extract and apply process.
Additional controls are customizable via modifying the YAML files directly below.
- [CDC YAML files](./demo/sqlserver/yaml/cdc/)
- [Change Tracking YAML files](./demo/sqlserver/yaml/change/)

## Change Arcion Target
Make changes below and click `Run All Below`.  
Links: [Overview](#overview) | [Change YCSB Data Size](#change-ycsb-data-size) | [Change YCSB TPS](#change-ycsb-tps) | [Change Arcion Target](#change-arcion-target) | [Start Arcion](#start-arcion)

In [0]:
# show Arcion and DBX controls

VBox([
      HBox([Label('Arcion'), repl_mode, cdc_mode]),
      HBox([Label('Target'), dbx_destinations, dbx_staging]),
      HBox([Label('Threads'), snapshot_threads, realtime_threads, delta_threads]),
      ])

## Start Arcion
`Run All Below` and Watch the progress on `tmux`.  
Links: [Overview](#overview) | [Change YCSB Data Size](#change-ycsb-data-size) | [Change YCSB TPS](#change-ycsb-tps) | [Change Arcion Target](#change-arcion-target) | [Start Arcion](#start-arcion)

In [0]:
# start/restart Arcion
print (f"""{cdc_mode.value} {repl_mode.value}""")
subprocess.run(f""". ./demo/sqlserver/run-ycsb-sqlserver-source.sh; 
    echo $PROG_DIR;
    cd $PROG_DIR;
    kill_arcion;
    a_repltype='{repl_mode.value}'
    SRCDB_SNAPSHOT_THREADS='{snapshot_threads.value}' 
    SRCDB_REALTIME_THREADS='{realtime_threads.value}' 
    SRCDB_DELTA='{delta_threads.value}'
    DSTDB_TYPE='{dbx_destinations.value}'
    DSTDB_STAGE='{dbx_staging.value}'
    DBX_SPARK_URL='{dbx_spark_url.value}'
    DBX_DATABRICKS_URL='{dbx_databricks_url.value}'
    DBX_ACCESS_TOKEN='{dbx_access_token.value}'
    DBX_HOSTNAME='{dbx_hostname.value}'
    DBX_DBFS_ROOT='/{dbx_username.value}'
    DBX_USERNAME='{dbx_username.value}'
    start_{cdc_mode.value}_arcion;""",
    shell=True,executable="/usr/bin/bash")
