## BigQuery Spark Connector

In [None]:
%%configure -f 
{
    "conf": {
        "spark.jars": "<<<PATH_SPARK_BQ_JAR>>>"
    }
}  

## BQ Sync Python Package
If you are not going to leverage an environment, the BQ Sync package needs to be installed at runtime. 

<strong>Please note that if you are scheduling this notebook to run from a pipeline, you must provide the <code>_inlineInstallationEnabled</code> parameter to the pipeline for pip install support.</strong>

In [None]:
%pip install FabricSync>='<<<VERSION>>>' --quiet --disable-pip-version-check

### BQ Spark Connector Optimizations

In [None]:
spark.conf.set("readSessionCacheDurationMins", "1")
spark.conf.set("preferredMinParallelism", 600) #This varies based on size of data and compute environment size
spark.conf.set("responseCompressionCodec", "RESPONSE_COMPRESSION_CODEC_LZ4")
spark.conf.set("bqChannelPoolSize", 80) #Match the number of executor cores for your configuration

#For big data loads, set so that the BQ connection does not timeout
spark.conf.set("httpConnectTimeout", 0)
spark.conf.set("httpReadTimeout", 0)

# Config
The set-up process creates a minimal config file based on the parameters provided. 

You can update the config file at anytime and manually upload to an alternate path. 

Note: If you upload to a OneLake destination, it must be in the default Lakehouse and the <code>config_json_path</code> should point to the File API path (example: <code>/lakehouse/default/Files/myconfigfile.json</code>).

In [None]:
config_json_path = "<<<PATH_TO_USER_CONFIG>>>"

schedule_type = SyncScheduleType.AUTO
optimize_metadata = False
credential_provider = notebookutils.credentials

In [None]:
from FabricSync.BQ.Sync import BQSync
from FabricSync.BQ.Enum import SyncScheduleType

# Running BQ Sync

In [None]:
bq_sync = BQSync(config_json_path, credential_provider)
bq_sync.sync_metadata()

Before you continue, carefully evaluate your config for correctness.

Once you run the next step, your load configuration is locked and cannot be changed without manually resetting the sync metadata and sync'd data.

In [None]:
bq_sync.run_schedule(schedule_type=schedule_type, sync_metadata=False, optimize_metadata=optimize_metadata)