Spark JDBC Profiler is a collection of utils functions for profiling source databases with spark jdbc connections.
pip install spark_jdbc_profiler
from spark_jdbc_profiler.whole_db_profiler.mysql_db_profiler import *
from spark_jdbc_profiler.segmentation_profiler.segmentation_gen import *
jdbcUsername = "test_user"
jdbcPassword = "test_pass"
jdbcHostname = "mariadb"
jdbcPort = "3306"
jdbcDatabase = "test"
jdbcUrl = f"jdbc:mysql://{jdbcHostname}:{jdbcPort}/{jdbcDatabase}?zeroDateTimeBehavior=ROUND"
connectionProperties = {"user": jdbcUsername, "password": jdbcPassword}
df = profile_whole_db(spark, jdbcUrl, connectionProperties)
df.show(n=20)
Read the CONTRIBUTING.md file.