Skip to content

hgbink/spark-jdbc-profiler

Repository files navigation

Spark JDBC Profiler

codecov CI

Spark JDBC Profiler is a collection of utils functions for profiling source databases with spark jdbc connections.

Install it from PyPI

pip install spark_jdbc_profiler

Usage

from spark_jdbc_profiler.whole_db_profiler.mysql_db_profiler import *
from spark_jdbc_profiler.segmentation_profiler.segmentation_gen import *

jdbcUsername = "test_user"
jdbcPassword = "test_pass"
jdbcHostname = "mariadb"
jdbcPort = "3306"
jdbcDatabase = "test"

jdbcUrl = f"jdbc:mysql://{jdbcHostname}:{jdbcPort}/{jdbcDatabase}?zeroDateTimeBehavior=ROUND"
connectionProperties = {"user": jdbcUsername, "password": jdbcPassword}

df = profile_whole_db(spark, jdbcUrl, connectionProperties)
df.show(n=20)

Development

Read the CONTRIBUTING.md file.

About

a collection of utils functions for profiling source databases with spark jdbc connections.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published