Spark JDBC Profiler

Spark JDBC Profiler is a collection of utils functions for profiling source databases with spark jdbc connections.

Install it from PyPI

pip install spark_jdbc_profiler

Usage

from spark_jdbc_profiler.whole_db_profiler.mysql_db_profiler import *
from spark_jdbc_profiler.segmentation_profiler.segmentation_gen import *

jdbcUsername = "test_user"
jdbcPassword = "test_pass"
jdbcHostname = "mariadb"
jdbcPort = "3306"
jdbcDatabase = "test"

jdbcUrl = f"jdbc:mysql://{jdbcHostname}:{jdbcPort}/{jdbcDatabase}?zeroDateTimeBehavior=ROUND"
connectionProperties = {"user": jdbcUsername, "password": jdbcPassword}

df = profile_whole_db(spark, jdbcUrl, connectionProperties)
df.show(n=20)

Development

Read the CONTRIBUTING.md file.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github		.github
docs		docs
spark_jdbc_profiler		spark_jdbc_profiler
tests		tests
.gitignore		.gitignore
ABOUT_THIS_TEMPLATE.md		ABOUT_THIS_TEMPLATE.md
CONTRIBUTING.md		CONTRIBUTING.md
Containerfile		Containerfile
HISTORY.md		HISTORY.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
mkdocs.yml		mkdocs.yml
requirements-test.txt		requirements-test.txt
requirements.txt		requirements.txt
setup.py		setup.py

License

hgbink/spark-jdbc-profiler

Folders and files

Latest commit

History

Repository files navigation

Spark JDBC Profiler

Install it from PyPI

Usage

Development

About

Topics

Resources

License

Stars

Watchers

Forks

Languages