A python ETL libRary (SPETLR) for Databricks powered by Apache SPark.
Visit SPETLR official webpage: https://spetlr.com/
Start Supporting DBR LTS14.3: Follow the PR..
- Cluster test submission with spetlr-tools
- Upgrade to Python 3.10
- The spetlr library (probably except SQL connection with ODBC) still supports older LTS versions between 9.1 and 13.3, but only 14.3 is tested.
- SQL ODBC driver version 18 is suppoetd (this is a breaking change if you haven't upgraded your ODBC driver).
- Neweset CosmosDB connector is tested for compatibility with DBR LTS14.3.
- Description
- Important Notes
- Installation
- Development Notes
- Testing
- General Project Info
- Contributing
- Build Status
- Releases
- Requirements
- Contact
SPETLR has a lot of great tools for working with ETL in Databricks. But to make it easy for you to consider why you need SPETLR here is a list of the core features:
-
ETL framework: A common ETL framework that enables reusable transformations in an object-oriented manner. Standardized structures facilitate cooperation in large teams.
-
Integration testing: A framework for creating test databases and tables before deploying to production in order to ensure reliable and stable data platforms. An additional layer of data abstraction allows full integration testing.
-
Handlers: Standard connectors with commonly used options reduce boilerplate.
For more information, visit SPETLR official webpage: https://spetlr.com/
This package can not be run or tested without access to pyspark
.
However, installing pyspark
as part of our installer gave issues when
other versions of pyspark
were needed. Hence we took out the dependency
from our installer.
pip install spetlr
To prepare for development, please install these additional requirements:
- Java 8
pip install -r test_requirements.txt
Then install the package locally
python setup.py develop
After installing the dev-requirements, execute tests by running:
pytest tests
These tests are located in the ./tests/local
folder and only require a Python interpreter. Pull requests will not be
accepted if these tests do not pass. If you add new features, please include corresponding tests.
Tests in the ./tests/cluster
folder are designed to run on a Databricks cluster.
The Pre-integration Test
utilizes Azure Resource deployment - and can only be run by the spetlr-org admins.
To deploy the necessary Azure resources to your own Azure Tenant, run the following command:
.\.github\deploy\deploy.ps1 -uniqueRunId "yourUniqueId"
Be aware that the applied name for uniqueRunId should only contain lower case and numbers, and its length should not exceed 12 characters.
Afterward, execute the following commands:
.\.github\submit\build.ps1
.\.github\submit\submit_test_job.ps1
Feel free to contribute to SPETLR. Any contributions are appreciated - not only new features, but also if you find a way to improve SPETLR.
If you have a suggestion that can enhance SPETLR, please fork the repository and create a pull request. Alternatively, you can open an issue with the "enhancement" tag.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/NewSPETLRFeature
) - Commit your Changes (
git commit -m 'Add some SEPTLRFeature'
) - Push to the Branch (
git push origin feature/NewSPETLRFeature
) - Open a Pull Request
Releases to PyPI is an Github Action which needs to be manually triggered.
The library has three txt-files at the root of the repo. These files defines three levels of requirements:
requirements_install.txt
- this file contains the required libraries to be able to install spetlr.requirements_test.txt
- libraries required to run unit- and integration testsrequirements_dev.txt
- libraries required in the development process in order to contribute to the repo
All libraries and their dependencies are added with a fixed version to the configuration file setup.cfg
using the defined requirements from requirements_install.txt
.
To upgrade the the dependencies in the setup.cfg
file do the following:
- Create a new branch
- Run
upgrade_requirements.ps1
in your terminal - Commit the changes the script has made to the cfg file. If there are no changes, everything is up to date.
- The PR runs all tests and ensure that the library is compliant with any updates
Note that if it is desired to upgrade a dependency, but not to its newest version, it is possible to set the desired version in the requirements_install.txt
, then this will be respected by the upgrade script.
For any inquiries, please use the SPETLR Discord Server.