This repository performs the following:
- Generates bash scripts and sql queries from excel and csv mapping documents (generate_sql_output.py)
- Converts mapping documents from spreadsheets to csv and update config file with updated csv file name when applicable (mapping_converter.py)
- Generate ddl scripts based on config and schema files (generate_ddl_output.py)
For this project, Python 3.9+ is required. To ensure the proper distribution of Python is installed, use the version command
$ python --version
NOTE: MacOS ships with Python 2 by default as the
python
command, and will typically also include a Python 3 distribution available as the commandpython3
. If this is the case on your machine, usepython3
instead for all setup steps.
There are several ways to host this project, including:
- Within an IDE, such as PyCharm
- In a Python isolation environment, such as virtualenv
- In a Docker container
-
Clone this repository to the target build machine.
-
Create a virtual environment for the project then activate it.
$ python -m venv venv $ . venv/bin/activate
$ python -m venv venv $ .\venv\Scripts\activate.bat
-
Use pip to install all project dependencies into the virtual environment
$ pip install -r requirements.txt
-
Run the unit tests to verify the install
$ python -m unittest
In order to generate SQL, run the transform_generator.generate_sql_output
module from the root of this project.
Example files are included in this project's test directory, and can be used to quickly test SQL Generation. The following command will utilize the relative paths of these files and writes all output to an output folder at the project root.
$ python -m transform_generator.generate_sql_output --config_path test/Resources/positive_cases/config --schema_path test/Resources/positive_cases/schema --mapping_sheet_path test/Resources/positive_cases/mapping --project_config_path test/Resources/positive_cases/project_config/project_config_test.csv --output_datafactory output/datafactory --output_databricks output/databricks
--config_path
: path to the config directory for the transform generator.--schema_path
: path to the directory containing .csv files for schemas--mapping_sheet_path
: path to the directory containing mapping sheets--project_config_path
semicolon delimited list to paths of the project config files--output_databricks
: path to the folder where databricks output is written. If folders in this path do not exist, they will be created.--output_datafactory
: path to the folder where databricks output is written. If folders in this path do not exist, they will be created.
$ python -m transform_generator.generate_ddl_output --config_path test/Resources/positive_cases/config --schema_path test/Resources/positive_cases/schema --mapping_sheet_path test/Resources/positive_cases/mapping --project_config_path test/Resources/positive_cases/project_config/project_config_test.csv --output_datafactory output/datafactory --output_databricks output/databricks
--config_path
: path to the config directory for the transform generator.--schema_path
: path to the directory containing .csv files for schemas--mapping_sheet_path
: path to the directory containing mapping sheets--project_config_path
semicolon delimited list to paths of the project config files--output_databricks
: path to the folder where databricks output is written. If folders in this path do not exist, they will be created.--output_datafactory
: path to the folder where databricks output is written. If folders in this path do not exist, they will be created.
Running the Documentation mandates a virtual environment activated with the requirements.txt file installed. This is accomplished as a through the completion of the first three Setup Steps.
Transformation Generator uses the static site generator MkDocs to build project documentation. Once The project has been set up in a virtual environment, the documentation can be built for local development by:
-
Navigating to the
documentation
directory$ cd ./documentation
-
Starting the development server
$ mkdocs serve
All documentation source files are contained within the documentation/docs
directory and are written in Markdown. By default, the built site will be available at http://localhost:8000/
Generating an unit test coverage file or updating a pre-existing one can be done with following steps:
-
Activate the virtual environment created during setup
$ . venv/bin/activate
-
Run the following commands to clear and generate new code coverage analysis:
$ coverage erase $ coverage
-
View the report
$ coverage report
Running the webserver mandates a virtual environment activated with the requirements.txt file installed. This is accomplished as a through the completion of the first three Setup Steps.
This application provides an API which can be hosted locally. To run it, at the root of the project run the module transform_generator.api.api
from within a virtual environment.
$ python -m transform_generator.api.api
The API will start listening on port 8001 by default.
Swagger documentation outlining the various endpoints offered by the API can be accessed from /docs