Skip to content

sidataplus/ETL-Synthea-dbt

Repository files navigation

ETL-Synthea-dbt

Using dbt (data build tool) to convert Synthea synthetic data to OMOP Common Data Model in PostgreSQL

Requirements

  • dbt-core >= 1.0.0
  • dbt-postgres >= 1.0.0
  • sqlfluff >= 2.0.0
  • sqlfluff-templater-dbt >= 2.0.0

Set-up

  1. Install dbt with Postgres adapter, sqlfluff and sqlfluff-templater-dbt
pip install dbt-postgres sqlfluff sqlfluff-templater-dbt
  1. If the installation is successful, you should have dbt CLI accessible globally. To test run
dbt --version
  1. Go to ~/.dbt/ (user home directory, i.e., not this repo directory), create profiles.yml with the example from example.profiles.yml. Fill in database credentials, read instructions in Postgres Profile config.
    dbt can write and read to one database connection at a time. dbt is not an ETL tool, as it does just the T.

  2. Test settings & DB connection, run

dbt debug

  1. Install packages for macro
dbt deps
  1. Write code. Learn more from Official free online courses.

  2. We use SQLFluff to make our SQL code clean and consistent.

Install SQLFluff and sqlfluff-templater-dbt with

pip install sqlfluff sqlfluff-templater-dbt

To lint SQL codes, run

sqlfluff lint .

To fix linting issues, run

sqlfluff fix .

or the following to force fix

sqlfluff fix -f .
  1. To test whether the code is good on our database or not, compile the model with
dbt compile

(optional: with --models like dbt run below).

  1. To run models,
  • dbt run - regular run
  • Model selection syntax (source). Specifying models can save you a lot of time by only running/testing the models that you think are relevant. However, there is a risk that you'll forget to specify an important upstream dependency so it's a good idea to understand the syntax thoroughly:
    • dbt run --models modelname - will only run modelname
    • dbt run --models +modelname - will run modelname and all parents
    • dbt run --models modelname+ - will run modelname and all children
    • dbt run --models +modelname+ - will run modelname, and all parents and children
    • dbt run --models @modelname - will run modelname, all parents, all children, AND all parents of all children
    • dbt run --exclude modelname - will run all models except modelname
    • Note that all of these work with folder selection syntax too:
      • dbt run --models folder - will run all models in a folder
      • dbt run --models folder.subfolder - will run all models in the subfolder
      • dbt run --models +folder.subfolder - will run all models in the subfolder and all parents
  1. To run models with test, use
dbt build

(optional: with --models like dbt run above).

  1. To generate doc, run dbt docs generate, serve by dbt docs serve. Or, run them together as
dbt docs generate && dbt docs serve --port 8080

Other resources

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published