Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(#1328) add guide for creating dbt models #1394

Merged
merged 16 commits into from
Sep 9, 2024
Merged

Conversation

witash
Copy link
Contributor

@witash witash commented May 17, 2024

Closes #1328
Closes #1447

@witash witash changed the title feat(#1328) add guide fro creating dbt models feat(#1328) add guide for creating dbt models May 20, 2024
@njuguna-n
Copy link
Contributor

I'll complete my review tomorrow. Time got away from me a bit today.

Copy link
Contributor

@njuguna-n njuguna-n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good but left some minor comments.


The url is set in the `CHT_PIPELINE_BRANCH_URL` environment variable, either in .env if using docker compose, or in values.yaml if useing kubernetes.

When models are changed, CHT Sync needs to be restarted for the change to take effect; when it restarts, it automatically downloads the latest version of cht pipeline and applies than changes; for models materialized as views, they are are updated instantly, for models materialized as incmrenetal tables, the table is dropped and will be unavailable until it is rebuilt from scratch.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When models are changed, CHT Sync needs to be restarted for the change to take effect; when it restarts, it automatically downloads the latest version of cht pipeline and applies than changes; for models materialized as views, they are are updated instantly, for models materialized as incmrenetal tables, the table is dropped and will be unavailable until it is rebuilt from scratch.
When models are changed, CHT Sync needs to be restarted for the change to take effect; when it restarts, it automatically downloads the latest version of cht pipeline and applies the changes; for models materialized as views, they are are updated instantly, for models materialized as incremental tables, the table is dropped and will be unavailable until it is rebuilt from scratch.


The url is set in the `CHT_PIPELINE_BRANCH_URL` environment variable, either in .env if using docker compose, or in values.yaml if useing kubernetes.

When models are changed, CHT Sync needs to be restarted for the change to take effect; when it restarts, it automatically downloads the latest version of cht pipeline and applies than changes; for models materialized as views, they are are updated instantly, for models materialized as incmrenetal tables, the table is dropped and will be unavailable until it is rebuilt from scratch.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth it to add a sentence discouraging editing of the base models given how long it would take to rebuild the tables if there is already a lot of data in the database.

|--|--|
|`uuid`|Unique identifier of the record. Note this is both the primary key for this table, and a foreign key to the couchdb table.|
|`contact_uuid`| uuid of the `contact` who made the report|
|`parent_contact_uuid`| uuid of the parent of `contact` who made the report (usually a place)|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having this named as contact_parent_uuid as we have it in the model currently makes it clearer.

- Models to contain aggregates that may be useful for dashboards or analysis

### Form models
for each form in the CHT app, create one model which selects from `data_record where form = 'theformyouwant'`, moves the fields from the `fields` json into columns, applies any transformations that would be convenient, and if necessary, indexes them.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for each form in the CHT app, create one model which selects from `data_record where form = 'theformyouwant'`, moves the fields from the `fields` json into columns, applies any transformations that would be convenient, and if necessary, indexes them.
for each form in the CHT app, create one model which selects from `data_record where form = 'theformyouwant'`, moves the fields from the `fields` json into columns, applies any transformations that would be convenient, and if necessary, add indexes to them.

Copy link
Member

@andrablaj andrablaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@witash this is an excellent start to the DBT model documentation. Thank you for the detailed steps and the example queries.

My suggestions are mainly typo and style correction.

Considering @njuguna-n's latest findings about lambda views, we will need a separate PR for those changes. I suggest:

  • fixing the current styles and typos
  • creating a separate PR to be merged into this branch with the latests changes from Njuguna
  • re-doing a review.

Co-authored-by: Andra Blaj <andra@medic.org>
Co-authored-by: njuguna-n <141340177+njuguna-n@users.noreply.github.com>
Copy link
Member

@andrablaj andrablaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small style changes.

@andrablaj
Copy link
Member

@njuguna-n, how do you plan to proceed with updating this doc to include the latest changes (e.g. lambda views)?

@njuguna-n
Copy link
Contributor

@andrablaj I have added a section about the base lambda view here.

njuguna-n and others added 2 commits June 6, 2024 10:47
* docs: add a description about the couchdb lambda view

* Update content/en/apps/guides/data/analytics/building-dbt-models.md

Co-authored-by: Andra Blaj <andra@medic.org>

---------

Co-authored-by: Andra Blaj <andra@medic.org>
Copy link
Contributor

@njuguna-n njuguna-n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good so far but we need to update some sections that have changed since this was first written. I think we should also add a section describing how to handle deleted documents using FK relationships and cascade deletes.

```yml
packages:
- git: "https://github.com/medic/cht-pipeline"
revision: "1.0.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing as we do not have versioning on cht-pipeline yet we should suggest using the main branch and updating this once this issue is done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea the thing about using the main branch is they might get changes that they don't want or are not prepared for. Maybe we wait until that issue is done boefre merging this PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@witash we now have versions in cht-pipeline.


## Setup

To create application specifc models, create a new dbt project as described [here](https://docs.getdbt.com/reference/commands/init) and a Github repository (it may be public or private).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To create application specifc models, create a new dbt project as described [here](https://docs.getdbt.com/reference/commands/init) and a Github repository (it may be public or private).
To create application specific models, create a new dbt project as described [here](https://docs.getdbt.com/reference/commands/init) and a Github repository (it may be public or private).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be updated to include the new tables e.g. document_metadata and update the columns in the existing tables.

Copy link
Contributor

@mrjones-plip mrjones-plip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering why we need a new GH repo for dbt and Andra pointed me to these nascent docs. I understand now!

I'm suggesting some minor re-wording for clarity

witash and others added 5 commits September 2, 2024 10:27
Co-authored-by: mrjones <8253488+mrjones-plip@users.noreply.github.com>
Co-authored-by: Andra Blaj <andra@medic.org>
Co-authored-by: njuguna-n <141340177+njuguna-n@users.noreply.github.com>
@witash
Copy link
Contributor Author

witash commented Sep 4, 2024

this now depends on medic/cht-pipeline#155 which contains the contact_type models, changes to person and place models, and macros

if there's no changes from review on that, this is ready for re-review

Copy link
Contributor

@mrjones-plip mrjones-plip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping my review very narrowly scoped, trusting others are reviewing this PR more widely. As such, my two minor feedback points were addressed - ship it!

Copy link
Member

@andrablaj andrablaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments are mostly typos and formatting-related. Good job at having this done!

Co-authored-by: Andra Blaj <andra@medic.org>
Copy link
Contributor

@njuguna-n njuguna-n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking very good. left a couple of additional comments.

Copy link
Contributor

@njuguna-n njuguna-n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Great job!

Copy link
Member

@andrablaj andrablaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@witash witash merged commit d903430 into main Sep 9, 2024
2 checks passed
@witash witash deleted the 1328-building-dbt-models branch September 9, 2024 06:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update documentation for model deployment Document DBT model creation in CHT Pipeline
4 participants