Ele 826 orchestration integration#900
Conversation
ELE-826 Orchestration (Airflow) integration - backend
Overall goal:
Technical details:
Concrete tasks:
|
|
👋 @NoyaArie |
IDoneShaveIt
left a comment
There was a problem hiding this comment.
Most of the comments are small 🙂
| return DbtInvocationSchema() | ||
|
|
||
| def get_invocations_by_ids( | ||
| self, macro_args: Optional[dict] = None |
There was a problem hiding this comment.
Instead of expecting macro_args as param, spread it to its content.
I know that in the method above this is how it is done but this is not a good example (should be changed there as well).
Spreading the macro_args params to its content lets us know what are those "macro_args" and their real typing (what is dict containing? what are the keys I should pass to it? what are their values suppose to be?)
|
|
||
| def get_invocations_by_ids( | ||
| self, macro_args: Optional[dict] = None | ||
| ) -> [DbtInvocationSchema]: |
There was a problem hiding this comment.
| ) -> [DbtInvocationSchema]: | |
| ) -> List[DbtInvocationSchema]: |
And don't forget to import it from typing 🙂
| resources_latest_invocation_dict = dict() | ||
| for result in resources_latest_invocation_results: | ||
| resources_latest_invocation_dict[result["unique_id"]] = result[ | ||
| "invocation_id" | ||
| ] | ||
|
|
||
| return resources_latest_invocation_dict |
There was a problem hiding this comment.
| resources_latest_invocation_dict = dict() | |
| for result in resources_latest_invocation_results: | |
| resources_latest_invocation_dict[result["unique_id"]] = result[ | |
| "invocation_id" | |
| ] | |
| return resources_latest_invocation_dict | |
| resources_latest_invocation_map = {result["unique_id"]: result["invocation_id"] for result in resources_latest_invocation_results} | |
| return resources_latest_invocation_map |
| @@ -0,0 +1,22 @@ | |||
| {% macro get_resources_latest_invocation() %} | |||
| {% set dbt_run_results = ref('dbt_run_results') %} | |||
| {%- if elementary.relation_exists(dbt_run_results) -%} | |||
There was a problem hiding this comment.
I think that there is no need to check if this table exists.
If a user ran elementary dbt package he should have this table.
Plus I am not sure we want to do nothing if this table does not exist.
| {% set dbt_run_results = ref('dbt_run_results') %} | ||
| {%- if elementary.relation_exists(dbt_run_results) -%} | ||
| {% set get_resources_latest_invocation_query %} | ||
| with row_numbered_run_results as ( |
There was a problem hiding this comment.
Lets give it a more meaningful name than row_numbered 🙂
Something like ordered_run_results or something that explains what is the meaning of the row_number
| {% endif %} | ||
|
|
||
| {% set get_invocations_query %} | ||
| select * from {{ invocations_relation }} where invocation_id in {{ elementary.strings_list_to_tuple(ids) }} |
There was a problem hiding this comment.
Lets spread * to make it easy to understand which data is returning from this query 🙂
| {% set database, schema = elementary.target_database(), target.schema %} | ||
| {% set invocations_relation = adapter.get_relation(database, schema, 'dbt_invocations') %} | ||
| {% if not invocations_relation %} | ||
| {% do elementary.edr_log('failed getting invocations relation') %} |
There was a problem hiding this comment.
Did you try to run this macro when there is no invocation_relation?
We used edr_log to be able to parse the results from our macro runs using the dbt runner.
I am not sure this will give you the wanted effect 🤔
| {% endset %} | ||
| {% set result = elementary.run_query(get_invocations_query) %} | ||
| {% if not result %} | ||
| {% do elementary.edr_log('no invocations were found') %} |
There was a problem hiding this comment.
Same comment about the edr_log
| else: | ||
| raise NotImplementedError | ||
|
|
||
| def get_invocations_by_ids(self, invocations_ids: list[str]) -> DbtInvocationSchema: |
There was a problem hiding this comment.
| def get_invocations_by_ids(self, invocations_ids: list[str]) -> DbtInvocationSchema: | |
| def get_invocations_by_ids(self, invocations_ids: List[str]) -> List[DbtInvocationSchema]: |
There was a problem hiding this comment.
Why? Using List is deprecated
https://peps.python.org/pep-0585/#implementation
list # typing.List
dict # typing.Dict
...
Importing those from typing is deprecated. Due to PEP 563 and the intention to minimize the runtime impact of typing, this deprecation will not generate DeprecationWarnings. Instead, type checkers may warn about such deprecated usage when the target version of the checked program is signalled to be Python 3.9 or newer. It's recommended to allow for those warnings to be silenced on a project-wide basis.
The deprecated functionality will be removed from the typing module in the first Python version released 5 years after the release of Python 3.9.0.
| @@ -1,3 +1,5 @@ | |||
| from typing import Dict | |||
There was a problem hiding this comment.
| from typing import Dict | |
| from typing import Dict, List |
| filters=serializable_filters, | ||
| lineage=serializable_lineage, | ||
| invocations=invocations, | ||
| resources_latest_invocation=resources_latest_invocation, |
There was a problem hiding this comment.
Do you need both of the new objects at the front?
Include in the report information about the last invocation for each resource.