Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generated result query in report contains an invalid table #1483

Open
cshlin opened this issue Apr 9, 2024 · 3 comments
Open

Generated result query in report contains an invalid table #1483

cshlin opened this issue Apr 9, 2024 · 3 comments

Comments

@cshlin
Copy link

cshlin commented Apr 9, 2024

Describe the bug
When trying to copy the result query to view the failed rows, I get a query that contains an invalid tmp table

To Reproduce
Steps to reproduce the behavior:

  1. On the generated EDR html report, click on Results
  2. Expand a test failure, and in the Result section, click the Copy button beside the Result Query
  3. Receive the following query (see the FROM clause on line 50 in the attached .txt file)

edr.txt

Expected behavior
A query that provides me with the failed rows

Screenshots

EDR

Environment (please complete the following information):

  • edr Version: 0.14.1
  • dbt package Version: 0.14.1

Additional context
Add any other context about the problem here.

@cshlin cshlin added Bug Something isn't working Triage 👀 labels Apr 9, 2024
@haritamar
Copy link
Collaborator

Hi @cshlin !
Thanks for posting this issue and sorry for the delayed response.
You're right - there is a temp table here that is available during the test run, but is not available afterwards. We need to save instead a query that will not require this temp table.

Any chance you'd like to contribute a fix to this?
I'd be happy to provide guidance with a possible approach to fix this.

@cshlin
Copy link
Author

cshlin commented May 19, 2024 via email

@haritamar
Copy link
Collaborator

Thanks @cshlin !

It's admittedly not a easy fix - but I'll write what I think can be a way to solve it.
Essentially I think we need to save a different query in the test_result_query than what is being actually run - a version of the query that does not rely on temp tables.
(This is needed, because the metrics of all tests are only saved to the permanent data_monitoring_metrics table in the on_run_end hook of dbt, which is after the tests are actually being run)

These areas likely need to be changed:

  1. get_read_anomaly_scores_query should get a parameter that allows it to compute the anomaly scores directly, and not through a temp table that is previously generated.
  2. We should also ensure that in this mode, get_anomaly_scores_query pulls only from data_monitoring_metrics and not from a union of it with a temporary table.
  3. The store_anomaly_test_results should get an additional saved_anomaly_scores_sql parameter which will contain a query that does not rely on a temp table, and the test_results_query should be computed by using it.

The full flow where the anomaly query is generated and the result is saved exists in each of the anomaly test implementations.
For example in test_table_anomalies.sql it would be these rows:

{% set flattened_test = elementary.flatten_test(context["model"]) %}
{% set anomaly_scores_sql = elementary.get_read_anomaly_scores_query() %}
{% do elementary.store_metrics_table_in_cache() %}
{% do elementary.store_anomaly_test_results(flattened_test, anomaly_scores_sql) %}

Please let me know if you plan to look into it, and feel free to ask additional questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants