[COST-4750] OCP Scrub disabled tags in trino. #4961

myersCody · 2024-03-08T20:32:40Z

Jira Ticket

Description

This change will move the scrubbing of disabled tags up into the trino flow.

Testing

Snip it of the original SQL:

WITH cte_ocp_node_label_line_item_daily AS (
    SELECT date(nli.interval_start) as usage_start,
        nli.node,
        nli.node_labels
    FROM openshift_node_labels_line_items_daily AS nli
    WHERE nli.source = '4e653efb-347b-473d-887b-b8162f697b44'
        AND nli.year = '2024'
        AND nli.month = '02'
        AND nli.interval_start >= DATE('2024-02-01')
        AND nli.interval_start < DATE('2024-03-01')
    GROUP BY date(nli.interval_start),
        nli.node,
        nli.node_labels
),
cte_ocp_namespace_label_line_item_daily AS (
    SELECT date(nli.interval_start) as usage_start,
        nli.namespace,
        nli.namespace_labels
    FROM openshift_namespace_labels_line_items_daily AS nli
    WHERE nli.source = '4e653efb-347b-473d-887b-b8162f697b44'
        AND nli.year = '2024'
        AND nli.month = '02'
        AND nli.interval_start >= DATE('2024-02-01')
        AND nli.interval_start < DATE('2024-03-01')
    GROUP BY date(nli.interval_start),
        nli.namespace,
        nli.namespace_labels
)
SELECT
    count(pod),
    date(interval_start),
    li.namespace,
    li.node,
    sum(pod_request_cpu_core_seconds) / 3600 as pod_req_sum,
    map_concat(
            cast(json_parse(coalesce(nli.node_labels, '{}')) as map(varchar, varchar)),
            cast(json_parse(coalesce(nsli.namespace_labels, '{}')) as map(varchar, varchar)),
            cast(json_parse(li.pod_labels) as map(varchar, varchar))
        ) as pod_labels
FROM openshift_pod_usage_line_items_daily as li
LEFT JOIN cte_ocp_node_label_line_item_daily as nli
        ON nli.node = li.node
            AND nli.usage_start = date(li.interval_start)
LEFT JOIN cte_ocp_namespace_label_line_item_daily as nsli
        ON nsli.namespace = li.namespace
            AND nsli.usage_start = date(li.interval_start)
WHERE source = '4e653efb-347b-473d-887b-b8162f697b44'
        AND year = '2024'
        AND month = '02'
        AND interval_start >= DATE('2024-02-01')
        AND interval_start < DATE('2024-03-01')
GROUP BY date(li.interval_start), li.namespace, li.node, li.source, 6
ORDER BY date(li.interval_start);

Snip it of the new SQL:

with cte_pg_enabled_keys as (
    select array_agg(key order by key) as keys
      from postgres.org1234567.reporting_enabledtagkeys
     where enabled = true
     and provider_type = 'OCP'
),
cte_ocp_namespace_label_line_item_daily AS (
SELECT DISTINCT
    date(nli.interval_start) as usage_start,
    nli.namespace,
    cast(
        map_filter(
            cast(json_parse(nli.namespace_labels) as map(varchar, varchar)),
            (k,v) -> contains(pek.keys, k)
        ) as json
    ) as namespace_labels
FROM hive.org1234567.openshift_namespace_labels_line_items_daily AS nli
CROSS JOIN cte_pg_enabled_keys AS pek
WHERE
    nli.source = '4e653efb-347b-473d-887b-b8162f697b44'
    AND nli.year = '2024'
    AND nli.month = '02'
    AND nli.interval_start >= DATE('2024-02-01')
    AND nli.interval_start < DATE('2024-03-01')
GROUP BY date(nli.interval_start),
    nli.namespace,
    3 -- needs to match the map_filter
),
cte_ocp_node_label_line_item_daily AS (
SELECT
    date(nli.interval_start) as usage_start,
    nli.node,
    cast(
        map_filter(
            cast(json_parse(nli.node_labels) as map(varchar, varchar)),
            (k,v) -> contains(pek.keys, k)
        ) as json
    ) as node_labels
FROM
    hive.org1234567.openshift_node_labels_line_items_daily AS nli
CROSS JOIN cte_pg_enabled_keys AS pek
WHERE
    nli.source = '4e653efb-347b-473d-887b-b8162f697b44'
    AND nli.year = '2024'
    AND nli.month = '02'
    AND nli.interval_start >= DATE('2024-02-01')
    AND nli.interval_start < DATE('2024-03-01')
GROUP BY
    date(nli.interval_start),
    nli.node,
    3 -- needs to match the map filter
)
SELECT
    count(pod),
    date(interval_start),
    li.namespace,
    li.node,
    sum(pod_request_cpu_core_seconds) / 3600 as pod_req_sum,
    map_concat(
            cast(coalesce(nli.node_labels, CAST('{}' AS JSON)) as map(varchar, varchar)),
            cast(coalesce(nsli.namespace_labels, CAST('{}' AS JSON)) as map(varchar, varchar)),
            map_filter(
                cast(json_parse(li.pod_labels) AS MAP(VARCHAR, VARCHAR)),
                (k, v) -> CONTAINS(pek.keys, k)
            )
        ) as pod_labels
FROM openshift_pod_usage_line_items_daily as li
CROSS JOIN cte_pg_enabled_keys AS pek
LEFT JOIN cte_ocp_node_label_line_item_daily as nli
        ON nli.node = li.node
            AND nli.usage_start = date(li.interval_start)
            AND nli.node_labels != CAST('{}' AS JSON)
LEFT JOIN cte_ocp_namespace_label_line_item_daily as nsli
        ON nsli.namespace = li.namespace
            AND nsli.usage_start = date(li.interval_start)
            AND nsli.namespace_labels != CAST('{}' AS JSON)
WHERE source = '4e653efb-347b-473d-887b-b8162f697b44'
    AND year = '2024'
    AND month = '02'
    AND interval_start >= DATE('2024-02-01')
    AND interval_start < DATE('2024-03-01')
GROUP BY date(li.interval_start), li.namespace, li.node, li.source, 6
ORDER BY date(li.interval_start);

You can compare these.

Another way to test is to just set all OCP tags as disabled, and refresh. You should now see no pod_labels in the daily summary table in postgresql.

Notes

...

maskarb · 2024-03-08T21:54:09Z

koku/api/report/test/util/model_bakery_loader.py

+            # remove disabled tag keys
+            sql_params = {
+                "start_date": self.first_start_date,
+                "end_date": self.last_end_date,
+                "report_period_ids": report_period_ids,
+                "schema": self.schema,
+            }
+            sql = pkgutil.get_data("api.report.test.util.mock_trino_sql", "ocp/remove_disabled_tags.sql")
+            sql = sql.decode("utf-8")
+            table_name = accessor._table_map["line_item_daily_summary"]
+            accessor._prepare_and_execute_raw_sql_query(table_name, sql, sql_params)


I don't understand the purpose of this change. Why not just leave this functionality where it is and just update the names and the docstring to indicate that it is used for unit testing? The purpose of the accessor is to run sql against the db, so we should leave that functionality there.

I just sent a message to the team chat about this 😂 The reason for the change is that I would like to disconnect these files from our data processing flow. I want to make clear that the only purpose this file serves now is for unit testing.

I am open to suggestions though, if we want to keep it in the accessor that is fine. I just want a cleaner way of understanding which sql files are being used for unit-testing purposes only. But that could just mean moving them to a separate directory.

codecov · 2024-03-08T21:58:22Z

Codecov Report

Merging #4961 (453185c) into main (a1efbbe) will increase coverage by 0.0%.
The diff coverage is 100.0%.

Additional details and impacted files

@@          Coverage Diff          @@
##            main   #4961   +/-   ##
=====================================
  Coverage   94.0%   94.0%           
=====================================
  Files        375     375           
  Lines      31116   31117    +1     
  Branches    3697    3697           
=====================================
+ Hits       29264   29265    +1     
  Misses      1181    1181           
  Partials     671     671

myersCody · 2024-03-11T13:20:46Z

/retest

myersCody · 2024-03-11T16:58:03Z

/retest

myersCody · 2024-03-11T20:04:30Z

/retest

myersCody · 2024-03-12T03:20:37Z

/retest

lcouzens · 2024-03-12T09:00:54Z

/retest

lcouzens · 2024-03-12T09:25:19Z

Tests are failing on bad sql.

needs fixing

lcouzens

Nice job Cody

[COST-4750] OCP Scrub disabled tags in trino.

0b34a08

github-actions bot added the smokes-required label Mar 8, 2024

myersCody added the ocp-smoke-tests pr_check will build the image and run ocp + ocp on [clouds] smoke tests label Mar 8, 2024

myersCody added 2 commits March 8, 2024 16:30

Fix unittests.

c967dd0

Final clean up.

4051a81

myersCody marked this pull request as ready for review March 8, 2024 21:42

myersCody requested review from a team as code owners March 8, 2024 21:42

maskarb reviewed Mar 8, 2024

View reviewed changes

myersCody added 2 commits March 9, 2024 20:12

Address comments.

05a31ca

Clean up /test/util/ directories.

c390fc0

Remove unnecessary change.

34bf0da

lcouzens previously approved these changes Mar 11, 2024

View reviewed changes

Remove commas

46a8bd2

myersCody marked this pull request as draft March 12, 2024 13:17

myersCody marked this pull request as ready for review March 12, 2024 15:34

lcouzens approved these changes Mar 12, 2024

View reviewed changes

Merge branch 'main' into COST-4750-OCP-scrub-disabled-tags

453185c

myersCody enabled auto-merge (squash) March 12, 2024 15:35

myersCody merged commit d939072 into main Mar 12, 2024
11 checks passed

myersCody deleted the COST-4750-OCP-scrub-disabled-tags branch March 12, 2024 16:58

maskarb mentioned this pull request Mar 20, 2024

[COST-4457] select for update enabled tags - OCP #4791

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[COST-4750] OCP Scrub disabled tags in trino. #4961

[COST-4750] OCP Scrub disabled tags in trino. #4961

myersCody commented Mar 8, 2024

maskarb Mar 8, 2024

myersCody Mar 8, 2024

myersCody Mar 8, 2024

codecov bot commented Mar 8, 2024 •

edited

myersCody commented Mar 11, 2024

myersCody commented Mar 11, 2024

myersCody commented Mar 11, 2024

myersCody commented Mar 12, 2024

lcouzens commented Mar 12, 2024

lcouzens commented Mar 12, 2024

lcouzens left a comment

[COST-4750] OCP Scrub disabled tags in trino. #4961

[COST-4750] OCP Scrub disabled tags in trino. #4961

Conversation

myersCody commented Mar 8, 2024

Jira Ticket

Description

Testing

Notes

maskarb Mar 8, 2024

Choose a reason for hiding this comment

myersCody Mar 8, 2024

Choose a reason for hiding this comment

myersCody Mar 8, 2024

Choose a reason for hiding this comment

codecov bot commented Mar 8, 2024 • edited

Codecov Report

myersCody commented Mar 11, 2024

myersCody commented Mar 11, 2024

myersCody commented Mar 11, 2024

myersCody commented Mar 12, 2024

lcouzens commented Mar 12, 2024

lcouzens commented Mar 12, 2024

lcouzens left a comment

Choose a reason for hiding this comment

codecov bot commented Mar 8, 2024 •

edited