Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[COST-4750] OCP Scrub disabled tags in trino. #4961

Merged
merged 8 commits into from
Mar 12, 2024

Conversation

myersCody
Copy link
Contributor

Jira Ticket

COST-4750

Description

This change will move the scrubbing of disabled tags up into the trino flow.

Testing

Snip it of the original SQL:

WITH cte_ocp_node_label_line_item_daily AS (
    SELECT date(nli.interval_start) as usage_start,
        nli.node,
        nli.node_labels
    FROM openshift_node_labels_line_items_daily AS nli
    WHERE nli.source = '4e653efb-347b-473d-887b-b8162f697b44'
        AND nli.year = '2024'
        AND nli.month = '02'
        AND nli.interval_start >= DATE('2024-02-01')
        AND nli.interval_start < DATE('2024-03-01')
    GROUP BY date(nli.interval_start),
        nli.node,
        nli.node_labels
),
cte_ocp_namespace_label_line_item_daily AS (
    SELECT date(nli.interval_start) as usage_start,
        nli.namespace,
        nli.namespace_labels
    FROM openshift_namespace_labels_line_items_daily AS nli
    WHERE nli.source = '4e653efb-347b-473d-887b-b8162f697b44'
        AND nli.year = '2024'
        AND nli.month = '02'
        AND nli.interval_start >= DATE('2024-02-01')
        AND nli.interval_start < DATE('2024-03-01')
    GROUP BY date(nli.interval_start),
        nli.namespace,
        nli.namespace_labels
)
SELECT
    count(pod),
    date(interval_start),
    li.namespace,
    li.node,
    sum(pod_request_cpu_core_seconds) / 3600 as pod_req_sum,
    map_concat(
            cast(json_parse(coalesce(nli.node_labels, '{}')) as map(varchar, varchar)),
            cast(json_parse(coalesce(nsli.namespace_labels, '{}')) as map(varchar, varchar)),
            cast(json_parse(li.pod_labels) as map(varchar, varchar))
        ) as pod_labels
FROM openshift_pod_usage_line_items_daily as li
LEFT JOIN cte_ocp_node_label_line_item_daily as nli
        ON nli.node = li.node
            AND nli.usage_start = date(li.interval_start)
LEFT JOIN cte_ocp_namespace_label_line_item_daily as nsli
        ON nsli.namespace = li.namespace
            AND nsli.usage_start = date(li.interval_start)
WHERE source = '4e653efb-347b-473d-887b-b8162f697b44'
        AND year = '2024'
        AND month = '02'
        AND interval_start >= DATE('2024-02-01')
        AND interval_start < DATE('2024-03-01')
GROUP BY date(li.interval_start), li.namespace, li.node, li.source, 6
ORDER BY date(li.interval_start);

Snip it of the new SQL:

with cte_pg_enabled_keys as (
    select array_agg(key order by key) as keys
      from postgres.org1234567.reporting_enabledtagkeys
     where enabled = true
     and provider_type = 'OCP'
),
cte_ocp_namespace_label_line_item_daily AS (
SELECT DISTINCT
    date(nli.interval_start) as usage_start,
    nli.namespace,
    cast(
        map_filter(
            cast(json_parse(nli.namespace_labels) as map(varchar, varchar)),
            (k,v) -> contains(pek.keys, k)
        ) as json
    ) as namespace_labels
FROM hive.org1234567.openshift_namespace_labels_line_items_daily AS nli
CROSS JOIN cte_pg_enabled_keys AS pek
WHERE
    nli.source = '4e653efb-347b-473d-887b-b8162f697b44'
    AND nli.year = '2024'
    AND nli.month = '02'
    AND nli.interval_start >= DATE('2024-02-01')
    AND nli.interval_start < DATE('2024-03-01')
GROUP BY date(nli.interval_start),
    nli.namespace,
    3 -- needs to match the map_filter
),
cte_ocp_node_label_line_item_daily AS (
SELECT
    date(nli.interval_start) as usage_start,
    nli.node,
    cast(
        map_filter(
            cast(json_parse(nli.node_labels) as map(varchar, varchar)),
            (k,v) -> contains(pek.keys, k)
        ) as json
    ) as node_labels
FROM
    hive.org1234567.openshift_node_labels_line_items_daily AS nli
CROSS JOIN cte_pg_enabled_keys AS pek
WHERE
    nli.source = '4e653efb-347b-473d-887b-b8162f697b44'
    AND nli.year = '2024'
    AND nli.month = '02'
    AND nli.interval_start >= DATE('2024-02-01')
    AND nli.interval_start < DATE('2024-03-01')
GROUP BY
    date(nli.interval_start),
    nli.node,
    3 -- needs to match the map filter
)
SELECT
    count(pod),
    date(interval_start),
    li.namespace,
    li.node,
    sum(pod_request_cpu_core_seconds) / 3600 as pod_req_sum,
    map_concat(
            cast(coalesce(nli.node_labels, CAST('{}' AS JSON)) as map(varchar, varchar)),
            cast(coalesce(nsli.namespace_labels, CAST('{}' AS JSON)) as map(varchar, varchar)),
            map_filter(
                cast(json_parse(li.pod_labels) AS MAP(VARCHAR, VARCHAR)),
                (k, v) -> CONTAINS(pek.keys, k)
            )
        ) as pod_labels
FROM openshift_pod_usage_line_items_daily as li
CROSS JOIN cte_pg_enabled_keys AS pek
LEFT JOIN cte_ocp_node_label_line_item_daily as nli
        ON nli.node = li.node
            AND nli.usage_start = date(li.interval_start)
            AND nli.node_labels != CAST('{}' AS JSON)
LEFT JOIN cte_ocp_namespace_label_line_item_daily as nsli
        ON nsli.namespace = li.namespace
            AND nsli.usage_start = date(li.interval_start)
            AND nsli.namespace_labels != CAST('{}' AS JSON)
WHERE source = '4e653efb-347b-473d-887b-b8162f697b44'
    AND year = '2024'
    AND month = '02'
    AND interval_start >= DATE('2024-02-01')
    AND interval_start < DATE('2024-03-01')
GROUP BY date(li.interval_start), li.namespace, li.node, li.source, 6
ORDER BY date(li.interval_start);

You can compare these.

Another way to test is to just set all OCP tags as disabled, and refresh. You should now see no pod_labels in the daily summary table in postgresql.

Notes

...

@myersCody myersCody added the ocp-smoke-tests pr_check will build the image and run ocp + ocp on [clouds] smoke tests label Mar 8, 2024
@myersCody myersCody marked this pull request as ready for review March 8, 2024 21:42
@myersCody myersCody requested review from a team as code owners March 8, 2024 21:42
Comment on lines 400 to 410
# remove disabled tag keys
sql_params = {
"start_date": self.first_start_date,
"end_date": self.last_end_date,
"report_period_ids": report_period_ids,
"schema": self.schema,
}
sql = pkgutil.get_data("api.report.test.util.mock_trino_sql", "ocp/remove_disabled_tags.sql")
sql = sql.decode("utf-8")
table_name = accessor._table_map["line_item_daily_summary"]
accessor._prepare_and_execute_raw_sql_query(table_name, sql, sql_params)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the purpose of this change. Why not just leave this functionality where it is and just update the names and the docstring to indicate that it is used for unit testing? The purpose of the accessor is to run sql against the db, so we should leave that functionality there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just sent a message to the team chat about this 😂 The reason for the change is that I would like to disconnect these files from our data processing flow. I want to make clear that the only purpose this file serves now is for unit testing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am open to suggestions though, if we want to keep it in the accessor that is fine. I just want a cleaner way of understanding which sql files are being used for unit-testing purposes only. But that could just mean moving them to a separate directory.

Copy link

codecov bot commented Mar 8, 2024

Codecov Report

Merging #4961 (453185c) into main (a1efbbe) will increase coverage by 0.0%.
The diff coverage is 100.0%.

Additional details and impacted files
@@          Coverage Diff          @@
##            main   #4961   +/-   ##
=====================================
  Coverage   94.0%   94.0%           
=====================================
  Files        375     375           
  Lines      31116   31117    +1     
  Branches    3697    3697           
=====================================
+ Hits       29264   29265    +1     
  Misses      1181    1181           
  Partials     671     671           

@myersCody
Copy link
Contributor Author

/retest

lcouzens
lcouzens previously approved these changes Mar 11, 2024
@myersCody
Copy link
Contributor Author

/retest

3 similar comments
@myersCody
Copy link
Contributor Author

/retest

@myersCody
Copy link
Contributor Author

/retest

@lcouzens
Copy link
Contributor

/retest

@lcouzens
Copy link
Contributor

Tests are failing on bad sql.

@lcouzens lcouzens dismissed their stale review March 12, 2024 09:25

needs fixing

@myersCody myersCody marked this pull request as draft March 12, 2024 13:17
@myersCody myersCody marked this pull request as ready for review March 12, 2024 15:34
Copy link
Contributor

@lcouzens lcouzens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job Cody

@myersCody myersCody enabled auto-merge (squash) March 12, 2024 15:35
@myersCody myersCody merged commit d939072 into main Mar 12, 2024
11 checks passed
@myersCody myersCody deleted the COST-4750-OCP-scrub-disabled-tags branch March 12, 2024 16:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ocp-smoke-tests pr_check will build the image and run ocp + ocp on [clouds] smoke tests smokes-required
Projects
None yet
3 participants