Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[COST-4367] Update HCS AWS SQL #4812

Merged
merged 8 commits into from
Dec 14, 2023
Merged

[COST-4367] Update HCS AWS SQL #4812

merged 8 commits into from
Dec 14, 2023

Conversation

cgoodfred
Copy link
Contributor

Jira Ticket

COST-4367

Description

This change will update the AWS HCS querying to use a cost estimate for CCSP data based on vCPU count.

Testing

  1. Checkout Branch
  2. Restart Koku
  3. Create and Load a yaml that matches some CCSP data.
  4. Check the minio bucket for the HCS files and make sure there are records present for the CCSP criteria
(
      lineitem_legalentity LIKE '%Amazon Web Services%'
      AND lineitem_lineitemdescription LIKE '%Red Hat%'
    )
    OR (
      lineitem_legalentity LIKE '%Amazon Web Services%'
      AND lineitem_lineitemdescription LIKE '%RHEL%'
    )
    OR (
      lineitem_legalentity LIKE '%AWS%'
      AND lineitem_lineitemdescription LIKE '%Red Hat%'
    )
    OR (
      lineitem_legalentity LIKE '%AWS%'
      AND lineitem_lineitemdescription LIKE '%RHEL%'
    )

Notes

...

@cgoodfred cgoodfred added the aws-smoke-tests pr_check will build the image and run aws + ocp on aws smoke tests label Dec 3, 2023
@cgoodfred cgoodfred self-assigned this Dec 3, 2023
Copy link

codecov bot commented Dec 3, 2023

Codecov Report

Merging #4812 (95d8068) into main (a7902d3) will decrease coverage by 0.0%.
The diff coverage is n/a.

Additional details and impacted files
@@           Coverage Diff           @@
##            main   #4812     +/-   ##
=======================================
- Coverage   94.0%   94.0%   -0.0%     
=======================================
  Files        364     364             
  Lines      30206   30206             
  Branches    3599    3599             
=======================================
- Hits       28382   28379      -3     
- Misses      1164    1165      +1     
- Partials     660     662      +2     

@cgoodfred cgoodfred marked this pull request as ready for review December 7, 2023 19:25
@cgoodfred cgoodfred requested review from a team as code owners December 7, 2023 19:25
@myersCody
Copy link
Contributor

From my understanding one of the main benefits of using UNION is that it removes duplicate rows between query 1 and query 2 from the final result.

I am trying to walk through the scenario of what happens if the same row is returned for query 1 and query 2 in my head. In query 2, you may be applying the expected rates to the lineitem_unblendedcost column. Therefore, the same row could have a different value for only the lineitem_unblendedcost column in the second query's set.

First, I wanted to get your opinion on if this scenario could even occur with the way query one and query two are currently structured? If the scenario can occur, then will the union still remove the duplicate? How does postgresql decide which value of the lineitem_unblendedcost column to keep for the final results?

@cgoodfred
Copy link
Contributor Author

From my understanding one of the main benefits of using UNION is that it removes duplicate rows between query 1 and query 2 from the final result.

I am trying to walk through the scenario of what happens if the same row is returned for query 1 and query 2 in my head. In query 2, you may be applying the expected rates to the lineitem_unblendedcost column. Therefore, the same row could have a different value for only the lineitem_unblendedcost column in the second query's set.

First, I wanted to get your opinion on if this scenario could even occur with the way query one and query two are currently structured? If the scenario can occur, then will the union still remove the duplicate? How does postgresql decide which value of the lineitem_unblendedcost column to keep for the final results?

My understanding of the way AWS reports are structured from all the real reports I've looked at, is that these would be two separate instances that can't overlap. The lineitem_lineitemdescription for EC2 instances is where we might see something like RHEL and the product_productname will always be Amazon Elastic Compute Cloud for the underlying infrastructure.

On the flip side, the Marketplace offerings is where the product name may contain Red Hat or RHEL and the lineitem_lineitemdescription on those entries is something like AWS Marketplace hourly software usage|us-east-1|m7i.2xlarge.

So I do not believe there is a valid case of both the product name and description containing Red Hat or RHEL.

As for the UNION question, these would also not be considered distinct rows if a record that we overwrite the cost on was found to match both filters since the unblendedcost would be unique between the two they would both be identified.

@myersCody
Copy link
Contributor

From my understanding one of the main benefits of using UNION is that it removes duplicate rows between query 1 and query 2 from the final result.
I am trying to walk through the scenario of what happens if the same row is returned for query 1 and query 2 in my head. In query 2, you may be applying the expected rates to the lineitem_unblendedcost column. Therefore, the same row could have a different value for only the lineitem_unblendedcost column in the second query's set.
First, I wanted to get your opinion on if this scenario could even occur with the way query one and query two are currently structured? If the scenario can occur, then will the union still remove the duplicate? How does postgresql decide which value of the lineitem_unblendedcost column to keep for the final results?

My understanding of the way AWS reports are structured from all the real reports I've looked at, is that these would be two separate instances that can't overlap. The lineitem_lineitemdescription for EC2 instances is where we might see something like RHEL and the product_productname will always be Amazon Elastic Compute Cloud for the underlying infrastructure.

On the flip side, the Marketplace offerings is where the product name may contain Red Hat or RHEL and the lineitem_lineitemdescription on those entries is something like AWS Marketplace hourly software usage|us-east-1|m7i.2xlarge.

So I do not believe there is a valid case of both the product name and description containing Red Hat or RHEL.

As for the UNION question, these would also not be considered distinct rows if a record that we overwrite the cost on was found to match both filters since the unblendedcost would be unique between the two they would both be identified.

Gotcha, I just wanted to know if we put the thought into if the same row could be returned between the two query sets. If we did the due diligence and don't believe this is a real scenario, then I think this good to go! Thanks for taking the time to dig through that with me.

myersCody
myersCody previously approved these changes Dec 8, 2023
Copy link

Quality Gate Passed Quality Gate passed

Kudos, no new issues were introduced!

0 New issues
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

@cgoodfred cgoodfred added hot-fix-smoke-tests pr_check label to run minimal smoke tests for fast moving bug-fix and removed aws-smoke-tests pr_check will build the image and run aws + ocp on aws smoke tests labels Dec 14, 2023
@esebesto
Copy link
Contributor

Tested with full smokes locally using IQE MR with updated HCS tests -> clean pass (with the exception of 2 expected failures)

@cgoodfred
Copy link
Contributor Author

Running hot-fix-smokes as Eva has tested the branch against her smoke test changes for HCS smokes and they passed.

@cgoodfred
Copy link
Contributor Author

/retest

@cgoodfred cgoodfred merged commit bdd582d into main Dec 14, 2023
10 checks passed
@cgoodfred cgoodfred deleted the COST-4367-query-ccsp branch December 14, 2023 18:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hot-fix-smoke-tests pr_check label to run minimal smoke tests for fast moving bug-fix smokes-required
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants