Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AWS Billing] Add fingerprint dimension #6749

Merged
merged 3 commits into from
Jul 3, 2023

Conversation

constanca-m
Copy link
Contributor

What does this PR do?

Add dimension fingerprint for group_by field.

Currently, the dimensions defined for AWS Billing are not enough as the documents keep being overwritten. The fingerprint field solves this problem.

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.

Related issues

Relates to #6293.

Signed-off-by: constanca-m <constanca.manteigas@elastic.co>
Signed-off-by: constanca-m <constanca.manteigas@elastic.co>
@@ -85,6 +85,9 @@
object_type: keyword
object_type_mapping_type: "*"
description: Cost explorer group by key values.
- name: group_by.fingerprint
type: keyword
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we actually need to index the fingerprint field?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because we need to set the dimension as true.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was meaning to keep the mapping, but add index: false and doc_values: false to save disk space.

micro optimization here for sure, but we want to ensure we are applying best practices (and micro optimizations do add up).

Copy link
Contributor Author

@constanca-m constanca-m Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is interesting, I was not aware of those options:

The index option controls whether field values are indexed. It accepts true or false and defaults to true.

If the fingerprint is not indexed, can it still be used to create the _tsid? What if there is overlapping in documents? If we don't have the fingerprint value, then we will have to search a document based on all fields used for the fingerprint. Is that a good idea?

doc_values: false

The same problem, what if we want to search a document based on the fingerprint?

@tommyers-elastic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern is that for TSDB testing we need all dimensions to be searchable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if there is overlapping in documents?

How do you mean?

If the fingerprint is not indexed, can it still be used to create the _tsid?

I would guess so, but let's find out.

what if we want to search a document based on the fingerprint?

Why would we ever want to do that? we would have to know the content of the document beforehand in order to generate the hash to search.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we ever want to do that?

The fingerprint would be a dimension, which means that if documents overlap we will lose data. In that case we can find those documents based on the set of dimensions, including the fingerprint. That is why I think it is useful to be able to search for it.

@elasticmachine
Copy link

elasticmachine commented Jun 30, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-07-03T09:23:16.808+0000

  • Duration: 53 min 2 sec

Test stats 🧪

Test Results
Failed 0
Passed 194
Skipped 4
Total 198

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

@constanca-m
Copy link
Contributor Author

/test

Copy link
Contributor

@tommyers-elastic tommyers-elastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approving with the caveat that we should look for storage optimizations on fingerprint fields, since they are not designed to be searchable by the integration user.

@elasticmachine
Copy link

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 100.0% (15/15) 💚
Files 93.75% (15/16)
Classes 93.75% (15/16)
Methods 85.714% (240/280)
Lines 85.925% (7387/8597)
Conditionals 100.0% (0/0) 💚

@constanca-m constanca-m merged commit 557ecfd into elastic:main Jul 3, 2023
1 check passed
@constanca-m constanca-m deleted the aws-billing-dimensions branch July 3, 2023 10:17
@elasticmachine
Copy link

Package aws - 1.45.9 containing this change is available at https://epr.elastic.co/search?package=aws

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants