[Metricbeat] [GCP] Fix compute metadata #36338

gpop63 · 2023-08-16T13:19:02Z

Proposed commit message

A few things I noticed while debugging:

NewMetadataService is invoked based on the period set in the config e.g. 1m, therefore the computeInstances map will clear by itself, no need to remake it again in instance. If the period is 1m, are there any risks of having stale metadata?

beats/x-pack/metricbeat/module/gcp/metrics/compute/metadata.go

Lines 25 to 32 in 138e623

    
           func NewMetadataService(projectID, zone string, region string, regions []string, opt ...option.ClientOption) (gcp.MetadataService, error) { 
        
           	return &metadataCollector{ 
        
           		projectID:        projectID, 
        
           		zone:             zone, 
        
           		region:           region, 
        
           		regions:          regions, 
        
           		opt:              opt, 
        
           		computeInstances: make(map[uint64]*computepb.Instance),

ID is also calling Metadata, and in timeSeriesGrouped both are called

beats/x-pack/metricbeat/module/gcp/metrics/compute/identity.go

Lines 16 to 17 in 138e623

    
           func (s *metadataCollector) ID(ctx context.Context, in *gcp.MetadataCollectorInputData) (string, error) { 
        
           	metadata, err := s.Metadata(ctx, in.TimeSeries)

beats/x-pack/metricbeat/module/gcp/metrics/timeseries.go

Lines 33 to 43 in 138e623

    
           id, err := metadataService.ID(ctx, sdCollectorInputData) 
        
           if err != nil { 
        
           	m.Logger().Errorf("error trying to retrieve ID from metric event '%v'", err) 
        
           	continue 
        
           } 
        
           metadataCollectorData, err := metadataService.Metadata(ctx, sdCollectorInputData.TimeSeries) 
        
           if err != nil { 
        
           	m.Logger().Error("error trying to retrieve labels from metric event") 
        
           	continue 
        
           }

ListTimeSeries call in Metric gets instances from other projects, not only from the project_id specified in the config. The AggregatedList call will not be able to find those instances.

beats/x-pack/metricbeat/module/gcp/metrics/metrics_requester.go

Lines 39 to 53 in 138e623

    
           func (r *metricsRequester) Metric(ctx context.Context, serviceName, metricType string, timeInterval *monitoringpb.TimeInterval, aligner string) timeSeriesWithAligner { 
        
           	timeSeries := make([]*monitoringpb.TimeSeries, 0) 
        
           	req := &monitoringpb.ListTimeSeriesRequest{ 
        
           		Name:     "projects/" + r.config.ProjectID, 
        
           		Interval: timeInterval, 
        
           		View:     monitoringpb.ListTimeSeriesRequest_FULL, 
        
           		Filter:   r.getFilterForMetric(serviceName, metricType), 
        
           		Aggregation: &monitoringpb.Aggregation{ 
        
           			PerSeriesAligner: gcp.AlignersMapToGCP[aligner], 
        
           			AlignmentPeriod:  r.config.period, 
        
           		}, 
        
           	} 
        
           	it := r.client.ListTimeSeries(ctx, req)

I think the problem is that there are many Metadata calls and clearing the computeInstances map would trigger the compute AggregatedList call a lot of times, and this would go on forever since that API call can't get metadata for all instances - there are also Kubernetes pods/nodes (we can't get metadata for them) that send compute metrics, not only VM instances.

By not remaking the computeInstances map, we only make 1 AggregatedList call per period specified in config (1m).

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have made corresponding change to the default configuration files
I have added tests that prove my fix is effective or that my feature works
I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

[ ]

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

mergify · 2023-08-16T13:19:43Z

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @gpop63? 🙏.
For such, you'll need to label your PR with:

The upcoming major version of the Elastic Stack
The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

elasticmachine · 2023-08-16T14:11:31Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Start Time: 2023-08-28T10:39:06.653+0000
Duration: 53 min 11 sec

Test stats 🧪

Test	Results
Failed	0
Passed	1533
Skipped	96
Total	1629

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.
/package : Generate the packages and run the E2E tests.
/beats-tester : Run the installation tests with beats-tester.
run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

x-pack/metricbeat/module/gcp/timeseries_metadata_collector.go

This reverts commit 349bf6b.

This reverts commit 89f6ab5.

…adata

tdancheva · 2023-08-29T15:54:37Z

@kaiyan-sheng I unskipped the test in a draft PR and the test fails cause it runs on 8.8.2. Do you think it is a good idea to backport this so that we can validate it now instead of waiting for the next release?

kaiyan-sheng · 2023-08-29T16:46:00Z

@tdancheva good point! I would consider this as a bug fix and let's backport it.

* don't remake map * only add instances from project_id * fix ecs cloud region and AZ bug * Revert "fix ecs cloud region and AZ bug" This reverts commit 349bf6b. * Revert "only add instances from project_id" This reverts commit 89f6ab5. * add changelog entry (cherry picked from commit 63a147a)

* [Metricbeat] [GCP] Fix compute metadata (#36338) * don't remake map * only add instances from project_id * fix ecs cloud region and AZ bug * Revert "fix ecs cloud region and AZ bug" This reverts commit 349bf6b. * Revert "only add instances from project_id" This reverts commit 89f6ab5. * add changelog entry (cherry picked from commit 63a147a) * Update CHANGELOG.next.asciidoc --------- Co-authored-by: Gabriel Pop <94497545+gpop63@users.noreply.github.com> Co-authored-by: kaiyan-sheng <kaiyan.sheng@elastic.co>

* don't remake map * only add instances from project_id * fix ecs cloud region and AZ bug * Revert "fix ecs cloud region and AZ bug" This reverts commit 349bf6b. * Revert "only add instances from project_id" This reverts commit 89f6ab5. * add changelog entry

gpop63 added 3 commits August 15, 2023 13:55

don't remake map

7a93b01

only add instances from project_id

89f6ab5

fix ecs cloud region and AZ bug

349bf6b

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Aug 16, 2023

gpop63 self-assigned this Aug 16, 2023

gpop63 added the Team:Cloud-Monitoring Label for the Cloud Monitoring team label Aug 16, 2023

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Aug 16, 2023

gpop63 marked this pull request as ready for review August 16, 2023 15:49

gpop63 requested a review from a team as a code owner August 16, 2023 15:49

kaiyan-sheng reviewed Aug 17, 2023

View reviewed changes

x-pack/metricbeat/module/gcp/timeseries_metadata_collector.go Outdated Show resolved Hide resolved

kaiyan-sheng approved these changes Aug 23, 2023

View reviewed changes

gpop63 added 4 commits August 28, 2023 13:36

Revert "fix ecs cloud region and AZ bug"

535886c

This reverts commit 349bf6b.

Revert "only add instances from project_id"

1e2f06b

This reverts commit 89f6ab5.

add changelog entry

a37c09d

Merge remote-tracking branch 'upstream/main' into fix_gcp-compute_met…

dd86d61

…adata

tdancheva self-requested a review August 28, 2023 15:47

tdancheva approved these changes Aug 28, 2023

View reviewed changes

gpop63 merged commit 63a147a into elastic:main Aug 28, 2023
21 checks passed

tdancheva mentioned this pull request Aug 28, 2023

Enable GCP Test elastic/elastic-package#1413

Merged

kaiyan-sheng added the backport-v8.9.0 Automated backport with mergify label Sep 6, 2023

mergify bot mentioned this pull request Sep 6, 2023

[8.9](backport #36338) [Metricbeat] [GCP] Fix compute metadata #36522

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Metricbeat] [GCP] Fix compute metadata #36338

[Metricbeat] [GCP] Fix compute metadata #36338

gpop63 commented Aug 16, 2023

mergify bot commented Aug 16, 2023

elasticmachine commented Aug 16, 2023 •

edited by jenkins-beats-ci bot

Build stats

Test stats 🧪

tdancheva commented Aug 29, 2023

kaiyan-sheng commented Aug 29, 2023

	func NewMetadataService(projectID, zone string, region string, regions []string, opt ...option.ClientOption) (gcp.MetadataService, error) {
	return &metadataCollector{
	projectID: projectID,
	zone: zone,
	region: region,
	regions: regions,
	opt: opt,
	computeInstances: make(map[uint64]*computepb.Instance),

	func (s metadataCollector) ID(ctx context.Context, in gcp.MetadataCollectorInputData) (string, error) {
	metadata, err := s.Metadata(ctx, in.TimeSeries)

	id, err := metadataService.ID(ctx, sdCollectorInputData)
	if err != nil {
	m.Logger().Errorf("error trying to retrieve ID from metric event '%v'", err)
	continue
	}

	metadataCollectorData, err := metadataService.Metadata(ctx, sdCollectorInputData.TimeSeries)
	if err != nil {
	m.Logger().Error("error trying to retrieve labels from metric event")
	continue
	}

	func (r metricsRequester) Metric(ctx context.Context, serviceName, metricType string, timeInterval monitoringpb.TimeInterval, aligner string) timeSeriesWithAligner {
	timeSeries := make([]*monitoringpb.TimeSeries, 0)

	req := &monitoringpb.ListTimeSeriesRequest{
	Name: "projects/" + r.config.ProjectID,
	Interval: timeInterval,
	View: monitoringpb.ListTimeSeriesRequest_FULL,
	Filter: r.getFilterForMetric(serviceName, metricType),
	Aggregation: &monitoringpb.Aggregation{
	PerSeriesAligner: gcp.AlignersMapToGCP[aligner],
	AlignmentPeriod: r.config.period,
	},
	}

	it := r.client.ListTimeSeries(ctx, req)

[Metricbeat] [GCP] Fix compute metadata #36338

[Metricbeat] [GCP] Fix compute metadata #36338

Conversation

gpop63 commented Aug 16, 2023

Proposed commit message

Checklist

Author's Checklist

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

mergify bot commented Aug 16, 2023

elasticmachine commented Aug 16, 2023 • edited by jenkins-beats-ci bot

💚 Build Succeeded

Build stats

Test stats 🧪

💚 Flaky test report

🤖 GitHub comments

tdancheva commented Aug 29, 2023

kaiyan-sheng commented Aug 29, 2023

elasticmachine commented Aug 16, 2023 •

edited by jenkins-beats-ci bot