Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metrics: refactor volume collector #2665

Merged
merged 2 commits into from
Mar 6, 2024
Merged

Conversation

derekbit
Copy link
Member

@derekbit derekbit commented Mar 4, 2024

Longhorn 8098

Which issue(s) this PR fixes:

Issue longhorn/longhorn#8098

What this PR does / why we need it:

Special notes for your reviewer:

Additional documentation or context

Copy link
Member

@innobead innobead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which part caused the nil exception?

note: The current code will not cause the Longhorn manager pod to crash, as the panic will be intendedly handled.

@innobead
Copy link
Member

innobead commented Mar 5, 2024

@c3y1huang Why we did recover part in each collection method? It looks intendely.

@derekbit
Copy link
Member Author

derekbit commented Mar 5, 2024

Which part caused the nil exception?

				engineClientProxy, err = vc.getEngineClientProxy(e)
				if err == nil {
					defer engineClientProxy.Close()

					metrics, err = engineClientProxy.MetricsGet(e)
					if err != nil {
						vc.logger.WithError(err).Warnf("Failed to get metrics from volume %v from engine %v", e.Spec.VolumeName, e.Name)
					}
				} else {
					vc.logger.WithError(err).Warnf("Failed to get engine proxy of %v for volume %v", e.Name, v.Name)
				}

defer engineClientProxy.Close() causes nil pointer dereference.

note: The current code will not cause the Longhorn manager pod to crash, as the panic will be intendedly handled.

@innobead
Copy link
Member

innobead commented Mar 5, 2024

Which part caused the nil exception?

				engineClientProxy, err = vc.getEngineClientProxy(e)
				if err == nil {
					defer engineClientProxy.Close()

But from the log below. getEngineClientProxy returned an error and nil engineClientProxy. Why defer engineClientProxy.Close() will be invoked?

2024-03-01T10:28:17.583228766+01:00 time="2024-03-01T09:28:17Z" level=warning msg="Failed to get engine proxy of pvc-4c4da4f6-6585-4afe-8c2c-23b6624573ed-e-0 for volume pvc-4c4da4f6-6585-4afe-8c2c-23b6624573ed" func="metrics_collector.(*VolumeCollector).Collect" file="volume_collector.go:192" collector=volume error="failed to get binary client for engine pvc-4c4da4f6-6585-4afe-8c2c-23b6624573ed-e-0: cannot get client for engine pvc-4c4da4f6-6585-4afe-8c2c-23b6624573ed-e-0: engine is not running" node=core8
2024-03-01T10:28:17.583267365+01:00 time="2024-03-01T09:28:17Z" level=warning msg="Panic during collecting metrics" func="metrics_collector.(*VolumeCollector).Collect.func1" file="volume_collector.go:164" collector=volume error="runtime error: invalid memory address or nil pointer dereference" node=core8

/metrics_collector/volume_collector.go#L212-L219

func (vc *VolumeCollector) getEngineClientProxy(engine *longhorn.Engine) (c engineapi.EngineClientProxy, err error) {
	engineCliClient, err := controller.GetBinaryClientForEngine(engine, &engineapi.EngineCollection{}, engine.Status.CurrentImage)
	if err != nil {
		return nil, errors.Wrapf(err, "failed to get binary client for engine %v", engine.Name)
	}

	return engineapi.GetCompatibleClient(engine, engineCliClient, vc.ds, nil, vc.proxyConnCounter)
}

@derekbit
Copy link
Member Author

derekbit commented Mar 5, 2024

Which part caused the nil exception?

				engineClientProxy, err = vc.getEngineClientProxy(e)
				if err == nil {
					defer engineClientProxy.Close()

But from the log below. getEngineClientProxy returned an error and nil engineClientProxy. Why defer engineClientProxy.Close() will be invoked?

2024-03-01T10:28:17.583228766+01:00 time="2024-03-01T09:28:17Z" level=warning msg="Failed to get engine proxy of pvc-4c4da4f6-6585-4afe-8c2c-23b6624573ed-e-0 for volume pvc-4c4da4f6-6585-4afe-8c2c-23b6624573ed" func="metrics_collector.(*VolumeCollector).Collect" file="volume_collector.go:192" collector=volume error="failed to get binary client for engine pvc-4c4da4f6-6585-4afe-8c2c-23b6624573ed-e-0: cannot get client for engine pvc-4c4da4f6-6585-4afe-8c2c-23b6624573ed-e-0: engine is not running" node=core8
2024-03-01T10:28:17.583267365+01:00 time="2024-03-01T09:28:17Z" level=warning msg="Panic during collecting metrics" func="metrics_collector.(*VolumeCollector).Collect.func1" file="volume_collector.go:164" collector=volume error="runtime error: invalid memory address or nil pointer dereference" node=core8

/metrics_collector/volume_collector.go#L212-L219

func (vc *VolumeCollector) getEngineClientProxy(engine *longhorn.Engine) (c engineapi.EngineClientProxy, err error) {
	engineCliClient, err := controller.GetBinaryClientForEngine(engine, &engineapi.EngineCollection{}, engine.Status.CurrentImage)
	if err != nil {
		return nil, errors.Wrapf(err, "failed to get binary client for engine %v", engine.Name)
	}

	return engineapi.GetCompatibleClient(engine, engineCliClient, vc.ds, nil, vc.proxyConnCounter)
}

You're right.
Let's close the PR. BTW, I don't see any issue that can cause the exception in the remaining part.
Do you have idea?

@innobead
Copy link
Member

innobead commented Mar 5, 2024

It looks normal to me as well, but it could be a nested error, possibly due to nil except for a property of any nested property of an object.

@c3y1huang any ideas?

@derekbit
Copy link
Member Author

derekbit commented Mar 5, 2024

It looks normal to me as well, but it could be a nested error, possibly due to nil except for a property of any nested property of an object.

The refactored codes are better and clear. Do you think we can merge it?

@innobead
Copy link
Member

innobead commented Mar 5, 2024

Sounds good to me. Refactoring if having chances always. Of course, need to be careful of any regression.

innobead
innobead previously approved these changes Mar 5, 2024
Copy link
Member

@innobead innobead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Please update the commit message and title of this PR, as the change here doesn't really fix the nil exception issue.

@derekbit derekbit changed the title metrics: fix nil pointer dereference metrics: refactor volume collector Mar 5, 2024
@derekbit
Copy link
Member Author

derekbit commented Mar 5, 2024

LGTM.

Please update the commit message and title of this PR, as the change here doesn't really fix the nil exception issue.

Done.

@innobead
Copy link
Member

innobead commented Mar 5, 2024

@derekbit the commit message is not changed yet.

Longhorn 8098

Signed-off-by: Derek Su <derek.su@suse.com>
@innobead innobead merged commit 5d12108 into longhorn:master Mar 6, 2024
5 checks passed
@derekbit derekbit deleted the issue-8098 branch April 20, 2024 13:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants