Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StackOverflowError #87

Closed
skundrik opened this issue May 4, 2021 · 17 comments · Fixed by #99 or #107
Closed

StackOverflowError #87

skundrik opened this issue May 4, 2021 · 17 comments · Fixed by #99 or #107
Labels
bug Something isn't working

Comments

@skundrik
Copy link

skundrik commented May 4, 2021

Version report

Jenkins versions report:

Jenkins: 2.263.4
OS: Linux - 2.6.32-754.36.1.el6.x86_64
---
opentelemetry:0.9

Reproduction steps

  • Installed opentelemetry plugin 0.9
  • Deleted a job build
  • Observed the following in the logs
2021-05-03 16:27:52.627+0000 [id=128871]        WARNING h.model.listeners.RunListener#report: RunListener failed
java.lang.StackOverflowError
        at hudson.model.AbstractItem.getFullName(AbstractItem.java:477)
        at hudson.model.AbstractItem.getFullName(AbstractItem.java:477)
        at hudson.model.AbstractItem.getFullName(AbstractItem.java:477)
        at io.jenkins.plugins.opentelemetry.job.OtelTraceService$RunIdentifier.fromRun(OtelTraceService.java:306)
        at io.jenkins.plugins.opentelemetry.job.OtelTraceService.getSpan(OtelTraceService.java:70)
        at io.jenkins.plugins.opentelemetry.job.OtelTraceService.setupContext(OtelTraceService.java:219)
        at io.jenkins.plugins.opentelemetry.job.opentelemetry.OtelContextAwareAbstractRunListener.onDeleted(OtelContextAwareAbstract
RunListener.java:99)
        at io.jenkins.plugins.opentelemetry.job.MonitoringRunListener._onDeleted(MonitoringRunListener.java:243)
        at io.jenkins.plugins.opentelemetry.job.opentelemetry.OtelContextAwareAbstractRunListener.onDeleted(OtelContextAwareAbstract
RunListener.java:100)
        at io.jenkins.plugins.opentelemetry.job.MonitoringRunListener._onDeleted(MonitoringRunListener.java:243)
        at io.jenkins.plugins.opentelemetry.job.opentelemetry.OtelContextAwareAbstractRunListener.onDeleted(OtelContextAwareAbstract
RunListener.java:100)
        at io.jenkins.plugins.opentelemetry.job.MonitoringRunListener._onDeleted(MonitoringRunListener.java:243)
        at io.jenkins.plugins.opentelemetry.job.opentelemetry.OtelContextAwareAbstractRunListener.onDeleted(OtelContextAwareAbstract
RunListener.java:100)
...
@skundrik skundrik added the bug Something isn't working label May 4, 2021
@cyrille-leclerc
Copy link
Contributor

That very unexpected, it seems to be a problem of infinite recursion on AbstractItem.getParent(). Like if getParent() was returning this .

Do you know what is the impacted job type? Do you use special plugins that could create special folders... ?

https://github.com/jenkinsci/jenkins/blob/jenkins-2.263.4/core/src/main/java/hudson/model/AbstractItem.java#L477

    @Exported
    public final String getFullName() {
        String n = getParent().getFullName();
        if(n.length()==0)   return getName();
        else                return n+'/'+getName();
    }

@skundrik
Copy link
Author

skundrik commented May 5, 2021

I'll try to double check the job type, not sure how easy it'll be since there doesn't seem to be much indication which one failed.

With regards to folder plugins we have a standard

cloudbees-folder:6.15

but also have

gitlab-branch-source:1.5.7
workflow-multibranch:2.23

which I guess creates special folders

@cyrille-leclerc
Copy link
Contributor

Thanks. We will have to look at the existing log messages to see if we can better understand. Otherwise, we may have to update the code to capture exception details.

@skundrik do you agree with the idea that it seems to be a problem with the nature of the "pipeline full name"? A mismatch between the expectation of the Otel Plugin to use AbstractItem.getFullName() in a situation that would not be possible for other pieces of Jenkins code.

@cyrille-leclerc
Copy link
Contributor

FYI troubleshooting version in progress #99

OpenTelemetry Plugin Board automation moved this from In progress to Done May 7, 2021
OpenTelemetry Plugin Board automation moved this from Done to Backlog May 7, 2021
@cyrille-leclerc
Copy link
Contributor

cyrille-leclerc commented May 7, 2021

We have created a new release to capture more details on the StackOverflowError and identify the cause.

Can you please test with https://github.com/jenkinsci/opentelemetry-plugin/releases/tag/opentelemetry-0.10-beta and search in the logs for the following log messages:

https://github.com/jenkinsci/opentelemetry-plugin/blob/opentelemetry-0.10-beta/src/main/java/io/jenkins/plugins/opentelemetry/job/OtelTraceService.java#L319

LOGGER.log(Level.WARNING, "Issue #87: StackOverflowError getting job name for " + jobName + "#" + run.getNumber());

or

LOGGER.log(Level.WARNING, "Issue #87: StackOverflowError getting job name for unknown job #" + run.getNumber());

@skundrik
Copy link
Author

skundrik commented May 7, 2021

I will install and report findings.

@skundrik
Copy link
Author

I have installed 0.10-beta and I have checked my logs but can't find the log message.

@cyrille-leclerc
Copy link
Contributor

cyrille-leclerc commented May 10, 2021

@skundrik do you see problems like the stackoverflow either? I'm wondering if we "catched" the wrong thing or if your problem "disappeared".

@skundrik
Copy link
Author

Sorry for not making it clearer, I can still see the StackOverflowError although I think they seem to be different variations like

java.lang.StackOverflowError
        at hudson.model.AbstractItem.getFullDisplayName(AbstractItem.java:484)
        at hudson.model.AbstractItem.getFullDisplayName(AbstractItem.java:484)
        at hudson.model.Run.getFullDisplayName(Run.java:825)
        at io.jenkins.plugins.opentelemetry.job.OtelTraceService.getSpan(OtelTraceService.java:79)
        at io.jenkins.plugins.opentelemetry.job.OtelTraceService.getSpan(OtelTraceService.java:70)
        at io.jenkins.plugins.opentelemetry.job.OtelTraceService.setupContext(OtelTraceService.java:226)
        at io.jenkins.plugins.opentelemetry.job.opentelemetry.OtelContextAwareAbstractRunListener.onDeleted(OtelContextAwareAbstractRunListener.java:99)
        at io.jenkins.plugins.opentelemetry.job.MonitoringRunListener._onDeleted(MonitoringRunListener.java:243)
        at io.jenkins.plugins.opentelemetry.job.opentelemetry.OtelContextAwareAbstractRunListener.onDeleted(OtelContextAwareAbstractRunListener.java:100)
        at io.jenkins.plugins.opentelemetry.job.MonitoringRunListener._onDeleted(MonitoringRunListener.java:243)
        at io.jenkins.plugins.opentelemetry.job.opentelemetry.OtelContextAwareAbstractRunListener.onDeleted(OtelContextAwareAbstract
java.lang.StackOverflowError
        at java.util.Arrays.hashCode(Arrays.java:4146)
        at java.util.Objects.hash(Objects.java:128)
        at io.jenkins.plugins.opentelemetry.job.OtelTraceService$RunIdentifier.hashCode(OtelTraceService.java:343)
        at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1646)
        at io.jenkins.plugins.opentelemetry.job.OtelTraceService.getSpan(OtelTraceService.java:76)
        at io.jenkins.plugins.opentelemetry.job.OtelTraceService.getSpan(OtelTraceService.java:70)
        at io.jenkins.plugins.opentelemetry.job.OtelTraceService.setupContext(OtelTraceService.java:226)
        at io.jenkins.plugins.opentelemetry.job.opentelemetry.OtelContextAwareAbstractRunListener.onDeleted(OtelContextAwareAbstractRunListener.java:99)
        at io.jenkins.plugins.opentelemetry.job.MonitoringRunListener._onDeleted(MonitoringRunListener.java:243)
        at io.jenkins.plugins.opentelemetry.job.opentelemetry.OtelContextAwareAbstractRunListener.onDeleted(OtelContextAwareAbstractRunListener.java:100)
        at io.jenkins.plugins.opentelemetry.job.MonitoringRunListener._onDeleted(MonitoringRunListener.java:243)
        at io.jenkins.plugins.opentelemetry.job.opentelemetry.OtelContextAwareAbstractRunListener.onDeleted(OtelContextAwareAbstract
java.lang.StackOverflowError
        at java.util.AbstractMap.toString(AbstractMap.java:547)
        at com.google.common.collect.AbstractMapBasedMultimap$AsMap.toString(AbstractMapBasedMultimap.java:1323)
        at com.google.common.collect.AbstractMultimap.toString(AbstractMultimap.java:253)
        at com.google.common.collect.ArrayListMultimap.toString(ArrayListMultimap.java:65)
        at java.lang.String.valueOf(String.java:2994)
        at java.lang.StringBuilder.append(StringBuilder.java:131)
        at io.jenkins.plugins.opentelemetry.job.OtelTraceService$RunSpans.toString(OtelTraceService.java:244)
        at java.lang.String.valueOf(String.java:2994)
        at java.lang.StringBuilder.append(StringBuilder.java:131)
        at io.jenkins.plugins.opentelemetry.job.OtelTraceService.getSpan(OtelTraceService.java:79)
        at io.jenkins.plugins.opentelemetry.job.OtelTraceService.getSpan(OtelTraceService.java:70)
        at io.jenkins.plugins.opentelemetry.job.OtelTraceService.setupContext(OtelTraceService.java:226)
        at io.jenkins.plugins.opentelemetry.job.opentelemetry.OtelContextAwareAbstractRunListener.onDeleted(OtelContextAwareAbstract
RunListener.java:99)
        at io.jenkins.plugins.opentelemetry.job.MonitoringRunListener._onDeleted(MonitoringRunListener.java:243)
        at io.jenkins.plugins.opentelemetry.job.opentelemetry.OtelContextAwareAbstractRunListener.onDeleted(OtelContextAwareAbstract

@cyrille-leclerc
Copy link
Contributor

@skundrik the StackOverflow errors on Arrays.hashCode() and AbstractMap.toString() let me think that the problem may not reside in AbstractItem.getFullDisplayname() but maybe in the params of your JVM. Did you change the JVM stack size (e.g. -Xss)?

@skundrik
Copy link
Author

No, it's a default size

@skundrik
Copy link
Author

This looks suspect to me

@Override
public void _onDeleted(Run run) {
super.onDeleted(run);
}

When MonitoringRunListener.onDeleted gets called it ends up in this parent abstract class implementation

@Override
public final void onDeleted(Run run) {
try (Scope ignored = getTraceService().setupContext(run)) {
this._onDeleted(run);
}
}
which basically calls the one above, which in turn calls this one again and so on until StackOverflowError.

MonitoringRunListener._onDeleted override seems unnecessary as it doesn't actually provide any implementation as opposed to othe _onXXXX methods.

@skundrik
Copy link
Author

skundrik commented May 10, 2021

@cyrille-leclerc I think I managed to reproduce it by deleting an old build and added to the reproduction steps in the description. Can you reproduce it as well?

@cyrille-leclerc
Copy link
Contributor

Great catch @skundrik . Shame on me, it's definitely an infinite loop. I pushed the PR #107 .
I'll cut a new release as soon as the PR is green.

OpenTelemetry Plugin Board automation moved this from Backlog to Done May 10, 2021
cyrille-leclerc added a commit that referenced this issue May 10, 2021
@cyrille-leclerc
Copy link
Contributor

@skundrik
Copy link
Author

@cyrille-leclerc Can't see the error any more. 👍 LGTM

@cyrille-leclerc
Copy link
Contributor

Thank you for your patience @skundrik

v1v added a commit to v1v/opentelemetry-plugin that referenced this issue Jun 21, 2021
…ailures-rate-with-provisioning

* upstream/master:
  Bump GRPC from 1.37.0 to 1.37.1
  Bump OpenTelemetry Collector from 0.26.0 to 0.27.0
  cosmetic: change the image reference
  [dashboards] Provide some dashboards to visualise the CI status
  No serialisation for the loadedStepsPlugins data structure (jenkinsci#110)
  [maven-release-plugin] prepare for next development iteration
  [maven-release-plugin] prepare release opentelemetry-0.13
  Bump the testing OpenTelemetry Collector from 0.23.0 to 0.26.0
  Bump OpenTelemetry from 1.1.0 to 1.2.0
  [maven-release-plugin] prepare for next development iteration
  [maven-release-plugin] prepare release opentelemetry-0.12-beta
  Fix jenkinsci#87 : infinite loop on delete
  [maven-release-plugin] prepare for next development iteration
  [maven-release-plugin] prepare release opentelemetry-0.11-beta
  Update src/main/java/io/jenkins/plugins/opentelemetry/job/MonitoringRunListener.java
  Fix jenkinsci#105, OpenTelemetry span attributes don't support null attributes
  Refactor: remove duplicated display name
  Default name for the ObservabilityBackend
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
2 participants