Skip to content
This repository has been archived by the owner on Mar 11, 2024. It is now read-only.

NPE in test #75

Closed
etishka opened this issue Jan 24, 2018 · 6 comments
Closed

NPE in test #75

etishka opened this issue Jan 24, 2018 · 6 comments

Comments

@etishka
Copy link

etishka commented Jan 24, 2018

Build hangs on flinkspector unit test. After some time it shows NullPointerException inside of Flinkspector code. It is not clear what causes it.

17:18:36,808 INFO org.apache.flink.runtime.minicluster.FlinkMiniCluster - Starting FlinkMiniCluster.
17:18:36,840 INFO org.apache.flink.runtime.metrics.MetricRegistryImpl - No metrics reporter configured, no metrics will be exposed/reported.
17:18:38,718 INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started
17:18:38,885 INFO org.apache.flink.runtime.blob.BlobServer - Created BLOB server storage directory /tmp/blobStore-7bcc77e5-111e-4808-8fdd-6cc199bc47c3
17:18:38,928 INFO org.apache.flink.runtime.blob.BlobServer - Started BLOB server at 0.0.0.0:41743 - max concurrent requests: 50 - max backlog: 1000
17:18:39,233 INFO org.apache.flink.runtime.jobmanager.MemoryArchivist - Started memory archivist akka://flink/user/archive_1
17:18:39,255 INFO org.apache.flink.runtime.jobmanager.JobManager - Starting JobManager at akka://flink/user/jobmanager_1.
17:18:39,258 INFO org.apache.flink.runtime.highavailability.nonha.embedded.EmbeddedLeaderService - Proposing leadership to contender org.apache.flink.runtime.jobmanager.JobManager@73c3f03f @ akka://flink/user/jobmanager_1
17:18:39,278 INFO org.apache.flink.runtime.taskexecutor.TaskManagerConfiguration - Messages have a max timeout of 1000000 ms
17:18:39,328 INFO org.apache.flink.runtime.taskexecutor.TaskManagerServices - Temporary file directory '/tmp': total 9 GB, usable 4 GB (44.44% usable)
17:18:39,332 INFO org.apache.flink.runtime.jobmanager.JobManager - JobManager akka://flink/user/jobmanager_1 was granted leadership with leader session ID Some(11ce4b11-eff3-4964-9cb7-99e5a62dfd4e).
17:18:39,393 INFO org.apache.flink.runtime.highavailability.nonha.embedded.EmbeddedLeaderService - Received confirmation of leadership for leader akka://flink/user/jobmanager_1 , session=11ce4b11-eff3-4964-9cb7-99e5a62dfd4e
17:18:39,486 INFO org.apache.flink.runtime.clusterframework.standalone.StandaloneResourceManager - Trying to associate with JobManager leader akka://flink/user/jobmanager_1
17:18:39,610 INFO org.apache.flink.runtime.io.network.buffer.NetworkBufferPool - Allocated 144 MB for network buffer pool (number of memory segments: 4612, bytes per segment: 32768).
17:18:39,639 INFO org.apache.flink.runtime.clusterframework.standalone.StandaloneResourceManager - Resource Manager associating with leading JobManager Actor[akka://flink/user/jobmanager_1#-734530174] - leader session 11ce4b11-eff3-4964-9cb7-99e5a62dfd4e
17:18:39,686 WARN org.apache.flink.runtime.query.QueryableStateUtils - Could not load Queryable State Client Proxy. Probable reason: flink-queryable-state-runtime is not in the classpath. Please put the corresponding jar from the opt to the lib folder.
17:18:39,686 WARN org.apache.flink.runtime.query.QueryableStateUtils - Could not load Queryable State Server. Probable reason: flink-queryable-state-runtime is not in the classpath. Please put the corresponding jar from the opt to the lib folder.
17:18:39,692 INFO org.apache.flink.runtime.io.network.NetworkEnvironment - Starting the network environment and its components.
17:18:39,694 INFO org.apache.flink.runtime.taskexecutor.TaskManagerServices - Limiting managed memory to 80 MB, memory will be allocated lazily.
17:18:39,713 INFO org.apache.flink.runtime.io.disk.iomanager.IOManager - I/O manager uses directory /tmp/flink-io-c0057d8c-e00b-48b4-b8e6-d04e0325babc for spill files.
17:18:39,729 INFO org.apache.flink.runtime.filecache.FileCache - User file cache uses directory /tmp/flink-dist-cache-d6c66807-0c28-450e-b9b1-5ac6c3ffb0aa
17:18:39,784 INFO org.apache.flink.runtime.filecache.FileCache - User file cache uses directory /tmp/flink-dist-cache-bf8a0fe4-1d1a-43fb-857c-9d14010da32e
17:18:39,906 INFO org.apache.flink.runtime.taskmanager.TaskManager - Starting TaskManager actor at akka://flink/user/taskmanager_1#954964096.
17:18:39,910 INFO org.apache.flink.runtime.taskmanager.TaskManager - TaskManager data connection information: 2a476f7f300008d39542925dbc4e71c0 @ localhost (dataPort=-1)
17:18:39,911 INFO org.apache.flink.runtime.taskmanager.TaskManager - TaskManager has 1 task slot(s).
17:18:39,917 INFO org.apache.flink.runtime.taskmanager.TaskManager - Memory usage stats: [HEAP: 155/348/1446 MB, NON HEAP: 29/30/-1 MB (used/committed/max)]
17:18:39,940 INFO org.apache.flink.runtime.taskmanager.TaskManager - Trying to register at JobManager akka://flink/user/jobmanager_1 (attempt 1, timeout: 500 milliseconds)
17:18:39,960 INFO org.apache.flink.runtime.instance.InstanceManager - Registered TaskManager at localhost (akka://flink/user/taskmanager_1) as 91cb3a85518a3daeec6ba199a25b1384. Current number of registered hosts is 1. Current number of alive task slots is 1.
17:18:39,969 INFO org.apache.flink.runtime.clusterframework.standalone.StandaloneResourceManager - TaskManager 2a476f7f300008d39542925dbc4e71c0 has started.
17:18:39,971 INFO org.apache.flink.runtime.taskmanager.TaskManager - Successful registration at JobManager (akka://flink/user/jobmanager_1), starting network stack and library cache.
17:18:39,990 INFO org.apache.flink.runtime.taskmanager.TaskManager - Determined BLOB server address to be localhost/127.0.0.1:41743. Starting BLOB cache.
17:18:39,996 INFO org.apache.flink.runtime.blob.PermanentBlobCache - Created BLOB cache storage directory /tmp/blobStore-594aa983-44f2-4809-b20c-2d1f04654653
17:18:40,002 INFO org.apache.flink.runtime.blob.TransientBlobCache - Created BLOB cache storage directory /tmp/blobStore-665c9ef1-bf7e-45b0-a862-b0703d0b753a
17:18:41,893 INFO org.apache.flink.runtime.client.JobSubmissionClientActor - Disconnect from JobManager null.
17:18:41,910 INFO org.apache.flink.runtime.client.JobSubmissionClientActor - Received SubmitJobAndWait(JobGraph(jobId: 27f30ea34456e47491c489bb258c77b0)) but there is no connection to a JobManager yet.
17:18:41,911 INFO org.apache.flink.runtime.client.JobSubmissionClientActor - Received job Flink Streaming Job (27f30ea34456e47491c489bb258c77b0).
17:18:41,923 INFO org.apache.flink.runtime.client.JobSubmissionClientActor - Connect to JobManager Actor[akka://flink/user/jobmanager_1#-734530174].
17:18:41,930 INFO org.apache.flink.runtime.client.JobSubmissionClientActor - Connected to JobManager at Actor[akka://flink/user/jobmanager_1#-734530174] with leader session id 11ce4b11-eff3-4964-9cb7-99e5a62dfd4e.
17:18:41,930 INFO org.apache.flink.runtime.client.JobSubmissionClientActor - Sending message to JobManager akka://flink/user/jobmanager_1 to submit job Flink Streaming Job (27f30ea34456e47491c489bb258c77b0) and wait for progress
17:18:41,943 INFO org.apache.flink.runtime.client.JobSubmissionClientActor - Upload jar files to job manager akka://flink/user/jobmanager_1.
java.lang.NullPointerException
at io.flinkspector.core.runtime.OutputHandler.processMessage(OutputHandler.java:156)
at io.flinkspector.core.runtime.OutputHandler.call(OutputHandler.java:100)
at io.flinkspector.core.runtime.OutputHandler.call(OutputHandler.java:39)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

@lofifnc
Copy link
Contributor

lofifnc commented Jan 24, 2018

Could you supply some code causing this exception? Like this I can only guess what's causing this.

@etishka
Copy link
Author

etishka commented Feb 2, 2018

It is just a generic Flinkspector testcase; this happens before any test execution.
And it only occurs on corporate build system's linux box (teamcity). Developer's systems and jenkins CI work ok.

@etishka
Copy link
Author

etishka commented Feb 2, 2018

BTW, looking into the code:


private Action processMessage(byte[] bytes)
            throws IOException, FlinkTestFailedException {

        if (bytes.length == 0) {
            //the subscriber has been cancelled from outside quietly finish the process
            return Action.FINISH;
        }

        if (bytes == null) {
            System.out.println("Waited too long for message from sink");
            return Action.FINISH;
        }

if bytes == null, then first check would raise NPE. And the second check would always be false

@DaveFrederick
Copy link
Contributor

I ran into this bug and using a private patch with the first two if blocks switched in processMessage(), the problem is resolved.

@lofifnc
Copy link
Contributor

lofifnc commented Feb 9, 2018

Could you open a pull request with the fix?
I'm at vacation for the next 3 weeks and don't have access to a computer. I could at least merge this into the master.

Cheers, Alex

@lofifnc
Copy link
Contributor

lofifnc commented Feb 27, 2018

The bug fix has been included into release 0.8.3

@lofifnc lofifnc closed this as completed Feb 27, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants