Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

model predict error #991

Closed
jiangchao2014 opened this issue Sep 19, 2019 · 12 comments
Closed

model predict error #991

jiangchao2014 opened this issue Sep 19, 2019 · 12 comments
Assignees

Comments

@jiangchao2014
Copy link

jiangchao2014 commented Sep 19, 2019

when i use loadTF model to predict, i get error, i cant understand the error information. My input JTensor shape is 1 * 224 * 224 * 3, please help me,thanks~

My input is:
`image width:224 height:224 brands:3

inputs:[[JTensor{data=[196.0, 191.0, 195.0, 197.0, 193.0, 194.0, 202.0, 193.0, 194.0, 203.0, 194.0, 195.0, 195.0, 187.0, 185.0, 199.0, 189.0, 188.0, 202.0, 190.0, 192.0, 199.0, 189.0, 190.0, 200.0, 191.0, 194.0, 201.0, 191.0, 190.0, 197.0, 189.0, 186.0, 201.0, 191.0, 189.0, 201.0, 191.0, 190.0, 200.0, 189.0, 193.0, 202.0, 191.0, 195.0, 200.0, 194.0, 196.0, 203.0, 194.0, 199.0, 202.0, 191.0, 195.0, 201.0, 191.0, 190.0, 196.0, 188.0, 185.0, 199.0, 189.0, 190.0, 200.0, 190.0, 189.0, 202.0, 193.0, 188.0, 201.0, 190.0, 186.0, 200.0, 192.0, 190.0, 201.0, 196.0, 193.0, 198.0, 192.0, 192.0, 201.0, 192.0, 195.0, 201.0, 196.0, 193.0, 198.0, 193.0, 190.0, 203.0, 193.0, 192.0, 205.0, 194.0, 198.0, 202.0, 194.0, 191.0, 203.0, 195.0, 193.0, 204.0, 196.0, 194.0, 201.0, 196.0, 192.0, 203.0, 195.0, 192.0, 200.0, 195.0, 192.0, 207.0, 206.0, 202.0, 210.0, 218.0, 220.0, 210.0, 220.0, 221.0, 206.0, 215.0, 214.0, 206.0, 204.0, 205.0, 205.0, 200.0, 197.0, 199.0, 194.0, 191.0, 200.0, 197.0, 192.0, 202.0, 194.0, 191.0, 202.0, 193.0, 188.0, 204.0, 194.0, 193.0, 202.0, 197.0, 193.0, 202.0, 194.0, 192.0, 202.0, 196.0, 196.0, 205.0, 198.0, 192.0, 202.0, 197.0, 194.0, 198.0, 194.0, 195.0, 200.0, 192.0, 190.0, 205.0, 195.0, 193.0, 204.0, 194.0, 193.0, 205.0, 197.0, 194.0, 201.0, 193.0, 190.0, 204.0, 193.0, 189.0, 204.0, 193.0, 191.0, 201.0, 190.0, 186.0, 200.0, 190.0, 181.0, 200.0, 190.0, 181.0, 202.0, 193.0, 188.0, 203.0, 193.0, 191.0, 204.0, 193.0, 191.0, 203.0, 191.0, 191.0, 203.0, 193.0, 194.0, 206.0, 196.0, 195.0, 206.0, 198.0, 196.0, 200.0, 192.0, 189.0, 202.0, 195.0, 189.0, 204.0, 194.0, 192.0, 203.0, 193.0, 192.0, 203.0, 194.0, 195.0, 207.0, 195.0, 195.0, 201.0, 196.0, 192.0, 204.0, 199.0, 196.0, 202.0, 197.0, 193.0, 202.0, 199.0, 192.0, 203.0, 198.0, 195.0, 203.0, 199.0, 198.0, 208.0, 214.0, 212.0, 212.0, 222.0, 221.0, 212.0, 222.0, 221.0, 210.0, 212.0, 209.0, 203.0, 198.0, 195.0, 202.0, 198.0, 197.0, 202.0, 192.0, 191.0, 204.0, 195.0, 190.0, 202.0, 194.0, 191.0, 204.0, 194.0, 193.0, 206.0, 197.0, 192.0, 205.0, 197.0, 195.0, 205.0, 195.0, 194.0, 201.0, 193.0, 191.0, 204.0, 194.0, 192.0, 204.0, 196.0, 193.0, 201.0, 193.0, 191.0, 201.0, 191.0, 192.0, 201.0, 196.0, 193.0, 199.0, 195.0, 192.0, 200.0, 195.0, 191.0, 206.0, 194.0, 194.0, 203.0, 194.0, 197.0, 206.0, 195.0, 199.0, 205.0, 197.0, 195.0, 207.0, 197.0, 198.0, 206.0, 196.0, 194.0, 205.0, 196.0, 197.0, 203.0, 194.0, 195.0, 200.0, 195.0, 192.0, 202.0, 196.0, 198.0, 202.0, 197.0, 194.0, 205.0, 197.0, 194.0, 204.0, 196.0, 193.0, 205.0, 200.0, 197.0, 203.0, 194.0, 195.0, 202.0, 197.0, 194.0, 197.0, 191.0, 191.0, 199.0, 194.0, 191.0, 207.0, 196.0, 194.0, 204.0, 194.0, 193.0, 200.0, 190.0, 189.0, 205.0, 197.0, 195.0, 204.0, 196.0, 194.0, 204.0, 199.0, 195.0, 211.0, 213.0, 212.0, 216.0, 226.0, 225.0, 215.0, 225.0, 224.0, 211.0, 215.0, 214.0, 206.0, 205.0, 201.0, 209.0, 200.0, 201.0, 206.0, 195.0, 199.0, 205.0, 195.0, 196.0, 201.0, 193.0, 191.0, 203.0, 193.0, 191.0, 203.0, 193.0, 194.0, 201.0, 196.0, 190.0, 199.0, 194.0, 191.0, 205.0, 200.0, 197.0, 204.0, 199.0, 195.0, 202.0, 197.0, 194.0, 202.0, 196.0, 196.0, 201.0, 195.0, 195.0, 204.0, 198.0, 198.0, 208.0, 204.0, 203.0, 203.0, 202.0, 198.0, 206.0, 202.0, 199.0, 206.0, 202.0, 199.0, 207.0, 202.0, 206.0, 204.0, 202.0, 203.0, 203.0, 202.0, 198.0, 202.0, 201.0, 199.0, 206.0, 206.0, 204.0, 206.0, 206.0, 206.0, 204.0, 205.0, 200.0, 203.0, 203.0, 203.0, 205.0, 206.0, 208.0, 208.0, 210.0, 209.0, 208.0, 208.0, 210.0, 211.0, 213.0, 212.0, 210.0, 212.0, 211.0, 215.0, 215.0, 217.0, 213.0, 213.0, 213.0, ... ], shape=[1, 224, 224, 3]}]]`

`------------------------------------------------------------
The program finished with the following exception:

org.apache.flink.client.program.ProgramInvocationException: Job failed. (JobID: 0a0f590771d0bd17dedbb84d00d58e6f)
at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:268)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:483)
at org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:66)
at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1509)
at com.garbage.example.RunMain.main(RunMain.java:17)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:529)
at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:421)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:423)
at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:813)
at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:287)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:213)
at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1050)
at org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1126)
at org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1126)
Caused by: org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
at org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146)
at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:265)
... 18 more
Caused by: java.lang.UnsatisfiedLinkError: com.intel.analytics.zoo.pipeline.inference.OpenVinoInferenceSupportive.predict(J[F[I)Lcom/intel/analytics/zoo/pipeline/inference/JTensor;
at com.intel.analytics.zoo.pipeline.inference.OpenVinoInferenceSupportive.predict(Native Method)
at com.intel.analytics.zoo.pipeline.inference.OpenVINOModel$$anonfun$predict$1.apply(OpenVINOModel.scala:37)
at com.intel.analytics.zoo.pipeline.inference.OpenVINOModel$$anonfun$predict$1.apply(OpenVINOModel.scala:31)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at com.intel.analytics.zoo.pipeline.inference.OpenVINOModel.predict(OpenVINOModel.scala:31)
at com.intel.analytics.zoo.pipeline.inference.InferenceModel.com$intel$analytics$zoo$pipeline$inference$InferenceModel$$predict(InferenceModel.scala:550)
at com.intel.analytics.zoo.pipeline.inference.InferenceModel$$anonfun$doPredict$2.apply(InferenceModel.scala:507)
at com.intel.analytics.zoo.pipeline.inference.InferenceModel$$anonfun$doPredict$2.apply(InferenceModel.scala:504)
at com.intel.analytics.zoo.pipeline.inference.InferenceSupportive$class.timing(InferenceSupportive.scala:42)
at com.intel.analytics.zoo.pipeline.inference.InferenceModel.timing(InferenceModel.scala:29)
at com.intel.analytics.zoo.pipeline.inference.InferenceModel.doPredict(InferenceModel.scala:504)
at com.intel.analytics.zoo.pipeline.inference.AbstractInferenceModel.predict(AbstractInferenceModel.java:133)
at com.garbage.example.PredictFlatMap.flatMap(PredictFlatMap.java:68)
at com.garbage.example.PredictFlatMap.flatMap(PredictFlatMap.java:23)
at org.apache.flink.streaming.api.operators.StreamFlatMap.processElement(StreamFlatMap.java:50)
at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:202)
at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
at java.lang.Thread.run(Thread.java:748)`

@qiyuangong
Copy link
Contributor

Hi @jiangchao2014
In most cases, this error is caused by native shared lib mismatch.

To fix or avoid that error, pls switch to Ubuntu 16.04 or higher (recommended by OpenVINO). CentOS is not a good choice, because its gcc is a bit lower than required.

If you don't have an extra server with required OS, you can use solution 2 in this link.

@jiangchao2014
Copy link
Author

jiangchao2014 commented Sep 20, 2019

@qiyuangong thanks,the os is ubuntu 16,so I think it is not the reason. I have joined tianchi competition,there are some pepple have run the program normally in the same environment.
Is it possible for other reasons?
Looking forward to your reply

@jiangchao2014
Copy link
Author

up,up,up

@qiyuangong
Copy link
Contributor

Well, this problem is definitely caused by JNI. But, as you are using Ubuntu 16.04 and other people's applications work well on this environment. Pls print system env with env commend, and give us more details about your configuration.

@glorysdj
Copy link
Contributor

@jiangchao2014 could you please also share the code in PredictFlatMap please

@glorysdj
Copy link
Contributor

The root cause of this issue is user has broadcast the OpenvinoModel to flink slots, but the OpenvinoModel is native and can not be broadcast, please put the model load logic in open function of the PredictFlatMap class, which will broadcast the bytes of the model and load it on each flink slot, and no such issue will exists. Please have a try. Thanks.

@qiyuangong
Copy link
Contributor

@jiangchao2014

Is this issue be addressed?

@jiangchao2014
Copy link
Author

jiangchao2014 commented Oct 3, 2019

@qiyuangong @glorysdj Sorry,i am taking national day. This is the address of PredictFlatMap, thanks for answers. PredictFlatMap

@jiangchao2014
Copy link
Author

@qiyuangong @glorysdj thanks, i have solved it by moving the loadTF to open function, although i still dont understand the difference between open function and constructor function. I am new flinker.

@qiyuangong
Copy link
Contributor

@jiangchao2014

When working with JNI, we use float * for loaded OpenVINO model. This float * pointer point to model object in loaded node. If this pointer is broadcast to cluster, other nodes cannot fetch the loader model.

@glorysdj added an open function to handle this problem, by broadcast model bytes to remote nodes.

Issue close. Feel free to reopen it, if you have further question.

@jason-dai jason-dai reopened this Oct 5, 2019
@jason-dai
Copy link
Contributor

How can we make sure that users won't make these mistakes in their code? Or how to report error properly?

@qiyuangong
Copy link
Contributor

This issue is out of data, and is replaced by https://github.com/intel-analytics/analytics-zoo-core/issues/99 . Will be fixed through zoo-core patch.

@liu-shaojun liu-shaojun transferred this issue from intel-analytics/BigDL-2.x Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants