Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No Operation named [init_all_tables] in the Graph when using GPT2Transformer #14110

Closed
1 task done
SidWeng opened this issue Dec 26, 2023 · 4 comments · Fixed by #14164
Closed
1 task done

No Operation named [init_all_tables] in the Graph when using GPT2Transformer #14110

SidWeng opened this issue Dec 26, 2023 · 4 comments · Fixed by #14164
Assignees
Labels

Comments

@SidWeng
Copy link

SidWeng commented Dec 26, 2023

Is there an existing issue for this?

  • I have searched the existing issues and did not find a match.

What are you working on?

I followed 14.0.GPT2_Transformer_In_SparkNLP.ipynb. However it threw

java.lang.IllegalArgumentException: No Operation named [init_all_tables] in the Graph

Current Behavior

Py4JJavaError: An error occurred while calling o215.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 4.0 failed 4 times, most recent failure: Lost task 0.3 in stage 4.0 (TID 45) (10.0.0.10 executor 0): java.lang.IllegalArgumentException: No Operation named [init_all_tables] in the Graph
	at org.tensorflow.Graph.outputOrThrow(Graph.java:211)
	at org.tensorflow.Session$Runner.addTarget(Session.java:406)
	at com.johnsnowlabs.ml.tensorflow.TensorflowWrapper$.runRestoreNewInit$1(TensorflowWrapper.scala:388)
	at com.johnsnowlabs.ml.tensorflow.TensorflowWrapper$.com$johnsnowlabs$ml$tensorflow$TensorflowWrapper$$processInitAllTableOp(TensorflowWrapper.scala:402)
	at com.johnsnowlabs.ml.tensorflow.TensorflowWrapper.getTFSessionWithSignature(TensorflowWrapper.scala:146)
	at com.johnsnowlabs.ml.ai.GPT2.tag(GPT2.scala:136)
	at com.johnsnowlabs.ml.ai.GPT2.$anonfun$predict$1(GPT2.scala:90)
	at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293)
	at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
	at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
	at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
	at scala.collection.TraversableLike.flatMap(TraversableLike.scala:293)
	at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:290)
	at scala.collection.mutable.ArrayOps$ofRef.flatMap(ArrayOps.scala:198)
	at com.johnsnowlabs.ml.ai.GPT2.predict(GPT2.scala:76)
	at com.johnsnowlabs.nlp.annotators.seq2seq.GPT2Transformer.batchAnnotate(GPT2Transformer.scala:459)
	at com.johnsnowlabs.nlp.HasBatchedAnnotate.$anonfun$batchProcess$1(HasBatchedAnnotate.scala:59)
	at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:364)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:136)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)

Expected Behavior

should print the content of generation.result just like the example output of 14.0.GPT2_Transformer_In_SparkNLP.ipynb

Steps To Reproduce

run 14.0.GPT2_Transformer_In_SparkNLP.ipynb with following Spark config

spark.sql.extensions                               io.delta.sql.DeltaSparkSessionExtension
spark.sql.catalog.spark_catalog                    org.apache.spark.sql.delta.catalog.DeltaCatalog
spark.sql.hive.metastore.version                   2.3.9
spark.hadoop.datanucleus.autoCreateSchema          true
spark.hadoop.datanucleus.fixedDatastore            false
spark.hadoop.datanucleus.schema.autoCreateTables   true
spark.hadoop.javax.jdo.option.ConnectionURL        jdbc:mysql://mysql-xxxxx
spark.hadoop.javax.jdo.option.ConnectionUserName   xxxxx
spark.hadoop.javax.jdo.option.ConnectionPassword   xxxxx
spark.hadoop.javax.jdo.option.ConnectionDriverName com.mysql.cj.jdbc.Driver
spark.sql.warehouse.dir        hdfs://xxxxx
spark.sql.catalogImplementation                    hive
spark.databricks.delta.schema.autoMerge.enabled    true
spark.driver.cores 1
spark.driver.memory 10g
spark.executor.cores 1
spark.executor.memory 7g
spark.dynamicAllocation.enabled true
spark.shuffle.service.enabled true
spark.dynamicAllocation.minExecutors 1
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator org.bdgenomics.adam.serialization.ADAMKryoRegistrator
spark.executor.extraJavaOptions -XX:+UseG1GC
spark.hadoop.io.compression.codecs org.seqdoop.hadoop_bam.util.BGZFEnhancedGzipCodec
spark.local.dir /mnt
spark.port.maxRetries 64
spark.kryoserializer.buffer.max 1000M

Spark NLP version and Apache Spark

Spark NLP: 5.2.0
Spark: 3.3.0
pyspark: 3.3.1

Type of Spark Application

spark-shell

Java Version

openjdk version "1.8.0_392"

Java Home Directory

/usr/lib/jvm/java-1.8.0-openjdk-amd64

Setup and installation

pip install spark-nlp==5.2.0

Operating System and Version

Ubuntu 20.04

Link to your project (if available)

No response

Additional Information

spark-nlp-assembly-5.2.0.jar is put under SPARK_HOME/jars

@mdrobena
Copy link

Hi,

I have the exact same issue.

My environment:

Databricks runtime version 14.0 ML (includes Apache Spark 3.5.0, Scala 2.12)
Spark NLP: 5.2.2

@DevinTDHa
Copy link
Member

Hi @SidWeng @mdrobena,

I tried to reproduce it, but there isn't an issue with my side. Can you perhaps detail your steps to recreate this issue? Additionally, can you perhaps try just to update Spark NLP to the latest version and see if there is any difference?

I tried these following combinations of settings:

  1. Run the notebook on colab with your specified version, applying the config
  2. Run the notebook locally with specified versions, applying the config
  3. Run the notebook on databricks with the same runtime and specified versions, following Install Spark NLP on Databricks

But in all of them, I get the results without any problems. Thanks for reporting!

@mdrobena
Copy link

Hi @DevinTDHa,

Thanks for your reply. I just tried to run the first task (GPT2 Pipeline) as shown here.

I had updated both, pypi spark-nlp and the maven library. This is my current environment:

  1. Azure Databricks cluster with 14.0 ML runtime
  2. Spark 3.5.0
  3. Spark Maven library com.johnsnowlabs.nlp:spark-nlp_2.12:5.2.3 following Install Spark NLP on Databricks
  4. pypi spark-nlp==5.2.3

However, I still get the same error:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 23.0 failed 4 times, most recent failure: Lost task 2.3 in stage 23.0 (TID 173) (10.139.64.12 executor 1): java.lang.IllegalArgumentException: No Operation named [init_all_tables] in the Graph

I attach my notebook GPT2Transformer_ OpenAI Text-To-Text Transformer.zip to reproduce the error.

@DevinTDHa
Copy link
Member

Hi @mdrobena,

Thanks for the thorough description! I was able to reproduce it with your instructions. Important is also that it needs to be run on a muli-node setup. I am looking into this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants