No Operation named [init_all_tables] in the Graph when using GPT2Transformer #14110

SidWeng · 2023-12-26T09:33:54Z

Is there an existing issue for this?

I have searched the existing issues and did not find a match.

What are you working on?

I followed 14.0.GPT2_Transformer_In_SparkNLP.ipynb. However it threw

java.lang.IllegalArgumentException: No Operation named [init_all_tables] in the Graph

Current Behavior

Py4JJavaError: An error occurred while calling o215.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 4.0 failed 4 times, most recent failure: Lost task 0.3 in stage 4.0 (TID 45) (10.0.0.10 executor 0): java.lang.IllegalArgumentException: No Operation named [init_all_tables] in the Graph
	at org.tensorflow.Graph.outputOrThrow(Graph.java:211)
	at org.tensorflow.Session$Runner.addTarget(Session.java:406)
	at com.johnsnowlabs.ml.tensorflow.TensorflowWrapper$.runRestoreNewInit$1(TensorflowWrapper.scala:388)
	at com.johnsnowlabs.ml.tensorflow.TensorflowWrapper$.com$johnsnowlabs$ml$tensorflow$TensorflowWrapper$$processInitAllTableOp(TensorflowWrapper.scala:402)
	at com.johnsnowlabs.ml.tensorflow.TensorflowWrapper.getTFSessionWithSignature(TensorflowWrapper.scala:146)
	at com.johnsnowlabs.ml.ai.GPT2.tag(GPT2.scala:136)
	at com.johnsnowlabs.ml.ai.GPT2.$anonfun$predict$1(GPT2.scala:90)
	at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293)
	at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
	at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
	at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
	at scala.collection.TraversableLike.flatMap(TraversableLike.scala:293)
	at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:290)
	at scala.collection.mutable.ArrayOps$ofRef.flatMap(ArrayOps.scala:198)
	at com.johnsnowlabs.ml.ai.GPT2.predict(GPT2.scala:76)
	at com.johnsnowlabs.nlp.annotators.seq2seq.GPT2Transformer.batchAnnotate(GPT2Transformer.scala:459)
	at com.johnsnowlabs.nlp.HasBatchedAnnotate.$anonfun$batchProcess$1(HasBatchedAnnotate.scala:59)
	at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:364)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:136)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)

Expected Behavior

should print the content of generation.result just like the example output of 14.0.GPT2_Transformer_In_SparkNLP.ipynb

Steps To Reproduce

run 14.0.GPT2_Transformer_In_SparkNLP.ipynb with following Spark config

spark.sql.extensions                               io.delta.sql.DeltaSparkSessionExtension
spark.sql.catalog.spark_catalog                    org.apache.spark.sql.delta.catalog.DeltaCatalog
spark.sql.hive.metastore.version                   2.3.9
spark.hadoop.datanucleus.autoCreateSchema          true
spark.hadoop.datanucleus.fixedDatastore            false
spark.hadoop.datanucleus.schema.autoCreateTables   true
spark.hadoop.javax.jdo.option.ConnectionURL        jdbc:mysql://mysql-xxxxx
spark.hadoop.javax.jdo.option.ConnectionUserName   xxxxx
spark.hadoop.javax.jdo.option.ConnectionPassword   xxxxx
spark.hadoop.javax.jdo.option.ConnectionDriverName com.mysql.cj.jdbc.Driver
spark.sql.warehouse.dir        hdfs://xxxxx
spark.sql.catalogImplementation                    hive
spark.databricks.delta.schema.autoMerge.enabled    true
spark.driver.cores 1
spark.driver.memory 10g
spark.executor.cores 1
spark.executor.memory 7g
spark.dynamicAllocation.enabled true
spark.shuffle.service.enabled true
spark.dynamicAllocation.minExecutors 1
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator org.bdgenomics.adam.serialization.ADAMKryoRegistrator
spark.executor.extraJavaOptions -XX:+UseG1GC
spark.hadoop.io.compression.codecs org.seqdoop.hadoop_bam.util.BGZFEnhancedGzipCodec
spark.local.dir /mnt
spark.port.maxRetries 64
spark.kryoserializer.buffer.max 1000M

Spark NLP version and Apache Spark

Spark NLP: 5.2.0
Spark: 3.3.0
pyspark: 3.3.1

Type of Spark Application

spark-shell

Java Version

openjdk version "1.8.0_392"

Java Home Directory

/usr/lib/jvm/java-1.8.0-openjdk-amd64

Setup and installation

pip install spark-nlp==5.2.0

Operating System and Version

Ubuntu 20.04

Link to your project (if available)

No response

Additional Information

spark-nlp-assembly-5.2.0.jar is put under SPARK_HOME/jars

The text was updated successfully, but these errors were encountered:

mdrobena · 2024-01-25T19:52:06Z

Hi,

I have the exact same issue.

My environment:

Databricks runtime version 14.0 ML (includes Apache Spark 3.5.0, Scala 2.12)
Spark NLP: 5.2.2

DevinTDHa · 2024-02-10T16:42:42Z

Hi @SidWeng @mdrobena,

I tried to reproduce it, but there isn't an issue with my side. Can you perhaps detail your steps to recreate this issue? Additionally, can you perhaps try just to update Spark NLP to the latest version and see if there is any difference?

I tried these following combinations of settings:

Run the notebook on colab with your specified version, applying the config
Run the notebook locally with specified versions, applying the config
Run the notebook on databricks with the same runtime and specified versions, following Install Spark NLP on Databricks

But in all of them, I get the results without any problems. Thanks for reporting!

mdrobena · 2024-02-11T16:49:37Z

Hi @DevinTDHa,

Thanks for your reply. I just tried to run the first task (GPT2 Pipeline) as shown here.

I had updated both, pypi spark-nlp and the maven library. This is my current environment:

Azure Databricks cluster with 14.0 ML runtime
Spark 3.5.0
Spark Maven library com.johnsnowlabs.nlp:spark-nlp_2.12:5.2.3 following Install Spark NLP on Databricks
pypi spark-nlp==5.2.3

However, I still get the same error:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 23.0 failed 4 times, most recent failure: Lost task 2.3 in stage 23.0 (TID 173) (10.139.64.12 executor 1): java.lang.IllegalArgumentException: No Operation named [init_all_tables] in the Graph

I attach my notebook GPT2Transformer_ OpenAI Text-To-Text Transformer.zip to reproduce the error.

DevinTDHa · 2024-02-17T10:01:56Z

Hi @mdrobena,

Thanks for the thorough description! I was able to reproduce it with your instructions. Important is also that it needs to be run on a muli-node setup. I am looking into this problem.

SidWeng added the question label Dec 26, 2023

SidWeng assigned maziyarpanahi Dec 26, 2023

maziyarpanahi assigned DevinTDHa Jan 25, 2024

DevinTDHa mentioned this issue Feb 17, 2024

SPARKNLP-1000: Fix No Operation named [init_all_tables] for GPT2 #14177

Merged

10 tasks

maziyarpanahi linked a pull request Feb 19, 2024 that will close this issue

release/530-release-candidate #14164

Merged

maziyarpanahi closed this as completed in #14164 Feb 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No Operation named [init_all_tables] in the Graph when using GPT2Transformer #14110

No Operation named [init_all_tables] in the Graph when using GPT2Transformer #14110

SidWeng commented Dec 26, 2023 •

edited

Loading

mdrobena commented Jan 25, 2024

DevinTDHa commented Feb 10, 2024

mdrobena commented Feb 11, 2024

DevinTDHa commented Feb 17, 2024

No Operation named [init_all_tables] in the Graph when using GPT2Transformer #14110

No Operation named [init_all_tables] in the Graph when using GPT2Transformer #14110

Comments

SidWeng commented Dec 26, 2023 • edited Loading

Is there an existing issue for this?

What are you working on?

Current Behavior

Expected Behavior

Steps To Reproduce

Spark NLP version and Apache Spark

Type of Spark Application

Java Version

Java Home Directory

Setup and installation

Operating System and Version

Link to your project (if available)

Additional Information

mdrobena commented Jan 25, 2024

My environment:

DevinTDHa commented Feb 10, 2024

mdrobena commented Feb 11, 2024

DevinTDHa commented Feb 17, 2024

SidWeng commented Dec 26, 2023 •

edited

Loading