org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection has grown past JVM limit of 0xFFFF #485

Tagar · 2017-12-06T22:58:07Z

We started getting this error on wide datasets after upgrading to latest SW 2.2.3.
It was not happening on previous SW release 2.2.2.

executor 16): java.lang.RuntimeException: Error while encoding: org.codehaus.janino.JaninoRuntimeException: failed to compile: org.codehaus.janino.JaninoRuntimeException: Constant pool for class org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection has grown past JVM limit of 0xFFFF /* 001 / public java.lang.Object generate(Object[] references) { / 002 / return new SpecificUnsafeProjection(references); / 003 / } / 004 / / 005 / class SpecificUnsafeProjection extends org.apache.spark.sql.catalyst.expressions.UnsafeProjection { / 006 / / 007 / private Object[] references; / 008 / private int argValue; / 009 / private java.lang.String argValue1; / 010 / private boolean isNull11; / 011 / private boolean value11; / 012 / private boolean isNull12; /

Code:

new_df = df.drop('FILE_CODE', 'ZIP_CODE', 'ZIP_PLUS_4', 'ADDRESS_KEY', 'HOUSEHOLD_KEY', 'AGILITY_ADDRESS', 'AGILITY_HOUSEHOLD')
print "Drop vars"
skippy_binary = hc.as_h2o_frame(new_df,framename='skippy_binary')
skippy_binary["SEGMENT"] = skippy_binary["SEGMENT"].asfactor()
print "H2O Frame Created"

This error happens on a dataframe with ~3k variables, but doesn't happen on a dataframe with ~800 columns for example. But again, SW 2.2.2 didn't have this problem on the same same data/same code.

The text was updated successfully, but these errors were encountered:

Tagar · 2017-12-06T23:03:45Z

This seems to be https://issues.apache.org/jira/browse/SPARK-18016 ? Wonder why we didn't hit this on 2.2.2 ?

Tagar · 2017-12-11T18:11:02Z

@jakubhava I opened support case with Cloudera.
They asked if we could reproduce this outside of h2o/ sparkling water and just using Spark/ PySpark.
Could you please point me to a code snippet that hc.as_h2o_frame() runs internally to see
if we can reproduce this outside of SW?
Thank you.

jakubhava · 2017-12-12T10:31:02Z

Hi @Tagar , thanks for investigation!

The logic used for converting Spark DataFrame into H2OFrame is stored right here https://github.com/h2oai/sparkling-water/blob/master/core/src/main/scala/org/apache/spark/h2o/converters/SparkDataFrameConverter.scala

mmalohlava · 2017-12-12T20:59:07Z

@Tagar @jakubhava it is interesting problem. I tracked the differences between 2.2.2..2.2.3 but did not find any reasonable explanation. There are several potential changes, like this one but i do not see reason for triggering https://issues.apache.org/jira/browse/SPARK-18016. @jakubhava WDYT?

Tagar · 2017-12-12T23:58:16Z

Thank you for looking into this @jakubhava and @mmalohlava .

Cloudera Support confirms it is directly related to SPARK-18016.
Although again it's strange we didn't face this problem before the upgrade.
We also tried to upgrade to 2.2.4 (from 2.2.3) and users confirm today
they still have that issue on very wide datasets (broken on 3k, 13k columns
datasets, for example, but works fine on 800 columns datasets).

I also asked users to try spark.sql.codegen.wholeStage to set to false
as it seems related to code generation on Spark side somehow.
But setting spark.sql.codegen.wholeStage to false didn't change behavior.

axiomoixa · 2017-12-13T06:43:12Z

@Tagar Could you try Spark 2.2.1? Apparently a lot of those 64KB JVM bytecode limit bugs are fixed now.
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420&version=12340470

The limit on the number of columns you experienced sounds a lot like what I had experienced with MLLib. Namely, GLM with 500 variables ran fine, when it got up to 2k variables, GLM errored out. https://issues.apache.org/jira/browse/SPARK-22761 , which is one of those 64KB bytecode limit bugs that is apparently fixed in Spark 2.2.1.

Tagar · 2017-12-13T16:53:10Z

@axiomoixa We use Cloudera's Spark 2.2 build - they sometimes remove certain patches and on other hand can backport certain other fixes. I have updated my Cloudera case and asked if those 2.2.1 fixes of "64KB JVM bytecode limit" made its way to Cloudera - thank you for pointing to that.

jakubhava · 2017-12-13T19:44:17Z

I just had a quick look and I couldn’t find any particular change in Sparkling Water which would cause such a dramatic column number

axiomoixa · 2017-12-13T20:51:12Z

@Tagar
I must unfortunately correct myself. Apparently, those 64KB bytecode fixes are in the major branch to be released with Spark 2.3.0, but not in 2.2.1 yet.

jakubhava · 2017-12-14T17:06:56Z

@axiomoixa I can see some 64KB bytecode fixed already in 2.2.1, at least it is stated in the release notes - https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420&version=12340470

jakubhava · 2017-12-14T17:21:47Z

@Tagar @axiomoixa I think it was caused by this change in Sparkling Water - https://0xdata.atlassian.net/browse/SW-499. It is a good change though.

Before, the BinaryType was just ignored ( no error thrown ), but right now it is properly handled and when we have for example an Array[Byte] in Spark DataFrame, it will now be expanded into a lots of new columns -> which is probably cause of the exception.

Could you please share the schema of the data you are converting @Tagar ? Or at least, share the information if any of the fields is BinaryType (or Array[Byte]) ?

Tagar · 2017-12-15T02:28:29Z

@jakubhava thank you for this information. Yes it seems to be a good improvement.

Before, the BinaryType was just ignored ( no error thrown ), but right now it is properly handled and when we have for example an Array[Byte] in Spark DataFrame, it will now be expanded into a lots of new columns -> which is probably cause of the exception.

What would be an example of such a datatype? Does it mean SW-499 creates an enum-like structure for categorical features? We use PySpark primarily. Array[Byte] is probably a string datatype in Spark world, or you mean a nested collection of elements?

Could you please share the schema of the data you are converting @Tagar ? Or at least, share the information if any of the fields is BinaryType (or Array[Byte]) ?

I will upload schema to H2O ticket https://support.h2o.ai/support/tickets/91559 if that's okay with you, as I can't share schema in public domain.

Tagar · 2017-12-15T04:32:04Z

@jakubhava I updated the H2O case with complete schema.
That dataframe has only 'double' and 'string' data types:

>>> print set(t for n,t in df.dtypes)
set(['double', 'string'])

So not sure where Array[Byte] or BinaryType is coming from?

Thank you.

jakubhava · 2017-12-15T07:51:45Z

@Tagar BinaryType is type used to represent Array[Byte].

If the dataframe has only simple type such as string and double then the change SW-499 does not affect this call, so still not sure why it was started behaving this

jakubhava · 2017-12-15T08:10:04Z

My last candidate is this change - #429 , particularly this line -

sparkling-water/core/src/main/scala/org/apache/spark/h2o/utils/H2OSchemaUtils.scala

Line 111 in 0fa5510

    
           df.sparkSession.createDataFrame(df.rdd, renamedColsWithoutDots(df.schema, substPattern))

During each conversion we call this newly in 2.2.3 to create a new dataframe with possibly renamed columns.

Spark however internally calls this method, with needsConversion set to true. It therefore creates the projection and then creates a Dataset out of the converted data. The project might be reason for triggering the exception above.

private[sql] def createDataFrame(
    rowRDD: RDD[Row],
    schema: StructType,
    needsConversion: Boolean) = {
  // TODO: use MutableProjection when rowRDD is another DataFrame and the applied
  // schema differs from the existing schema on any field data type.
  val catalystRows = if (needsConversion) {
    val encoder = RowEncoder(schema)
    rowRDD.map(encoder.toRow)
  } else {
    rowRDD.map{r: Row => InternalRow.fromSeq(r.toSeq)}
  }
  val logicalPlan = LogicalRDD(schema.toAttributes, catalystRows)(self)
  Dataset.ofRows(self, logicalPlan)
}

Kuba

jakubhava · 2017-12-15T08:36:05Z

@Tagar I think that this change https://github.com/h2oai/sparkling-water/pull/497/files might actually help in your case, however I still need to test it. If you know how to build sparkling water and want to give it a try as well, feel free to build it from this PR https://github.com/h2oai/sparkling-water/pull/497/files

jakubhava · 2017-12-18T21:12:45Z

Closing this issue as it is fixed by #497 .

However, please note that this is just optimisation of our code to not create additional dataframes/columns. The original issue still exist in Spark and can be reproduced on really large number of columns, however without upgrading Spark, there is currently not much we can do

Tagar · 2017-12-19T22:49:01Z

Users confirm this issue is fixed now.
So we're back to pre-upgrade state.

Also root cause - https://issues.apache.org/jira/browse/SPARK-18016 was fixed and committed to Spark 2.3 today.

Thank you a lot.

jakubhava · 2018-01-04T09:37:14Z

Hi @Tagar, new Sparkling Water release is out with this and also additional fixes

Tagar · 2018-01-04T16:42:00Z

Thank you @jakubhava! We will upgrade to 2.2.6 tonight.

jakubhava closed this as completed Dec 18, 2017

Tagar mentioned this issue Jan 2, 2018

Running Aggregator results in lost Spark/SW executors #519

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection has grown past JVM limit of 0xFFFF #485

org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection has grown past JVM limit of 0xFFFF #485

Tagar commented Dec 6, 2017

Tagar commented Dec 6, 2017

Tagar commented Dec 11, 2017

jakubhava commented Dec 12, 2017

mmalohlava commented Dec 12, 2017

Tagar commented Dec 12, 2017

axiomoixa commented Dec 13, 2017

Tagar commented Dec 13, 2017

jakubhava commented Dec 13, 2017

axiomoixa commented Dec 13, 2017 •

edited

jakubhava commented Dec 14, 2017

jakubhava commented Dec 14, 2017 •

edited

Tagar commented Dec 15, 2017

Tagar commented Dec 15, 2017

jakubhava commented Dec 15, 2017

jakubhava commented Dec 15, 2017

jakubhava commented Dec 15, 2017 •

edited

jakubhava commented Dec 18, 2017

Tagar commented Dec 19, 2017

jakubhava commented Jan 4, 2018

Tagar commented Jan 4, 2018

org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection has grown past JVM limit of 0xFFFF #485

org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection has grown past JVM limit of 0xFFFF #485

Comments

Tagar commented Dec 6, 2017

Tagar commented Dec 6, 2017

Tagar commented Dec 11, 2017

jakubhava commented Dec 12, 2017

mmalohlava commented Dec 12, 2017

Tagar commented Dec 12, 2017

axiomoixa commented Dec 13, 2017

Tagar commented Dec 13, 2017

jakubhava commented Dec 13, 2017

axiomoixa commented Dec 13, 2017 • edited

jakubhava commented Dec 14, 2017

jakubhava commented Dec 14, 2017 • edited

Tagar commented Dec 15, 2017

Tagar commented Dec 15, 2017

jakubhava commented Dec 15, 2017

jakubhava commented Dec 15, 2017

jakubhava commented Dec 15, 2017 • edited

jakubhava commented Dec 18, 2017

Tagar commented Dec 19, 2017

jakubhava commented Jan 4, 2018

Tagar commented Jan 4, 2018

axiomoixa commented Dec 13, 2017 •

edited

jakubhava commented Dec 14, 2017 •

edited

jakubhava commented Dec 15, 2017 •

edited