Spark4 support#17
Conversation
…rail false positive
| Returns an empty Scala immutable List (Nil), usable as Seq[_]. | ||
| Uses JavaConverters.toList() to produce an immutable.List rather than | ||
| a Stream, which is required for Py4J constructor/method lookup to succeed | ||
| across both Scala 2.12 (Spark 3.x) and Scala 2.13 (Spark 4+). |
There was a problem hiding this comment.
JavaConverters is deprecated since Scala 2.13 and removed in Scala 2.13+ in favor of scala.jdk.CollectionConverters. Since Spark 4.0 uses Scala 2.13, this will break on Spark 4. You need a compatibility shim that tries scala.jdk.CollectionConverters first and falls back to JavaConverters for Spark 3.x, or use a different approach entirely.
|
|
||
| def scala_map_to_dict(jvm, scala_map): | ||
| return dict(jvm.scala.collection.JavaConversions.mapAsJavaMap(scala_map)) | ||
| return dict(jvm.scala.collection.JavaConverters.mapAsJavaMapConverter(scala_map).asJava()) |
There was a problem hiding this comment.
Same issue: JavaConverters.mapAsJavaMapConverter is deprecated/removed in Scala 2.13 (Spark 4.0). This needs a try/except fallback to scala.jdk.CollectionConverters.
|
|
||
| def scala_map_to_java_map(jvm, scala_map): | ||
| return jvm.scala.collection.JavaConversions.mapAsJavaMap(scala_map) | ||
| return jvm.scala.collection.JavaConverters.mapAsJavaMapConverter(scala_map).asJava() |
There was a problem hiding this comment.
Same JavaConverters deprecation/removal issue for Spark 4.0 (Scala 2.13).
| """ | ||
| self._run_result = run | ||
| profile_map = self._jvm.scala.collection.JavaConversions.mapAsJavaMap(run.profiles()) # TODO from ScalaUtils | ||
| profile_map = self._jvm.scala.collection.JavaConverters.mapAsJavaMapConverter(run.profiles()).asJava() # TODO from ScalaUtils |
There was a problem hiding this comment.
Same JavaConverters issue — this will fail on Spark 4.0 (Scala 2.13). Should use the same compatibility approach as scala_utils.py.
| return jvm.scala.collection.JavaConverters.iterableAsScalaIterableConverter( | ||
| jvm.java.util.ArrayList() | ||
| ).asScala().toList() | ||
|
|
There was a problem hiding this comment.
jvm.java.util.ArrayList() creates an empty Java ArrayList, but iterableAsScalaIterableConverter(...).asScala().toList() goes through Iterable → List. This works, but the docstring says it returns Nil (immutable.Nil), which is technically scala.collection.immutable.Nil$. The actual return type depends on the Scala version. Consider just using jvm.scala.collection.immutable.Nil.toList() or jvm.scala.collection.immutable.List.empty() directly, which is simpler and avoids the JavaConverters dependency.
|
|
||
|
|
||
| SPARK_TO_DEEQU_COORD_MAPPING = { | ||
| "4.0": "com.amazon.deequ:deequ:2.0.14-spark-4.0", |
There was a problem hiding this comment.
The Deequ version 2.0.14-spark-4.0 is referenced here but doesn't appear to exist on Maven Central as of this review. Please verify this artifact is published before merging, or CI for the Spark 4.0 matrix entry will fail at runtime.
| - PYSPARK_VERSION: "3.5" | ||
| JAVA_VERSION: "11" | ||
| PYTHON_VERSION: "3.8" | ||
| - PYSPARK_VERSION: "4.0.0" |
There was a problem hiding this comment.
Spark 4.0 requires Python >=3.9, but the pyproject.toml still allows python = ">=3.8,<4". The CI matrix entry uses Python 3.9, which is fine, but there's no enforcement that Spark 4.0 won't be used with Python 3.8 by end users. Consider adding a runtime check or documenting this requirement.
|
|
||
| def _render(template_str, **kwargs): | ||
| """Render a prompt template safely. Converts {var} to $var for Template.safe_substitute | ||
| so untrusted content like PR bodies containing {braces} won't crash or leak.""" |
There was a problem hiding this comment.
_render converts all { to ${ which will break if the template contains literal ${ already (it becomes $${). Also, Template.safe_substitute uses $var or ${var} syntax, so converting {var} to ${var} will fail for any variable name containing special characters. Consider using a more robust approach or at least escaping $ signs in the template before conversion.
| self._repo = cfg.repo | ||
| self._timeout = cfg.github_api_timeout | ||
| self._dry_run = cfg.dry_run | ||
| self._repo_root = os.getenv("GITHUB_WORKSPACE", os.path.abspath(os.path.join(os.path.dirname(__file__), "..", ".."))) |
There was a problem hiding this comment.
os.path.abspath(os.path.join(os.path.dirname(__file__), '..', '..')) resolves to the repo root only when the script is at scripts/issue_bot/. If the package is installed or the working directory changes, this fallback will be wrong. The GITHUB_WORKSPACE env var should always be set in CI, but this could cause confusing failures in local testing.
| kwargs["outputConfig"] = { | ||
| "textFormat": { | ||
| "type": "json_schema", | ||
| "structure": {"jsonSchema": { |
There was a problem hiding this comment.
outputConfig with textFormat/json_schema is not a documented parameter for the Bedrock Converse API. The Converse API uses toolConfig or additionalModelRequestFields for structured output. This will likely raise a ParamValidationError or be silently ignored.
53ada86 to
ef2f117
Compare
Issue #, if available:
Description of changes:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.