Skip to content

Spark4 support#17

Open
sudsali wants to merge 6 commits into
masterfrom
spark4-support
Open

Spark4 support#17
sudsali wants to merge 6 commits into
masterfrom
spark4-support

Conversation

@sudsali

@sudsali sudsali commented Apr 22, 2026

Copy link
Copy Markdown
Owner

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Repository owner deleted a comment from github-actions Bot Apr 22, 2026
Repository owner deleted a comment from github-actions Bot Apr 22, 2026
Repository owner deleted a comment from github-actions Bot Apr 22, 2026
Repository owner deleted a comment from github-actions Bot Apr 23, 2026
Repository owner deleted a comment from github-actions Bot Apr 23, 2026
Repository owner deleted a comment from github-actions Bot Apr 23, 2026

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


Generated by AI (model: us.anthropic.claude-opus-4-6-v1, prompt: 55f69cb5) — may not be fully accurate. Reply if this doesn't help.

Comment thread pydeequ/scala_utils.py
Returns an empty Scala immutable List (Nil), usable as Seq[_].
Uses JavaConverters.toList() to produce an immutable.List rather than
a Stream, which is required for Py4J constructor/method lookup to succeed
across both Scala 2.12 (Spark 3.x) and Scala 2.13 (Spark 4+).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JavaConverters is deprecated since Scala 2.13 and removed in Scala 2.13+ in favor of scala.jdk.CollectionConverters. Since Spark 4.0 uses Scala 2.13, this will break on Spark 4. You need a compatibility shim that tries scala.jdk.CollectionConverters first and falls back to JavaConverters for Spark 3.x, or use a different approach entirely.

Comment thread pydeequ/scala_utils.py

def scala_map_to_dict(jvm, scala_map):
return dict(jvm.scala.collection.JavaConversions.mapAsJavaMap(scala_map))
return dict(jvm.scala.collection.JavaConverters.mapAsJavaMapConverter(scala_map).asJava())

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue: JavaConverters.mapAsJavaMapConverter is deprecated/removed in Scala 2.13 (Spark 4.0). This needs a try/except fallback to scala.jdk.CollectionConverters.

Comment thread pydeequ/scala_utils.py

def scala_map_to_java_map(jvm, scala_map):
return jvm.scala.collection.JavaConversions.mapAsJavaMap(scala_map)
return jvm.scala.collection.JavaConverters.mapAsJavaMapConverter(scala_map).asJava()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same JavaConverters deprecation/removal issue for Spark 4.0 (Scala 2.13).

Comment thread pydeequ/profiles.py
"""
self._run_result = run
profile_map = self._jvm.scala.collection.JavaConversions.mapAsJavaMap(run.profiles()) # TODO from ScalaUtils
profile_map = self._jvm.scala.collection.JavaConverters.mapAsJavaMapConverter(run.profiles()).asJava() # TODO from ScalaUtils

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same JavaConverters issue — this will fail on Spark 4.0 (Scala 2.13). Should use the same compatibility approach as scala_utils.py.

Comment thread pydeequ/scala_utils.py
return jvm.scala.collection.JavaConverters.iterableAsScalaIterableConverter(
jvm.java.util.ArrayList()
).asScala().toList()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jvm.java.util.ArrayList() creates an empty Java ArrayList, but iterableAsScalaIterableConverter(...).asScala().toList() goes through IterableList. This works, but the docstring says it returns Nil (immutable.Nil), which is technically scala.collection.immutable.Nil$. The actual return type depends on the Scala version. Consider just using jvm.scala.collection.immutable.Nil.toList() or jvm.scala.collection.immutable.List.empty() directly, which is simpler and avoids the JavaConverters dependency.

Comment thread pydeequ/configs.py


SPARK_TO_DEEQU_COORD_MAPPING = {
"4.0": "com.amazon.deequ:deequ:2.0.14-spark-4.0",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Deequ version 2.0.14-spark-4.0 is referenced here but doesn't appear to exist on Maven Central as of this review. Please verify this artifact is published before merging, or CI for the Spark 4.0 matrix entry will fail at runtime.

- PYSPARK_VERSION: "3.5"
JAVA_VERSION: "11"
PYTHON_VERSION: "3.8"
- PYSPARK_VERSION: "4.0.0"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spark 4.0 requires Python >=3.9, but the pyproject.toml still allows python = ">=3.8,<4". The CI matrix entry uses Python 3.9, which is fine, but there's no enforcement that Spark 4.0 won't be used with Python 3.8 by end users. Consider adding a runtime check or documenting this requirement.

Comment thread scripts/issue_bot/main.py

def _render(template_str, **kwargs):
"""Render a prompt template safely. Converts {var} to $var for Template.safe_substitute
so untrusted content like PR bodies containing {braces} won't crash or leak."""

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_render converts all { to ${ which will break if the template contains literal ${ already (it becomes $${). Also, Template.safe_substitute uses $var or ${var} syntax, so converting {var} to ${var} will fail for any variable name containing special characters. Consider using a more robust approach or at least escaping $ signs in the template before conversion.

self._repo = cfg.repo
self._timeout = cfg.github_api_timeout
self._dry_run = cfg.dry_run
self._repo_root = os.getenv("GITHUB_WORKSPACE", os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "..")))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os.path.abspath(os.path.join(os.path.dirname(__file__), '..', '..')) resolves to the repo root only when the script is at scripts/issue_bot/. If the package is installed or the working directory changes, this fallback will be wrong. The GITHUB_WORKSPACE env var should always be set in CI, but this could cause confusing failures in local testing.

kwargs["outputConfig"] = {
"textFormat": {
"type": "json_schema",
"structure": {"jsonSchema": {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outputConfig with textFormat/json_schema is not a documented parameter for the Bedrock Converse API. The Converse API uses toolConfig or additionalModelRequestFields for structured output. This will likely raise a ParamValidationError or be silently ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant