Spark4 support by sudsali · Pull Request #17 · sudsali/python-deequ

sudsali · 2026-04-22T22:42:38Z

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

…d output, tests

… tests

…rail false positive

github-actions

Generated by AI (model: us.anthropic.claude-opus-4-6-v1, prompt: 55f69cb5) — may not be fully accurate. Reply if this doesn't help.

github-actions · 2026-04-23T15:19:43Z

+    Returns an empty Scala immutable List (Nil), usable as Seq[_].
+    Uses JavaConverters.toList() to produce an immutable.List rather than
+    a Stream, which is required for Py4J constructor/method lookup to succeed
+    across both Scala 2.12 (Spark 3.x) and Scala 2.13 (Spark 4+).


JavaConverters is deprecated since Scala 2.13 and removed in Scala 2.13+ in favor of scala.jdk.CollectionConverters. Since Spark 4.0 uses Scala 2.13, this will break on Spark 4. You need a compatibility shim that tries scala.jdk.CollectionConverters first and falls back to JavaConverters for Spark 3.x, or use a different approach entirely.

github-actions · 2026-04-23T15:19:43Z


 def scala_map_to_dict(jvm, scala_map):
-    return dict(jvm.scala.collection.JavaConversions.mapAsJavaMap(scala_map))
+    return dict(jvm.scala.collection.JavaConverters.mapAsJavaMapConverter(scala_map).asJava())


Same issue: JavaConverters.mapAsJavaMapConverter is deprecated/removed in Scala 2.13 (Spark 4.0). This needs a try/except fallback to scala.jdk.CollectionConverters.

github-actions · 2026-04-23T15:19:43Z


 def scala_map_to_java_map(jvm, scala_map):
-    return jvm.scala.collection.JavaConversions.mapAsJavaMap(scala_map)
+    return jvm.scala.collection.JavaConverters.mapAsJavaMapConverter(scala_map).asJava()


Same JavaConverters deprecation/removal issue for Spark 4.0 (Scala 2.13).

github-actions · 2026-04-23T15:19:43Z

        """
        self._run_result = run
-        profile_map = self._jvm.scala.collection.JavaConversions.mapAsJavaMap(run.profiles())  # TODO from ScalaUtils
+        profile_map = self._jvm.scala.collection.JavaConverters.mapAsJavaMapConverter(run.profiles()).asJava()  # TODO from ScalaUtils


Same JavaConverters issue — this will fail on Spark 4.0 (Scala 2.13). Should use the same compatibility approach as scala_utils.py.

github-actions · 2026-04-23T15:19:43Z

+    return jvm.scala.collection.JavaConverters.iterableAsScalaIterableConverter(
+        jvm.java.util.ArrayList()
+    ).asScala().toList()



jvm.java.util.ArrayList() creates an empty Java ArrayList, but iterableAsScalaIterableConverter(...).asScala().toList() goes through Iterable → List. This works, but the docstring says it returns Nil (immutable.Nil), which is technically scala.collection.immutable.Nil$. The actual return type depends on the Scala version. Consider just using jvm.scala.collection.immutable.Nil.toList() or jvm.scala.collection.immutable.List.empty() directly, which is simpler and avoids the JavaConverters dependency.

github-actions · 2026-04-23T15:19:43Z



 SPARK_TO_DEEQU_COORD_MAPPING = {
+    "4.0": "com.amazon.deequ:deequ:2.0.14-spark-4.0",


The Deequ version 2.0.14-spark-4.0 is referenced here but doesn't appear to exist on Maven Central as of this review. Please verify this artifact is published before merging, or CI for the Spark 4.0 matrix entry will fail at runtime.

github-actions · 2026-04-23T15:19:43Z

+          - PYSPARK_VERSION: "3.5"
+            JAVA_VERSION: "11"
+            PYTHON_VERSION: "3.8"
+          - PYSPARK_VERSION: "4.0.0"


Spark 4.0 requires Python >=3.9, but the pyproject.toml still allows python = ">=3.8,<4". The CI matrix entry uses Python 3.9, which is fine, but there's no enforcement that Spark 4.0 won't be used with Python 3.8 by end users. Consider adding a runtime check or documenting this requirement.

github-actions · 2026-04-23T15:19:43Z

+
+def _render(template_str, **kwargs):
+    """Render a prompt template safely. Converts {var} to $var for Template.safe_substitute
+    so untrusted content like PR bodies containing {braces} won't crash or leak."""


_render converts all { to ${ which will break if the template contains literal ${ already (it becomes $${). Also, Template.safe_substitute uses $var or ${var} syntax, so converting {var} to ${var} will fail for any variable name containing special characters. Consider using a more robust approach or at least escaping $ signs in the template before conversion.

github-actions · 2026-04-23T15:19:43Z

+        self._repo = cfg.repo
+        self._timeout = cfg.github_api_timeout
+        self._dry_run = cfg.dry_run
+        self._repo_root = os.getenv("GITHUB_WORKSPACE", os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "..")))


os.path.abspath(os.path.join(os.path.dirname(__file__), '..', '..')) resolves to the repo root only when the script is at scripts/issue_bot/. If the package is installed or the working directory changes, this fallback will be wrong. The GITHUB_WORKSPACE env var should always be set in CI, but this could cause confusing failures in local testing.

github-actions · 2026-04-23T15:19:43Z

+                kwargs["outputConfig"] = {
+                    "textFormat": {
+                        "type": "json_schema",
+                        "structure": {"jsonSchema": {


outputConfig with textFormat/json_schema is not a documented parameter for the Bedrock Converse API. The Converse API uses toolConfig or additionalModelRequestFields for structured output. This will likely raise a ParamValidationError or be silently ignored.

sudsali added 6 commits April 20, 2026 19:04

feat: Add AI-powered issue triage and PR review bot

5c34b18

fix: Address PR review feedback - Converse API, guardrails, structure…

1c54177

…d output, tests

fix: Address PR review - Converse API, guardrails, structured output,…

6ec2ebd

… tests

fix: Split prompt into trusted system/untrusted user to prevent guard…

256124c

…rail false positive

fix: Edge case fixes - double-backtick prevention, label type safety

89e4e41

Add Spark 4.0 support via deequ:2.0.14-spark-4.0

80897c9

Repository owner deleted a comment from github-actions Bot Apr 22, 2026

Repository owner deleted a comment from github-actions Bot Apr 23, 2026

github-actions Bot requested changes Apr 23, 2026

View reviewed changes

sudsali force-pushed the master branch 2 times, most recently from 53ada86 to ef2f117 Compare April 23, 2026 16:25

sudsali mentioned this pull request Apr 27, 2026

feat: Add AI-powered issue triage and PR review bot awslabs/python-deequ#260

Merged

sudsali force-pushed the master branch from f9daad0 to e669a68 Compare May 6, 2026 15:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark4 support#17

Spark4 support#17
sudsali wants to merge 6 commits into
masterfrom
spark4-support

sudsali commented Apr 22, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot Apr 23, 2026

Uh oh!

github-actions Bot Apr 23, 2026

Uh oh!

github-actions Bot Apr 23, 2026

Uh oh!

github-actions Bot Apr 23, 2026

Uh oh!

github-actions Bot Apr 23, 2026

Uh oh!

github-actions Bot Apr 23, 2026

Uh oh!

github-actions Bot Apr 23, 2026

Uh oh!

github-actions Bot Apr 23, 2026

Uh oh!

github-actions Bot Apr 23, 2026

Uh oh!

github-actions Bot Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant



		SPARK_TO_DEEQU_COORD_MAPPING = {
		"4.0": "com.amazon.deequ:deequ:2.0.14-spark-4.0",

Conversation

sudsali commented Apr 22, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant