Skip to content

Commit

Permalink
[SPARK-42253][PYTHON] Add test for detecting duplicated error class
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?

This PR proposes to add test for detecting duplicated name of error classes to keep the error class unique.

### Why are the changes needed?

The name of error class should be unique, so we should check if it's duplicated or not.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually test in case `COLUMN_IN_LIST` is duplicated as below:
```shell
======================================================================
FAIL [0.006s]: test_error_classes_duplicated (pyspark.errors.tests.test_errors.ErrorsTest)
----------------------------------------------------------------------
Traceback (most recent call last):
...
AssertionError: False is not true : Duplicate error class: COLUMN_IN_LIST

----------------------------------------------------------------------
Ran 2 tests in 0.007s

FAILED (failures=1)
```

Closes apache#39821 from itholic/SPARK-42253.

Lead-authored-by: itholic <haejoon.lee@databricks.com>
Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 4d37e78)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
  • Loading branch information
itholic and HyukjinKwon committed Feb 1, 2023
1 parent 94a6f2a commit fa0e1af
Show file tree
Hide file tree
Showing 2 changed files with 14 additions and 6 deletions.
5 changes: 0 additions & 5 deletions python/pyspark/errors/error_classes.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,11 +59,6 @@
"Argument `<arg_name>` must be a DataFrame, got <arg_type>."
]
},
"NOT_A_DATAFRAME" : {
"message" : [
"Argument `<arg_name>` should be a DataFrame, got <arg_type>."
]
},
"NOT_A_DICT" : {
"message" : [
"Argument `<arg_name>` should be a dict, got <arg_type>."
Expand Down
15 changes: 14 additions & 1 deletion python/pyspark/errors/tests/test_errors.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,15 @@
# limitations under the License.
#

import json
import unittest

from pyspark.errors.error_classes import ERROR_CLASSES_JSON
from pyspark.errors.utils import ErrorClassesReader


class ErrorsTest(unittest.TestCase):
def test_error_classes(self):
def test_error_classes_sorted(self):
# Test error classes is sorted alphabetically
error_reader = ErrorClassesReader()
error_class_names = list(error_reader.error_info_map.keys())
Expand All @@ -33,6 +35,17 @@ def test_error_classes(self):
f"after [{error_class_names[i + 1]}]",
)

def test_error_classes_duplicated(self):
# Test error classes is not duplicated
def detect_duplication(pairs):
error_classes_json = {}
for name, message in pairs:
self.assertTrue(name not in error_classes_json, f"Duplicate error class: {name}")
error_classes_json[name] = message
return error_classes_json

json.loads(ERROR_CLASSES_JSON, object_pairs_hook=detect_duplication)


if __name__ == "__main__":
import unittest
Expand Down

0 comments on commit fa0e1af

Please sign in to comment.