[FEATURE] Add Azure CI/CD action to aid with style guide enforcement …

…(type hints) (#4878) * [FEATURE] Introduce ParameterBuilder Evaluation Dependencies and Validation Dependencies 2022 03 23 66 (#4531) * [BUGFIX] Moves testing dependencies out of core reqs (#4522) * removing upper bound on mistune * remove deprecated depedencies * adds untracked dependency * adds untracked dependency * adds untracked dependency * moving dependencies * removes dependencies added to lite from core | adds missing dependencies Co-authored-by: Chetan Kini <chetan@superconductive.com> * [FEATURE] Convert Existing Self-Initializing Expectations to Make ExpectationConfigurationBuilder Self-Contained with its own validation_parameter_builder settings (#4547) * [MAINTENANCE] Don't return from validate configuration methods (#4545) * Add validate_configuration to 2 core Expectations that are passing all their tests * Comment out examples for expect_column_values_to_match_regex.py... its test_definitions JSON has many more tests * Update all validate_configuration methods to have type hints and return None * Update all doc snippet references that were effected * [DOCS] technical term tags connect to data cloud docs (#4414) * - Adds technical tags to all documents in the Connect to data: Cloud section of the docs. (Note, the term in the <WhereToRunCode /> imported component was tagged in a different PR.) - Some additional editing was done to bring documents in line with the documentation and how-to guide standards. * - Fixed extra </Tabs> and </TabItem> closures from prior commit to resolve conflicts with develop. * - Update to include technical term tags. (#4462) - Minor updates to correct formatting and spelling issues. * - Moved docs related to contributing integrations under contributing in the ToC (#4551) - Minor edit to title of "How to write integration documentation" to conform to ToC standards (not title cased unless containing a Technical Term). * - Adds new image files for the intro page (#4540) - Updates the image file link for the overview image on the intro page * [DOCS] clarifications on execution engines and scalability (#4539) * - DOC-184: Specify in the tutorial that Spark and SqlAlchemy are also supported Execution Engines. * - DOC-183: In the Execution Engine technical term page, list the class names for Execution Engines and specify that spark is supported as a scalable alternative to Pandas. * - DOC-182: In the connect to data: overview section for "configuring your datasource's execution engine" list the class names for execution engines. * [DOCS] technical terms for validate data advanced (#4535) * - add support for technical term tags. * - added technical term tags. - Updated how to validate data without a checkpoint to mention the replacement workflow and that additional documentation is forthcoming. - NOTE: how_to_implement_custom_notifications needs to be rewritten, and was not edited as part of this PR. * - added technical term tags. - Updated how to validate data without a checkpoint to mention the replacement workflow and that additional documentation is forthcoming. - NOTE: how_to_implement_custom_notifications needs to be rewritten, and was not edited as part of this PR. * [DOCS] technical terms for validate data actions docs (#4518) * - Edits to bring docs up to documentation and how-to guide standards. * - add technical term tags to documents. - minor formatting edits (technical terms missing capitalization, etc). * [MAINTENANCE] Rule-Based Profiler: Refactor utilities into appropriate modules/classes for better separation of concerns (#4553) * [MAINTENANCE] Refactor global `conftest` (#4534) * chore: use black directives to temporarily disable linting * chore: more black directives to temporarily disable linting * chore: finish remaining * refactor: start cleaning up conftest * refactor: more refactoring of conftest * refactor: even more refactoring of conftest * [FEATURE] Improve diagnostic checklist details (#4548) * Update library_metadata check to provide details when it doesn't pass * In linting check, if snake_case doesn't match filename, show computed snake_case * Change class name for expect_column_values_to_be_valid_ipv4.py and remove package attr * Update convert_to_json_serializable to handle bytes * Update build_gallery.py script to convert diagnostics to JSON in separate try/except * Update build_gallery.py script to write expectation_library_v2.json file with indenting * Update _check_input_validation to tell if custom assert statements are used in validate_configuration * clean up (#4554) * minor touch up (#4558) * [MAINTENANCE] Refactor Anonymizer utilizing the Strategy design pattern (#4485) * feat: init commit * refactor: shift all logic over to base class * feat: start impl of anonymize on Anonymizer * feat: get ProfilerRunAnonymizer working * refactor: remove constructor from ProfilerRunAnonymizer * refactor: start on CheckpointRunAnonymizer * fix: clean up broken checkpoint tests * fix: ensure *args and **kwargs are propogated through * refactor: start work on datasource anonymizers * refactor: remove all anonymizers except Anonymizer from usage stats attrs * fix: update isinstance checks * refactor: move helper into checkpoint_run_anonymizer * refactor: move helper into datasource_anonymizer * refactor: make anonymize string private and place in strategy * refactor: make anonymize batch info private and place in strategy * refactor: move build_init_payload to Anonymizer * refactor: make remainder of anonymize methods private * refactor: add store info to strategy * refactor: add dataconnector info to strategy * refactor: consolidate profiler info and profiler run anonymization * refactor: remove *args from signatures * refactor: updates around checkpoint anonymization * chore: misc cleanup of Anonymizer * feat: final touch up before review * chore: remove 'else' statements * fix: ensure appropriate checkpoint method gets called * chore: misc updates from review * refactor: move init_payload back to usage stats * chore: misc type hinting * refactor: start using individual classes again * chore: continue updating individual anonymizer classes * feat: further updates to child classes * feat: update anonymize_init_payload * fix: get checkpoint payloads working * refactor: ensure all methods have obj * fix: misc fixes * fix: make misc updates to conditional checks for obj * refactor: rename ExpectationAnonymizer to ExpectationSuiteAnonymizer * refactor: rename Checkpoint and Profiler anonymizers * feat: leverage aggregate anonymizer downstream * feature: conditionally create aggregate_anonymizer in constructor * feat: add cache retrieve or instantiate util * chore: add batch_request can_handle * feat: ensure that salt has a default value in anonymizers * refactor: require aggregate anonymizer in constructor * refactor: instantiate all strategies in aggregate * fix: fix broken tests * refactor: rename internal getter Co-authored-by: Don Heppner <donald.heppner@gmail.com> * [MAINTENANCE] Remove duplicate mistune dependency * [MAINTENANCE] Run PEP-273 checks on a schedule or release cut * [DOCS] correct code reference line numbers and snippet tags for how to create a batch of data from an in memory data frame (#4573) * -Corrected the line references and added <snippet> tags to source code for Spark version of guide. * -Corrected the line references and added <snippet> tags to source code for Pandas version of guide. * -lint reformat w/black * -correcting line numbers after lint formatting. * [MAINTENANCE] Package dependencies usage stats instrumentation - part 1 (#4546) Usage stats instrumentation of package dependencies * [MAINTENANCE] Add DevRel team to GitHub auto-label action * [MAINTENANCE] Add GitHub action to conditionally auto-update PR's (#4574) * feat: add new action * chore: add conditions * [MAINTENANCE] Bump version of `black` in response to hotfix for Click v8.1.0 (#4577) * chore: bump version * chore: test change * chore: update all instances of black * chore: new test changes * chore: revert test changes * Update overview.md (#4556) * Add missing links. * Fix some typos * Simplify flow and grammar in a few places Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com> * - corrected broken link in admonition box. (#4585) - updated links in admonition box to point to current technical documentation rather than old core concepts documents. * [MAINTENANCE] Minor clean-up (#4571) Little bit of cleanup in our execution engine and validator * [BUGFIX] Adjust output of datetime `ParameterBuilder` to match Expectation (#4590) * fix : misconfigured ExpectationConfigurationBuilder * pushing fix * clean up before submitting for review * bugfix : remove sorting * remove extra line * [MAINTENANCE] Instrument package dependencies (#4583) * Add dependencies to data_context.__init__ event * [MAINTENANCE] Standardize DomainBuilder Constructor Arguments Ordering (#4599) * release candidate for 0.14.13 * feat: add script * feat: add action * feat: update condition * feat: update action * feat: integrate script * chore: ensure we checkout * feat: misc updates * feat: misc updates * chore: update * chore: update * test: add docstrings * chore: revert test changes * feat: update * feat: update * feat: update * feat: update * feat: update * feat: update * feat: update * feat: update * feat: update * feat: update * feat: update * feat: general cleanup and tightening of script * chore: add docstring checker to pipeline * docs: add docstrings and type hints * revert to 0.14.12 state * chore: misc fixes after review * feat: write script * chore: remove files from other branch * refactor: port over script to Python * chore: update threshold * chore: misc cleanup * chore: misc cleanup Co-authored-by: Alex Sherstinsky <alexsherstinsky@users.noreply.github.com> Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com> Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com> Co-authored-by: Rachel-Reverie <94694058+Rachel-Reverie@users.noreply.github.com> Co-authored-by: Don Heppner <donald.heppner@gmail.com> Co-authored-by: Anthony Burdi <anthony@superconductive.com> Co-authored-by: Abe Gong <abegong@users.noreply.github.com> Co-authored-by: William Shin <will@superconductive.com> Co-authored-by: Ben Horkley <horkley@superconductive.com>
great-expectations · Apr 19, 2022 · 00b9a77 · 00b9a77
1 parent eb13884
commit 00b9a77
Show file tree

Hide file tree

Showing 2 changed files with 128 additions and 0 deletions.
diff --git a/azure-pipelines-dependency-graph-testing.yml b/azure-pipelines-dependency-graph-testing.yml
@@ -104,7 +104,15 @@ stages:
     dependsOn: scope_check
     pool:
       vmImage: 'ubuntu-latest'
+
     jobs:
+    - job: type_hint_checker
+      steps:
+      - script: |
+          pip install mypy # Prereq for type hint script
+          python scripts/check_type_hint_coverage.py
+        name: TypeHintChecker
+
     - job: docstring_checker
       steps:
       - bash: python scripts/check_docstring_coverage.py
@@ -114,6 +122,7 @@ stages:
     dependsOn: [lint]
     pool:
       vmImage: 'ubuntu-18.04'
+
     jobs:
       - job: import_ge
 

diff --git a/scripts/check_type_hint_coverage.py b/scripts/check_type_hint_coverage.py
@@ -0,0 +1,119 @@
+import subprocess
+from collections import defaultdict
+from typing import Dict, List, Optional
+
+TYPE_HINT_ERROR_THRESHOLD: int = (
+    2867  # This number is to be reduced as we annotate more functions!
+)
+
+
+def get_changed_files(branch: str) -> List[str]:
+    """Perform a `git diff` against a given branch.
+
+    Args:
+        branch (str): The branch to diff against (generally `origin/develop`)
+
+    Returns:
+        A list of changed files.
+    """
+    git_diff: subprocess.CompletedProcess = subprocess.run(
+        ["git", "diff", branch, "--name-only"],
+        stdout=subprocess.PIPE,
+        stderr=subprocess.PIPE,
+        universal_newlines=True,
+    )
+    return [f for f in git_diff.stdout.split()]
+
+
+def run_mypy(directory: str) -> List[str]:
+    """Run mypy to identify functions with type hint violations.
+
+    Flags:
+        --ignore-missing-imports: Omitting for simplicity's sake (https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports)
+        --disallow-untyped-defs:  What is responsible for highlighting function signature errors
+        --show-error-codes:       Allows us to label each error with its code, enabling filtering
+        --install-types:          We need common type hints from typeshed to get a more thorough analysis
+        --non-interactive:        Automatically say yes to '--install-types' prompt
+
+    Args:
+        directory (str): The target directory to run mypy against
+
+    Returns:
+        A list containing filtered mypy output relevant to function signatures
+    """
+    raw_results: subprocess.CompletedProcess = subprocess.run(
+        [
+            "mypy",
+            "--ignore-missing-imports",
+            "--disallow-untyped-defs",
+            "--show-error-codes",
+            "--install-types",
+            "--non-interactive",
+            directory,
+        ],
+        stdout=subprocess.PIPE,
+        stderr=subprocess.PIPE,
+        universal_newlines=True,
+    )
+
+    filtered_results: List[str] = _filter_mypy_results(raw_results)
+    return filtered_results
+
+
+def _filter_mypy_results(raw_results: subprocess.CompletedProcess) -> List[str]:
+    def _filter(line: str) -> bool:
+        return "error:" in line and "untyped-def" in line
+
+    return list(filter(lambda line: _filter(line), raw_results.stderr.split("\n")))
+
+
+def render_deviations(changed_files: List[str], deviations: List[str]) -> None:
+    """Iterates through changed files in order to provide the user with useful feedback around mypy type hint violations
+
+    Args:
+        changed_files (List[str]): The files relevant to the given commit/PR
+        deviations (List[str]): mypy deviations as generated by `run_mypy`
+
+    Raises:
+        AssertionError if number of style guide violations is higher than threshold
+    """
+    deviations_dict: Dict[str, List[str]] = _build_deviations_dict(deviations)
+
+    error_count: int = len(deviations)
+    print(f"[SUMMARY] {error_count} functions have untyped-def violations!")
+
+    threshold_is_surpassed: bool = error_count > TYPE_HINT_ERROR_THRESHOLD
+
+    if threshold_is_surpassed:
+        print(
+            "\nHere are violations of the style guide that are relevant to the files changed in your PR:"
+        )
+        for file in changed_files:
+            errors: Optional[List[str]] = deviations_dict.get(file)
+            if errors:
+                print(f"\n  {file}:")
+                for error in errors:
+                    print(f"    {error}")
+
+    # Chetan - 20220417 - While this number should be 0, getting the number of style guide violations down takes time
+    # and effort. In the meanwhile, we want to set an upper bound on errors to ensure we're not introducing
+    # further regressions. As functions are annotated in adherence with style guide standards, developers should update this number.
+    assert (
+        threshold_is_surpassed is False
+    ), f"""A function without proper type annotations was introduced; please resolve the matter before merging.
+                We expect there to be {TYPE_HINT_ERROR_THRESHOLD} or fewer violations of the style guide (actual: {error_count})"""
+
+
+def _build_deviations_dict(mypy_results: List[str]) -> Dict[str, List[str]]:
+    deviations_dict: Dict[str, List[str]] = defaultdict(list)
+    for row in mypy_results:
+        file: str = row.split(":")[0]
+        deviations_dict[file].append(row)
+
+    return deviations_dict
+
+
+if __name__ == "__main__":
+    changed_files: List[str] = get_changed_files("origin/develop")
+    untyped_def_deviations: List[str] = run_mypy("great_expectations")
+    render_deviations(changed_files, untyped_def_deviations)