Skip to content

Commit

Permalink
[FEATURE] Add Azure CI/CD action to aid with style guide enforcement …
Browse files Browse the repository at this point in the history
…(type hints) (#4878)

* [FEATURE] Introduce ParameterBuilder Evaluation Dependencies and Validation Dependencies 2022 03 23 66 (#4531)

* [BUGFIX] Moves testing dependencies out of core reqs (#4522)

* removing upper bound on mistune

* remove deprecated depedencies

* adds untracked dependency

* adds untracked dependency

* adds untracked dependency

* moving dependencies

* removes dependencies added to lite from core | adds missing dependencies

Co-authored-by: Chetan Kini <chetan@superconductive.com>

* [FEATURE] Convert Existing Self-Initializing Expectations to Make ExpectationConfigurationBuilder Self-Contained with its own validation_parameter_builder settings (#4547)

* [MAINTENANCE] Don't return from validate configuration methods (#4545)

* Add validate_configuration to 2 core Expectations that are passing all their tests

* Comment out examples for expect_column_values_to_match_regex.py... its test_definitions JSON has many more tests

* Update all validate_configuration methods to have type hints and return None

* Update all doc snippet references that were effected

* [DOCS] technical term tags connect to data cloud docs (#4414)

* - Adds technical tags to all documents in the Connect to data: Cloud section of the docs. (Note, the term in the <WhereToRunCode /> imported component was tagged in a different PR.)
- Some additional editing was done to bring documents in line with the documentation and how-to guide standards.

* - Fixed extra </Tabs> and </TabItem> closures from prior commit to resolve conflicts with develop.

* - Update to include technical term tags. (#4462)

- Minor updates to correct formatting and spelling issues.

* - Moved docs related to contributing integrations under contributing in the ToC (#4551)

- Minor edit to title of "How to write integration documentation" to conform to ToC standards (not title cased unless containing a Technical Term).

* - Adds new image files for the intro page (#4540)

- Updates the image file link for the overview image on the intro page

* [DOCS] clarifications on execution engines and scalability (#4539)

* - DOC-184: Specify in the tutorial that Spark and SqlAlchemy are also supported Execution Engines.

* - DOC-183: In the Execution Engine technical term page, list the class names for Execution Engines and specify that spark is supported as a scalable alternative to Pandas.

* - DOC-182: In the connect to data: overview section for "configuring your datasource's execution engine" list the class names for execution engines.

* [DOCS] technical terms for validate data advanced (#4535)

* - add support for technical term tags.

* - added technical term tags.
- Updated how to validate data without a checkpoint to mention the replacement workflow and that additional documentation is forthcoming.
- NOTE: how_to_implement_custom_notifications needs to be rewritten, and was not edited as part of this PR.

* - added technical term tags.
- Updated how to validate data without a checkpoint to mention the replacement workflow and that additional documentation is forthcoming.
- NOTE: how_to_implement_custom_notifications needs to be rewritten, and was not edited as part of this PR.

* [DOCS] technical terms for validate data actions docs (#4518)

* - Edits to bring docs up to documentation and how-to guide standards.

* - add technical term tags to documents.
- minor formatting edits (technical terms missing capitalization, etc).

* [MAINTENANCE] Rule-Based Profiler: Refactor utilities into appropriate modules/classes for better separation of concerns (#4553)

* [MAINTENANCE] Refactor global `conftest` (#4534)

* chore: use black directives to temporarily disable linting

* chore: more black directives to temporarily disable linting

* chore: finish remaining

* refactor: start cleaning up conftest

* refactor: more refactoring of conftest

* refactor: even more refactoring of conftest

* [FEATURE] Improve diagnostic checklist details (#4548)

* Update library_metadata check to provide details when it doesn't pass

* In linting check, if snake_case doesn't match filename, show computed snake_case

* Change class name for expect_column_values_to_be_valid_ipv4.py and remove package attr

* Update convert_to_json_serializable to handle bytes

* Update build_gallery.py script to convert diagnostics to JSON in separate try/except

* Update build_gallery.py script to write expectation_library_v2.json file with indenting

* Update _check_input_validation to tell if custom assert statements are used in validate_configuration

* clean up (#4554)

* minor touch up (#4558)

* [MAINTENANCE] Refactor Anonymizer utilizing the Strategy design pattern (#4485)

* feat: init commit

* refactor: shift all logic over to base class

* feat: start impl of anonymize on Anonymizer

* feat: get ProfilerRunAnonymizer working

* refactor: remove constructor from ProfilerRunAnonymizer

* refactor: start on CheckpointRunAnonymizer

* fix: clean up broken checkpoint tests

* fix: ensure *args and **kwargs are propogated through

* refactor: start work on datasource anonymizers

* refactor: remove all anonymizers except Anonymizer from usage stats attrs

* fix: update isinstance checks

* refactor: move helper into checkpoint_run_anonymizer

* refactor: move helper into datasource_anonymizer

* refactor: make anonymize string private and place in strategy

* refactor: make anonymize batch info private and place in strategy

* refactor: move build_init_payload to Anonymizer

* refactor: make remainder of anonymize methods private

* refactor: add store info to strategy

* refactor: add dataconnector info to strategy

* refactor: consolidate profiler info and profiler run anonymization

* refactor: remove *args from signatures

* refactor: updates around checkpoint anonymization

* chore: misc cleanup of Anonymizer

* feat: final touch up before review

* chore: remove 'else' statements

* fix: ensure appropriate checkpoint method gets called

* chore: misc updates from review

* refactor: move init_payload back to usage stats

* chore: misc type hinting

* refactor: start using individual classes again

* chore: continue updating individual anonymizer classes

* feat: further updates to child classes

* feat: update anonymize_init_payload

* fix: get checkpoint payloads working

* refactor: ensure all methods have obj

* fix: misc fixes

* fix: make misc updates to conditional checks for obj

* refactor: rename ExpectationAnonymizer to ExpectationSuiteAnonymizer

* refactor: rename Checkpoint and Profiler anonymizers

* feat: leverage aggregate anonymizer downstream

* feature: conditionally create aggregate_anonymizer in constructor

* feat: add cache retrieve or instantiate util

* chore: add batch_request can_handle

* feat: ensure that salt has a default value in anonymizers

* refactor: require aggregate anonymizer in constructor

* refactor: instantiate all strategies in aggregate

* fix: fix broken tests

* refactor: rename internal getter

Co-authored-by: Don Heppner <donald.heppner@gmail.com>

* [MAINTENANCE] Remove duplicate mistune dependency

* [MAINTENANCE] Run PEP-273 checks on a schedule or release cut

* [DOCS] correct code reference line numbers and snippet tags for how to create a batch of data from an in memory data frame (#4573)

* -Corrected the line references and added <snippet> tags to source code for Spark version of guide.

* -Corrected the line references and added <snippet> tags to source code for Pandas version of guide.

* -lint reformat w/black

* -correcting line numbers after lint formatting.

* [MAINTENANCE] Package dependencies usage stats instrumentation - part 1 (#4546)

Usage stats instrumentation of package dependencies

* [MAINTENANCE] Add DevRel team to GitHub auto-label action

* [MAINTENANCE] Add GitHub action to conditionally auto-update PR's  (#4574)

* feat: add new action

* chore: add conditions

* [MAINTENANCE] Bump version of `black` in response to hotfix for Click v8.1.0 (#4577)

* chore: bump version

* chore: test change

* chore: update all instances of black

* chore: new test changes

* chore: revert test changes

* Update overview.md (#4556)

* Add missing links.
* Fix some typos
* Simplify flow and grammar in a few places

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* - corrected broken link in admonition box. (#4585)

- updated links in admonition box to point to current technical documentation rather than old core concepts documents.

* [MAINTENANCE] Minor clean-up (#4571)

Little bit of cleanup in our execution engine and validator

* [BUGFIX] Adjust output of datetime `ParameterBuilder` to match Expectation (#4590)

* fix : misconfigured ExpectationConfigurationBuilder

* pushing fix

* clean up before submitting for review

* bugfix : remove sorting

* remove extra line

* [MAINTENANCE] Instrument package dependencies (#4583)

* Add dependencies to data_context.__init__ event

* [MAINTENANCE] Standardize DomainBuilder Constructor Arguments Ordering (#4599)

* release candidate for 0.14.13

* feat: add script

* feat: add action

* feat: update condition

* feat: update action

* feat: integrate script

* chore: ensure we checkout

* feat: misc updates

* feat: misc updates

* chore: update

* chore: update

* test: add docstrings

* chore: revert test changes

* feat: update

* feat: update

* feat: update

* feat: update

* feat: update

* feat: update

* feat: update

* feat: update

* feat: update

* feat: update

* feat: update

* feat: general cleanup and tightening of script

* chore: add docstring checker to pipeline

* docs: add docstrings and type hints

* revert to 0.14.12 state

* chore: misc fixes after review

* feat: write script

* chore: remove files from other branch

* refactor: port over script to Python

* chore: update threshold

* chore: misc cleanup

* chore: misc cleanup

Co-authored-by: Alex Sherstinsky <alexsherstinsky@users.noreply.github.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>
Co-authored-by: Rachel-Reverie <94694058+Rachel-Reverie@users.noreply.github.com>
Co-authored-by: Don Heppner <donald.heppner@gmail.com>
Co-authored-by: Anthony Burdi <anthony@superconductive.com>
Co-authored-by: Abe Gong <abegong@users.noreply.github.com>
Co-authored-by: William Shin <will@superconductive.com>
Co-authored-by: Ben Horkley <horkley@superconductive.com>
  • Loading branch information
10 people committed Apr 19, 2022
1 parent eb13884 commit 00b9a77
Show file tree
Hide file tree
Showing 2 changed files with 128 additions and 0 deletions.
9 changes: 9 additions & 0 deletions azure-pipelines-dependency-graph-testing.yml
Expand Up @@ -104,7 +104,15 @@ stages:
dependsOn: scope_check
pool:
vmImage: 'ubuntu-latest'

jobs:
- job: type_hint_checker
steps:
- script: |
pip install mypy # Prereq for type hint script
python scripts/check_type_hint_coverage.py
name: TypeHintChecker
- job: docstring_checker
steps:
- bash: python scripts/check_docstring_coverage.py
Expand All @@ -114,6 +122,7 @@ stages:
dependsOn: [lint]
pool:
vmImage: 'ubuntu-18.04'

jobs:
- job: import_ge

Expand Down
119 changes: 119 additions & 0 deletions scripts/check_type_hint_coverage.py
@@ -0,0 +1,119 @@
import subprocess
from collections import defaultdict
from typing import Dict, List, Optional

TYPE_HINT_ERROR_THRESHOLD: int = (
2867 # This number is to be reduced as we annotate more functions!
)


def get_changed_files(branch: str) -> List[str]:
"""Perform a `git diff` against a given branch.
Args:
branch (str): The branch to diff against (generally `origin/develop`)
Returns:
A list of changed files.
"""
git_diff: subprocess.CompletedProcess = subprocess.run(
["git", "diff", branch, "--name-only"],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
universal_newlines=True,
)
return [f for f in git_diff.stdout.split()]


def run_mypy(directory: str) -> List[str]:
"""Run mypy to identify functions with type hint violations.
Flags:
--ignore-missing-imports: Omitting for simplicity's sake (https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports)
--disallow-untyped-defs: What is responsible for highlighting function signature errors
--show-error-codes: Allows us to label each error with its code, enabling filtering
--install-types: We need common type hints from typeshed to get a more thorough analysis
--non-interactive: Automatically say yes to '--install-types' prompt
Args:
directory (str): The target directory to run mypy against
Returns:
A list containing filtered mypy output relevant to function signatures
"""
raw_results: subprocess.CompletedProcess = subprocess.run(
[
"mypy",
"--ignore-missing-imports",
"--disallow-untyped-defs",
"--show-error-codes",
"--install-types",
"--non-interactive",
directory,
],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
universal_newlines=True,
)

filtered_results: List[str] = _filter_mypy_results(raw_results)
return filtered_results


def _filter_mypy_results(raw_results: subprocess.CompletedProcess) -> List[str]:
def _filter(line: str) -> bool:
return "error:" in line and "untyped-def" in line

return list(filter(lambda line: _filter(line), raw_results.stderr.split("\n")))


def render_deviations(changed_files: List[str], deviations: List[str]) -> None:
"""Iterates through changed files in order to provide the user with useful feedback around mypy type hint violations
Args:
changed_files (List[str]): The files relevant to the given commit/PR
deviations (List[str]): mypy deviations as generated by `run_mypy`
Raises:
AssertionError if number of style guide violations is higher than threshold
"""
deviations_dict: Dict[str, List[str]] = _build_deviations_dict(deviations)

error_count: int = len(deviations)
print(f"[SUMMARY] {error_count} functions have untyped-def violations!")

threshold_is_surpassed: bool = error_count > TYPE_HINT_ERROR_THRESHOLD

if threshold_is_surpassed:
print(
"\nHere are violations of the style guide that are relevant to the files changed in your PR:"
)
for file in changed_files:
errors: Optional[List[str]] = deviations_dict.get(file)
if errors:
print(f"\n {file}:")
for error in errors:
print(f" {error}")

# Chetan - 20220417 - While this number should be 0, getting the number of style guide violations down takes time
# and effort. In the meanwhile, we want to set an upper bound on errors to ensure we're not introducing
# further regressions. As functions are annotated in adherence with style guide standards, developers should update this number.
assert (
threshold_is_surpassed is False
), f"""A function without proper type annotations was introduced; please resolve the matter before merging.
We expect there to be {TYPE_HINT_ERROR_THRESHOLD} or fewer violations of the style guide (actual: {error_count})"""


def _build_deviations_dict(mypy_results: List[str]) -> Dict[str, List[str]]:
deviations_dict: Dict[str, List[str]] = defaultdict(list)
for row in mypy_results:
file: str = row.split(":")[0]
deviations_dict[file].append(row)

return deviations_dict


if __name__ == "__main__":
changed_files: List[str] = get_changed_files("origin/develop")
untyped_def_deviations: List[str] = run_mypy("great_expectations")
render_deviations(changed_files, untyped_def_deviations)

0 comments on commit 00b9a77

Please sign in to comment.