Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add QualityTest script type and contracts for it #20

Merged
merged 4 commits into from
Nov 3, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .idea/pricecypher_python_sdk.iml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

14 changes: 10 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,15 @@ datasets.get_transactions(DATASET_ID, AGGREGATE, columns)
```

### Contracts
The `Script` or `ScopeScript` abstract classes can be extended with their abstract methods implemented to create
scripts usable in other services. The `ScopeScript` in particular is intended for scripts that calculate values of
certain scopes for transactions. See the documentation on the abstract functions for further specifics.
The `Script`, `ScopeScript`, and `QualityTestScript` abstract classes can be extended with their abstract methods
implemented to create scripts usable in other services.

The `ScopeScript` in particular is intended for scripts that calculate values of certain scopes for transactions.

The `QualityTestScript` is intended for scripts that check the quality of a data intake and produce a standardized
output that can be visualized and/or used by other services.

See the documentation on the abstract functions for further specifics.

## Development

Expand All @@ -51,7 +57,7 @@ The SDK that this package provides is contained in the top-level package content
## Authors

* **Marijn van der Horst** - *Initial work*
* **Pieter Voors** - *Contracts*
* **Pieter Voors** - *Contracts for Script and ScopeScript*

See also the list of [contributors](https://github.com/marketredesign/pricecypher_python_sdk/contributors) who participated in this project.

Expand Down
3 changes: 2 additions & 1 deletion setup.cfg
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[metadata]
name = pricecypher-sdk
version = 0.4.1
version = 0.5.0
author = Deloitte Consulting B.V.
description = Python wrapper around the different PriceCypher APIs
long_description = file: README.md
Expand All @@ -23,6 +23,7 @@ install_requires =
requests>=2.14.0
marshmallow-dataclass>=8.5.3
pandas>=1.4.1
numpy>=1.18.5
typeguard>=2.13.3

[options.packages.find]
Expand Down
26 changes: 26 additions & 0 deletions src/pricecypher/contracts/QualityTestScript.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
from abc import ABC, abstractmethod
from typing import Optional, Any

from pricecypher.contracts import Script, TestSuite


class QualityTestScript(Script, ABC):
"""
The abstract QualityTestScript class serves as an interaction contract such that by extending it with its
methods implemented, a script can be created that performs data quality tests on a dataset, which can then be
used in a generalized yet controlled setting.
"""

def execute(self, business_cell_id: Optional[int], bearer_token: str, user_input: dict[Any: Any]) -> Any:
return self.execute_tests(business_cell_id, bearer_token)
EmielSchmeink marked this conversation as resolved.
Show resolved Hide resolved

@abstractmethod
def execute_tests(self, business_cell_id: Optional[int], bearer_token: str) -> TestSuite:
EmielSchmeink marked this conversation as resolved.
Show resolved Hide resolved
"""
Execute the script to calculate the values of some scope for the given transactions.

:param business_cell_id: Business cell to execute the script for, or None if running the script for all.
:param bearer_token: Bearer token to use for additional requests.
:return: List of all test results that were performed by the test script.
"""
raise NotImplementedError
3 changes: 3 additions & 0 deletions src/pricecypher/contracts/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,5 @@
from .enums import *
from .dataclasses import *
from .Script import Script
from .ScopeScript import ScopeScript
from .QualityTestScript import QualityTestScript
85 changes: 85 additions & 0 deletions src/pricecypher/contracts/dataclasses/TestResult.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
from dataclasses import dataclass
from typing import Union

from pricecypher.contracts import TestStatus


@dataclass
class ElementTestResult:
"""
Defines a test result of one element of a test.

key (str): Unique identifier of this element test result (lowercase kebab-case), e.g. 'nr_null'.

label (str): Label of the element test result for displaying purposes, e.g. 'NULL values'.

value (str or int): The formatted value of the element test result, e.g. '23,734'.
"""
key: str
label: str
value: Union[str, int]


@dataclass
class ElementTest:
"""
Defines the test of a single element of a test case, having one or multiple test results. For instance, one element
could be a single column of one test that checks the number of NULL values for all columns of a dataset.

label (str): Label of this single element of the test for displaying purposes, e.g. the name of the column.

message (str): Short message that describes the test results for displaying purposes, e.g. 'The column has no NULL'.

results (list[ElementTestResult]): The test results for this single element. For instance, a count of the total
number of values, a count of the NULL values, and the percentage of NULL values.
"""
label: str
message: str
status: TestStatus
results: list[ElementTestResult]


@dataclass
class TestResult:
"""
Defines one test case with overall status result and multiple test results.

key (str): Unique identifier of the test (lowercase kebab-case), e.g. 'expect_no_null_values'.

label (str): Label of the test for displaying purposes, e.g. 'Expect no NULL values in the dataset.'

coverage (str): Short description to display what is covered by the test, e.g. '10 columns' or 'All transactions'.

status (TestStatus): Overall status of the test.

element_label (str): Label to display what the different test elements represent, e.g. 'Column' or 'Dataset'.

elements (list[ElementTest]): Test results of all the different elements in the test. For instance, the test
results of all the columns of the dataset.
"""
key: str
label: str
coverage: str
status: TestStatus
element_label: str
elements: list[ElementTest]


@dataclass
class TestSuite:
"""
One quality test script always produces one TestSuite response. A test suite (usually) contains multiple test cases.
It also defines a category that can be used by front-ends to group multiple test suites together.

label (str): Label of the test suite, e.g. 'Completeness'.

key (str): Unique identifier of the test suite (lowercase kebab-case), e.g. 'basic-completeness'.

category_key (str): Unique identifier of the category this test suite is in, e.g. 'basic' or 'advanced'.

test_results (list[TestResult]): All test cases of this test suite, with their results.
"""
label: str
key: str
category_key: str
test_results: list[TestResult]
1 change: 1 addition & 0 deletions src/pricecypher/contracts/dataclasses/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .TestResult import *
7 changes: 7 additions & 0 deletions src/pricecypher/contracts/enums/TestStatus.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
from enum import Enum


class TestStatus(str, Enum):
success = "success"
warning = "warning"
fail = "fail"
1 change: 1 addition & 0 deletions src/pricecypher/contracts/enums/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .TestStatus import TestStatus
2 changes: 1 addition & 1 deletion src/pricecypher/datasets.py
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@ def get_transactions(
:param bool aggregate: If true, the transactions will be grouped on all categorical columns that have no
aggregation method specified.
:param list columns: Desired columns in the resulting dataframe. Each column must be a dict. Each column must
have either a `representation` or a `name_dataset` specified. The following properties are optional.
have a `representation`, `scope_id`, or `name_dataset` specified. The following properties are optional.
`filter`: value or list of values the resulting transactions should be filtered on.
`aggregate`: aggregation method that should be used for this column. When aggregating and no
aggregation method is specified, the method that is used is determined by the underlying dataset
Expand Down
21 changes: 21 additions & 0 deletions src/pricecypher/encoders/JsonEncoder.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
import dataclasses
import numpy as np

from json import JSONEncoder


class PriceCypherJsonEncoder(JSONEncoder):
"""
JSON encoder that can properly serialize dataclasses and numpy numbers.
"""
def default(self, obj):
EmielSchmeink marked this conversation as resolved.
Show resolved Hide resolved
if dataclasses.is_dataclass(obj):
return dataclasses.asdict(obj)
elif isinstance(obj, np.integer):
return int(obj)
elif isinstance(obj, np.floating):
return float(obj)
elif isinstance(obj, np.ndarray):
return obj.tolist()

return super().default(obj)
1 change: 1 addition & 0 deletions src/pricecypher/encoders/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .JsonEncoder import PriceCypherJsonEncoder