input_checker

The input_checker package enables users to compare a given data frame against a benchmark data frame. The package executes that by keeping track of the key information in the benchmark data frame and cross-checking the comparison data frame against those tracked characteristics.

The package currently contains five main checks;

Null checker: ensures that columns with missing values in the benchmark data frame are the only columns with missing values in the comparison data frame
Dtype checker: ensures that columns in the comparison data frame are of the same data type as in the benchmark data frame
Categorical value checker: ensures that categorical columns in the comparison data frame only contain values that exist in the benchmark data frame
Numerical checker: ensures that the values of the numerical columns in the comparison data frame lie within the minimum and maximum range of the numerical columns in the benchmark data frame
Datetime checker: ensures that the values of datetime columns in the comparison data frame lie beyond the minimum date (optionally maximum) of datetime columns in the benchmark data frame

The package has multiple usage areas including but not limited to ensuring that data points sent to the model in live environment matches key characteristics of the data the model was initially trained on.

Here is a simple example of using input_checker to compare training data to test data;

import pandas as pd
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split

import input_checker
from input_checker.checker import InputChecker

# load and prepare sklearn wine dataset
wine = load_wine()

df_wine = pd.DataFrame(wine['data'], columns = wine['feature_names'])
df_wine['target'] = wine['target']

# split into train/test sets
df_train, df_test = train_test_split(df_wine, test_size=0.2)

# define numerical columns 
# please note; the original wine dataset only has numerical fields
# please refer to the example notebook under the examples folder for 
# using input_checker with different dtypes and missing values
numerical_columns = ['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium',
       'total_phenols', 'flavanoids', 'nonflavanoid_phenols',
       'proanthocyanins', 'color_intensity', 'hue',
       'od280/od315_of_diluted_wines', 'proline']

# define input_checker
checker = InputChecker(columns=numerical_columns,
                       numerical_columns=numerical_columns) 

# fitting input_checker
checker.fit(df_train)

# compare test data frame to the training data frame
df_test_checked = checker.transform(df_test)

Installation

input_checker can be installed from PyPI simply with;

pip install input_checker

Documentation

Documentation for input_checker can be found on readthedocs.

Examples

To help get started there is an example notebook in the examples folder that shows how to use input_checker.

Build and test

The test framework we are using for this project is pytest, to run the tests follow the steps below.

First clone the repo and move to the root directory;

git clone https://github.com/lvgig/input_checker.git
cd input_checker

Then install input_checker in editable mode;

pip install -e . -r requirements-dev.txt

Then run the tests simply with pytest

pytest

Contribute

input_checker is under active development, we're super excited if you're interested in contributing! See the CONTRIBUTING.md for the full details of our working practices.

For bugs and feature requests please open an issue.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.githooks		.githooks
docs		docs
examples		examples
input_checker		input_checker
tests		tests
.coveragerc		.coveragerc
.flake8		.flake8
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
CONTRIBUTING.md		CONTRIBUTING.md
Changelog.md		Changelog.md
LICENSE		LICENSE
README.md		README.md
bandit.yml		bandit.yml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.py		setup.py

License

lvgig/input_checker

Folders and files

Latest commit

History

Repository files navigation

input_checker

Installation

Documentation

Examples

Build and test

Contribute

About

Topics

Resources

License

Stars

Watchers

Forks

Languages