Skip to content
This repository has been archived by the owner on Jan 9, 2024. It is now read-only.

Commit

Permalink
Merge pull request #40 from georgianpartners/issue_33
Browse files Browse the repository at this point in the history
Implement basic integration tests
  • Loading branch information
alexrallen committed Feb 20, 2019
2 parents 7f5ce68 + 787a063 commit 6896f1d
Show file tree
Hide file tree
Showing 5 changed files with 117 additions and 8 deletions.
5 changes: 4 additions & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,9 @@ script:
after_success:
- poetry run coveralls

env:
- FORESHADOW_TESTS="ALL"

jobs:
include:
- python: "3.5"
Expand All @@ -35,4 +38,4 @@ jobs:
- pip install pre-commit
- pre-commit install-hooks
script:
- pre-commit run --all-files
- pre-commit run --all-files
10 changes: 8 additions & 2 deletions doc/developers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,8 @@ Making sure everything works
If all the tests pass you're all set up!

.. note:: Our platform also includes integration tests that asses the overall performance of our framework using the default settings on a few standard ML datasets. By default these tests are not executed, to run them, set an environmental variable called `FORESHADOW_TESTS` to `ALL`

Suggested development work flow
1. Create a branch off of development to contain your change

Expand Down Expand Up @@ -199,10 +201,14 @@ Intents are where the magic of Foreshadow all comes together. You need to be tho

You will need to set the :py:attr:`dtype <foreshadow.intents.BaseIntent.dtype>`, :py:attr:`children <foreshadow.intents.BaseIntent.children>`, :py:attr:`single_pipeline <foreshadow.intents.BaseIntent.single_pipeline>`, and :py:attr:`multi_pipeline <foreshadow.intents.BaseIntent.multi_pipeline>` class attributes. You will also need to implement the :py:meth:`is_intent <foreshadow.intents.BaseIntent.is_intent>` classmethod. In most cases when adding an intent you can initialize :py:attr:`children <foreshadow.intents.BaseIntent.children>` to an empty list. Set the :py:attr:`dtype <foreshadow.intents.BaseIntent.dtype>` to the most appropriate initial form of that entering your intent.

Use the :py:attr:`single_pipeline <foreshadow.intents.BaseIntent.single_pipeline>` field to determine the transformers that will be applied to a **single** column that is mapped to your intent. Add a **unique** name describing each step that you choose to include in your pipeline. It is important to note the utility of smart transformers here as you can now include branched logic in your pipelines deciding between different individual transformers based on the input data at runtime. The :py:attr:`multi_pipeline <foreshadow.intents.BaseIntent.multi_pipeline>` pipeline should be used to apply transformations to all columns of a specific intent after the single pipelines have been evaluated. The same rules for defining the pipelines themselves apply here as well.
Use the :py:attr:`single_pipeline <foreshadow.intents.BaseIntent.single_pipeline>` field to determine the transformers that will be applied to a **single** column that is mapped to your intent. Add a **unique** name describing each step that you choose to include in your pipeline. This field is represented as a list of PipelineTemplateEntry objects which are constructed using the following format `PipelineTemplateEntry([unique_name], [class], [can_operate_on_y])` The class name is either a singular transformer class, or a tuple of the form `([cls], {**args})` where args will be passed into the constructor of the transformer. The final boolean determines whether that transformer should be applied when operating on y-variables.

It is important to note the utility of smart transformers here as you can now include branched logic in your pipelines deciding between different individual transformers based on the input data at runtime. The :py:attr:`multi_pipeline <foreshadow.intents.BaseIntent.multi_pipeline>` pipeline should be used to apply transformations to all columns of a specific intent after the single pipelines have been evaluated. The same rules for defining the pipelines themselves apply here as well.

The :py:meth:`is_intent <foreshadow.intents.BaseIntent.is_intent>` classmethod determines whether a specific column maps to an intent. Use this method to apply any heuristics, logic, or methods of determine whether a raw column maps to the intent that you are defining. Below is an example intent definition that you can modify to suit your needs.

The :py:meth:`column_summary <foreshadow.intents.BaseIntent.column_summary>` classmethod is used to generate statistical reports each time an intent operates on a columns allowing a user to examine how effective the intent will be in processing the data. These reports can be accessed by calling the :py:meth:`summarize <foreshadow.preprocessor.summarize>` method after fitting the Foreshadow object.

Make **sure** to go to the parent intent and add your intent class name to the ordered :py:attr:`children <foreshadow.intents.BaseIntent.children>` field in the order of priority among the previously defined intents. The last intent in this list will be the most preferred intent upon evaluation in the case of multiple intents being able to process a column.

Take a look at the :py:class:`NumericIntent <foreshadow.intents.NumericIntent>` implementation for an example of how to implement an intent.
Expand All @@ -211,4 +217,4 @@ Take a look at the :py:class:`NumericIntent <foreshadow.intents.NumericIntent>`
Future Architecture Roadmap
---------------------------

Under progress
In progress
2 changes: 2 additions & 0 deletions doc/users.rst
Original file line number Diff line number Diff line change
Expand Up @@ -336,6 +336,8 @@ other than the :code:`override` parameter itself will be passed to the override
To use a smart transformer outside of the Intent / Foreshadow environment simply use it exactly as a sklearn transformer. When you call :code:`fit()` or :code:`fit_transform()` it automatically
resolves which transformer to use by interally calling the :code:`_get_transformer()` overriden method.

.. note:: Arguments passed into the constructor of a smart transformer will be passed into the fit function of the transformer it resolves to. This is meant to primarily be used alongside the override argument.


Configuration
-------------
Expand Down
11 changes: 6 additions & 5 deletions foreshadow/intents/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,16 +67,17 @@ class BaseIntent(metaclass=_IntentRegistry):

single_pipeline_template = None
"""A template for single pipelines of smart transformers that affect a
single column in an intent
single column in an intent. Uses a list of PipelineTemplateEntry to
describe the transformers.
The template needs an additional boolean at the end of the tuple that
The template needs an additional boolean at the end of the constructor that
determines whether the transformation can be applied to response
variables.
Example: single_pipeline_template = [
('t1', Transformer1, False),
('t2', (Transformer2, {'arg1': True}), True),
('t3', Transformer1, True),
PipelineTemplateEntry('t1', Transformer1, False),
PipelineTemplateEntry('t2', (Transformer2, {'arg1': True}), True),
PipelineTemplateEntry('t3', Transformer1, True),
]
"""

Expand Down
97 changes: 97 additions & 0 deletions foreshadow/tests/test_integration.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
"""
Integration Tests
Slow-running tests that verify the performance of the framework on simple datasets
"""

import pytest


def check_slow():
import os

return os.environ.get("FORESHADOW_TESTS") != "ALL"


slow = pytest.mark.skipif(
check_slow(), reason="Skipping long-runnning integration tests"
)


@slow
def test_integration_binary_classification():
import foreshadow as fs
import pandas as pd
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

np.random.seed(1337)

cancer = load_breast_cancer()
cancerX_df = pd.DataFrame(cancer.data, columns=cancer.feature_names)
cancery_df = pd.DataFrame(cancer.target, columns=["target"])

X_train, X_test, y_train, y_test = train_test_split(
cancerX_df, cancery_df, test_size=0.2
)
shadow = fs.Foreshadow(estimator=LogisticRegression())
shadow.fit(X_train, y_train)

baseline = 0.9824561403508771
score = shadow.score(X_test, y_test)

assert not score < baseline * 0.9


@slow
def test_integration_multiclass_classification():
import foreshadow as fs
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

np.random.seed(1337)

iris = load_iris()
irisX_df = pd.DataFrame(iris.data, columns=iris.feature_names)
irisy_df = pd.DataFrame(iris.target, columns=["target"])

X_train, X_test, y_train, y_test = train_test_split(
irisX_df, irisy_df, test_size=0.2
)
shadow = fs.Foreshadow(estimator=LogisticRegression())
shadow.fit(X_train, y_train)

baseline = 0.9666666666666667
score = shadow.score(X_test, y_test)

assert not score < baseline * 0.9


@slow
def test_integration_regression():
import foreshadow as fs
import numpy as np
import pandas as pd
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

boston = load_boston()
bostonX_df = pd.DataFrame(boston.data, columns=boston.feature_names)
bostony_df = pd.DataFrame(boston.target, columns=["target"])

X_train, X_test, y_train, y_test = train_test_split(
bostonX_df, bostony_df, test_size=0.2
)
shadow = fs.Foreshadow(estimator=LinearRegression())
shadow.fit(X_train, y_train)

baseline = 0.6953024611269096
score = shadow.score(X_test, y_test)

assert not score < baseline * 0.9

0 comments on commit 6896f1d

Please sign in to comment.