Skip to content

Commit

Permalink
multiple sensitive features - postprocessing (#288)
Browse files Browse the repository at this point in the history
* take changes from other branch that touches all modules

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* get all tests working again

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* Dashboard for Census Notebook (#171)

Update the existing Census notebook for grid search to use the new dashboard. The bulk of the notebook is unchanged (including the fictional motivating scenario).

Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* Stop installing old dashboard (#176)

Have moved notebooks off the old dashboard. Remove dependency from pipelines

Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* Update ReadMe with Yarn instructions (#177)

Now that the dashboard tarball is no longer checked in, provide instructions on creating it in a cloned repo

Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* Standardise input convertors for test (#178)

Create and use a standard set of convertors for use with our 'argument type' tests. This has required adding several `__init__.py` files to the `test` directory to enable `pytest` to find the common code.

Also add an 'argument type' test to `ExponentiatedGradient`

Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* Small fixes to get the documentation appearing (#179)

Fix issues in getting documentation to appear in Sphinx.

Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* law school notebook (#169)

* law school notebook

Signed-off-by: Miro Dudik <mdudik@gmail.com>

* Remove hypens from filename
Some copy edit fixes
Correct suspected bug in ExponentiatedGradient section

Signed-off-by: Richard Edgar <riedgar@microsoft.com>

* Didn't quite undo all my temporary changes

Signed-off-by: Richard Edgar <riedgar@microsoft.com>

* address some of the comments

Signed-off-by: Miro Dudik <mdudik@gmail.com>

* Fix typo in name

Signed-off-by: Richard Edgar <riedgar@microsoft.com>

* Improve spacing and add a comment in expgrad section

Signed-off-by: Richard Edgar <riedgar@microsoft.com>

* add AUC explanation

Signed-off-by: Miro Dudik <mdudik@gmail.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* Code cleanups (#181)

Fix some minor things:
- Make the dashboard use the same copyright notice as the rest of the code
- Some renaming of `expgrad` to `ExponentiatedGradient`

Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* Build the widget (#185)

Add a job template which builds the widget to the PR-Gate, Nightly and Nightly-Fixed builds. Note that this does not run any tests, but just ensures that the widget builds successfully

Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* update logging to use FileHandler instead of basicConfig (#175)

Signed-off-by: Ilya Matiach <ilmat@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* Enable ReadTheDocs (#182)

Change how the documentation is done slightly, so that our documentation can show up on ReadTheDocs. Some additional copy-editing of the in-code documentation has been done as a result of this.

The docs should appear at:
https://fairlearn.readthedocs.io/en/latest/

Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* Pin scikit-learn (#189)

The recent update to scikit-learn is causing a break in one of the Notebooks. Until this is debugged, pin the version

Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* Add more flake8 checks (#187)

Add a number of extra flake8 checks:
- flake8-blind-except
- flake8-builtins
- flake8-docstrings
- flake8-logging-format
- flake8-rst-docstrings

Since these create a huge number of issues, suppress a lot of these for now in `setup.cfg` (plus a handful of special cases done inline). Put in fixes for the simpler complaints, such as:
- Separate summaries in docstrings
- Spacing within and around docstrings
- Deferring string interpolation in `logging` calls

Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* Rename files and update license and docs (#183)

* rename  files

* update comment

* update license

* address comments

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* Fix for Law School Notebook (#191)

Tweak the Law School notebook so that it works with the latest `scikit-learn`

This enables us to unpin the version of `scikit-learn` in our `requirements.txt` file

Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* Markdown updates based on doc bash (#186)

* address feedback from doc bash

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* latex updates

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* latex update

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* latex update

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* undo latex changes

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* remove commas

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* rephrasing postprocessing constructor requirements

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* feedback from Miro

Signed-off-by: Roman Lutz <rolutz@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* Reorganise documentation (#192)

Reorganising how the documentation is presented, since the default style from `sphinx-apidoc` assumed we had lots of individual modules rather than larger packages

Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* Declare 0.4.0 release (#193)

Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* Workaround Python 3.5 issue with Linux (#194)

An issue with the `pip` install of `shap` has appeared on the Linux agents under Python 3.5. Reasons are currently obscure, but this is blocking a release. Since Python 3.5 continues to work on Windows, rely on that (pending further debugging)

Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* fix classification classification bug (#201)

* Update readme for v0.4.0 (#196)

Signed-off-by: Richard Edgar <riedgar@microsoft.com>

* Pin troublesome package (#198)

During our release process, a new version of `colorama` (required by one of our dependencies) was released. This has issues with the Windows/3.7 build.

Unblock the release by pinning the version

Signed-off-by: Richard Edgar <riedgar@microsoft.com>

* fix classification classification bug

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* version change to address security bug (#203)

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* Fix accidental merge (#205)

Some portions of the v0.4.0 release branch were accidentally merged into master
- Making the ReadMe version suitable for PyPI
- Pinning the `colorama` version to unblock the release train

This changeset undoes these fixes in master

Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* ReadMe Processor for Releases (#206)

Create a python script to translate `ReadMe.md` from GitHub to PyPI. This will avoid the need to create a branch to do a release.

This script is slightly dependent on the structure of the file, so if there are substantial changes to that, this script will require updating. It also assumes that a tag `v(fairlearn.__version__)` exists in the repo.

Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* Track pip dependencies (#208)

We've had trouble with our dependencies updating and breaking our builds

Augment the build pipelines so that they publish the output of `pip freeze` to an artifact. This will aid debugging these issues. The name of both the artifact itself and the file therein can be specified.

Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* Remove some flake8 global suppressions (#209)

After adding more `flake8` analysers, we were obliged to put in some global suppressions to keep the number of issues manageable. Start the process of removing these with D102, D103 and D401. Some of these just move the suppression to file-level, while others tweak documentation blocks to suit.

Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* Re-enable Linux 3.5 (#210)

Roman figured out a workaround for getting `shap` installed with Linux and Python 3.5. Put this into `fairlearn`

Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* Expand Notebook testing (#212)

Increase the variety of platforms used for testing our Jupyter Notebooks. Unable to test on MacOS at present, due to some problem installing `lightgbm`.

Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* Improvements for pinning requirements (#213)

A better way of running our tests with pinned requirements. Rather than have a separate `requirements-fixed.txt` file, have a script to turn the `requirements.txt` file into the former. Update builds accordingly.

Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* Standardise ML argument documentation (#214)

Make our documentation of fit(), X, predict() etc. more standard between our various submodules.

Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* perf test through Azure ML (#180)

* perf test first version through Azure ML

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* move some code to tempeh

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* add missing files

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* perf tests that get auth details through Azure Keyvault

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* upgrade to alpha tempeh version

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* flake8

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* exclude D100 and D103 for script generation python file

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* move variables into nightly-perf.yaml

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* azureml sdk requirement for perf test

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* remove powershell syntax

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* add cwd for tests

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* print cwd

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* remove special working directory condition

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* fix directory handling based on ADO

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* tempeh bump to a2

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* print message for debugging

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* try upper case variables

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* remove extraneous dash

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* print env var names

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* try explicitly adding variables

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* use variables directly, tempeh bump

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* add hardcoced data as variables

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* use windows instead of linux because some of the UI packages aren't available in linux

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* add wheel dependency

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* fake dashboard files

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* pass parameters for perf tests via args

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* yaml fix

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* yaml fix

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* remove waiting for run to complete

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* refactor to submit all jobs without waiting for result

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* remove obsolete gitignore line

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* tempeh bump to 0.1.11

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* azureml-sdk warning

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* pipeline improvements to use keyvault tasks

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* logically separate script generation into steps

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* simplify writing long string of = signs

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* flake8

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* replace incorrect variables in yaml

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* flake8

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* use importerror instead of modulenotfounderror for py3.5

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* add PR trigger for changes to test/perf directory

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* remove output from notebook

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* quotes for yaml variables

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* correct parameter in yaml

Signed-off-by: Roman Lutz <rolutz@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* Documentation and flake8 (#215)

Various updates for the documentation:
- Remove another `flake8` global suppression
- Add explanations for remaining `flake8` suppressions
- Replace the `:any:` references in the documentation with appropriate ones
- Make some file-level suppressions (which may have actually turned `flake8` off entirely on the file) specific to the appropriate lines

Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* Update Release pipeline after KV move (#216)

The KeyVault containing the PyPI secrets has been moved to a more appropriate subscription. As a result, the Release pipeline needs to be updated with the correct service connection

Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* Basic unit tests for EqualizedOdds and DemographicParity moments (#217)

Some very basic unit tests for the `EqualizedOdds` and `DemographicParity` moment classes. These are 'pinning' tests two establish the behaviour of these classes. The `gamma` method is not yet included in these tests, since that requires a trained model.

Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* Add more time metrics to performance tests (#219)

* perf test first version through Azure ML

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* move some code to tempeh

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* add missing files

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* perf tests that get auth details through Azure Keyvault

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* upgrade to alpha tempeh version

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* flake8

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* exclude D100 and D103 for script generation python file

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* move variables into nightly-perf.yaml

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* azureml sdk requirement for perf test

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* remove powershell syntax

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* add cwd for tests

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* print cwd

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* remove special working directory condition

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* fix directory handling based on ADO

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* tempeh bump to a2

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* print message for debugging

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* try upper case variables

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* remove extraneous dash

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* print env var names

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* try explicitly adding variables

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* use variables directly, tempeh bump

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* add hardcoced data as variables

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* use windows instead of linux because some of the UI packages aren't available in linux

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* add wheel dependency

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* fake dashboard files

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* pass parameters for perf tests via args

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* yaml fix

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* yaml fix

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* remove waiting for run to complete

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* refactor to submit all jobs without waiting for result

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* remove obsolete gitignore line

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* tempeh bump to 0.1.11

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* azureml-sdk warning

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* pipeline improvements to use keyvault tasks

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* logically separate script generation into steps

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* simplify writing long string of = signs

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* flake8

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* replace incorrect variables in yaml

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* flake8

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* use importerror instead of modulenotfounderror for py3.5

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* add PR trigger for changes to test/perf directory

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* remove output from notebook

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* quotes for yaml variables

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* correct parameter in yaml

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* add additional time-based metrics

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* adjustments to fix syntax errors and logical issues in the calculation of metrics

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* add oracle calls

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* custom metrics for executions times, min, max, mean

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* undo sphinx special docs for test/perf

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* flake8

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* flake8

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* flake8

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* flake8

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* flake8

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* update description of oracle execution time properties

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* flake8

Signed-off-by: Roman Lutz <rolutz@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* Fix perf tests by logging lists through log_list instead of log (#221)

* perf test first version through Azure ML

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* move some code to tempeh

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* add missing files

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* perf tests that get auth details through Azure Keyvault

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* upgrade to alpha tempeh version

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* flake8

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* exclude D100 and D103 for script generation python file

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* move variables into nightly-perf.yaml

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* azureml sdk requirement for perf test

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* remove powershell syntax

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* add cwd for tests

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* print cwd

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* remove special working directory condition

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* fix directory handling based on ADO

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* tempeh bump to a2

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* print message for debugging

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* try upper case variables

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* remove extraneous dash

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* print env var names

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* try explicitly adding variables

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* use variables directly, tempeh bump

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* add hardcoced data as variables

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* use windows instead of linux because some of the UI packages aren't available in linux

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* add wheel dependency

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* fake dashboard files

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* pass parameters for perf tests via args

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* yaml fix

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* yaml fix

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* remove waiting for run to complete

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* refactor to submit all jobs without waiting for result

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* remove obsolete gitignore line

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* tempeh bump to 0.1.11

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* azureml-sdk warning

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* pipeline improvements to use keyvault tasks

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* logically separate script generation into steps

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* simplify writing long string of = signs

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* flake8

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* replace incorrect variables in yaml

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* flake8

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* use importerror instead of modulenotfounderror for py3.5

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* add PR trigger for changes to test/perf directory

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* remove output from notebook

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* quotes for yaml variables

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* correct parameter in yaml

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* add additional time-based metrics

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* adjustments to fix syntax errors and logical issues in the calculation of metrics

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* add oracle calls

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* custom metrics for executions times, min, max, mean

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* undo sphinx special docs for test/perf

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* flake8

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* flake8

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* flake8

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* flake8

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* flake8

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* update description of oracle execution time properties

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* flake8

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* bug fix for list logging

Signed-off-by: Roman Lutz <rolutz@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* make metric logging a lot more readable and provide additional metrics to show the overhead fairlearn adds (#228)

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* Convert Notebook tests to papermill (#223)

Rather than using `nbval`, convert our notebook tests to use `papermill`. With the help of `nteract-scrapbook` we can then examine the contents of particular variables from the notebooks to ensure that we're getting the expected results.

Explicit `scrapbook` commands are required to save out values for future examination, but we don't want to include these when our users look at the notebooks. Accordingly, we include machinery for adding the necessary cells to the notebooks dynamically.

Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* Bump dashboard npm package to match source code (#229)

* publish latest version

* update docs for push

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* Remove unused ReST files (#233)

Two of the ReST files generated by sphinx-autodoc weren't actually used. Remove them to get rid of a warning.

Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* Basic Moments documentation (#241)

Add some basic documentation of the `Moment` class and its subclasses.

Also:
- Turn the `n` field of the `Moment` object into a `total_samples` property
- Add `intersphinx` hook for `pandas` documentation

Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* [WIP] create extensions to install custom plots separately & check in generated files (#240)

* check in generated javascript files and split into package with extensions

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* add installation tests, move yml files to templates directory if appropriate, delete unused and broken yml file

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* flake8

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* separate directories per package, composition with minimal fairlearn package

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* script updates to get doc and wheel builds in shape

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* update yml files and scripts to enable wheel upload per package

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* address feedback from PR by adding documentation to the pipeline definition yml files

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* remove "templates/" as location prefix for files in the templates directory itself

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* first version of widget build validation script

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* corrections in widget build validation

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* undo adjustments to completely split up packages

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* fix yaml

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* reverse code coverage build changes

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* fix yml

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* ignore install tests when necessary

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* add macos python 3.5

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* add exceptions module back to documentaiton

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* add name for job

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* fix characters in job name

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* fix job name

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* add image label

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* correct installation path

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* undo other changes to rst file

Signed-off-by: Roman Lutz <rolutz@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* Add logging variant of numpy.all_close (#246)

The `numpy` package provides an `all_close` routine for comparing two arrays. Unfortunately, there's no mechanism for showing which elements failed the comparison. Put together a wrapper based on `numpy.isclose` which will print out information about failed comparisons.

Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* Implement GroupMetricSet (#250)

Create a `GroupMetricSet` class for holding collections of grouped metrics. This is to help with AzureML integration.

There has been some (possibly unnecessary) reorganisation of things under `fairlearn/metrics` but the public interface is unchanged.

Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* Exclude install tests in code coverage check (#251)

* ignore install tests since they'll unexpectedly work

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* add python -m before pip install

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* upgrade tempeh to v0.1.12

Signed-off-by: Roman Lutz <rolutz@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* Replace powershell scripts with python and add Makefile (#249)

* check in generated javascript files and split into package with extensions

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* add installation tests, move yml files to templates directory if appropriate, delete unused and broken yml file

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* flake8

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* separate directories per package, composition with minimal fairlearn package

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* script updates to get doc and wheel builds in shape

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* update yml files and scripts to enable wheel upload per package

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* address feedback from PR by adding documentation to the pipeline definition yml files

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* remove "templates/" as location prefix for files in the templates directory itself

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* first version of widget build validation script

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* corrections in widget build validation

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* undo adjustments to completely split up packages

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* fix yaml

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* reverse code coverage build changes

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* fix yml

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* ignore install tests when necessary

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* add macos python 3.5

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* add exceptions module back to documentaiton

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* add name for job

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* fix characters in job name

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* fix job name

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* add image label

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* correct installation path

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* undo other changes to rst file

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* rewrite scripts in python

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* replace widget build script with python script

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* build_widget adjustments to make it work

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* build_widget finalization plus add ls commands to find yarn installation in ADO

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* some more paths to check

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* task -> script

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* usr/bin/yarn check

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* workingDirectory adjustment

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* add ls

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* adjustment for fairlearn root dir check

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* add ./

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* fix template

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* flake8

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* comment about set-variable-from-file script only being required in ADO

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* add makefile, update contributing guide, and replace remaining ps1 occurrences in pipeline ymls

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* add romanlutz to codeowners for scripts dir

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* fix comment

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* make process_readme a standalone script again

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* delete build_docs

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* flake8

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* makefile adjustments according to feedback

Signed-off-by: Roman Lutz <rolutz@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* Fix pypi release yaml (#260)

* undo erroneous changes to yaml

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* undo prior erroneous change

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* replace job template usage with just a step

Signed-off-by: Roman Lutz <rolutz@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* add CHANGES.md for v0.4.2 (#262)

* add CHANGES.md for v0.4.2

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* add general instructions to always do that

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* Update CHANGES.md

Adding `GroupMetricSet` to the changelog

Signed-off-by: Richard Edgar <riedgar@microsoft.com>

* comment out test that fails consistently only on windows

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* fix readme processing script by adding fairlearn dir to sys path, add second solution for issue 265

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* fix syntax error, flake8

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* bump version to 0.4.2

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* test with list of lists instead of single list

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

Co-authored-by: Richard Edgar <riedgar@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* Update metric keys to match dashboard (#268)

The dashboard already had its own keys defined for mapping metric functions to strings. Update the `GroupMetricSet` to use the same keys.

Figuring out how to unify the two implementations of this mapping is left as a issue #269 

Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* Fix release blockers - widget generated files, widget validation (#267)

* add built widget file updates & fix widget build validation, as well as pypi release template for empty DEV_VERSION

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* undo DEV_VERSION change

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* add comment and link to issue

Signed-off-by: Roman Lutz <rolutz@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* remove --assert-no-changes flag in release as well (#272)

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* Update default metrics in GroupMetricSet (#271)

Tweak the list of metrics computed by default by the `compute` method of `GroupMetricSet` to match those expected by the dashboard

Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* set env var before installing fairlearn to correct version file name content (#274)

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* MNT use sklearn's NotFittedError instead of NotFittedException (#259)

* MNT use sklearn's NotFittedError instead of NotFittedException

Signed-off-by: adrinjalali <adrin.jalali@gmail.com>

* add to the changelog

Signed-off-by: adrinjalali <adrin.jalali@gmail.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* Updates for GroupMetricResult and GroupMetricSet (#279)

Add (in)equality operators to `GroupMetricResult` and `GroupMetricSet`, along with basic tests. These will simplify other testing in future.

Change `GroupMetricSet` so that the `groups` have to be specified as sequential integers from zero. If this is not the case then the `compute()` method will remap the supplied groups to `[0, 1, 2, ….]` and put the stringified original values into the `group_names` property. Since the keys are now sequential integers, convert the `group_names` property itself from a dictionary into a list.

Closes #275

Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* DOC contributing: trim lines and add notes on signoff (#276)

* DOC contributing: trim lines and add notes on signoff

Signed-off-by: adrinjalali <adrin.jalali@gmail.com>

* hook

Signed-off-by: adrinjalali <adrin.jalali@gmail.com>

* modify note to point to the right answer

Signed-off-by: adrinjalali <adrin.jalali@gmail.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* Further metric changes (#281)

A number of extra changes to metrics:

- `GroupMetricResult` now dynamically calculates `maximum`, `range` etc.
- `GroupMetricSet` has a consistency check
- `GroupMetricSet` can transform itself to and from a dictionary matching the schema used by the dashboard

Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* Preparations for v0.4.3 Release (#284)

Bump version and update Markdown files

Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* Use kwargs in metrics (#286)

Change `metric_by_group` and `make_group_metric` to understand `**kwargs`. This removes the need for lots of small wrapper functions

Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* take changes from other branch that touches all modules

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* get all tests working again

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* squeeze instead of reshape, deselect instead of skip in pytest, utility function for compression

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* flake8

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

Co-authored-by: Richard Edgar <riedgar@microsoft.com>
Co-authored-by: MiroDudik <mdudik@gmail.com>
Co-authored-by: Ilya Matiach <ilmat@microsoft.com>
Co-authored-by: Brandon Horn <rihorn@microsoft.com>
Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
  • Loading branch information
6 people committed Feb 10, 2020
1 parent 2ffe87c commit 7253eb8
Show file tree
Hide file tree
Showing 7 changed files with 531 additions and 265 deletions.
100 changes: 98 additions & 2 deletions fairlearn/_input_validation.py
Expand Up @@ -3,18 +3,113 @@

import numpy as np
import pandas as pd
from sklearn.utils.validation import check_X_y, check_consistent_length, check_array


_KW_SENSITIVE_FEATURES = "sensitive_features"

_MESSAGE_X_NONE = "Must supply X"
_MESSAGE_Y_NONE = "Must supply y"
_MESSAGE_SENSITIVE_FEATURES_NONE = "Must specify {0} (for now)".format(_KW_SENSITIVE_FEATURES)
_MESSAGE_X_Y_ROWS = "X and y must have same number of rows"
_MESSAGE_X_SENSITIVE_ROWS = "X and the sensitive features must have same number of rows"
_INPUT_DATA_FORMAT_ERROR_MESSAGE = "The only allowed input data formats for {} are: {}. " \
"Your provided data was of type {}."
_EMPTY_INPUT_ERROR_MESSAGE = "At least one of sensitive_features, labels, or scores are empty."
_SENSITIVE_FEATURES_NON_BINARY_ERROR_MESSAGE = "Sensitive features contain more than two unique" \
" values"
_LABELS_NOT_0_1_ERROR_MESSAGE = "Supplied y labels are not 0 or 1"
_MORE_THAN_ONE_COLUMN_ERROR_MESSAGE = "{} is a {} with more than one column"
_NOT_ALLOWED_TYPE_ERROR_MESSAGE = "{} is not an ndarray, Series or DataFrame"
_NDARRAY_NOT_TWO_DIMENSIONAL_ERROR_MESSAGE = "{} is an ndarray which is not 2D"
_NOT_ALLOWED_MATRIX_TYPE_ERROR_MESSAGE = "{} is not an ndarray or DataFrame"

_ALLOWED_INPUT_TYPES_X = [np.ndarray, pd.DataFrame]
_ALLOWED_INPUT_TYPES_SENSITIVE_FEATURES = [np.ndarray, pd.DataFrame, pd.Series, list]
_ALLOWED_INPUT_TYPES_Y = [np.ndarray, pd.DataFrame, pd.Series, list]

_SENSITIVE_FEATURE_COMPRESSION_SEPARATOR = ","


def _validate_and_reformat_input(X, y=None, expect_y=True, enforce_binary_sensitive_feature=False,
enforce_binary_labels=False, **kwargs):
"""Validate input data and return the data in an appropriate format.
:param X: The feature matrix
:type X: numpy.ndarray or pandas.DataFrame
:param y: The label vector
:type y: numpy.ndarray, pandas.DataFrame, pandas.Series, or list
:param expect_y: if True y needs to be provided, otherwise ignores the argument; default True
:type expect_y: bool
:param enforce_binary_sensitive_feature: if True raise exception if there are more than two
distinct values in the `sensitive_features` data from `kwargs`; default False
:type enforce_binary_sensitive_feature: bool
:param enforce_binary_labels: if True raise exception if there are more than two distinct
values in the `y` data; default False
:type enforce_binary_labels: bool
"""
if y is not None:
# calling check_X_y with a 2-dimensional y causes a warning, so ensure it is 1-dimensional
if isinstance(y, np.ndarray) and len(y.shape) == 2 and y.shape[1] == 1:
y = y.squeeze()
elif isinstance(y, pd.DataFrame) and y.shape[1] == 1:
y = y.to_numpy().squeeze()

X, y = check_X_y(X, y)
y = check_array(y, ensure_2d=False, dtype='numeric')
if enforce_binary_labels and not set(np.unique(y)).issubset(set([0, 1])):
raise ValueError(_LABELS_NOT_0_1_ERROR_MESSAGE)
elif expect_y:
raise ValueError(_MESSAGE_Y_NONE)
else:
X = check_array(X)

_KW_SENSITIVE_FEATURES = "sensitive_features"
sensitive_features = kwargs.get(_KW_SENSITIVE_FEATURES)
if sensitive_features is None:
raise ValueError(_MESSAGE_SENSITIVE_FEATURES_NONE)

check_consistent_length(X, sensitive_features)
sensitive_features = check_array(sensitive_features, ensure_2d=False, dtype=None)

# compress multiple sensitive features into a single column
if len(sensitive_features.shape) > 1 and sensitive_features.shape[1] > 1:
sensitive_features = \
_compress_multiple_sensitive_features_into_single_column(sensitive_features)

if enforce_binary_sensitive_feature:
if len(np.unique(sensitive_features)) > 2:
raise ValueError(_SENSITIVE_FEATURES_NON_BINARY_ERROR_MESSAGE)

return pd.DataFrame(X), pd.Series(y), pd.Series(sensitive_features.squeeze())


def _compress_multiple_sensitive_features_into_single_column(sensitive_features):
"""Compress multiple sensitive features into a single column.
The resulting mapping converts multiple dimensions into the Cartesian product of the
individual columns.
:param sensitive_features: multi-dimensional array of sensitive features
:type sensitive_features: `numpy.ndarray`
:return: one-dimensional array of mapped sensitive features
"""
if not isinstance(sensitive_features, np.ndarray):
raise ValueError("Received argument of type {} instead of expected numpy.ndarray"
.format(type(sensitive_features).__name__))
return np.apply_along_axis(
lambda row: _SENSITIVE_FEATURE_COMPRESSION_SEPARATOR.join(
[str(row[i])
.replace("\\", "\\\\") # escape backslash and separator
.replace(_SENSITIVE_FEATURE_COMPRESSION_SEPARATOR,
"\\" + _SENSITIVE_FEATURE_COMPRESSION_SEPARATOR)
for i in range(len(row))]),
axis=1,
arr=sensitive_features)


def _validate_and_reformat_reductions_input(X, y, enforce_binary_sensitive_feature=False,
**kwargs):
# TODO: remove this function once reductions use _validate_and_reformat_input from above
if X is None:
raise ValueError(_MESSAGE_X_NONE)

Expand Down Expand Up @@ -47,6 +142,7 @@ def _validate_and_reformat_reductions_input(X, y, enforce_binary_sensitive_featu


def _make_vector(formless, formless_name):
# TODO: remove this function once reductions use _validate_and_reformat_input from above
formed_vector = None
if isinstance(formless, list):
formed_vector = pd.Series(formless)
Expand Down Expand Up @@ -74,9 +170,9 @@ def _make_vector(formless, formless_name):


def _get_matrix_shape(formless, formless_name):
# TODO: remove this function once reductions use _validate_and_reformat_input from above
num_rows = -1
num_cols = -1

if isinstance(formless, pd.DataFrame):
num_cols = len(formless.columns)
num_rows = len(formless.index)
Expand Down
77 changes: 15 additions & 62 deletions fairlearn/postprocessing/_threshold_optimizer.py
Expand Up @@ -16,18 +16,15 @@

from sklearn.exceptions import NotFittedError
from fairlearn.postprocessing import PostProcessing
from fairlearn._input_validation import _validate_and_reformat_input
from ._constants import (LABEL_KEY, SCORE_KEY, SENSITIVE_FEATURE_KEY, OUTPUT_SEPARATOR,
DEMOGRAPHIC_PARITY, EQUALIZED_ODDS)
from ._roc_curve_utilities import _interpolate_curve, _get_roc
from ._interpolated_prediction import InterpolatedPredictor

# various error messages
DIFFERENT_INPUT_LENGTH_ERROR_MESSAGE = "{} need to be of equal length."
EMPTY_INPUT_ERROR_MESSAGE = "At least one of sensitive_features, labels, or scores are empty."
NON_BINARY_LABELS_ERROR_MESSAGE = "Labels other than 0/1 were provided."
INPUT_DATA_FORMAT_ERROR_MESSAGE = "The only allowed input data formats are: " \
"list, numpy.ndarray, pandas.DataFrame, pandas.Series. " \
"Your provided data was of types ({}, {}, {})"
NOT_SUPPORTED_CONSTRAINTS_ERROR_MESSAGE = "Currently only {} and {} are supported " \
"constraints.".format(DEMOGRAPHIC_PARITY, EQUALIZED_ODDS)
PREDICT_BEFORE_FIT_ERROR_MESSAGE = "It is required to call 'fit' before 'predict'."
Expand Down Expand Up @@ -97,7 +94,8 @@ def fit(self, X, y, *, sensitive_features, **kwargs):
:type sensitive_features: currently 1D array as numpy.ndarray, list, pandas.DataFrame,
or pandas.Series
"""
self._validate_input_data(X, sensitive_features, y)
_, _, sensitive_feature_vector = _validate_and_reformat_input(
X, y, sensitive_features=sensitive_features, enforce_binary_labels=True)

# postprocessing can't handle 0/1 as floating point numbers, so this converts it to int
if type(y) in [np.ndarray, pd.DataFrame, pd.Series]:
Expand Down Expand Up @@ -125,7 +123,7 @@ def fit(self, X, y, *, sensitive_features, **kwargs):
raise ValueError(NOT_SUPPORTED_CONSTRAINTS_ERROR_MESSAGE)

self._post_processed_predictor_by_sensitive_feature = threshold_optimization_method(
sensitive_features, y, scores, self._grid_size, self._flip, self._plot)
sensitive_feature_vector, y, scores, self._grid_size, self._flip, self._plot)

def predict(self, X, *, sensitive_features, random_state=None):
"""Predict label for each sample in X while taking into account sensitive features.
Expand All @@ -144,12 +142,14 @@ def predict(self, X, *, sensitive_features, random_state=None):
random.seed(random_state)

self._validate_post_processed_predictor_is_fitted()
self._validate_input_data(X, sensitive_features)
_, _, sensitive_feature_vector = _validate_and_reformat_input(
X, y=None, sensitive_features=sensitive_features, expect_y=False,
enforce_binary_labels=True)
unconstrained_predictions = self._unconstrained_predictor.predict(X)

positive_probs = _vectorized_prediction(
self._post_processed_predictor_by_sensitive_feature,
sensitive_features,
sensitive_feature_vector,
unconstrained_predictions)
return (positive_probs >= np.random.rand(len(positive_probs))) * 1

Expand All @@ -167,41 +167,18 @@ def _pmf_predict(self, X, *, sensitive_features):
:rtype: numpy.ndarray
"""
self._validate_post_processed_predictor_is_fitted()
self._validate_input_data(X, sensitive_features)
_, _, sensitive_feature_vector = _validate_and_reformat_input(
X, y=None, sensitive_features=sensitive_features, expect_y=False,
enforce_binary_labels=True)
positive_probs = _vectorized_prediction(
self._post_processed_predictor_by_sensitive_feature, sensitive_features,
self._post_processed_predictor_by_sensitive_feature, sensitive_feature_vector,
self._unconstrained_predictor.predict(X))
return np.array([[1.0 - p, p] for p in positive_probs])

def _validate_post_processed_predictor_is_fitted(self):
if not self._post_processed_predictor_by_sensitive_feature:
raise NotFittedError(PREDICT_BEFORE_FIT_ERROR_MESSAGE)

def _validate_input_data(self, X, sensitive_features, y=None):
allowed_input_types = [list, np.ndarray, pd.DataFrame, pd.Series]
if type(X) not in allowed_input_types or \
type(sensitive_features) not in allowed_input_types or \
(y is not None and type(y) not in allowed_input_types):
raise TypeError(INPUT_DATA_FORMAT_ERROR_MESSAGE
.format(type(X).__name__,
type(y).__name__,
type(sensitive_features).__name__))

if len(X) == 0 or len(sensitive_features) == 0 or (y is not None and len(y) == 0):
raise ValueError(EMPTY_INPUT_ERROR_MESSAGE)

if y is None:
if len(X) != len(sensitive_features) or (y is not None and len(X) != len(y)):
raise ValueError(DIFFERENT_INPUT_LENGTH_ERROR_MESSAGE
.format("X and sensitive_features"))
else:
if len(X) != len(sensitive_features) or (y is not None and len(X) != len(y)):
raise ValueError(DIFFERENT_INPUT_LENGTH_ERROR_MESSAGE
.format("X, sensitive_features, and y"))

if set(np.unique(y)) > set([0, 1]):
raise ValueError(NON_BINARY_LABELS_ERROR_MESSAGE)


def _threshold_optimization_demographic_parity(sensitive_features, labels, scores, grid_size=1000,
flip=True, plot=False):
Expand Down Expand Up @@ -443,37 +420,13 @@ def _vectorized_prediction(function_dict, sensitive_features, scores):
:type scores: list, numpy.ndarray, pandas.DataFrame, or pandas.Series
"""
# handle type conversion to ndarray for other types
sensitive_features_vector = _convert_to_ndarray(
sensitive_features, MULTIPLE_DATA_COLUMNS_ERROR_MESSAGE.format("sensitive_features"))
scores_vector = _convert_to_ndarray(scores, SCORES_DATA_TOO_MANY_COLUMNS_ERROR_MESSAGE)
sensitive_features_vector = np.array(sensitive_features)
scores_vector = np.array(scores)

return sum([(sensitive_features_vector == a) * function_dict[a].predict(scores_vector)
for a in function_dict])


def _convert_to_ndarray(data, dataframe_multiple_columns_error_message):
"""Convert the input data from list, pandas.Series, or pandas.DataFrame to numpy.ndarray.
:param data: the data to be converted into a numpy.ndarray
:type data: numpy.ndarray, pandas.Series, pandas.DataFrame, or list
:param dataframe_multiple_columns_error_message: the error message to show in case the
provided data is more than 1-dimensional
:type dataframe_multiple_columns_error_message:
:return: the input data formatted as numpy.ndarray
:rtype: numpy.ndarray
"""
if type(data) == list:
data = np.array(data)
elif type(data) == pd.DataFrame:
if len(data.columns) > 1:
# TODO: extend to multiple columns for additional group data
raise ValueError(dataframe_multiple_columns_error_message)
data = data[data.columns[0]].values
elif type(data) == pd.Series:
data = data.values
return data


def _reformat_and_group_data(sensitive_features, labels, scores, sensitive_feature_names=None):
"""Reformats the data into a new pandas.DataFrame and group by sensitive feature values.
Expand Down Expand Up @@ -535,7 +488,7 @@ def _reformat_data_into_dict(key, data_dict, additional_data):
raise ValueError(
MULTIPLE_DATA_COLUMNS_ERROR_MESSAGE.format("sensitive_features"))
else:
data_dict[key] = additional_data.reshape(-1)
data_dict[key] = additional_data.squeeze()
elif type(additional_data) == pd.DataFrame:
# TODO: extend to multiple columns for additional_data by using column names
for attribute_column in additional_data.columns:
Expand Down
6 changes: 6 additions & 0 deletions test/unit/constants.py
@@ -0,0 +1,6 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.


MULTIPLE_SENSITIVE_FEATURE_COMPRESSION_SKIP_REASON = \
"Multiple sensitive features cannot be compressed into one-dimensional data structure."
33 changes: 30 additions & 3 deletions test/unit/input_convertors.py
Expand Up @@ -4,6 +4,8 @@
import numpy as np
import pandas as pd

from fairlearn._input_validation import _compress_multiple_sensitive_features_into_single_column


def ensure_list(X):
assert X is not None
Expand All @@ -18,6 +20,19 @@ def ensure_list(X):
raise ValueError("Failed to convert to list")


def ensure_list_1d(X):
assert X is not None
if isinstance(X, list):
return X
elif isinstance(X, np.ndarray):
return X.squeeze().tolist()
elif isinstance(X, pd.Series):
return X.tolist()
elif isinstance(X, pd.DataFrame):
return X.tolist()
raise ValueError("Failed to convert to list")


def ensure_ndarray(X):
assert X is not None
if isinstance(X, list):
Expand All @@ -34,8 +49,10 @@ def ensure_ndarray(X):
def ensure_ndarray_2d(X):
assert X is not None
tmp = ensure_ndarray(X)
if len(tmp.shape) != 1:
raise ValueError("Requires 1d array")
if len(tmp.shape) not in [1, 2]:
raise ValueError("Requires 1d or 2d array")
if len(tmp.shape) == 2:
return tmp
result = np.expand_dims(tmp, 1)
assert len(result.shape) == 2
return result
Expand All @@ -46,7 +63,10 @@ def ensure_series(X):
if isinstance(X, list):
return pd.Series(X)
elif isinstance(X, np.ndarray):
return pd.Series(X)
if len(X.shape) == 1:
return pd.Series(X)
if X.shape[1] == 1:
return pd.Series(X.squeeze())
elif isinstance(X, pd.Series):
return X
elif isinstance(X, pd.DataFrame):
Expand All @@ -72,3 +92,10 @@ def ensure_dataframe(X):
ensure_ndarray_2d,
ensure_series,
ensure_dataframe]


def _map_into_single_column(matrix):
if len(np.array(matrix).shape) == 1:
return np.array(matrix)

return _compress_multiple_sensitive_features_into_single_column(matrix)

0 comments on commit 7253eb8

Please sign in to comment.