diff --git a/README.md b/README.md index b39ecd28..a277148b 100644 --- a/README.md +++ b/README.md @@ -14,21 +14,6 @@ FairLens is an open source Python library for automatically discovering bias and measuring fairness in data. The package can be used to quickly identify bias, and provides multiple metrics to measure fairness across a range of sensitive and legally protected characteristics such as age, race and sex. -## Core Features - -Some of the main features of Fairlens are: - -- **Measuring Bias** - FairLens can be used to measure the extent and significance of biases in datasets using a wide range of statistical distances and metrics. - -- **Sensitive Attribute and Proxy Detection** - Data Scientists may be unaware of protected or sensitive attributes in their data, and potentially hidden correlations between these columns and other non-protected columns in their data. FairLens can quickly identify sensitive columns and flag hidden correlations and the non-sensitive proxies. - -- **Visualization Tools** - FairLens has a range of tools that be used to generate meaningful and descriptive diagrams of different distributions in the dataset before delving further in to quantify them. For instance, FairLens can be used to visualize the distribution of a target with respect to different sensitive demographics, or a correlation heatmap. - -- **Fairness Scorer** - The fairness scorer is a simple tool which data scientists can use to get started with FairLens. It is designed to just take in a dataset and a target variable and to automatically generate a report highlighting hidden biases, correlations, and containing various diagrams. - - -The goal of FairLens is to enable data scientists to gain a deeper understanding of their data, and helps to to ensure fair and ethical use of data in analysis and machine learning tasks. The insights gained from FairLens can be harnessed by the [Bias Mitigation](https://www.synthesized.io/post/synthesized-mitigates-bias-in-data) feature of the [Synthesized](https://synthesized.io) platform, which is able to automagically remove bias using the power of synthetic data. - ## Bias in my data? It's very simple to quickly start understanding any biases that may be present in your data. @@ -81,6 +66,21 @@ See some of our previous blog posts for our take on bias and fairness in ML: - [Fairness and biases in machine learning and their impact on banking and insurance](https://www.synthesized.io/post/fairness-and-biases-in-machine-learning-and-their-impact-on-banking-and-insurance) - [Fairness and algorithmic biases in machine learning and recommendations to enterprise](https://www.synthesized.io/post/fairness-and-algorithmic-biases-in-machine-learning-and-recommendations) +## Core Features + +Some of the main features of Fairlens are: + +- **Measuring Bias** - FairLens can be used to measure the extent and significance of biases in datasets using a wide range of statistical distances and metrics. + +- **Sensitive Attribute and Proxy Detection** - Data Scientists may be unaware of protected or sensitive attributes in their data, and potentially hidden correlations between these columns and other non-protected columns in their data. FairLens can quickly identify sensitive columns and flag hidden correlations and the non-sensitive proxies. + +- **Visualization Tools** - FairLens has a range of tools that be used to generate meaningful and descriptive diagrams of different distributions in the dataset before delving further in to quantify them. For instance, FairLens can be used to visualize the distribution of a target with respect to different sensitive demographics, or a correlation heatmap. + +- **Fairness Scorer** - The fairness scorer is a simple tool which data scientists can use to get started with FairLens. It is designed to just take in a dataset and a target variable and to automatically generate a report highlighting hidden biases, correlations, and containing various diagrams. + + +The goal of FairLens is to enable data scientists to gain a deeper understanding of their data, and helps to to ensure fair and ethical use of data in analysis and machine learning tasks. The insights gained from FairLens can be harnessed by the [Bias Mitigation](https://www.synthesized.io/post/synthesized-mitigates-bias-in-data) feature of the [Synthesized](https://synthesized.io) platform, which is able to automagically remove bias using the power of synthetic data. + ## Installation diff --git a/docs/_static/distance.png b/docs/_static/distance.png new file mode 100644 index 00000000..141ae86e Binary files /dev/null and b/docs/_static/distance.png differ diff --git a/docs/conf.py b/docs/conf.py index 7fbac5d8..007acda8 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -53,6 +53,7 @@ "IPython.sphinxext.ipython_console_highlighting", "IPython.sphinxext.ipython_directive", "sphinx_panels", + "sphinxcontrib.bibtex", ] autosummary_generate = True @@ -63,6 +64,8 @@ panels_add_bootstrap_css = False +bibtex_bibfiles = ["refs.bib"] + # Add any paths that contain templates here, relative to this directory. templates_path = ["_templates"] @@ -110,5 +113,7 @@ "https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css", ] +mathjax_path = "https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" + # Customization html_logo = "_static/FairLens_196x51.png" diff --git a/docs/index.rst b/docs/index.rst index 0a0a296f..675a2c46 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -11,6 +11,21 @@ FairLens is a tool to help people assess the fairness of datasets and models in .. panels:: :card: + intro-card text-center + --- + :fa:`balance-scale,text-black fa-4x,style=fa` + + Fairness and Bias + ^^^^^^^^^^^^^^^^^ + + Learn more about fairness and bias in data science and machine learning, and how we measure it in FairLens. + + +++ + + .. link-button:: user_guide/fairness + :type: ref + :text: Go + :classes: btn-block btn-secondary + --- :fa:`book,text-black fa-4x,style=fa` @@ -44,6 +59,23 @@ FairLens is a tool to help people assess the fairness of datasets and models in :text: Go :classes: btn-block btn-secondary + --- + :fa:`users,text-black fa-4x,style=fa` + + Contributing + ^^^^^^^^^^^^ + + Saw a typo in the documentation? Want to improve + existing functionalities? The contributing guidelines will guide + you through the process of improving FairLens. + + +++ + + .. link-button:: contributing + :type: ref + :text: Go + :classes: btn-block btn-secondary + .. toctree:: :maxdepth: 3 :hidden: @@ -52,39 +84,3 @@ FairLens is a tool to help people assess the fairness of datasets and models in user_guide/index reference/index contributing - - -.. overview panel -.. --- -.. :fa:`balance-scale,text-black fa-4x,style=fa` - -.. Fairness and Bias -.. ^^^^^^^^^^^^^^^^^ - -.. An introduction to fairness and bias in data science. Learn more about how you can assess the fairness of -.. your machine learning pipeline. - -.. +++ - -.. .. link-button:: user_guide/fairness -.. :type: ref -.. :text: Go -.. :classes: btn-block btn-secondary - -.. contribution panel -.. --- -.. :fa:`users,text-black fa-4x,style=fa` - -.. Contributing -.. ^^^^^^^^^^^^ - -.. Saw a typo in the documentation? Want to improve -.. existing functionalities? The contributing guidelines will guide -.. you through the process of improving FairLens. - -.. +++ - -.. .. link-button:: contributing -.. :type: ref -.. :text: Go -.. :classes: btn-block btn-secondary diff --git a/docs/refs.bib b/docs/refs.bib new file mode 100644 index 00000000..3c7b43d8 --- /dev/null +++ b/docs/refs.bib @@ -0,0 +1,22 @@ +@book{fairmlbook, + title={Fairness and Machine Learning}, + author={Solon Barocas and Moritz Hardt and Arvind Narayanan}, + publisher={fairmlbook.org}, + url={http://www.fairmlbook.org}, + year = {2019} +} + +@article{gouic2020projection, + title={Projection to fairness in statistical learning}, + author={Gouic, Thibaut Le and Loubes, Jean-Michel and Rigollet, Philippe}, + journal={arXiv preprint arXiv:2005.11720}, + year={2020} +} + +@misc{compas, + title={How We Analyzed the COMPAS Recidivism Algorithm}, + author={Jeff Larson, Surya Mattu, Lauren Kirchner and Julia Angwin}, + journal={ProPublica}, + url={https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm}, + year={2016} +} diff --git a/docs/user_guide/fairness.rst b/docs/user_guide/fairness.rst index ef1a8e14..d78c5c83 100644 --- a/docs/user_guide/fairness.rst +++ b/docs/user_guide/fairness.rst @@ -1,125 +1,65 @@ -Fairness +Overview ======== -This is a general guide to assessing fairness in supervised learning tasks (classification, regression) -using structural datasets. - -Literature Review ------------------ - In most supervised learning problems, a model is trained on a set of features :math:`X`, to predict or estimate a target variable :math:`Y`. The resulting prediction of a trained model is denoted by :math:`R`. Additionally, we -define :math:`A` as a subset of :math:`X`, which corresponds to legally protected attributes such as -ethnicity, gender, etc. - -.. math:: +define :math:`A`, a subset of :math:`X`, which corresponds to legally protected attributes. - \underbrace{\text{Title}\hspace{2mm}\overbrace{\text{Gender}\hspace{2mm} \text{Ethnicity}}^{A}\hspace{2mm}\text{Legal Status}}_{X}\hspace{3mm}\overbrace{\text{Raw Score}}^{Y}\hspace{3mm}\overbrace{\text{Predicted Score}}^{R} +.. raw:: html -.. There are multiple definitions of fairness in literature. +