From 1bf55e7318a914125f60f46b2f7770d83e58278b Mon Sep 17 00:00:00 2001
From: Gaurav Gupta <47334368+gaugup@users.noreply.github.com>
Date: Wed, 20 Apr 2022 17:01:11 -0700
Subject: [PATCH] Add pre-built cohort into adult census notebook (#1243)

* [WIP] Add pre-built cohort into adult census notebook

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* erroranalysis version bump in raiwidgets to 0.1.31 (#1245)

* Make cohrtData empty list in case no pre-bdefined cohorts are injected (#1247)

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Simplify the train pipeline responsibleaidashboard-census-classification-model-debugging.ipynb (#1195)

* Simplify the train pipeline responsibleaidashboard-census-classification-model-debugging.ipynb

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Address code review comments

* Update notebooks/responsibleaidashboard/responsibleaidashboard-census-classification-model-debugging.ipynb

Co-authored-by: Roman Lutz <romanlutz13@gmail.com>

Co-authored-by: Roman Lutz <romanlutz13@gmail.com>
Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Add regression test for pre-defined cohorts in raiwidgets (#1249)

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* color (#1248)

* Add feature importance box & bar chart (#1241)

* refactor

* build

* build

* temp

* temp

* temp

* temp

* box

* cache

* e2e

* e2e

* fix

* e2e fix

* e2e

* fix e2e

* widget

* widget

* fix

* widget

* e2e

* e2e

* e2e

* test

* test

* PreBuilt cohorts UX changes (#1242)

* Intial SDK implementation cohorts

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Add basic validationf for cohorts

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Add serialized version of cohort config to ResponsibleAiDashboard

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Add more tests cohorts

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* fix broken builds due to pip upgrade which broke pip-tools (#1185)

* refactor matrix filter and area state to be private static (#1179)

* Change variable name

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Add more cohort filters

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Add cohort data to dashboard e2e

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Add more cohorts filters

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Document various data validation for cohorts

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Add new interfaces for pre-built cohort

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Add more cohort filters

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Add prebuilt cohort walking logic in UI and add more data validation scenarios

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Add basic data validation checks

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Add logic to translate the Index cohort filter

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Remove commented out code

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Add SDK validations for Index based cohort filter

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Add code for validating classification outcome

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Add error filter validations and add tests

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Add fake cohorts for regression dataset

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Add fake cohorts for multi-class classification dataset

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Add handling of regression filter

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Add support for classification outcome in UI

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Add validations for Predicted Y and True Y cohort filters

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Add UI code to handle prediced Y and true Y for pre-built cohort filters

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Add cohort validation with test data to raiwidgets

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Add tests for validating Predicted/True Y cohorts

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Add UI support for TrueY/PredictedY for classification

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Rename cohort_filter_list to cohort_list

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Rename UI varibles to match SDK

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Fix duplicate cohort name

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Add SDK cohorts to notebook

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Add dataset validations and add categorical features

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Add validations for categorical_features

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Fix sorted imports

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Add code for translating categorical values

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Move cohort processing to a separate file

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Fix code review comments

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Refactor cohort translated function into different small functions

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Change to lowercase for outcome

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Fix code review comments

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Refactor cohort_list validations and converge pytest common functions into fixtures

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Add conftest into raiwidgets tests

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Add validations for cohort list

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Add cohortData test

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Fix sorted imports

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* isort fix

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Add UI unit tests for cohort translation

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Add more checks in UI uni test

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Add UI tests for regression cohorts

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* REmove notebook change

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Fix typescript build

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Change cohort filter values so that cohort filters non-zero points

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Fix for empty cohort list

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Simplify the train pipeline responsibleaidashboard-census-classification-model-debugging.ipynb (#1195)

* Simplify the train pipeline responsibleaidashboard-census-classification-model-debugging.ipynb

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Address code review comments

* Update notebooks/responsibleaidashboard/responsibleaidashboard-census-classification-model-debugging.ipynb

Co-authored-by: Roman Lutz <romanlutz13@gmail.com>

Co-authored-by: Roman Lutz <romanlutz13@gmail.com>

* Propagate error strings instead of raising exceptions

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Fix code issues

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Fix code review comments

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Fix code review comments

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

Co-authored-by: Ilya Matiach <ilmat@microsoft.com>
Co-authored-by: Roman Lutz <romanlutz13@gmail.com>

* Make _cohort.py module a public module (#1253)

* Make _cohort.py a public module

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Add missing file

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* fix notebook build failures due to pywinpty dependency release failing in python 3.6 (#1257)

* fix notebook build failures due to pywinpty dependency release failing in python 3.6

* build pywinpty from conda instead

* add lowerbound

* fixup

* fixup

* Add supported models and data types to README.md responsibleai (#1259)

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* make getting-started notebook a markdown file showing APIs (#1223)

* refactor tabs out of RAI dashboard into a separate component (#1256)

* Add individual causal scatter chart (#1258)

* temp

* refactor

* test

* style fix

* comment

* minor fix to url for responsibleai package in setup.py (#1260)

* Fix UX e2e tests and address code review comments

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Fix eslint

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Address review comments

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

* Reset the number of samples in test dataset

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

Co-authored-by: Ilya Matiach <ilmat@microsoft.com>
Co-authored-by: Roman Lutz <romanlutz13@gmail.com>
Co-authored-by: Bo Zhang <71688188+zhb000@users.noreply.github.com>
---
 .../modelAssessmentDatasets.ts                | 13 ++++
 .../describeModelPerformanceSideBar.ts        | 14 +++-
 ...ensus-classification-model-debugging.ipynb | 78 ++++++++++++++++++-
 3 files changed, 101 insertions(+), 4 deletions(-)

diff --git a/apps/widget-e2e/src/describer/modelAssessment/modelAssessmentDatasets.ts b/apps/widget-e2e/src/describer/modelAssessment/modelAssessmentDatasets.ts
index 30806fad91..c6548aeb25 100644
--- a/apps/widget-e2e/src/describer/modelAssessment/modelAssessmentDatasets.ts
+++ b/apps/widget-e2e/src/describer/modelAssessment/modelAssessmentDatasets.ts
@@ -56,6 +56,14 @@ const modelAssessmentDatasets = {
       "capital-loss"
     ],
     modelStatisticsData: {
+      cohortDropDownValues: [
+        "All data",
+        "Cohort Age and Hours-Per-Week",
+        "Cohort Marital-Status",
+        "Cohort Index",
+        "Cohort Predicted Y",
+        "Cohort True Y"
+      ],
       defaultXAxis: "Probability : <=50K",
       defaultXAxisPanelValue: "Prediction probabilities",
       defaultYAxis: "Cohort",
@@ -115,6 +123,7 @@ const modelAssessmentDatasets = {
       "s6"
     ],
     modelStatisticsData: {
+      cohortDropDownValues: ["All data"],
       defaultXAxis: "Error",
       defaultXAxisPanelValue: "Error",
       defaultYAxis: "Cohort",
@@ -180,6 +189,7 @@ const modelAssessmentDatasets = {
     ],
     isRegression: true,
     modelStatisticsData: {
+      cohortDropDownValues: ["All data"],
       defaultXAxis: "Error",
       defaultXAxisPanelValue: "Error",
       defaultYAxis: "Cohort",
@@ -272,6 +282,7 @@ const modelAssessmentDatasets = {
       "YrSold"
     ],
     modelStatisticsData: {
+      cohortDropDownValues: ["All data"],
       defaultXAxis: "Probability : Less than median",
       defaultXAxisPanelValue: "Prediction probabilities",
       defaultYAxis: "Cohort",
@@ -364,6 +375,7 @@ const modelAssessmentDatasets = {
       "YrSold"
     ],
     modelStatisticsData: {
+      cohortDropDownValues: ["All data"],
       hasModelStatisticsComponent: false,
       hasSideBar: false
     },
@@ -416,6 +428,7 @@ const modelAssessmentDatasets = {
     ],
     isMulticlass: true,
     modelStatisticsData: {
+      cohortDropDownValues: ["All data"],
       defaultXAxis: "Predicted Y",
       defaultXAxisPanelValue: "Prediction probabilities",
       defaultYAxis: "Cohort",
diff --git a/apps/widget-e2e/src/describer/modelAssessment/modelStatistics/describeModelPerformanceSideBar.ts b/apps/widget-e2e/src/describer/modelAssessment/modelStatistics/describeModelPerformanceSideBar.ts
index e540b1c9f1..d12b2f2463 100644
--- a/apps/widget-e2e/src/describer/modelAssessment/modelStatistics/describeModelPerformanceSideBar.ts
+++ b/apps/widget-e2e/src/describer/modelAssessment/modelStatistics/describeModelPerformanceSideBar.ts
@@ -19,7 +19,12 @@ export function describeModelPerformanceSideBar(
     });
 
     it("Side bar should be updated with updated values", () => {
-      cy.get(Locators.MSSideBarCards).should("have.length", 1);
+      cy.get(Locators.MSSideBarCards).should(
+        "have.length",
+        dataShape.modelStatisticsData?.cohortDropDownValues
+          ? dataShape.modelStatisticsData?.cohortDropDownValues.length
+          : 0
+      );
       cy.get(`${Locators.MSCRotatedVerticalBox} button`)
         .click()
         .get(
@@ -50,7 +55,12 @@ export function describeModelPerformanceSideBar(
       cy.get(`${Locators.MSCRotatedVerticalBox}`).contains(
         dataShape.modelStatisticsData?.defaultYAxis || "Cohort"
       );
-      cy.get(Locators.MSSideBarCards).should("have.length", 1);
+      cy.get(Locators.MSSideBarCards).should(
+        "have.length",
+        dataShape.modelStatisticsData?.cohortDropDownValues
+          ? dataShape.modelStatisticsData?.cohortDropDownValues.length
+          : 0
+      );
     });
 
     it("Should have dropdown to select cohort when y axis is changed to different value than cohort", () => {
diff --git a/notebooks/responsibleaidashboard/responsibleaidashboard-census-classification-model-debugging.ipynb b/notebooks/responsibleaidashboard/responsibleaidashboard-census-classification-model-debugging.ipynb
index 9fcb2a1390..783dafb5ad 100644
--- a/notebooks/responsibleaidashboard/responsibleaidashboard-census-classification-model-debugging.ipynb
+++ b/notebooks/responsibleaidashboard/responsibleaidashboard-census-classification-model-debugging.ipynb
@@ -252,6 +252,80 @@
     "rai_insights.compute()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "b84c6c0d",
+   "metadata": {},
+   "source": [
+    "Compose some cohorts which can be injected into the `ResponsibleAIDashboard`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0994b7d6",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from raiwidgets.cohort import Cohort, CohortFilter, CohortFilterMethods\n",
+    "\n",
+    "# Cohort on age and hours-per-week features in the dataset\n",
+    "cohort_filter_age = CohortFilter(\n",
+    "    method=CohortFilterMethods.METHOD_LESS,\n",
+    "    arg=[65],\n",
+    "    column='age')\n",
+    "cohort_filter_hours_per_week = CohortFilter(\n",
+    "    method=CohortFilterMethods.METHOD_GREATER,\n",
+    "    arg=[40],\n",
+    "    column='hours-per-week')\n",
+    "\n",
+    "user_cohort_age_and_hours_per_week = Cohort(name='Cohort Age and Hours-Per-Week')\n",
+    "user_cohort_age_and_hours_per_week.add_cohort_filter(cohort_filter_age)\n",
+    "user_cohort_age_and_hours_per_week.add_cohort_filter(cohort_filter_hours_per_week)\n",
+    "\n",
+    "# Cohort on marital-status feature in the dataset\n",
+    "cohort_filter_marital_status = CohortFilter(\n",
+    "    method=CohortFilterMethods.METHOD_INCLUDES,\n",
+    "    arg=[\"Never-married\", \"Divorced\"],\n",
+    "    column='marital-status')\n",
+    "\n",
+    "user_cohort_marital_status = Cohort(name='Cohort Marital-Status')\n",
+    "user_cohort_marital_status.add_cohort_filter(cohort_filter_marital_status)\n",
+    "\n",
+    "# Cohort on index of the row in the dataset\n",
+    "cohort_filter_index = CohortFilter(\n",
+    "    method=CohortFilterMethods.METHOD_LESS,\n",
+    "    arg=[20],\n",
+    "    column='Index')\n",
+    "\n",
+    "user_cohort_index = Cohort(name='Cohort Index')\n",
+    "user_cohort_index.add_cohort_filter(cohort_filter_index)\n",
+    "\n",
+    "# Cohort on predicted target value\n",
+    "cohort_filter_predicted_y = CohortFilter(\n",
+    "    method=CohortFilterMethods.METHOD_INCLUDES,\n",
+    "    arg=['>50K'],\n",
+    "    column='Predicted Y')\n",
+    "\n",
+    "user_cohort_predicted_y = Cohort(name='Cohort Predicted Y')\n",
+    "user_cohort_predicted_y.add_cohort_filter(cohort_filter_predicted_y)\n",
+    "\n",
+    "# Cohort on predicted target value\n",
+    "cohort_filter_true_y = CohortFilter(\n",
+    "    method=CohortFilterMethods.METHOD_INCLUDES,\n",
+    "    arg=['>50K'],\n",
+    "    column='True Y')\n",
+    "\n",
+    "user_cohort_true_y = Cohort(name='Cohort True Y')\n",
+    "user_cohort_true_y.add_cohort_filter(cohort_filter_true_y)\n",
+    "\n",
+    "cohort_list = [user_cohort_age_and_hours_per_week,\n",
+    "               user_cohort_marital_status,\n",
+    "               user_cohort_index,\n",
+    "               user_cohort_predicted_y,\n",
+    "               user_cohort_true_y]"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "elder-fleet",
@@ -267,7 +341,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "ResponsibleAIDashboard(rai_insights)"
+    "ResponsibleAIDashboard(rai_insights, cohort_list=cohort_list)"
    ]
   },
   {
@@ -510,7 +584,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.7.11"
+   "version": "3.6.12"
   }
  },
  "nbformat": 4,