# Intro to Reviewers

All `Reviewer` types in this package follow the same general framework for setting up the dashboard. 

This notebook will walk you through how to interact with a basic reviewer, which extends to other `Reviewer` types.

If you want to know how to **create** your own custom reviewer, see `Developer_Jupyter_Reviewer_Tutorial`

## Installation

1. Download the repository: `git clone git@github.com:getzlab/JupyterReviewer.git` 
1. `cd JupyterReviewer`
1. Create an environment: `conda create --name <my-env> --file requirements.txt`
1. Install package: `pip install -e .`

## Imports
For this tutorial, we just need a few packages

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import pandas as pd
import os

## The high level steps

There are 7 steps to get started reviewing in your jupyter notebook:
1. Pick your reviewer
1. Instatiate the selected reviewer
1. Set up the review data
1. Set up the app
1. Set up defaults settings
1. Run your reviewer!

It may seem to be a lot of steps, but it will actually only look something like this

First, we will go through some basic terminology. Next, we will walk through each step below using basic functionality. In the last section, we will go through more advanced options for each step.

## General Terminology

- A **Subject Type** refers to what "level" or "item" you are reviewing. For example, you may be reviewing samples (ie purity), participants (ie clinical data or comprehensive data), mutations (checking if it is an artifact), etc.

- A **Subject** is an individual item you are currently reviewing or manually annotating.

- A **Reviewer** is the class in the `JupyterReviewer` package that manages the data you want to review and implements a user interface for you to review each **Subject** one at a time, view/interact its corresponding data, and make annotations.

- An **Annotation** is some value associated with a given **Review Subject** given some kind of analysis or manual observation. A Review Subject can have multiple annotations. An Annotation may have certain parameters about what kinds of values are allowed (a list of options, or a range of values, etc.)

- **Data** in this context is actually an object that stores collection of tables. It includes the information to review for your Review Subject type (ie sample table from Terra), and the annotations you will eventally make for it. Its purpose is to "freeze" the data you are reviewing and couple the data with the annotations you make. Once this object is made, it will store all the data in a pickle file. From then on, only your annotations can be (easily) modified in a specific way (by the `ReviewData` class)

- An **ReviewDataApp** is a user interface to display your Data, such as displaying charts or graphs of the information associated with your Review Subject Type, and a way to add annotations for the currently displayed Review Subject. In this package, we use `plotly.dash` to create dashboards for this purpose.

![](https://github.com/getzlab/JupyterReviewer/blob/master/images/Reviewer%20Diagram.jpg)


## Basic Reviewer Run

### 1. Select your reviewer

For this tutorial, we will use `MyCustomReviewer`. Go to `JupyterReviewer/Reviewers/` to view other available prebuilt options. 

**This tutorial only applies to any `Reviewer` that inherits from `JupyterReviewer.ReviewerTemplate`**

Let's import `MyCustomReviewer`. This reviewer is built to review some dummy sample data. 

In [3]:
from JupyterReviewer.Reviewers.ExampleReviewer import ExampleReviewer

### 2. Instantiate the selected reviewer

This step is super simple, just create an object, no parameters required

In [4]:
my_reviewer = ExampleReviewer()

### 3. Set up the review data

At this step you give `my_reviewer` data you want to start reviewing. 

The type of data you need depends on the type of reviewer you are using. You can see what data it requires by typing the reviewer's `.set_review_data()` in a cell, place your cursor at the end and press `Shift+Tab`.

You will see these following required parameters:
- `data_pkl_fn: pathlib.Path`: Path to save your data in a pickle file.
- `description: str`: describe what data you are reviewing. It's a good idea to also include why

The remaining parameters are optional, and allows you to prefill the annotation and history information. See X section below for more information.

Depending on the reviewer you may additional arguments or tables (`**kwargs`), which should be included in the docstring. Alternatively, you can look at the source code directly.

`MyCustomReviewer` only requires a single dataframe.

> What is the `data_pkl_fn` for? All the data you review is used to create a `Data` object. This object saves all the data you want to review, and a corresponding `annot_df` and `history_df` dataframes. 
> - The `annot_df` dataframe stores the annotations you are recording for each item you are reviewing
> - The `history_df` dataframe stores all the changes that have been made to the annotation table
>
> The `description` parameter is for you to describe the source of the data and what the review process is for.
>
> **The purpose of this is to always couple the annotations with the data that was actually used to review it.**


In [5]:
# Load data
fn = 'example_data/Jupyter_Reviewer_Tutorial/data_to_review_example.tsv'
df = pd.read_csv(fn, sep='\t')
df = df.set_index('sample_id')
df.head()

Unnamed: 0_level_0,gender,age,tissue_origin,treatments_file,mutations_file
sample_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
sample_0,female,76,skin,./example_data/Jupyter_Reviewer_Tutorial/treat...,./example_data/Jupyter_Reviewer_Tutorial/mut_v...
sample_1,female,36,lung,./example_data/Jupyter_Reviewer_Tutorial/treat...,./example_data/Jupyter_Reviewer_Tutorial/mut_v...
sample_2,female,67,bone marrow,./example_data/Jupyter_Reviewer_Tutorial/treat...,./example_data/Jupyter_Reviewer_Tutorial/mut_v...
sample_3,male,53,breast,./example_data/Jupyter_Reviewer_Tutorial/treat...,./example_data/Jupyter_Reviewer_Tutorial/mut_v...
sample_4,male,37,skin,./example_data/Jupyter_Reviewer_Tutorial/treat...,./example_data/Jupyter_Reviewer_Tutorial/mut_v...


In [6]:
mut_df = pd.read_csv(df.iloc[0]['mutations_file'], sep='\t')
mut_df.head()

Unnamed: 0,gene,vaf,sample_id,cov,t_alt_count,t_ref_count
0,gene_0,0.324378,sample_0,184,59,125
1,gene_1,0.598717,sample_0,109,65,44
2,gene_2,0.664248,sample_0,189,125,64
3,gene_3,0.431641,sample_0,120,51,69
4,gene_4,0.058704,sample_0,149,8,141


In [10]:
if not os.path.exists('data'):
    os.mkdir('data')
    
output_pkl_fn = './data/example_reviewer_data.5.pkl'
my_reviewer.set_review_data(data_pkl_fn=output_pkl_fn, 
                            description='Intro to reviewers review session part 2',
                            sample_df=df,
                            preprocessing_str='Testing preprocessing')


Loading existing data pkl file



Try re-running the cell block above. It should give you a warning. Now, any time that file `./data/example_reviewer_data.pkl` is passed to *any* reviewer's `set_review_data()` parameter `data_pkl_fn`, it will simply load whatever is currently in that pickle file. It will NOT update any of its attributes to whatever the value of the other parameters (in this case, `description`, `sample_df`, `preprocessing_str`, etc.).

> **Why do this?** Often times we use `dalmatian` to pull data from Terra workspaces. Sometimes the data in Terra changes because we run workflows multiple times with different parameters. We want to avoid losing what data we were originally looking at to produce the annotations we currently have.
>
> If you do want to "update" your data, make a new session pointing to a different pickle file path. More on this later.


You can also see that changes were made to the input dataframe

In [11]:
my_reviewer.list_data_attributes()

dict_keys(['index', 'description', 'annot_col_config_dict', 'annot_df', 'history_df', 'df'])

In [12]:
my_reviewer.get_data_attribute('df').head()

Unnamed: 0_level_0,gender,age,tissue_origin,treatments_file,mutations_file,new_column
sample_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
sample_0,female,76,skin,./example_data/Jupyter_Reviewer_Tutorial/treat...,./example_data/Jupyter_Reviewer_Tutorial/mut_v...,Testing preprocessing
sample_1,female,36,lung,./example_data/Jupyter_Reviewer_Tutorial/treat...,./example_data/Jupyter_Reviewer_Tutorial/mut_v...,Testing preprocessing
sample_2,female,67,bone marrow,./example_data/Jupyter_Reviewer_Tutorial/treat...,./example_data/Jupyter_Reviewer_Tutorial/mut_v...,Testing preprocessing
sample_3,male,53,breast,./example_data/Jupyter_Reviewer_Tutorial/treat...,./example_data/Jupyter_Reviewer_Tutorial/mut_v...,Testing preprocessing
sample_4,male,37,skin,./example_data/Jupyter_Reviewer_Tutorial/treat...,./example_data/Jupyter_Reviewer_Tutorial/mut_v...,Testing preprocessing


### 4. Set up the app

Depending on the reviewer, you may have options to customize how the dashboard app will be displayed. In this example, we need to provide information about which columns in the input table we want to use as the mutation file and which of the columns in the sample table to display.

Feel free to change the `sample_cols` parameter to any set of columns or order from the input dataframe above.

In [13]:
my_reviewer.set_review_app(mut_file_col='mutations_file', 
                           sample_cols=['gender', 'age', 'tissue_origin'])

### 5. Set up default settings

For now, lets just set up the default annotations to record and other settings. More details in the Advanced section below.

In [14]:
my_reviewer.set_default_review_data_annotations_configuration()
my_reviewer.set_default_autofill()

In [15]:
my_reviewer.get_annot().head()

Unnamed: 0,Notes,Flag
sample_0,,
sample_1,,
sample_2,,
sample_3,,
sample_4,,


### 6. Run the reviewer!

You can run the app inside the notebook (`mode='inline'`) or in a separate window (`mode='external'`, the default).

If you are running your notebook in a VM, you will need to create an ssh connection, and specify the host and port address

In [19]:
! pip freeze

[0magutil==4.1.1
aiohttp==3.8.1
aiosignal==1.2.0
alabaster @ file:///home/ktietz/src/ci/alabaster_1611921544520/work
anaconda-client==1.11.0
anaconda-navigator==2.0.3
anaconda-project @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_52iciqycjz/croots/recipe/anaconda-project_1660339902500/work
ansi2html==1.7.0
anyio @ file:///opt/concourse/worker/volumes/live/fdfc134d-03e4-4e6b-4eab-c131ac108813/volume/anyio_1644481717647/work/dist
applaunchservices @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_96v71vcny2/croots/recipe/applaunchservices_1661854626389/work
appnope @ file:///opt/concourse/worker/volumes/live/5f13e5b3-5355-4541-5fc3-f08850c73cf9/volume/appnope_1606859448618/work
appscript @ file:///opt/concourse/worker/volumes/live/82e8b4c7-2416-4d10-509e-144ca79d9b1d/volume/appscript_1611426996703/work
argh==0.26.2
argon2-cffi @ file:///opt/conda/conda-bld/argon2-cffi_1645000214183/work
argon2-cffi-bindings @ file:///opt/concourse/worker/volum

In [18]:
my_reviewer.run(collapsable=True)

Dash app running on http://0.0.0.0:8050/


You can set `collapsable=False` if you do not want collapsable components.

Go ahead and "review" some of the data by selecting different rows in the drop down menu and inputting annotations into the input form at the top left. Press `submit`, and you will see your inputs update the history table. If you change your annotations for that sample again, the annotation table will keep the most recent change, but you will see in the history table it will show both your new and previous annotations.


For this exercise, fill in annotations for at least 8 samples, where at least four of them you annotate `Remove` for the `Flag` annotation

You can view your progress by accessing the annotation table:

In [14]:
# only viewing samples with annotations
my_reviewer.get_history().dropna()

Unnamed: 0,index,timestamp,source_data_fn,Notes,Flag
0,0,2022-07-22 11:29:52.585575,./data/example_reviewer_data.pkl,Hello!,Keep
0,1,2022-07-22 11:30:03.995091,./data/example_reviewer_data.pkl,Something fishy,Remove
0,3,2022-07-22 11:42:26.443766,./data/example_reviewer_data.pkl,Looks good,Keep
0,6,2022-07-22 11:42:37.593032,./data/example_reviewer_data.pkl,Perfect example,Keep
0,17,2022-07-22 11:42:50.034915,./data/example_reviewer_data.pkl,Sketchy,Remove
0,16,2022-07-22 11:43:02.220459,./data/example_reviewer_data.pkl,,Keep
0,13,2022-07-22 11:43:11.262853,./data/example_reviewer_data.pkl,not sure,
0,0,2022-07-22 11:48:50.382558,./data/example_reviewer_data.pkl,Never mind,Remove
0,2,2022-07-22 11:49:07.147002,./data/example_reviewer_data.pkl,MIssing driver gene,Remove
0,2,2022-07-22 11:49:14.797824,./data/example_reviewer_data.pkl,Missing driver gene,Remove


Now you can export this table to a file that you can then share, upload to Terra, or use for further analysis.

In [16]:
from datetime import date
export_dir = f'data/example_reviewer_data.exported_{date.today()}'
if not os.path.exists(export_dir):
    os.mkdir(export_dir)
    
my_reviewer.review_data_interface.export_data(export_dir)

## Advanced Reviewer Run

We will go through each step again, but show how you can further customize your reviewer for your needs.

Let's suppose you are done reviewing all the samples in `my_reviewer` above, and now I want to do some more exploration on the samples I decided to `Remove`.

### 1. and 2. Pick and instantiate your reviewer

We will just use the same one as before, but this time create a separate reviewer

In [17]:
my_reviewer_2 = ExampleReviewer()

### 3. Set up the review data

For this exercise, let's identify which samples you kept from the previous review. 


In [None]:
keep_samples_index = \
    my_reviewer.get_annot().loc[
        my_reviewer.get_annot()['Flag'] == 'Remove'
    ].index.tolist()

print(keep_samples_index)

It would be very useful in this new review session to know why I initially thought I should remove those samples. We can include the previous annotation and history data in this new review session. 

You may have seen in `ReviewerTemplate.set_review_data` there were several other parameters that were not discussed. Those parameters are to allow you to "pre-fill" annotations and history.

There are two main ways to do this:

1. **Manually input the `annot_df`, `annot_col_config_dict`, and `history_df` yourself.** This may be appropriate if you have some post processing done on the annotations done separately, or you only have access to an `annot_df` but no history.
1. **Preferred: Load existing data pickle object or exported data files** If you are just continuing review from a previous review session. You can do this by either passing in the path to the data pickle file (`load_existing_data_pkl_fn`), or to a directory that contains the exported tables from a review session (`load_existing_exported_data_dir`). 

> The latter method is preferred because the history tables include a column that indicates the source of the 
> annotations (column `source_data_fn`), which allows you to easily go back to that review session and read the 
> description. A work around is to manually generate your own history table with the columns corresponding to your
> input annotation table, plus `['index', 'timestamp', 'source_data_fn']`. 


**NOTE**: you still need to set up the input data you want to review yourself. Often times new or updated data is available. If you want to use the data you used for original review sessions, you can pull directly from the data object or the exported files. 


For this exercise, let's pass in the pickle file directly.

In [None]:
new_output_pkl_fn = './data/example_reviewer_data_2.pkl'
my_reviewer_2.set_review_data(index=keep_samples_index,
                              data_pkl_fn=new_output_pkl_fn,
                              description="Reviewing more data to see if I can explain why these samples should be removed.",
                              sample_df=my_reviewer.get_data_attribute('df'), # reuse exactly the same data as before
                              load_existing_data_pkl_fn=output_pkl_fn,
                              preprocessing_str='New data'
                             )


### 4. Set up the app

In the previous review session, all we saw was a mutation table. What if we also want to view the treatment data?

You can add additional tables and graphs to the dashboard.

First, run `set_review_app` like before. Note that if you have updated files to plot, make sure you reference the correct column name. In this case, we are still reviewig the original files in `mutations_file`.

In [None]:
my_reviewer_2.set_review_app(mut_file_col='mutations_file', 
                             sample_cols=['sample_id', 'gender', 'age', 'tissue_origin', 'mutations_file'])


Then you can add your own `AppComponent`. For more advanced features, see `Developer_JupyterReviewer_Tutorial.ipynb`.

For now, let's just use the built in function to add a table.

In [None]:
pd.read_csv(my_reviewer.get_data_attribute('df').iloc[0]['treatments_file'], sep='\t')

In [None]:
my_reviewer_2.app.add_table_from_path(data_table_source='df', # reference which table in the Data object to use.
                                      table_title='Treatment file',
                                      component_id='treatment-component-id',
                                      table_fn_col='treatments_file', 
                                      table_cols=['treatment_name', 'response', 'tx_start'])


## 5. Set up settings

Above, we just used default settings already implemented for us. This included what annotations to record and how they are displayed in the app. 

To create your own annotations, use `add_review_data_annotation(annot_name: str, data_annot: DataAnnotation)` for each annotation you want to use. This will create new columns in the `review_data.data.annot_df` dataframe if `annot_name` does not already exist. If it does, it will update the annotation's metadata with `data_annot`, which specifies information about the data type of the annotation (string, float, etc.), valid options, and default values. 

Similarly, you can specify or change how your annotation inputs will be displayed in the app with `add_review_data_annotations_app_display(name: str, app_display_type: str)`. `name` must refer to an annotation column in the `annot_df` dataframe (determined with `add_review_data_annotation()` or default settings). `app_display_type` indicates what kind of input format to display (id checklist, text, float, etc.).


Let's keep the default configuration, but add:
- annotation to indicate how sure we are we want to remove or keep the sample
- annotation to indicate what color the mutation histogram was
- modify the display of `Notes`


In [None]:
from JupyterReviewer.Data import DataAnnotation
my_reviewer_2.set_default_review_data_annotations_configuration() # sets both annotation columns and display

# Add additional annotation
my_reviewer_2.add_review_data_annotation(
    annot_name='Confidence', 
    review_data_annotation=DataAnnotation(
        annot_value_type='string', 
        options=['Confident', 'Unsure'], 
        default='Confident'))

my_reviewer_2.add_review_data_annotation(
    annot_name='Histogram color', 
    review_data_annotation=DataAnnotation(
        annot_value_type='string', 
        options=['red', 'blue', 'green']))

# Specify display type for additional annotation
my_reviewer_2.add_review_data_annotations_app_display(
    annot_name='Confidence', 
    app_display_type='select')
my_reviewer_2.add_review_data_annotations_app_display(
    annot_name='Histogram color', 
    app_display_type='select')

# Edit 'Notes' annotation to be displayed with 'text' instead of 'textarea'
my_reviewer_2.add_review_data_annotations_app_display(
    annot_name='Notes', 
    app_display_type='text')


### 5.1 Autofill

Sometimes there are parts of the dashboard that can calculate annotations on the fly. In this case, say we want to record what color the histogram plot was last plotted with. It would be tedious and error prone for a reviewer to have to keep copying data into the correpsonding input. 

Autofill allows the you to simply press a button and it will take the data in the dashboard from specified components and fill the annotation input panel for you.

Depending on the app, there will already be a default autofill setting. For those, just call 
```
my_reviewer_2.set_default_autofill()
```

In this case, `ExampleReviewer` has no autofill specified. You can add your own using `add_autofill()`. This is also useful when you have added your own custom components.

You may have to look at the source code to see what components were added and what layout components have values you can access.


In [None]:
from dash.dependencies import State
my_reviewer_2.add_autofill(autofill_button_name='Mut vafs',
                           fill_value=State('mut-figure-color-radioitem', 'value'),
                           annot_col='Histogram color'
                           )

## 6. Run the app!

In [None]:
my_reviewer_2.run()

Pass in a dataframe with the index that matches the index from the Data object to also traverse your data through a table. You specify which columns you want to include, in addition to the columns in your annotation table which are added automatically.

In [None]:
my_reviewer_2.run(review_data_table_df=df.reset_index().loc[keep_samples_index, ['gender', 'age', 'tissue_origin']], review_data_table_page_size=7)

You should see that the new inputs was added to the annotation panel. Additionally, the new table with treatment data is added to the bottom.

Select a sample from the dropdown and change the color of the histogram. 

Then, go back up to the annotaiton panel and press the `Mut val` button. This should autofill the `Histogram color` input to the same color you selected.

# Final words

You can always go back and change the parameters of setting the app and annotation configurations, adding or editing the components of the app, etc.

The one thing that will NOT change after initial instantiation is `set_review_data()`. This can only be set once per `data_pkl_fn`. If you wish to start over, you must manually delete the pickle file. Else, create a new one.

This feature allows you to make these changes to the app on the fly, and you can restart the notebook and it will not change the annotations you have already made.

