# Intro to Reviewers

## The problem

### Why and how we review data

Part of any study is ensuring data across multiple sources are consistent, and coming to conclusions about the data given additional context. Studies are often novel, and frequently there are steps along the way that do not have existing automation techniques. Even of such techniques exist, one may still be skeptical in case the data breaks any assumptions. 

Typically, those reviewing all this data opens a bunch of windows to view data from different places (a clinical information spreadsheet from a collaborator, a few outputs from a Terra workflow, and/or previous notes from another reviewer, etc.). Next they look at all the data and keep notes in yet a separate document, such as a spreadsheet or digital/physical notes. Then, they go row by row, sample by sample, until they finish.

### Why we need something better

While straightforward to do in theory, this review method is very brittle, error prone, and very time consuming. 

Reviewing can take a very long time, such as reviewing large datasets on the order of hundreds to thousands of data points, or if the review needs to be repeated multiple times if something upstream changes. 

Some review processes are iterative, or new information is gained from some other source to inform the review process, or we need to pass off the review process to someone else. We should be able to easily incorporate old data with new data, and share that history and information with others.

Some reviews require calculations, or exploring the the data in ways that a static plot cannot provide. Some Terra workflows do produce some interactive html files, but this is rare. Sometimes, a reviewer realizes in the process of reviewing a different kind of plot could be very informative for the review process. It should be easy to generate such a plot on the fly without having to modify or create a new Terra workflow, or opening a new notebook to calculate manually.

Lastly, humans are humans, and we make mistakes. It can be very tedious to maintain and update a large spreadsheet with hundreds of rows and multiple columns to annotate. Annotations are difficult to enforce in this setting, and changes are difficult to track. 



## The Solution: Jupyter notebook and Plotly-Dash!

Most ACBs use jupyter notebooks for their analysis. So why not keep the review process in jupyter notebooks too? Additionally, there already exist great tools for making interactive figures and dashboards. We can use these packages to help automatically consildate information and create figures that will make it easier to review, enforce annotation standards, and track changes over time.

The `JupyterReviewer` package makes it simple to create dashboards for reviewing data. Developers and users can easily customize their dashboards to incorpate any data they like, and automatically provides a reviewer an easy way to annotate their data, track changes, and share their annotations with others.

Below is an overview of what you need to know about this package to get started.


# Installation

1. Download the repository: `git clone git@github.com:getzlab/JupyterReviewer.git` 
1. `cd JupyterReviewer`
1. Create an environment: `conda create --name <my-env> --file requirements.txt`
1. Install package: `pip install -e .`

# ReviewData object

The `ReviewData` object is simply 3 tables that tracks what data you are looking at for your review, the annotations you made, and the history of your annotations. It is meant to mirror how one may go about annotations by going row by row in a spreadsheet, and filling in/editing the corresponding columns. The object is saved to a pickle file object (provided by the user), which can be shared or exported to a tsv file.

The only rule is each row corresponds to specific data item you want to annotated. It is independent of the other rows in the table. it can be a sample, a participant, a pair, a mutation, etc.

Recommendations:
- Do as much automation for annotations as possible first. You can use this tool to manually check and update these annotations
- Preprocess your files so when each sample's data is rendered, it will take less time to switch between samples.

How do you add and save annotations? The `ReviewDataApp` handles this. 


# ReviewDataApp object

The `ReviewDataApp` is simply a wrapper to make it easy to review data and add annotations, with the additional benefit of adding custom tables and graphs to help analyze and see all the relevant data all at once. It is built around plotly dash, and package that makes it easy to create interactive figures and custom dashboards. 

# Get started

1. Get a pandas dataframe with the data you want to review. Each item to review must have a unique index name (ie a sample_id, participant_id, etc.)
1. Pick or create a reviewer. You have two options: 
    1. Import an existing reviewer (`from JupyterReviewer.Reviewers import PurityReviewer`)
    1. Create a reviewer from scratch (see `Developer_Jupyter_Reviewer_Tutorial.ipynb`) and import

1. Instantiate the reviewer
    1. Set the review object by passing in (1) your dataframe with the data you want to review, (2) a pickle path to save the `ReviewData` object, and (3) any additional parameters required to setup the ReviewData object. You can add or changed annotation columns if you'd like with 
    1. Set the review app by passing in any of the required parameters
1. Run the app

Prior to running the app, you can also modify the pre-built reviewer:
- `reviewer.review_data.add_annotation({'column name': ReviewDataAnnotation()})`: Add or change an annotation column configuration.
- `reviewer.app.add_component(AppComponent(), **kwargs)`: Add a new component
- `reviewer.add_autofill()`: if you added another column with `add_annotation()` from above, you can specify how to autofill the annotation panel from an existing component

To learn how to add these customizations, see `Developer_Jupyter_Reviewer_Tutorial.ipynb`


Once you have chosen or created your reviewer, you are ready to review data!

In [None]:
# 1.) instantiate Your reviewer
reviewer = YourReviewer(...)

# 2.) set the ReviewData object
reviewer.set_review_data(...)

# set the review app
reviewer.set_review_app(...)

# run the app
reviewer.run(...)
