# Developer AnnoMate Tutorial

This tutorial is for developers who want to make a Reviewer object from scratch.

If you are looking to use a pre-made Reviewer, refer to the README.md file.


# Introduction


There are 3 parts you need to generate to make a standard AnnoMate Reviewer:

**1. ReviewData Object** A ReviewData object consists of 3 pandas dataframes:
1. data: a dataframe with data that a user wants to review, row by row (ie samples, participants, mutations, etc.)
2. annot: a dataframe with annotations that a user wants to write for each row (ie notes, flags, etc.)
3. history: a timeline of changes a user makes to the annot table


**2. ReviewData Annotations** Set what kind of data the reviewer needs to enter

**3. ReviewDataApp** A ReviewDataApp is a plotly.dash application to display data in a particular way to review data in a ReviewData object, and already includes prebuilt functionality for a user to add annotations and view history of the ReviewData.

As a developer, you will define custom dash components you want to display for the type of review you are implementing (purity, mutation, etc.). This includes tables, graphs, or other components of interest. plotly.dash also enables interactivity, so you can define special functions to allow for interactive viewing of charts and graphs, and to auto-calculate values you may want to use for your annotations (more on autofill below).

When a ReviewDataApp is passed a ReviewData object, it will read the ReviewData object to render your components.

**4. ReviewData Annotation display parameters** Define how to display the annotation inputs in the app

**5. Autofill Dictionary** Autofill allows you to connect outputs of the ReviewDataApp to the annotations for the ReviewData object. At runtime, it will add buttons in the top annotation panel where a user can click and the current values of the selected components in the dash will map to the specified annotation inputs in the annotation panel.

The following tutorial will walk through each of these steps in more detail.

## What a user sees

A user using your custom Reviewer will have a notebook that looks like the following:


# The basic structure

To build your custom reviewer, you will need to do the following:

1. Create a new class that inherits `ReviewerTemplate` in the `AnnoMate/Reviewers/` directory
1. Define 5 abstract methods (* are optional):
    1. gen_data()
    1. gen_review_app()
    1. *set_default_review_data_annotations()
    1. *set_default_review_data_annotation_app_display()
    1. *set_default_autofill()
    
Your file will look something like the following:


## gen_data()

This function creates a `Data` objects, which will then be the `data` attribute to the `ReviewData` object interface. The `ReviewData` interface is an object that manages the alterations and saving of the `Data` object. 

To crete a `Data` object, the following parameters must be defined, either by the developer or the user, or both:
- `description`: A string describing the ReviewData object's source of data and purpose
- `df`: pandas dataframe containing the data to review. Each row corresponds to a single subject to be annotated.
- `annot_df`, `history_df`, and `annot_col_config_dict`: can be prefilled annotation and history dataframes and their validation configurations from previous reviewer annotations.
- `index`: List of unique Subjects to review

For your custom reviewer, you may have specific plots or calculations you want to make, and known/common annotations that a someone reviewing the data should use. You define these special features for your type of reviewer in `gen_data(self, ...)`. The main features would be:
1. Preprocessing the input `df` dataframe, such as precomputing data/graphs and adding columns you may need for your ReviewDataApp. 
2. `annot_col_config_dict`: a str: `ReviewDataAnnotation` dictionary. The key string will be the column name in the review data object's annotation table. `ReviewDataAnnotation` consists of:
    1. `type`: view `ReviewData.AnnotationType` Enum
    1. `options`: a list of valid values (for checklist and radioitems)
    1. `validate_input`: a named, non-local function (cannot be a lambda function) that takes a single parameter and returns True/False

Once you have done any preprocessing and defined any default annotations, you can create and return the `Data` object. 

## set_default_review_data_annotations()

Here you define what data to collect during the review session. Use the prebuilt method `self.add_review_data_annotation()`, which will add the columns to the `Data.annot_df` table and also store the corresponding validation parameters. These are associated with the `Data` object generated from the above `gen_data()`.

Paramters for `ReviewerTemplate.add_review_data_annotation()` are:
- name: The name of the column in `annot` table. It will also be used in the app.
- review_data_annotation: a `ReviewDataAnnotation` object. Its parameters are:
    - `annot_value_type`: one of `["multi", "float", "int", "string"]`
    - `options`: list of options that are valid for this annotation
    - `validate_input`: A custom function to validate inputs
    - `default`: value to automatically fill annotations with

## gen_review_app()

This creates a dash application where you can define what components to include. The `ReviewDataApp` already has built-in components to handle interating through the items in any `ReviewData` object, rendering the annotation inputs defined by the `ReviewData` object's `review_data_annotation_list`, and the history.

**What is Dash?**

Dash is a library that makes it easy to generate custom dashboards in python. I recommend reviewing the [Dash Tutorial](https://dash.plotly.com/installation) first before proceeding. 

In short, to create a dash app, you define:
1. Layout: how you want the app to look
2. Callbacks: functions to define interactivity with the components in your layout

The `ReviewDataApp` is built so it is simple for you to easily add components and interactivity without having to deal directly with some of the idiosyncrasies of the plotly dash package. 


To create your custom app, you first instantiate a `ReviewDataApp`. Then you add a series of `AppComponent`s.

**AppComponent**

You can use existing app components (see the ReviewerCatalog at [example_notebooks/ReviewerCatalog.ipynb](https://github.com/getzlab/AnnoMate/blob/master/example_notebooks/ReviewerCatalog.ipynb) for different reviewers), or create from scratch.

To create an `AppComponent` from scratch, you will specify:
- **`name`**: A string naming the particular component
- **`layout`**: Using plotly dash's html and boostrap libraries, define how your component will look like (Divs, Graphs, Tables, etc.)
- **`callback_output`**: A list of `Output()`'s. The first argument is the id of the subcomponent in your layout, and the second argument is what attribute of that component to update with your callback functions
- **`callback_input`**: A list of `Input()`'s. The arguments are similar to `Output()`. If these subcomponents' attributes change, it will run your `internal_callback` function.
- **`callback_state`**: A list of `State()`'s. The arguments are similar to `Output()`. If the `internal_callback` function is triggered, the current values of these subcomponent attributes will be passed as parameters to the `internal_callback` function.
- **`new_data_callback`**: A function that who's first two arguments are assumed to be (1) The `ReviewData.data` object, and (2) an index value of the `ReviewData` object. 
    - The next parameters are defined IN ORDER of the `Input()`'s defined by the `callback_input` argument followed by the `State()`'s defined by the `callback_state`argument. 
    - The output of this function is a **list** that corresponds IN ORDER of the `Output()`'s listed in `callback_output`. This function will be called whenever a user switches to a new subject to review.
- **`internal_callback`**: A function with the the EXACT signature as `new_data_callback`. This function will be called whenever a user changes the attributes of subcomponents listed in `callback_input`.

We recommend organizing your custom AppComponents into python scripts located in an `AppComonent` directory (ie `MyReviewers/MyReviewers/AppComponent/MyCustomComponent.py`). Your `MyCustomComponent.py` will contain the functions to generate your layout, callbacks, and setting up your custom `AppComponent object

**Premade components**

`ReviewDataApp` objects also includes a function `add_table_from_path()` to create a simple table reading a file from a column. You can use it just like `add_component()`, but you only need to specify which column in the Data object to get the file from, and which columns in the file's table to display.

**Custom args for callback functions**
Sometimes your callback functions need parameters that may be specific to your reviewer type, or defined by the user (ex. pointing to a specific column name in the ReviewDataObject, specific parameters for displaying graphs, etc.). When adding a component to the app, you can also specify these arguments with keywords arguments.

```
premade_component = AppComponent(..., 
                                 new_data_callback=lambda df, idx, y: [df.loc[idx] + y], 
                                 ...)

class ExampleReviewer(ReviewerTemplate):
    ...
    
    # Specific to reviewer type
    def gen_review_app(self):
        app = ReviewDataApp()
        app.add_component(premade_component,
                          y=10) # <-------------------
        return app
        
    # OR define by the user
    def gen_review_app(self, y): # <-------------------
        app = ReviewDataApp()
        app.add_component(premade_component,
                          y=y)  # <-------------------
        return app
```



## set_default_review_data_annotations_app_display()

After the user has set the review data object's annotation data and the corresponding app, the user now has to specify how to display the annotations in the input form of the app.

`Reviewer.set_default_review_data_annotations_app_display()` is only called in the public method `Reviewer.set_default_review_data_anontations_configuration()`, which is used if the user wants to use your default annotations and display configuration. 

To define `Reviewer.set_default_review_data_annotations_app_display()`, Use the prebuilt `self.add_annotation_display_component()` method for each annotation to include. The parameters are:
- `name`: Corresponding name of a column that exists in the `ReviewData.annot` table (specified in `gen_review_data_annotations()`)
- `annot_display_component`: takes objects of type `AnnotationDisplayComponent`

**AnnotationDisplayComponent**

`AnnotationDisplayComponent` is a class of objects that define different input layouts to use when inputting your annotations. There are premade `AnnotationDisplayComponent`, includeing text boxes, number inputs, checklists, and dropdown menus. Each `AnnotationDisplayComponent` has a function `gen_input_component()` which returns a layout to display in the annotation panel and take in an input. 

All `AnnotationDisplayComponent`s have 3 attributes:
- `display_output_format`: A function that converts the data from the input component into a value that the annotation it is assigned to is compatible with. For example, some input components automatically casts values to a string, even if the options you give it are numbers. You may need to cast the string to an int.
- `default_display_value`: A valid option to display in the annotation panel automatically if the annotation does not already have an associated value (previously reviewed). It will NOT prefill the annotation table immediately.
- `default_compatible_types`: This attribute is defined in the class. Depending on the type of input display, only certain annotation types are compatible. For example, to use a `ChecklistAnnotationDisplayComponent` or `MultiValueSelectAnnotationDisplayComponent`, it can only be associated with a 'multi' type `DataAnnotation`. 



## set_default_autofill()

Sometimes you may have a lot of annotations, or one of your components produces an value that you want to use as an annotation. It can be annoying sometimes to have to manually type things into the form, so the `ReviewDataApp` has functionality to handle linking the current state of your subcomponents in the app to the annotation input panel.

You can set these "links" with `self.add_autofill()`:
- `autofill_button_name`: the name of the component to read from.
- `fill_value`: A `State` or raw value to fill the corresponding `annot_col` for the current subject being reviewed
- `annot_col`: The column in the annotation table to use the `fill_value`

If you use the same `autofill_button_name` when adding other autofills, then all `fill_value` -> `annot_col` associations will be fulfilled when that button is clicked. 

*Note that you can use `State` values from different `AppComponents` under a single autofill button.

** Note that the user can add additional annotation if they like with `reviewer.add_autofill()` as well if they decide to add additional components or annotations to the existing reviewer.


## Optimization

Due to the specifications of how App Components are produced and updated, many compute-intensive tasks (primarily file loading) occur within Callback functions. This slows down performance extensively, especially when working with interactive tables or plots where the data is being accessed and modified/filtered repeatedly. While one approach would be to save intermediate files locally in an accessible format (e.g. pickle files), caching is a more robust and extendable solution.

### Using functools.lrucache
Developers can decorate any function with `@functools.lrucache(maxsize=n)` to cache the results from `n` function calls in memory. Ideally, use this decorator for your most compute-intensive tasks, so when they are called again with the same arguments (when returning to a previous annotation or modifying a complicated figure), the results are already in memory.

Dash also provides Callback Caching, which stores data in a shared memory database or to disk. This functionality is not yet implemented in AnnoMate as most apps are designed to be run in a single thread and without deployment. However, you can read more about Dash caching [here](https://dash.plotly.com/background-callback-caching).

### cached_read_csv
A cached pandas read_csv function is also provided for convenience. Call `AnnoMate.AppComponents.utils.cached_read_csv()` to cache the results from your csv load, with a cachesize of 32. This is especially helpful when reading data from the cloud.

> **WARNING**: `cached_read_csv` is caching based on the **file name**, NOT the **contents**. If your file contents are changing, but the file path or string is the same, then you must
> 1. Restart your notebook (restarts the cache)
> 2. Change your file names
> 3. Run the following code to clear the cache
>    ```
>    import gc # python garbage collector
>    objects = [i for i in gc.get_objects() if isinstance(i, functools._lru_cache_wrapper)]
>       
>    # All objects cleared
>    for object in objects:
>        object.cache_clear()
>    ```

### Arguments must be immutable
The functools lrucache method only works with immutable arguments that can be hashed. For common mutable objects (lists and dictionaries), we provide a helper decorator `@freezeargs` available for import from `AnnoMate.AppComponents.utils`. This decorator can be used to automatically transform any input lists or dicts to immutable tuples and frozendicts which can be hashed for use in the lrucache method.

Passing any other mutable objects into a cached function is not supported (including custom objects).

### Cache size
As of now, there is no way to dynamically set the cachesize for an app. As the developer, you should determine an appropriate number of function calls to keep in memory for each function you wrap with `lrucache`.

### Development Tips
- Attempt to cache a function that works on the most generic/standard inputs for a given sample/patient. Call this cached function first and then make modifications to the resulting data.
- For cnv_suite visualization methods, many alterations (e.g. changing color, sigmas, even CN values) can be performed on an existing figure. In this case, you can cache the function that calls `plot_acr_interactive` and then modify the output based on any user changes.

# Full example

In [2]:
%load_ext autoreload
%autoreload 2

In [3]:
import pandas as pd
import numpy as np
import functools
import time
import os

In [4]:
from AnnoMate.Data import DataAnnotation
from AnnoMate.DataTypes.GenericData import GenericData
from AnnoMate.ReviewDataApp import ReviewDataApp, AppComponent
from AnnoMate.ReviewerTemplate import ReviewerTemplate
from AnnoMate.AnnotationDisplayComponent import * 
from typing import Dict, Union, List

import plotly.express as px
from plotly.subplots import make_subplots
from dash import dcc
from dash import html
from dash.dependencies import Input, Output, State
from dash.exceptions import PreventUpdate
from dash import Dash, dash_table
import dash
import dash_bootstrap_components as dbc
import functools
import plotly.graph_objects as go

# For pickling to work, need to explicitly define function
def validate_purity(x):
    return (x >= 0) and (x <= 1.0)

class PrebuiltReviewer(ReviewerTemplate):
    def gen_data(self,
                 description: str,
                 annot_df: pd.DataFrame,
                 annot_col_config_dict: Dict,
                 history_df: pd.DataFrame,
                 df: pd.DataFrame,
                 preprocessing_str: str,
                 index: List = None,   
                ):
        """
        Parameters
        ==========
        df: pd.DataFrame
            A dataframe
        
        """
        
        df['new_column'] = preprocessing_str
        
        if index is None:
            index = sample_df.index.tolist()

        return  GenericData(index=index,
                            description=description,
                            df=df,
                            annot_df=annot_df,
                            annot_col_config_dict=annot_col_config_dict,
                            history_df=history_df
                           )
    
    def set_default_review_data_annotations(self):
        self.add_review_data_annotation('A_float', DataAnnotation('float', validate_input=validate_purity))
        self.add_review_data_annotation('rating', DataAnnotation('int', options=range(10)))
        self.add_review_data_annotation('Notes', DataAnnotation('string'))
        self.add_review_data_annotation('class', DataAnnotation('string', options=[f'Option {n}' for n in range(4)]))
    
    def gen_review_app(self, test_param) -> ReviewDataApp:
        """
        Parameters
        ==========
        
        test_param: str
            A string to demonstrate passing a parameters to a callback function
        """
        app = ReviewDataApp()
        app.add_table_from_path(data_table_source='df',
                                table_title='file', 
                                component_id='maf-component-id', 
                                table_fn_col='mutations_file', 
                                table_cols=['gene', 'vaf', 'cov', 'sample_id'])


        def gen_data_summary_table(data: GenericData, idx, cols):
            r = data.df.loc[idx]
            return [[html.H1(f'{r.name} Data Summary'), dbc.Table.from_dataframe(r[cols].to_frame().reset_index())]]

        app.add_component(AppComponent(name='sample-info-component', 
                                      layout=html.Div(children=[html.H1('Data Summary'), 
                                                         dbc.Table.from_dataframe(df=pd.DataFrame())],
                                               id='sample-info-component'
                                              ), 
                                      callback_output=[Output('sample-info-component', 'children')],
                                      new_data_callback=gen_data_summary_table, 
                                      ),
                               cols=['gender',
                                     'age', 
                                     'new_column']
                              )

        def plot_interactive_graph(data: GenericData, idx: str, slider_value, test_param):
            x = np.arange(0, 1, 0.1)
            fig = go.Figure()
            fig.add_trace(go.Scatter(x=x, y=x))
            return [fig, 0.5]

        def interactive_graph_change_lines(data: GenericData, idx:str, slider_value, test_param):
            df = data.df
            fig = plot_interactive_graph(df, idx, slider_value, test_param)[0] # cache?
            fig.add_vline(slider_value)
            return [fig, dash.no_update] # or just return the original file


        app.add_component(AppComponent('test-interactive-graph',
                                      html.Div(children=[dcc.Graph(figure={}, id='a-figure'), 
                                                dcc.Slider(0, 1, 0.1, value=0.5, 
                                                           id='a-slider'
                                                          )
                                               ]),
                                      new_data_callback=plot_interactive_graph,
                                      internal_callback=interactive_graph_change_lines,
                                      callback_output=[Output('a-figure', 'figure'), Output('a-slider', 'value')],
                                      callback_input=[Input('a-slider', 'value')],
                                   ),
                       test_param=test_param
                       )
        
        return app
    
    def set_default_review_data_annotations_app_display(self):
        self.add_annotation_display_component('A_float', NumberAnnotationDisplay(default_display_value=0.5))
        self.add_annotation_display_component('rating', NumberAnnotationDisplay())
        self.add_annotation_display_component('Notes', TextAnnotationDisplay())
        self.add_annotation_display_component('class', RadioitemAnnotationDisplay(default_display_value='Option 2'))
    
    def set_default_autofill(self):
        self.add_autofill('test-autofill-button', State('a-slider', 'value'), 'A_float')
        self.add_autofill('test-autofill-button', 'Option 1', 'class')
    
    

# User POV

In [5]:
fn = 'example_data/AnnoMate_Tutorial/data_to_review_example.tsv'
df = pd.read_csv(fn, sep='\t', index_col=0)
df.head()

Unnamed: 0_level_0,gender,age,tissue_origin,treatments_file,mutations_file
sample_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
sample_0,female,57,skin,./example_data/AnnoMate_Tutorial/treatments/sa...,./example_data/AnnoMate_Tutorial/mut_vafs/samp...
sample_1,male,66,breast,./example_data/AnnoMate_Tutorial/treatments/sa...,./example_data/AnnoMate_Tutorial/mut_vafs/samp...
sample_2,male,65,lung,./example_data/AnnoMate_Tutorial/treatments/sa...,./example_data/AnnoMate_Tutorial/mut_vafs/samp...
sample_3,male,66,skin,./example_data/AnnoMate_Tutorial/treatments/sa...,./example_data/AnnoMate_Tutorial/mut_vafs/samp...
sample_4,male,48,bone marrow,./example_data/AnnoMate_Tutorial/treatments/sa...,./example_data/AnnoMate_Tutorial/mut_vafs/samp...


In [6]:
data_dir = './data'
if not os.path.exists(data_dir):
    print('making new directory')
    os.mkdir(data_dir)

data_pkl_fn = f'{data_dir}/Prebuilt_reviewer.Dev_Reviewer.pkl'

In [7]:
test_reviewer = PrebuiltReviewer()
test_reviewer.set_review_data(
    data_pkl_fn = data_pkl_fn, 
    description='testing', 
    df = df,
    index=df.index, 
    preprocessing_str = 'A preprocessing str'
)
test_reviewer.set_review_app(test_param='testing param kwargs')
test_reviewer.set_default_review_data_annotations_configuration()
test_reviewer.set_default_autofill()

# User customization
test_reviewer.app.add_component(AppComponent('Test Add Component', html.Div(html.P('New component'))))

# More annotations
test_reviewer.review_data_interface.add_annotation('another_annotation', DataAnnotation('float'))
test_reviewer.add_annotation_display_component('another_annotation', NumberAnnotationDisplay(default_display_value=20))

# Run
test_reviewer.run(mode='external', port=8085)

Setting auto_export_path to ./data/Prebuilt_reviewer.Dev_Reviewer.auto_export
Making directory ./data/Prebuilt_reviewer.Dev_Reviewer.auto_export for auto exporting.
Using ./data/Prebuilt_reviewer.Dev_Reviewer.auto_export for auto exporting.
Dash app running on http://0.0.0.0:8085/


In [9]:
test_reviewer.get_annot()

Unnamed: 0_level_0,A_float,rating,Notes,class,another_annotation
value,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
sample_0,0.4,,,Option 1,20.0
sample_1,0.8,10.0,,Option 0,20.0
sample_2,,,,,
sample_3,,,,,
sample_4,0.1,2.0,,Option 3,20.0
sample_5,,,,,
sample_6,,,,,
sample_7,,,,,
sample_8,,,,,
sample_9,,,,,


In [10]:
# User can reuse the app
second_data_pkl_fn = f'{data_dir}/Prebuilt_reviewer.Dev_Reviewer_2.pkl'
another_reviewer = PrebuiltReviewer()
another_reviewer.set_review_data(
    data_pkl_fn = second_data_pkl_fn, 
    description='testing copy annotations', 
    df=test_reviewer.get_data_attribute('df'),
    index=test_reviewer.get_data_attribute('df').index, 
    preprocessing_str = 'A preprocessing str',
    load_existing_data_pkl_fn=data_pkl_fn,
)

## do not need to re-add customizations
# All the data annotaiton configurations are saved in the pkl file
another_reviewer.app = test_reviewer.app
another_reviewer.annot_app_display_types_dict = test_reviewer.annot_app_display_types_dict
another_reviewer.autofill_dict = test_reviewer.autofill_dict # copy autofill settings
another_reviewer.run(port=8053)


Loading data from previous review with pickle file
Setting auto_export_path to ./data/Prebuilt_reviewer.Dev_Reviewer_2.auto_export
Making directory ./data/Prebuilt_reviewer.Dev_Reviewer_2.auto_export for auto exporting.
Using ./data/Prebuilt_reviewer.Dev_Reviewer_2.auto_export for auto exporting.
Dash app running on http://0.0.0.0:8053/


# Reviewer Suites and Organization

If you plan to share your reviewer with others, or create a collection of different version of reviewers that share similar components or data types, we recommend creating a new github repo. We recommend following the [AnnoMateTemplate](https://github.com/getzlab/AnnoMateTemplate) repo as a guide for organizing your different reviewers, app components, and data types.

```
MyCustomReviewer
    MyCustomReviewer
        Reviewers <-- Where you assemble a reviewer as shown in this example notebook
            MyCustomReviewerA.py
            MyCustomReviewerB.py
        AppComponents
            CustomAppComponentA.py
            CustomAppComponetB.py
            utils.py
        DataTypes
            CustomDataType.py
    .gitignore
    setup.py
    README.md

```

If you want to include your suite of reviewers in the [AnnoMate Reviewer Catalog](https://github.com/getzlab/AnnoMate/blob/master/catalog/ReviewerCatalog.ipynb), follow the instructions in the notebook. 
