## BioBLP Demo notebook

In this notebook we want to showcase the BioBLP as a Natural Language Interface (NLI) 


- Super basic UI
    - 3 widgets:
        - DrugBank ID input (textbox/dropdown?)
        - Relation dropdown (Drug-Disease Association)
        - Disease description (textbox)
        
- Load pre-trained BioBLP-D
- Post input functionality:
    - Setting <subject, relation>
    - Tokenization of input text (disease description) :: TextEntityPropertyPreprocessor
    - Forward pass

In [26]:
# BASE STUFF
import os.path as osp
import torch

from bioblp.loaders.preprocessors import TextEntityPropertyPreprocessor
from transformers import AutoTokenizer, AutoModel

In [32]:
# INTERACTIVE UI STUFF
import ipywidgets as widgets
from IPython.display import display
layout = widgets.Layout(width='auto', height='40px') 

In [47]:
# BASIC BUTTON FUNC -> we want a TextArea for disease desc AND a button to get predictions back!
btn = widgets.Button(description='Predict potential drugs for the described disease!',
                     layout=layout)

def btn_eventhandler(obj):
    print('Hello from the {} button!'.format(obj.description))
    
btn.on_click(btn_eventhandler)

display(btn)


Button(description='Predict potential drugs for the described disease!', layout=Layout(height='40px', width='a…

In [39]:
# TEXT AREA FOR DESCRIPTIONS
txtsl = widgets.Textarea(
 placeholder='Enter the description of the disease you are interested in...',
 description='Disease description')

In [42]:
# PUTTING IT ALL TOGETHER
input_widgets = widgets.HBox(
[btn])
display(input_widgets)

HBox(children=(Button(description='Predict potential drugs for the described disease!', layout=Layout(height='…

In [43]:
# TEST DISEASE
# D000006
test_disease_descr = '''A clinical syndrome with acute abdominal pain that is severe, 
localized, and rapid in onset. Acute abdomen may be caused by a variety of disorders, injuries, or diseases.'''

In [44]:
# LOAD PRETRAINED MODEL
base_path = osp.join('..', 'models', 'bioblpd-38uz9fjs')
model_path = osp.join(base_path, 'trained_model.pkl')
model = torch.load(model_path)

In [66]:
# LOAD BIOBERT STUFF + TEXT PREPROCESS + TOKENIZER
BASE_MODEL = 'dmis-lab/biobert-base-cased-v1.2'

tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)
preprocessor = TextEntityPropertyPreprocessor(tokenizer, max_length=64)

In [63]:
# TOKENIZE THE INPUT DESCRIPTION TO PASS AS MODEL INPUT
tokens = tokenizer(test_disease_descr, padding=True, truncation=True, return_tensors="pt")

tokens

{'input_ids': tensor([[  101,   170,  7300,  9318,  1114, 12104, 24716,  2489,  1115,  1110,
          5199,   117, 25813,   117,  1105,  6099,  1107, 15415,   119, 12104,
         14701,  1336,  1129,  2416,  1118,   170,  2783,  1104, 11759,   117,
          5917,   117,  1137,  8131,   119,   102]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}

In [55]:
encoder = model.property_encoder.type_id_to_encoder[0]

In [None]:
encoder

In [65]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
encoder.forward(tokens, device=device)

TypeError: '>' not supported between instances of 'BatchEncoding' and 'int'