In [1]:
import pickle
import tensorflow as tf

from util import *
from biomedical_qa.inference.inference import Inferrer, get_model
from biomedical_qa.inference.postprocessing import DeduplicatePostprocessor, ProbabilityThresholdPostprocessor, TopKPostprocessor
from biomedical_qa.sampling.bioasq import BioAsqSampler

import matplotlib.pyplot as plt
%pylab inline
pylab.rcParams['figure.figsize'] = (10, 6)

ModuleNotFoundError: No module named 'matplotlib'

In [2]:
import os
os.environ["CUDA_DEVICE_ORDER"]= "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]="1"

In [4]:
tf.test.gpu_device_name()

''

In [3]:
from tensorflow.python.client import device_lib
device_lib.list_local_devices()

[name: "/device:CPU:0"
 device_type: "CPU"
 memory_limit: 268435456
 locality {
 }
 incarnation: 6753831379374546630, name: "/device:XLA_CPU:0"
 device_type: "XLA_CPU"
 memory_limit: 17179869184
 locality {
 }
 incarnation: 1852587920859910418
 physical_device_desc: "device: XLA_CPU device"]

In [2]:
CONFIG = "../model_checkpoints/challenge2/BioASQ_4b_all_5fold_differentsquad/fold_4/config.pickle"

sess = tf.InteractiveSession()
model = get_model(sess, CONFIG, ["cpu:0"])
inferrer = Inferrer(model, sess, beam_size=20)

Loading Model: ../model_checkpoints/challenge2/BioASQ_4b_all_5fold_differentsquad/fold_4/config.pickle
Using weights: ../model_checkpoints/challenge2/BioASQ_4b_all_5fold_differentsquad/fold_4/final_model.tf
Restoring Weights...
INFO:tensorflow:Restoring parameters from ../model_checkpoints/challenge2/BioASQ_4b_all_5fold_differentsquad/fold_4/final_model.tf


In [3]:
vocab = inferrer.models[0].embedder.vocab
rev_vocab = [""] * len(vocab)
for w, i in vocab.items():
    rev_vocab[i] = w

# Prepare Data

In [4]:
sampler = BioAsqSampler("../data/BioASQ-TaskB-testData-enriched2", ["phaseB_4b_05.json"], 16,
                        vocab=vocab, shuffle=False, types=["factoid", "list"],
                        include_answer_spans=False)

In [5]:
factoid_questions = [q for q in sampler.get_questions() if q.q_type == "factoid"]
list_questions = [q for q in sampler.get_questions() if q.q_type == "list"]

In [6]:
factoid_ids = sorted([q.id for q in factoid_questions])
list_ids = sorted([q.id for q in list_questions])
print(len(factoid_ids), len(list_ids))

33 20


# Run Model

In [7]:
predictions = inferrer.get_predictions(sampler)

In [8]:
factoid_postprocessor = DeduplicatePostprocessor().chain(TopKPostprocessor(5))
list_postprocessor = DeduplicatePostprocessor().chain(ProbabilityThresholdPostprocessor(0.05))

# Factoid Analysis

In [9]:
CORRECT = 0
# Answer is written differently, but 100% correct
SHOULD_COUNT_CORRECT = 0
# Answer is probably correct (according to my understanding) or very close to a correct answer
SOMEWHAT_CORRECT = 0
# Answer candidates were given in the question, but the system failed to output one of the options
MULTPLE_CHOICE = 0
# The correct answer is within the top 5
WITHIN_TOP5 = 0
# Answer type of the top answer is wrong
WRONG_ANSWER_TYPE = 0
LONG_ANSWER = 0
NOT_EXTRACTABLE = 0

In [10]:
print_prediction(predictions[factoid_ids[0]], rev_vocab, factoid_postprocessor, with_context=True)

Id:
  56c1f039ef6e394741000052
Question:
  which antibody is implicated in the bickerstaff's brainstem encephalitis?
Answers:
 * antiganglioside antibody
Predicted Answers:
 * ('BBE', 0.049558859)
 * ('Fisher', 0.048204634)
 * ('anti-GQ1b', 0.04242089)
 * ('anti-GQ1b antibody syndrome', 0.040380526)
 * ('Fisher syndrome', 0.026518978)
In addition, BBE and Fisher syndrome, which are clinically similar and are both associated with the presence of the immunoglobulin G anti-GQ1b antibody, represent a specific autoimmune disease with a wide spectrum of symptoms that include ophthalmoplegia and ataxia.
The syndrome defined by Bickerstaff of progressive, external ophthalmoplegia and ataxia, with disturbance of consciousness or hyperreflexia, has subsequently been associated with anti-GQ1b antibodies.
An anti-GQ1b antibody syndrome h



In [11]:
NOT_EXTRACTABLE += 1
WITHIN_TOP5 += 1

In [86]:
print_prediction(predictions[factoid_ids[1]], rev_vocab, factoid_postprocessor, with_context=True)

Id:
  56c1f03bef6e394741000053
Question:
  mutation of which gene is implicated in the brain-lung-thyroid syndrome?
Answers:
 * thyroid transcription factor 1
Predicted Answers:
 * ('NKX2-1', 0.12829573)
 * ('neonatal respiratory distress syndrome', 0.064094439)
 * ('Thyroid transcription factor 1 (NKX2-1/TITF1) mutations cause brain-lung-thyroid syndrome, characterized by congenital hypothyroidism (CH), infant respiratory distress syndrome (IRDS) and benign hereditary chorea (BHC). \nThe clinical spectrum of 6 own and 40 published patients with NKX2-1 mutations ranged from the complete triad of brain-lung-thyroid syndrome (50%), brain and thyroid disease (30%), to isolated BHC (13%). \nBACKGROUND: NKX2.1 mutations have been identified in patients displaying complete or partial brain-lung-thyroid syndrome, which can include benign hereditary chorea (BHC), hypothyroidism and/or lung disease. \nBrain-lung-thyroid syndrome', 0.03243861)
 * ('', 0.028306119)
 * ('NKX2', 0.021967236)
 The d

In [13]:
SHOULD_COUNT_CORRECT += 1
WITHIN_TOP5 += 1

In [14]:
print_prediction(predictions[factoid_ids[2]], rev_vocab, factoid_postprocessor, with_context=True, context_char_limit=500)

Id:
  56c1f040ef6e394741000055
Question:
  which antibodies cause riedel thyroiditis?
Answers:
 * IgG4
Predicted Answers:
 * ('IgG4 thyroiditis', 0.31130823)
 * ('IgG4', 0.26839805)
 * ('Increased lymphangiogenesis', 0.13762234)
 * ('Immunoglobulin G4-related thyroid disease', 0.028562335)
 * ('IgG4RD', 0.021752631)
LEARNING POINTS: There are potential clinical applications of identifying subsets of patients with IgG4 thyroiditis (FVHT and Riedel thyroiditis).
The importance of IgG4 in the predictive model of thyroiditis.
Increased lymphangiogenesis in Riedel thyroiditis (Immunoglobulin G4-related thyroid disease).
The present study describes in depth a case of Riedel thyroiditis (RT) to clarify its pathogenesis and its putative inclusion in the spectrum of IgG4-related disease. 
Our findings support the in



In [15]:
WRONG_ANSWER_TYPE += 1
WITHIN_TOP5 += 1

In [82]:
print_prediction(predictions[factoid_ids[3]], rev_vocab, factoid_postprocessor, with_context=True)

Id:
  56c1f045ef6e394741000058
Question:
  selexipag is used for which disease?
Answers:
 * pulmonary arterial hypertension
Predicted Answers:
 * ('pulmonary arterial hypertension', 0.44907489)
 * ('PAH', 0.24103047)
 * ('pulmonary arterial hypertension (PAH)', 0.16990203)
 * ('treprostinil', 0.14131568)
 * ('oral, selective prostacyclin', 0.069728665)
OBJECTIVE: Selexipag is a novel, oral, selective prostacyclin (PGI2) receptor agonist in clinical development for the treatment of pulmonary arterial hypertension.
Selexipag for the treatment of pulmonary arterial hypertension.
This review was based on a PubMed search and focuses on the potential role of selexipag in the treatment of pulmonary arterial hypertension (PAH).
Selexipag showed effects on pharmacodynamic end points obtained with right heart catheterization in a Phase II trial in patie



In [17]:
CORRECT += 1
WITHIN_TOP5 += 1

In [18]:
print_prediction(predictions[factoid_ids[4]], rev_vocab, factoid_postprocessor)

Id:
  5710a592cf1c32585100002a
Question:
  which metabolite activates atxa?
Answers:
 * CO2
 * bicarbonate
Predicted Answers:
 * ('bicarbonate', 0.13786201)
 * ('Bacillus', 0.10073484)
 * ('atxA', 0.060979974)
 * ('B', 0.054563832)
 * ('Bacillus anthracis', 0.053057212)



In [19]:
CORRECT += 1
WITHIN_TOP5 += 1

In [20]:
print_prediction(predictions[factoid_ids[5]], rev_vocab, factoid_postprocessor)

Id:
  5710ade4cf1c32585100002c
Question:
  what is the suggested therapy for mycobacterium avium infection?
Answers:
 * Rifampin 10 mg/kg daily, ciprofloxacin 500 mg twice daily, clofazimine 100 mg every day, and ethambutol 15 mg/kg orally daily for 24 weeks, with or without amikacin 10 mg/kg intravenously or intramuscularly 5 days weekly for the first 4 weeks
Predicted Answers:
 * ('clarithromycin', 0.083397761)
 * ('rifampicin', 0.065752611)
 * ('ethambutol', 0.058501508)
 * ('immune deficiency syndrome', 0.039601199)
 * ('Rifampin 10 mg/kg daily', 0.039457604)



In [21]:
LONG_ANSWER += 1

In [22]:
print_prediction(predictions[factoid_ids[6]], rev_vocab, factoid_postprocessor, with_context=True)

Id:
  5710e131a5ed216440000001
Question:
  in which yeast chromosome does the rdna cluster reside?
Answers:
 * chromosome XII
 * chromosome 12
Predicted Answers:
 * ('Saccharomyces cerevisiae', 0.27682716)
 * ('Chromosome XII', 0.045510918)
 * ('Saccharomyces', 0.026726596)
 * ('', 0.017904308)
 * ('XII', 0.014451817)
Chromosome XII context is important for rDNA function in yeast
The rDNA cluster in Saccharomyces cerevisiae is located 450 kb from the left end and 610 kb from the right end of chromosome XII and consists of approximately 150 tandemly repeated copies of a 9.1 kb rDNA unit
To explore the biological significance of this specific chromosomal context, chromosome XII was split at both sides of the rDNA cluster and strains harboring deleted variants of chromosome XII consisting of 450 kb, 1500 kb (rDN



In [23]:
# Good example
WRONG_ANSWER_TYPE += 1
WITHIN_TOP5 += 1

In [24]:
print_prediction(predictions[factoid_ids[7]], rev_vocab, factoid_postprocessor, with_context=True, context_char_limit=5000)

Id:
  571366ba1174fb1755000005
Question:
  how can the fetal rhesus be determined with non-invasive testing?
Answers:
 * free fetal DNA from maternal cirulcation
Predicted Answers:
 * ('by analysis of cell-free DNA', 0.23273471)
 * ('by analysis of cell-free DNA in the maternal circulation is a rapidly evolving field', 0.13667606)
 * ('rapidly evolving field', 0.11576673)
 * ('by analysis of cell-free DNA in the maternal circulation', 0.11525083)
 * ('cell-free DNA in the maternal circulation', 0.10440883)
Determination of fetal rhesus d status by maternal plasma DNA analysis.
In this study, we assessed the feasibility of fetal RhD genotyping by analysis of cell-free fetal DNA(cffDNA) extracted from plasma samples of Rhesus (Rh) D-negative pregnant women by using real-time polymerase chain reaction (PCR).
Performing real-time PCR on cffDNA showed accurate, efficient and reliable results, allowing rapid and high throughput non invasive determination of fetal sex and RhD status in clinic

In [25]:
NOT_EXTRACTABLE += 1
SOMEWHAT_CORRECT += 1

In [26]:
print_prediction(predictions[factoid_ids[8]], rev_vocab, factoid_postprocessor, with_context=True)

Id:
  57136a7e1174fb1755000006
Question:
  how early during pregnancy does non-invasive cffdna testing allow sex determination of the fetus?
Answers:
 * 6th to 10th week of gestation
 * first trimester of pregnancy
Predicted Answers:
 * ('6th-10th', 0.49911994)
 * ('second trimester', 0.11349113)
 * ('first trimester', 0.043017577)
 * ('6th to 10th weeks', 0.036644924)
 * ('between 6th to 10th weeks', 0.034547769)
The use of cffDNA in fetal sex determination during the first trimester of pregnancy of female DMD carriers.
We determined fetal sex during the first trimester using a quantitative real-time polymerase chain reaction (PCR) assay of cffDNA in pregnant carriers of DMD.
Early fetal gender determination using real-time PCR analysis of cell-free fetal DNA during 6th-10th weeks of gestation.
Considerable 97.3% sensitivity and 97.3% specificity were obtained in fetal gender determination which is signi



In [27]:
SOMEWHAT_CORRECT += 1
WITHIN_TOP5 += 1

In [28]:
print_prediction(predictions[factoid_ids[9]], rev_vocab, factoid_postprocessor)

Id:
  57138eb21174fb175500000a
Question:
  which is the protein implicated in spinocerebellar ataxia type 3?
Answers:
 * Ataxin-3
Predicted Answers:
 * ('ataxin-3', 0.13128851)
 * ('ataxin-3 protein', 0.044707473)
 * ('Spinocerebellar ataxia type 3', 0.026197806)
 * ('ataxin', 0.014643435)
 * ('Mutant ataxin-3', 0.014237258)



In [29]:
CORRECT += 1
WITHIN_TOP5 += 1

In [30]:
print_prediction(predictions[factoid_ids[10]], rev_vocab, factoid_postprocessor, with_context=True, context_char_limit=500)

Id:
  5713c8d71174fb1755000015
Question:
  which peripheral neuropathy has been associated with ndrg1 mutations?
Answers:
 * Charcot-Marie-Tooth (CMT) 4D disease
Predicted Answers:
 * ('CMT4D disease', 0.16679578)
 * ('N-myc downstream-regulated gene 1', 0.1269552)
 * ('SETX', 0.046886794)
 * ('', 0.044508174)
 * ('human NDRG1', 0.041041661)
CMT4D disease is a severe autosomal recessive demyelinating neuropathy with extensive axonal loss leading to early disability, caused by mutations in the N-myc downstream regulated gene 1 (NDRG1)
In a previous study, we have shown that N-myc downstream-regulated gene 1 (NDRG1), classified in databases as a tumor suppressor and heavy metal-response protein, is mutated in hereditary motor and sensory neuropathy Lom (HMSNL), a severe autosomal recessive form of Charcot-Marie-Tooth (CMT) disease
In 



In [31]:
NOT_EXTRACTABLE += 1
SHOULD_COUNT_CORRECT += 1
WITHIN_TOP5 += 1

In [32]:
print_prediction(predictions[factoid_ids[11]], rev_vocab, factoid_postprocessor, with_context=True, context_char_limit=500)

Id:
  5717d64f29809bbe7a000001
Question:
  which is the cellular localization of the protein opa1?
Answers:
 * mitochondrial intermembrane space
Predicted Answers:
 * ('PKA phosphorylates perilipin', 0.031572588)
 * ('Mgm1/OPA1', 0.029592136)
 * ('Mfn2', 0.022602187)
 * ('dynamin-related GTPase', 0.016423272)
 * ('GTPase OPA1', 0.012227485)
. The subcellular distribution of mOPA1 overexpressed in COS-7 cells largely overlapped that of endogenous cytochrome c, a well known mitochondrial marker,
elease of high MW Opa-1 isoforms from the mitochondria to the cytosol
 mitochondrial fusion (opa-1)
mitochondrial fusion genes Mfn1 (mitofusin 1), Mfn2 (mitofusin 2), Opa1 (optic atrophy 1) 
Biochemical examinations indicate that both of the OPA1 isoforms are present in the intermembrane space. Submitochondrial fractionation by sucrose densit



In [33]:
WRONG_ANSWER_TYPE += 1

In [34]:
print_prediction(predictions[factoid_ids[12]], rev_vocab, factoid_postprocessor)

Id:
  5717d86029809bbe7a000003
Question:
  which gene is involved in the development of barth syndrome?
Answers:
 * Tafazzin (TAZ) gene
Predicted Answers:
 * ('tafazzin', 0.52062207)
 * ('TAZ', 0.15372121)
 * ('tafazzins', 0.080350451)
 * ('tafazzin (TAZ) gene mutation\nBarth syndrome is caused by mutations in the TAZ', 0.016136723)
 * ('glutamic acid', 0.0094389077)



In [35]:
SHOULD_COUNT_CORRECT += 1
WITHIN_TOP5 += 1

In [36]:
print_prediction(predictions[factoid_ids[13]], rev_vocab, factoid_postprocessor)

Id:
  5717dbfe7de986d80d000001
Question:
  what is the functional role of the protein drp1?
Answers:
 * mitochondrial fission
Predicted Answers:
 * ('dynamin-related protein 1', 0.1075499)
 * ('dynamin-related protein 1 (Drp1)', 0.068688847)
 * ('BNIP1 expression increased dynamin-related protein 1', 0.034827583)
 * ('BNIP1 expression increased dynamin-related protein 1 (Drp1)', 0.024955679)
 * ('mediates mitochondrial fission', 0.021791963)



In [37]:
WRONG_ANSWER_TYPE += 1

In [38]:
print_prediction(predictions[factoid_ids[14]], rev_vocab, factoid_postprocessor, with_context=True, context_char_limit=5000)

Id:
  5719f5b27de986d80d00000c
Question:
  what is the function of neu5gc (n-glycolylneuraminic acid)?
Answers:
 * Neu5Gc is an immune message to self
Predicted Answers:
 * ('sialic acid', 0.14496738)
 * ('a sialic acid synthesized', 0.028764857)
 * ('a sialic acid', 0.024278264)
 * ('can trigger immune response', 0.022610324)
 * ('cytidine monophosphate-N-acetylneuraminic acid hydroxylase', 0.02078384)
Humans lack a functional cytidine monophosphate-N-acetylneuraminic acid hydroxylase (CMAH) protein and cannot synthesize the sugar Neu5Gc, an innate mammalian signal of self
N-glycolylneuraminic acid (Neu5Gc) is an immunogenic sugar of dietary origin that metabolically incorporates into diverse native glycoconjugates in humans. 
N-Glycolylneuraminic acid (Neu5Gc) is a sialic acid synthesized by animals, but not by humans or birds. However, it can be incorporated in human cells and can trigger immune response.
Human heterophile antibodies that agglutinate animal erythrocytes are known to

In [39]:
NOT_EXTRACTABLE += 1

In [40]:
print_prediction(predictions[factoid_ids[15]], rev_vocab, factoid_postprocessor, with_context=True)

Id:
  5719f5c67de986d80d00000d
Question:
  what distinguishes lantibiotics from antibiotics?
Answers:
 * Lantibiotics are post-translationally modified natural peptides containing lanthionine
Predicted Answers:
 * ('antibiotic peptides', 0.099774949)
 * ('prepeptides', 0.044987559)
 * ('cationic peptides/proteins', 0.04388281)
 * ('dehydrobutyrine', 0.043818798)
 * ('low levels of resistance', 0.036902942)
One potentially interesting class of antimicrobials are the modified bacteriocins termed lantibiotics, which are bacterially produced, posttranslationally modified, lanthionine/methyllanthionine-containing peptides.
low levels of resistance have been reported for lantibiotics compared with commercial antibiotics
Mechanisms that hinder the action of lantibiotics are often innate systems that react to the presence of any cationic peptides/proteins or ones which result from cell well damage, rather



In [41]:
NOT_EXTRACTABLE += 1

In [42]:
print_prediction(predictions[factoid_ids[16]], rev_vocab, factoid_postprocessor, with_context=True)

Id:
  571cdd227de986d80d00000f
Question:
  which bacteria caused plague?
Answers:
 * Yersinia pestis
Predicted Answers:
 * ('Yersinia', 0.99534565)
 * ('Yersinia pestis', 0.058951233)
 * ('bubonic', 0.022814831)
 * ('bubonic plague', 0.0032212492)
 * ('enteric diseases', 0.0017863335)
 the causative bacteria Yersinia pestis as an agent of biological warfare have highlighted the need for a safe, efficacious, and rapidly producible vaccine. 
Yersinia, the causative bacteria of the bubonic plague and other enteric diseases



In [43]:
SOMEWHAT_CORRECT += 1
WITHIN_TOP5 += 1

In [44]:
print_prediction(predictions[factoid_ids[17]], rev_vocab, factoid_postprocessor, with_context=True, context_char_limit=5000)

Id:
  571ce13f7de986d80d000011
Question:
  in which cells are gasdermins expressed?
Answers:
 * epithelial cells
Predicted Answers:
 * ('upper epidermis', 0.35832062)
 * ('Immunohistochemical analysis', 0.06592714)
 * ('Immunohistochemical', 0.051259469)
 * ('sebaceous gland and preputial', 0.035447281)
 * ('gastric cancers (GCs)', 0.028525231)
Members of the novel gene family Gasdermin (Gsdm) are exclusively expressed in a highly tissue-specific manner in the epithelium of skin and the gastrointestinal tract. 
These results indicate that the mouse Gsdma and Gsdma3 genes share common function to regulate epithelial maintenance 
Gasdermin (GSDM or GSDMA), expressed in the upper gastrointestinal tract but frequently silenced in gastric cancers (GCs), regulates apoptosis of the gastric epithelium.
. Immunohistochemical analysis revealed that gasdermins are expressed specifically in cells at advanced stages of differentiation in the upper epidermis, the differentiating inner root sheath an

In [45]:
NOT_EXTRACTABLE += 1

In [46]:
print_prediction(predictions[factoid_ids[18]], rev_vocab, factoid_postprocessor, with_context=True, context_char_limit=500)

Id:
  571e12097de986d80d000017
Question:
  which protein does empagliflozin inhibit?
Answers:
 * SGLT2
Predicted Answers:
 * ('gemfibrozil', 0.11976505)
 * ('SGLT2', 0.067950003)
 * ('sodium glucose', 0.049686939)
 * ('sodium glucose cotransporter 2', 0.030933177)
 * ('sodium glucose cotransporter-2', 0.025351383)
Empagliflozin (Jardiance): a novel SGLT2 inhibitor for the treatment of type-2 diabetes.
AIMS: Empagliflozin is a selective sodium glucose cotransporter 2 (SGLT2) inhibitor that inhibits renal glucose reabsorption and is being investigated for the treatment of type 2 diabetes mellitus (T2DM). 
Effect of food on the pharmacokinetics of empagliflozin, a sodium glucose cotransporter 2 (SGLT2) inhibitor, and assessment of dose proportionality in healthy volunteers.
Safety, tolerability, pharmacokine



In [47]:
WRONG_ANSWER_TYPE += 1
WITHIN_TOP5 += 1

In [48]:
print_prediction(predictions[factoid_ids[19]], rev_vocab, factoid_postprocessor, with_context=True)

Id:
  571e14fbbb137a4b0c000001
Question:
  for which type of diabetes can empagliflozin be used?
Answers:
 * type 2 diabetes mellitus
Predicted Answers:
 * ('SGLT2', 0.20701998)
 * ('sodium-glucose co-transporter', 0.06659013)
 * ('sodium-glucose co-transporter 2', 0.057033274)
 * ('insulin', 0.053106554)
 * ('sodium glucose co-transporter-2', 0.049351662)
Empagliflozin, an SGLT2 inhibitor for the treatment of type 2 diabetes mellitus: a review of the evidence.
To review available studies of empagliflozin, a sodium glucose co-transporter-2 (SGLT2) inhibitor approved in 2014 by the European Commission and the United States Food and Drug Administration for the treatment of type 2 diabetes mellitus (T2DM).
In Phase II trials in patients with type 2 diabetes, empagliflozin provided improvements in glycosylated hemoglobin (HbA1c) and other measures of 



In [49]:
WRONG_ANSWER_TYPE += 1

In [50]:
print_prediction(predictions[factoid_ids[20]], rev_vocab, factoid_postprocessor)

Id:
  571e172bbb137a4b0c000002
Question:
  when was empagliflozin fda approved?
Answers:
 * 2014
Predicted Answers:
 * ('2014', 0.74671721)
 * ('in 2014', 0.0039122836)
 * ('empagliflozin', 0.0004895209)
 * ('sodium glucose co-transporter-2 (SGLT2) inhibitor approved in 2014', 0.00011115696)
 * ('2 diabetes mellitus', 8.5520718e-05)



In [51]:
CORRECT += 1
WITHIN_TOP5 += 1

In [83]:
print_prediction(predictions[factoid_ids[21]], rev_vocab, factoid_postprocessor, with_context=True)

Id:
  571e275dbb137a4b0c000005
Question:
  what are 'vildagliptin', 'sitagliptin', 'saxagliptin', 'alogliptin', 'linagliptin', and 'dutogliptin'?
Answers:
 * dipeptidyl peptidase-4 (DPP-4) inhibitors
Predicted Answers:
 * ('DPP-4 inhibitors', 0.21010515)
 * ("Embase search for 'vildagliptin", 0.13462703)
 * ('metformin', 0.11230746)
 * ('Medline', 0.086508729)
 * ('monotherapy', 0.08594954)
The present metaanalysis was designed to assess the effect of DPP-4 inhibitors on blood lipids, verifying possible differences across compounds of this class.METHODS: An extensive search of Medline and the Cochrane Library (any date up to December 31, 2010, restricted to randomized clinical trials, published in English) was performed for all trials containing, in any field, the words "sitagliptin," "vildagliptin," "saxagliptin," "alogliptin," "linagliptin," and/or "dutogliptin." 
Sitagliptin (MK



In [53]:
SHOULD_COUNT_CORRECT += 1
WITHIN_TOP5 += 1

In [54]:
print_prediction(predictions[factoid_ids[22]], rev_vocab, factoid_postprocessor)

Id:
  571e2beabb137a4b0c000006
Question:
  how is oct3 associated with serotonin?
Answers:
 * serotonin clearance
Predicted Answers:
 * ('Organic cation transporter 3', 0.054632295)
 * ('OCT3 mRNA', 0.042553544)
 * ('high-capacity organic cation transporter 3', 0.035562091)
 * ('inhibits their function exclusively through the latter', 0.028476657)
 * ('Organic cation transporter 3 (OCT3) is a high-capacity, low-affinity transporter that mediates bidirectional, sodium-independent transport of dopamine, norepinephrine, epinephrine, serotonin, and histamine.\nThe effect of blockade of either 5-hydroxytryptamine (5-HT)/serotonin transporter', 0.028202601)



In [55]:
print_prediction(predictions[factoid_ids[23]], rev_vocab, factoid_postprocessor, with_context=True)

Id:
  571e40a8bb137a4b0c000009
Question:
  which syndrome is associated with oatp1b1 and oatp1b3 deficiency?
Answers:
 * Rotor syndrome
Predicted Answers:
 * ('Rotor', 0.22170794)
 * ('Rotor syndrome', 0.14347503)
 * ('bilirubin reuptake', 0.05373624)
 * ('liver', 0.023573488)
 * ('Rotor syndrome was linked to mutations predicted to cause complete and simultaneous deficiencies of the organic anion transporting polypeptides OATP1B1 and OATP1B3.\nHere, we analyzed 8 Rotor-syndrome families and found that Rotor', 0.018893724)
Here, we analyzed 8 Rotor-syndrome families and found that Rotor syndrome was linked to mutations predicted to cause complete and simultaneous deficiencies of the organic anion transporting polypeptides OATP1B1 and OATP1B3.
Thus, disruption of hepatic reuptake of bilirubin glucuronide due to coexisting OATP1B1 and OATP1B3 deficiencies explains Rotor-type hyperbilirubinemia. 
Complete OATP1B1 and OATP1B3 deficiency causes human Rotor syndrome by interrupting conjugate

In [56]:
SHOULD_COUNT_CORRECT += 1
WITHIN_TOP5 += 1

In [57]:
print_prediction(predictions[factoid_ids[24]], rev_vocab, factoid_postprocessor, with_context=True, context_char_limit=5000)

Id:
  571e4293bb137a4b0c00000b
Question:
  what is the cause of episodic ataxia type 6?
Answers:
 * EAAT1 mutations
Predicted Answers:
 * ('reduced glutamate uptake', 0.26940683)
 * ('mutations in the gene encoding a glial glutamate transporter', 0.15499985)
 * ('excitatory amino acid transporter-1', 0.10309289)
 * ('reduced glutamate uptake by mutant excitatory amino acid transporter-1', 0.07833489)
 * ('glutamate uptake', 0.066270329)
There are several genetically and clinically distinct forms of this disease, and one of them, episodic ataxia type 6, is caused by mutations in the gene encoding a glial glutamate transporter, the excitatory amino acid transporter-1. So far, reduced glutamate uptake by mutant excitatory amino acid transporter-1 has been thought to be the main pathophysiological process in episodic ataxia type 6. 
Episodic ataxia type 6 represents the first human disease found to be associated with altered function of excitatory amino acid transporter anion channels and 

In [58]:
NOT_EXTRACTABLE += 1
WITHIN_TOP5 += 1

In [84]:
print_prediction(predictions[factoid_ids[25]], rev_vocab, factoid_postprocessor, with_context=True)

Id:
  571f33bd0fd6f91b68000003
Question:
  which gene is responsible for the development of sotos syndrome?
Answers:
 * NSD1 gene
Predicted Answers:
 * ('NSD1', 0.59406751)
 * ('NSD1 gene', 0.18805183)
 * ('Mutations', 0.10041367)
 * ('Mutations in NSD1', 0.074500352)
 * ('macrocephaly', 0.052492421)
Sotos syndrome is a well-known overgrowth syndrome characterized by excessive growth during childhood, macrocephaly, distinctive facial appearance and learning disability. This disorder is caused by mutations or deletions in NSD1 gene
Sotos syndrome (SoS) is a multiple anomaly, congenital disorder characterized by overgrowth, macrocephaly, distinctive facial features and variable degree of intellectual disability. Haploinsufficiency of the NSD1 gene at 5q35.3, arising from 5q35 microdeletions, p



In [60]:
SHOULD_COUNT_CORRECT += 1
WITHIN_TOP5 += 1

In [88]:
print_prediction(predictions[factoid_ids[26]], rev_vocab, factoid_postprocessor, with_context=True)

Id:
  571f5c150fd6f91b68000009
Question:
  which protein is found to be mutated in friedreich's ataxia?
Answers:
 * Frataxin
Predicted Answers:
 * ('Drosophila frataxin', 0.77032572)
 * ('Frataxin', 0.27225748)
 * ('pancreas\nFriedreich', 0.059299029)
 * ('elegans', 0.058782846)
 * ('frataxin deficiency', 0.038463593)
It is generally accepted that Friedreich's ataxia (FRDA) is caused by a deficiency in frataxin expression, a mitochondrial protein involved in iron homeostasis, which mainly affects the brain, dorsal root ganglia of the spinal cord, heart and in certain cases the pancreas
Friedreich's ataxia is a severe neurodegenerative disease caused by the decreased expression of frataxin, a mitochondrial protein that stimulates iron-sulfur (Fe-S) cluster biogenesis
In eukaryotes, frataxin deficiency (FXN) ca



In [62]:
SHOULD_COUNT_CORRECT += 1
WITHIN_TOP5 += 1

In [63]:
print_prediction(predictions[factoid_ids[27]], rev_vocab, factoid_postprocessor)

Id:
  571f5e740fd6f91b6800000b
Question:
  which enzyme is deficient in gaucher's disease?
Answers:
 * Beta glucocerebrosidase
Predicted Answers:
 * ('β-glucocerebrosidase', 0.24500941)
 * ('', 0.15059355)
 * ('glucocerebrosidase', 0.13008626)
 * ('glucocerebrosidase deficiency', 0.10309959)
 * ('Functional glucocerebrosidase', 0.081666328)



In [64]:
SHOULD_COUNT_CORRECT += 1
WITHIN_TOP5 += 1

In [65]:
print_prediction(predictions[factoid_ids[28]], rev_vocab, factoid_postprocessor)

Id:
  571f609c0fd6f91b6800000c
Question:
  which enzyme deficiency can cause gm1 gangliosidoses?
Answers:
 * β-galactosidase
Predicted Answers:
 * ('beta-galactosidase', 0.14753801)
 * ('β-hexosaminidase', 0.13878612)
 * ('β-galactosidase', 0.065652564)
 * ('beta-galactosidose', 0.037685119)
 * ('acid beta-galactosidase (beta-gal)', 0.030214785)



In [66]:
SHOULD_COUNT_CORRECT += 1
WITHIN_TOP5 += 1

In [90]:
print_prediction(predictions[factoid_ids[29]], rev_vocab, factoid_postprocessor, with_context=True)

Id:
  572096c90fd6f91b6800000e
Question:
  which gene is involved in giant axonal neuropathy?
Answers:
 * GAN gene
Predicted Answers:
 * ('1634G', 0.10628324)
 * ('GAN gene\nGiant Axonal Neuropathy', 0.041052084)
 * ('gigaxonin', 0.039144479)
 * ('BTB-KELCH protein Gigaxonin', 0.039069045)
 * ('GAN', 0.037875604)
Giant axonal neuropathy (GAN) is a progressive neurodegenerative disease caused by autosomal recessive mutations in the GAN gene resulting in a loss of a ubiquitously expressed protein, gigaxonin
We describe a toddler with clinical features suggesting giant axonal neuropathy (GAN), whose diagnosis was confirmed by minimally invasive skin biopsy and corroborated by the finding of compound heterozygous mutations involving the GAN gene
Giant Axonal Neuropathy is a pediatric neurodegenerative disord



In [68]:
WRONG_ANSWER_TYPE += 1
WITHIN_TOP5 += 1

In [69]:
print_prediction(predictions[factoid_ids[30]], rev_vocab, factoid_postprocessor, with_context=True)

Id:
  5721f4b30fd6f91b68000011
Question:
  which inherited disorder is known to be caused by mutations in the nemo gene?
Answers:
 * Incontinentia pigmenti or Bloch-Sulzberger syndrome
Predicted Answers:
 * ('CYBB', 0.50796443)
 * ('Incontinentia pigmenti', 0.24256466)
 * ('incontinenia pigmenti', 0.13879405)
 * ('IP', 0.064368941)
 * ('Incontinentia pigmenti (IP)', 0.041788947)
Incontinentia pigmenti (IP) is a rare neurocutaneous disorder with a frequency of 1 in 50,000 newborn, and is associated with mutations in IKBKG gene (NEMO) in Xq28, inherited as an X-linked dominant trait
Mutations in the NEMO gene give rise to a heterogeneous group of disorders, including the X-linked dominant disorder incontinentia pigmenti
De novo NEMO gene deletion (delta4-10)--a cause of incontinentia pigmenti in a female infant
Incontinentia pigmenti (IP) is a rare, inherited, multisystem



In [70]:
NOT_EXTRACTABLE += 1
WRONG_ANSWER_TYPE += 1
WITHIN_TOP5 += 1

In [71]:
print_prediction(predictions[factoid_ids[31]], rev_vocab, factoid_postprocessor, with_context=True)

Id:
  57279ef20fd6f91b68000018
Question:
  which intraflagellar transport (ift) motor protein has been linked to human skeletal ciliopathies?
Answers:
 * Intraflagellar transport (IFT) motor protein DYNC2H1
Predicted Answers:
 * ('DYNC2H1', 0.73991364)
 * ('DYNC2H1, have been linked to human skeletal ciliopathies, including asphyxiating thoracic dystrophy (ATD; also known as Jeune syndrome), Sensenbrenner syndrome, and Mainzer-Saldino syndrome (MZSDS).\nCytoplasmic dynein-2 is the motor for retrograde intraflagellar transport (IFT), and mutations in dynein-2 are known to cause skeletal ciliopathies.\nDeficiency of IFT proteins, including DYNC2H1, underlies a spectrum of skeletal ciliopathies.\nAll six IFT-A components and their motor protein, DYNC2H1', 0.43707862)
 * ('', 0.26879317)
 * ('DYNC2H1, have been linked to human skeletal ciliopathies, including asphyxiating thoracic dystrophy (ATD; also known as Jeune syndrome), Sensenbrenner syndrome, and Mainzer-Saldino syndrome (MZSDS). \

In [72]:
NOT_EXTRACTABLE += 1
SHOULD_COUNT_CORRECT += 1
WITHIN_TOP5 += 1

In [73]:
print_prediction(predictions[factoid_ids[32]], rev_vocab, factoid_postprocessor, with_context=True, context_char_limit=500)

Id:
  5727ab040fd6f91b68000019
Question:
  which gene has been found to be mutant in lesch-nyhan disease patients?
Answers:
 * Hypoxanthine guanine phosphoribosyl transferase (HPRT) gene
Predicted Answers:
 * ('HPRT', 0.25206044)
 * ('exon 8', 0.17887814)
 * ('hypoxanthine-guanine phosphoribosyltransferase', 0.13098602)
 * ('HPRT1 gene', 0.091416165)
 * ('HPRT1', 0.066123664)
We describe a family of seven boys affected by Lesch-Nyhan disease with various phenotypes. Further investigations revealed a mutation c.203T>C in the gene encoding HGprt of all members, with substitution of leucine to proline at residue 68 (p.Leu68Pro)
Lesch-Nyhan disease (LND) is caused by deficiency of hypoxanthine guanine phosphoribosyltransferase (HPRT)
Lesch-Nyhan Disease (LND) is the result of mutations in the X-linked gene encoding the purine metabolic enzyme, hypoxanthine guanine phosph



In [74]:
NOT_EXTRACTABLE += 1
SHOULD_COUNT_CORRECT += 1
WITHIN_TOP5 += 1

## Summary

Total Questions:

In [75]:
len(factoid_questions)

33

Not Extractable:

In [76]:
NOT_EXTRACTABLE

10

In [77]:
CORRECT

4

In [78]:
SHOULD_COUNT_CORRECT

10

In [79]:
WITHIN_TOP5

24

In [80]:
WRONG_ANSWER_TYPE

8

In [81]:
SOMEWHAT_CORRECT

4