Skip to content
AUEB at BioASQ 7: Document and Snippet Retrieval
C Python Makefile Other
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
Data
Evaluation
Index
Models
README.md
check_api.py
generate_bioasq_data.py
queries2galago.py
requirements.txt
retrieve_classic_IR.sh
test_api.py

README.md

aueb-bioasq7

AUEB at BioASQ 7: Document and Snippet Retrieval

Downloading data:

First of all you should download the indexed articles found in this link:

https://archive.org/details/AUEBBioASQ7Index
and extract in the the "Index" directory

Then you must download the data found in this link:
https://archive.org/details/AUEB_BioASQ_7_data
and extract in the "Data" directory

Steps:

Step 1: Extract Data.zip in "Data" directory.

Step 2: Extract galago.tar.gz and mongo.tar.gz in "Index" directory.

Step 3: Install requirements.txt using Python's pip. (We used Python's version 3.6)
pip3.6 install requirements.txt

Step 4: Start mongoDB (preferably in a screen session and then detach) ./Index/mongo/mongodb/bin/mongod --dbpath ./Index/mongo/mongo_database

Step 5: retrieve relevant documents using Galago and mongoDB.
You can/should change the paths in the script according to your needs.

If you have to test you own data you should format your questions like in the ./DATA/bioasq_data/trainining7b.json file.
sh retrieve_classic_IR.sh

Your input file should have the following format:

 {
    "questions": [
      {
        "body": "Is Hirschsprung disease a mendelian or a multifactorial disorder?", 
        "id": "55031181e9bde69634000014"
      },
      {
        "body": "What is being measured with an accelerometer in back pain patients", 
        "id": "533f9df0c45e133714000016"
      },
      ...
    ]
 }

Step 6: Load model and extract emitions
The pretrianed weights for the models can be found in folder "PretrainedWeightsAndVectors".
In the subfolder "bioasq7_bert_jpdrmm_2L_0p01_run_0" one can found the pretrained weights of JPDRMM model using Bert embeddings.
In the subfolder "bioasq_jpdrmm_2L_0p01_run_0" one can found the pretrained weights of JPDRMM model using W2V embeddings.
You can run the models using the following commands:

python extract_submition_w2v_jpdrmm.py

or

python extract_submition_bert_jpdrmm.py

for W2V-JPDRMM and BERT-JPDRMM respectively.

You should change the following paths according to your data files' paths.
For W2V-JPDRMM:

###########################################################
eval_path                   = './Evaluation/eval/run_eval.py'
retrieval_jar_path          = './Evaluation/dist/my_bioasq_eval_2.jar'
###########################################################
w2v_bin_path                = './Data/PretrainedWeightsAndVectors/pubmed2018_w2v_30D.bin'
idf_pickle_path             = './Data/PretrainedWeightsAndVectors/idf.pkl'
###########################################################
resume_from                 = './Data/bioasq_jpdrmm_2L_0p01_run_0/best_dev_checkpoint.pth.tar'
###########################################################
b                           = 5 # sys.argv[1]
f_in1                       = './Data/test_batch_{}/BioASQ-task7bPhaseA-testset{}'.format(b, b)
f_in2                       = './Data/test_batch_{}/bioasq7_bm25_top100/bioasq7_bm25_top100.test.pkl'.format(b)
f_in3                       = './Data/test_batch_{}/bioasq7_bm25_top100/bioasq7_bm25_docset_top100.test.pkl'.format(b)
odir                        = './Outputs/test_jpdrmm_high_batch{}/'.format(b)
###########################################################

For BERT-JPDRMM:

###########################################################
f_in1               = './Data/test_batch_5/BioASQ-task7bPhaseA-testset5'
f_in2               = './Data/test_batch_5/bioasq7_bm25_top100/bioasq7_bm25_top100.test.pkl'
f_in3               = './Data/test_batch_5/bioasq7_bm25_top100/bioasq7_bm25_docset_top100.test.pkl'
odir                = './Outputs/test_bert_jpdrmm_high_batch5/'
###########################################################
eval_path           = './Evaluation/eval/run_eval.py'
retrieval_jar_path  = './Evaluation/dist/my_bioasq_eval_2.jar'
###########################################################
w2v_bin_path        = './Data/PretrainedWeightsAndVectors/pubmed2018_w2v_30D.bin'
idf_pickle_path     = './Data/PretrainedWeightsAndVectors/idf.pkl'
###########################################################
resume_from         = './Data/PretrainedWeightsAndVectors/bioasq7_bert_jpdrmm_2L_0p01_run_0/best_checkpoint.pth.tar'
resume_from_bert    = './Data/PretrainedWeightsAndVectorsbioasq7_bert_jpdrmm_2L_0p01_run_0/best_bert_checkpoint.pth.tar'
cache_dir           = './Data/PretrainedWeightsAndVectors/bert_cache/'
###########################################################

Step 7:

You can’t perform that action at this time.