CliniqIR: Retrieval-Based Diagnostic Decision Support

This is the repository for the paper Retrieval-Based Diagnostic Decision Support.

Datasets

Download PubMed Abstracts.

cd CliniqIR_model
rsync -Pav ftp.ncbi.nlm.nih.gov::pubmed/baseline/\*.xml.gz Pubmed/

Download MIMIC-III datasets.
Download DC3 datasets.

Requirements

Install QuickUMLS
jdk-17.0.2
requirements.txt

Data Preprocessing

Each model requires input data to be in a certain format. Sample files have been provided in the Datasets and CliniqIR_model folder. See/Run the Data_Pre_processing_MIMIC_III.py for MIMIC-III data pre-processing.

Use CliniqIR

The index has four fields: pmid, UMLS concepts of an abstract, abstract title and abstract text with the latter two searchable. The source java files have also been provided to allow for custom use.

Building the PubMed Index

Download PubMed Abstracts. Abstracts should be in the directory "CliniqIR_model/Pubmed".
Extract UMLS Concepts from PubMed Abstracts.

  cd Data_Preprocessing
  Python Extract_Pubmed_Concepts.py

Build the index

  cd CliniqIR_model
  java -jar Build_Pubmed_Index.jar -cp LuceneJARFiles2

Searching the PubMed Index

Filter text queries by running QuickUMLS_FIltering.py in the Data_preprocessing directory.
Save filtered queries in the directory "CliniqIR_model/Queries.txt"
Search the index

  cd CliniqIR_model
  java -jar Search_Pubmed_Index.jar -cp LuceneJARFiles2

Evaluate CliniqIR and obtain ensemble results for MIMIC-III.

Calculate the PubMed collection frequency of each disease class label by running PubMed_Frequency.py in the Data preprocessing directory.
Get Clinical BERT's ranks by running Clinical_BERT.py which can be found in the Bert_models directory.
Obtain CliniqIR's query results by searching the PubMed index.
Obtain CliniqIR's ranks and get ensemble results by running Evaluate_Mimic-III.py which can be found in the CliniqIR_model directory.

To use other Clinical BERT or the zero shot baselines

Run Clinical_BERT.py to use Clinical BERT
Run Zero_shot_baselines.py to use the zero-shot baselines.

Credits

Some of the structure in this repo was adopted from https://github.com/ziy/medline-indexer

Authors

Tassallah Amina Abdullahi

Reference

Luca Soldaini and Nazli Goharian. "QuickUMLS: a fast, unsupervised approach for medical concept extraction." MedIR Workshop, SIGIR 2016.
Eickhoff, Carsten, et al. "DC3--A Diagnostic Case Challenge Collection for Clinical Decision Support." Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval. 2019.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CliniqIR: Retrieval-Based Diagnostic Decision Support

Datasets

Requirements

Data Preprocessing

Use CliniqIR

Building the PubMed Index

Searching the PubMed Index

Evaluate CliniqIR and obtain ensemble results for MIMIC-III.

To use other Clinical BERT or the zero shot baselines

Credits

Authors

Reference

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
Bert_models		Bert_models
CliniqIR_model		CliniqIR_model
Data_preprocessing		Data_preprocessing
Datasets		Datasets
README.md		README.md
requirements.txt		requirements.txt

rsinghlab/CliniqIR

Folders and files

Latest commit

History

Repository files navigation

CliniqIR: Retrieval-Based Diagnostic Decision Support

Datasets

Requirements

Data Preprocessing

Use CliniqIR

Building the PubMed Index

Searching the PubMed Index

Evaluate CliniqIR and obtain ensemble results for MIMIC-III.

To use other Clinical BERT or the zero shot baselines

Credits

Authors

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages