MedBench

A Russian Medical language understanding Benchmark is the set of NLP tasks on medical textual data for the Russian language.

This repository contains code and data to reproduce the results of the paper RuMedBench: A Russian Medical Language Understanding Benchmark.

Video from the AIME 2022 conference

Tasks Descriptions

RuMedTop3* is the task for diagnosis prediction from a raw medical text, including patient symptoms and complaints.
RuMedSymptomRec* Given an incomplete medical text, the task is to recommend the best symptom to check or verify.
RuMedDaNet is the yes/no question answering task in the range of medical-related domains (pharmacology, anatomy, therapeutic medicine, etc).
RuMedNLI is the natural language inference task in the clinical domain. The data is the full translated counterpart of MedNLI data.
RuMedNER is the task of named entity recognition in drug-related user reviews. The data is from the RuDReC repo.
ECG2Pathology is the task of assessment the quality of multilabel classification on ECG signals from the PTB-XL dataset.

*Both tasks are based on the RuMedPrime dataset.

Baselines & Results

We have implemented several baseline models; please see details in the paper.

Accuracy is the base metric for all tasks evaluation. For some tasks, additional metrics are used:

RuMedTop3 and RuMedSymptomRec - Hit@3
RuMedNER - F1-score

Test results:

Model	RuMedTop3	RuMedSymptomRec	RuMedDaNet	RuMedNLI	RuMedNER	ECG2Pathology	RuMedOverall
Naive	10.58/22.02	1.93/5.30	50.00	33.33	93.66/51.96	1.15	29.53
Feature-based	49.76/72.75	32.05/49.40	51.95	59.70	94.40/62.89	-	58.46
BiLSTM	40.88/63.50	20.24/31.33	52.34	60.06	94.74/63.26	-	53.87
RuBERT	39.54/62.29	18.55/34.22	67.19	77.64	96.63/73.53	-	61.44
RuPoolBERT	47.45/70.44	34.94/52.05	71.48	77.29	96.47/73.15	-	67.20
RuBioBERT*	43.55/68.86	28.94/44.55	53.91	80.31	96.63/75.97	-	62.69
RuBioRoBERTa*	46.72/72.87	44.01/58.95	76.17	82.77	97.19/77.81	-	71.54
Human	25.06/48.54	7.23/12.53	93.36	83.26	96.09/76.18	39.34	58.13

We define the overall model score as mean over all metric values (with prior averaging in the case of two metrics).

* this is implementation from the paper RuBioRoBERTa: a pre-trained biomedical language model for Russian language biomedical text mining (repository).

You can find the extension of this benchmark (with closed test sets) on the MedBench platform.

How to Run

Please refer to the code/ directory.

Contact

If you have any questions, please post a Github issue or email the authors.

Citation

@misc{blinov2022rumedbench,
    title={RuMedBench: A Russian Medical Language Understanding Benchmark},
    author={Pavel Blinov and Arina Reshetnikova and Aleksandr Nesterov and Galina Zubkova and Vladimir Kokh},
    year={2022},
    eprint={2201.06499},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
code		code
data		data
lb_submissions		lb_submissions
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code

code

data

data

lb_submissions

lb_submissions

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

MedBench

Tasks Descriptions

Baselines & Results

How to Run

Contact

Citation

About

Contributors 2

Languages

License

sb-ai-lab/MedBench

Folders and files

Latest commit

History

Repository files navigation

MedBench

Tasks Descriptions

Baselines & Results

How to Run

Contact

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages