Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add recipe for Medical Corpus #1212

Merged
merged 3 commits into from
Nov 14, 2023
Merged

Conversation

yfyeung
Copy link
Contributor

@yfyeung yfyeung commented Nov 10, 2023

This PR adds a recipe for A dataset of simulated patient-physician medical interviews with a focus on respiratory cases.
Corpus: https://huggingface.co/datasets/yfyeung/medical
Paper: https://www.nature.com/articles/s41597-022-01423-1

Dataset Description

The simulated medical conversation dataset is available on figshare.com. The dataset is divided into two sets of files: audio files of the simulated conversations in mp3 format, and the transcripts of the audio files as text files. There are 272 mp3 audio files and 272 corresponding transcript text files. Each file is titled with three characters and four digits. RES stands for respiratory, GAS represents gastrointestinal, CAR is cardiovascular, MSK is musculoskeletal, DER is dermatological, and the four following digits represent the case number of the respective disease category.

@yfyeung yfyeung changed the title Add Medical Corpus Add recipe for Medical Corpus Nov 11, 2023
Copy link
Collaborator

@pzelasko pzelasko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM!

@pzelasko pzelasko added this to the v1.18 milestone Nov 14, 2023
@pzelasko pzelasko merged commit f3c8168 into lhotse-speech:master Nov 14, 2023
10 checks passed
@yfyeung yfyeung deleted the medical branch November 14, 2023 03:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants