Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some files in the repo are missing in MIMIC datasets? #11

Closed
AliRasekh opened this issue Jun 12, 2023 · 3 comments
Closed

Some files in the repo are missing in MIMIC datasets? #11

AliRasekh opened this issue Jun 12, 2023 · 3 comments

Comments

@AliRasekh
Copy link

Hello,

First, Thanks for your great project and code.

While running the 1_Generate_HAIM-MIMIC-MM file. I got this error:

FileNotFoundError: [Errno 2] No such file or directory: './data/HAIM/physionet/files/mimiciv/1.0/mimic-cxr-jpg/2.0.0/mimic-cxr-2.0.0-jpeg-txt.csv'

The path and other things are correct and I have downloaded and extracted the following datasets as mentioned:
https://physionet.org/content/mimiciv/1.0/
https://physionet.org/content/mimic-cxr-jpg/2.0.0/

But in the second link, there is no file named "mimic-cxr-2.0.0-jpeg-txt.csv". How can I access that? And is the MIMIC-CXR version the same that you ran your code on it?

Thanks in advance

@lrsoenksen
Copy link
Owner

UPDATE (Jun. 12, 2023) For the publication, our team generated the file 'mimic-cxr-2.0.0-jpeg-txt.csv' by compiling an early-release version of participant notes and text from the images in CXR corresponding to MIMIC-IV. We wanted to add these to this repository, but the data policy from PhysioNet.org states we cannot directly share this compiled data via Git Hub. Physionet is the only one with permission to do so or subsets of the data. This means users need to generate their own mimic-cxr-2.0.0-jpeg-txt.csv based on the released notes and CXR files from Physionet.org once all notes are released. The dataset structure can be inferred from the code. As of June 12, 2023, Physionet has not fully released these notes, but it is likely they are planning to do so as part of their full release of MIMIC-IV. We are very sorry for any inconvenience this may cause.

@AliRasekh
Copy link
Author

Thanks for your response.

Then one question. If I comment the following lines, does it cause any problem in generating the multimodal data?

#     # Add paths and info to images in cxr
    df_mimic_cxr_jpg =pd.read_csv(core_mimiciv_path + 'mimic-cxr-jpg/2.0.0/mimic-cxr-2.0.0-jpeg-txt.csv')
    df_cxr = pd.merge(df_mimic_cxr_jpg, df_cxr, on='dicom_id')
    # Save
    df_cxr.to_csv(core_mimiciv_path + 'mimic-cxr-jpg/2.0.0/mimic-cxr-2.0.0-metadata.csv', index=False)
    #Read back the dataframe

@lrsoenksen
Copy link
Owner

It should not cause problems, but the input data would not be the same as in the publication. If you don't use the notes of the images, then you may need to comment downstream lines during embedding generation so that the training algorithm is not expecting input text from there. Additionally, you can just make a mimic-cxr-2.0.0-jpeg-txt.csv all with the same text (empty notes).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants