Some files in the repo are missing in MIMIC datasets? #11

AliRasekh · 2023-06-12T10:19:12Z

Hello,

First, Thanks for your great project and code.

While running the 1_Generate_HAIM-MIMIC-MM file. I got this error:

FileNotFoundError: [Errno 2] No such file or directory: './data/HAIM/physionet/files/mimiciv/1.0/mimic-cxr-jpg/2.0.0/mimic-cxr-2.0.0-jpeg-txt.csv'

The path and other things are correct and I have downloaded and extracted the following datasets as mentioned:
https://physionet.org/content/mimiciv/1.0/
https://physionet.org/content/mimic-cxr-jpg/2.0.0/

But in the second link, there is no file named "mimic-cxr-2.0.0-jpeg-txt.csv". How can I access that? And is the MIMIC-CXR version the same that you ran your code on it?

Thanks in advance

lrsoenksen · 2023-06-12T14:31:17Z

UPDATE (Jun. 12, 2023) For the publication, our team generated the file 'mimic-cxr-2.0.0-jpeg-txt.csv' by compiling an early-release version of participant notes and text from the images in CXR corresponding to MIMIC-IV. We wanted to add these to this repository, but the data policy from PhysioNet.org states we cannot directly share this compiled data via Git Hub. Physionet is the only one with permission to do so or subsets of the data. This means users need to generate their own mimic-cxr-2.0.0-jpeg-txt.csv based on the released notes and CXR files from Physionet.org once all notes are released. The dataset structure can be inferred from the code. As of June 12, 2023, Physionet has not fully released these notes, but it is likely they are planning to do so as part of their full release of MIMIC-IV. We are very sorry for any inconvenience this may cause.

AliRasekh · 2023-06-12T14:38:52Z

Thanks for your response.

Then one question. If I comment the following lines, does it cause any problem in generating the multimodal data?

#     # Add paths and info to images in cxr
    df_mimic_cxr_jpg =pd.read_csv(core_mimiciv_path + 'mimic-cxr-jpg/2.0.0/mimic-cxr-2.0.0-jpeg-txt.csv')
    df_cxr = pd.merge(df_mimic_cxr_jpg, df_cxr, on='dicom_id')
    # Save
    df_cxr.to_csv(core_mimiciv_path + 'mimic-cxr-jpg/2.0.0/mimic-cxr-2.0.0-metadata.csv', index=False)
    #Read back the dataframe

lrsoenksen · 2023-06-12T14:41:59Z

It should not cause problems, but the input data would not be the same as in the publication. If you don't use the notes of the images, then you may need to comment downstream lines during embedding generation so that the training algorithm is not expecting input text from there. Additionally, you can just make a mimic-cxr-2.0.0-jpeg-txt.csv all with the same text (empty notes).

lrsoenksen closed this as completed Jun 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some files in the repo are missing in MIMIC datasets? #11

Some files in the repo are missing in MIMIC datasets? #11

AliRasekh commented Jun 12, 2023

lrsoenksen commented Jun 12, 2023

AliRasekh commented Jun 12, 2023

lrsoenksen commented Jun 12, 2023

Some files in the repo are missing in MIMIC datasets? #11

Some files in the repo are missing in MIMIC datasets? #11

Comments

AliRasekh commented Jun 12, 2023

lrsoenksen commented Jun 12, 2023

AliRasekh commented Jun 12, 2023

lrsoenksen commented Jun 12, 2023