Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Better ways to improve team-wide understanding of MIMIC datasets #5

Closed
kdg1993 opened this issue Nov 28, 2022 · 14 comments
Closed
Assignees
Labels
Discussion Extra attention is needed

Comments

@kdg1993
Copy link
Collaborator

kdg1993 commented Nov 28, 2022

What

  • Discuss important things about MIMIC dataset that the whole team should know
  • Propose formats to analyze and share important points about MIMIC

Why

  • MIMIC has a more complicated data structure than CheXpert
  • It was one of the key issues of last week's meeting
  • Team-wide common reference can improve the efficiency of conversation in meeting

How

  • My simple suggestion is to make a notebook file for EDA MIMIC. Any further suggestions would be very helpful and appreciated!
@juppak
Copy link
Collaborator

juppak commented Dec 5, 2022

Comment

  • MIMIC dataset consist of 3 parts
    1. original image files (JPG format)
    2. data description csv files
    3. labeler csv files.
  • Image files and labeler files are similar with CheXpert dataset.
  • Therefore handling data description csv file is important.
  • Before make dataloader of MIMIC like CheXpert dataloader, pre-processing part must be needed.

I saw JIEON made that part! i think that looks good.
I think it would be more accurate for JIEON to explain the code. @jieonh

@jieonh
Copy link
Collaborator

jieonh commented Dec 5, 2022

Comment
Matching formats of CheXpert and MIMIC csv has been completed.

  • The clinical information part is all the same.

  • However, since view was not defined in labeler csv for mimic, information about view was retrieved from metadata csv and combined with labeler csv.

  • Metadata has about four columns related to view, but only the ViewPosition column (containing information about frontal/lateral, refer to the screenshot below) is used.

  • If necessary, it is possible to compare and analyze the remaining three columns that I mentioned in the previous bullet, but I didn't work on it for now because I thought it wasn't a priority.

  • Additionally, as I mentioned shortly in the last meeting, the difference in the number of data between the labeler csv (227835) and metadata csv (377110) is because MIMIC dataset contain several images(different views) per one study_id (227835). Which means all we have to do is set study_id as an index and combine dataframes.

image

@kdg1993
Copy link
Collaborator Author

kdg1993 commented Dec 5, 2022

Thank you for your accurate & kind reply for both of you @juppak @jieonh !! 👍

@jieonh I am now curious about the original code of the captured image above also has a kind of EDA things?

@jieonh
Copy link
Collaborator

jieonh commented Dec 5, 2022

@kdg1993 Could you explain it in more detail what you meant by EDA things?
I just did brief analysis about columns, missing values and some other stuffs. Visualizations are not included.

@kdg1993
Copy link
Collaborator Author

kdg1993 commented Dec 5, 2022

I believe you absolutely got what I mean @jieonh . What I wanted to ask is the visualization and somewhat analysis of the four csv files.

Thanks for telling me about your current work 😄.
Since I digging the MIMICE csv data for now (simply due to the lack of knowledge of me about it), I just wanted to ask you to share the codes if you have some.

@jieonh
Copy link
Collaborator

jieonh commented Dec 5, 2022

I uploaded some codes that i was working on for reference! /home/MIMIC_code/jieon/csv_lab
(It may not be well organized since it was just for personal experiment 😂)

@kdg1993
Copy link
Collaborator Author

kdg1993 commented Dec 5, 2022

As far as I get it, advanced visualization and statistical analysis of MIMIC and BRAX are needed and could help the whole team to boost the knowledge about data itself.

Since @seoulsky-field has a plan to do the EDA, I think kyoungmin might help us with it.
Also, I exploring the csv data now for my personal knowledge. So I wanna ask kyoungmin will handle both MIMIC or BRAX, or choose one of them.

@seoulsky-field
Copy link
Owner

Thank you for sharing codes, @jieonh !

And, I planned to choose one of MIMIC or BRAX. (It depends on team's current progress.)
But I think I can do both of them, so I'll do MIMIC first and BRAX is the next!

@kdg1993
Copy link
Collaborator Author

kdg1993 commented Dec 5, 2022

Just a small question, does anyone have any idea which branch and directory the EDA code should be in?

For example, in the feature/notebook branch or tutorial branch?

@seoulsky-field
Copy link
Owner

How about in notebook directory, docs branch?

@kdg1993
Copy link
Collaborator Author

kdg1993 commented Dec 5, 2022

The notebook directory seems fine for me also but... not quite sure about the docs branch considering the .md file in the docs directory

@seoulsky-field
Copy link
Owner

Oh, I misunderstand that you want to make a new branch like a 'feature/notebook'.
But I checked 'feature/notebook' branch already existed and Yisak uploaded notebook files in feature/notebook branch.

So, my opinion is we will upload EDA notebooks in notebook directory, feature/notebook branch.

@kdg1993
Copy link
Collaborator Author

kdg1993 commented Dec 5, 2022

It looks good to me! BTW, I think the notebook file naming could be difficult but this is not a critical issue :)

@seoulsky-field
Copy link
Owner

I think "EDA_[dataset].ipynb" is good to see.
Because EDA notebook files are generally just "one" each of datasets.

For example, EDA_CheXpert.ipynb, EDA_MIMIC.ipynb.

@kdg1993 kdg1993 closed this as completed Dec 9, 2022
@seoulsky-field seoulsky-field added this to the Dataset: MIMIC milestone Dec 30, 2022
@seoulsky-field seoulsky-field added the Discussion Extra attention is needed label Dec 30, 2022
@seoulsky-field seoulsky-field modified the milestones: Dataset: MIMIC, Modality: X-ray Jan 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Discussion Extra attention is needed
Projects
None yet
Development

No branches or pull requests

6 participants