-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A few questions #5
Comments
Hi Xianlong,
|
Hi Ed, Thanks for the reply! For 2. Have you try to train the model entirely on the synthetic data? if the model which performs well on the synthetic data can also performs well on the real data (kind of like training and validation sets), that I think will be a strong argument that synthetic data is really good, am I right? Also, as you mentioned heart-failure prediction model, I was wondering are you also generating the label of the EHR data? For example, heart-failure will be 1 and control will be 0 (or say can this model be used to generated labeled data? Like adding the label as the last column of the data.) Thank you |
Hi Xianlong, Figure 3 and 7 in my paper is exactly what you described. I trained logistic regression classifiers with both real and synthetic data, then tested them on held-out real data. There are many details that cannot be covered here, so I recommend you read my paper. You can generate labeled dataset in many ways. You can add an additional column like you suggested. Or you can develop a conditional generator. In my case, I trained two separate medGANs, one for case dataset, the other for control dataset. But as I said, this experiment was not rigorously conducted, so I can't say that my method is optimal. Thanks, |
cool! I didn't see the connection between these two at the beginning. Thanks! |
Hello Ed,
Thanks for sharing this great work with us!
After having trouble accessing the EHR dataset, I was wondering if we can generate synthetic data and I read this paper.
I have a few questions though:
It seems to me sequential patient data is more usable for many tasks, have you try to generate this kind of data? (as you mentioned in future work), for example, treat each patient as a matrix, each row will be a visit.
Have you try to do some real world tasks on synthetic data? If yes, can we trust the result we got form the synthetic data?
Thanks!
Xianlong
The text was updated successfully, but these errors were encountered: