-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scatter plot from learned code representations #19
Comments
I showed the scatterplot of ICD9 diagnosis codes, which can be grouped by the ICD9 taxonomy (http://www.icd9data.com/2015/Volume1/default.htm). |
Thanks Ed! |
Hello Ed, My problem statement is to predict similar diagnosis codes using med2vec. For example, If I have 140 Medical codes, embedding dimension size is 200, hidden dimension size is 2000 I have ran your code and got .npz files and find 6 numpy.array variables W_emb, b_output, b_hidden, b_emb, W_output, W_hidden inside it. Which one is used to predict similar codes? (W_emb or W_output) Based on my input, the output W_emb is 140X200 and W_output is 2000X140. Because in this issue https://github.com/mp2893/med2vec/issues/16#issue-403688598(Paste it in chrome) you have mentioned W_output is used to predict neighboring visits. And also how we know the generated embeddings is respect to which diagnosis code? Where that mapping is happened between embeddings and medical codes. Please try to clarify my doubts. |
If you want to find similar diagnosis codes, you shoul use W_emb and b_emb. |
Thanks Ed, When I ran theano code in CPU it's working fine but consumes more time (16 hours for each epoch). If ran it in GPU then it throws Segmentation Fault in this line (cost = f_grad_shared(x, batchD, y, mask, iVector, jVector)). Could you please help me to solve this issue? |
Unfortunately, that error seems to be caused by system-related issues, rather than the algorithm itself (unless it's the NaN error) |
Could you please tell me the versions of CUDA, Nvidia driver and Theano you have used? |
It says in the README that I used Theano 0.7. |
Thanks Ed, But I have also tried TensorFlow version, it takes 16 hours for each epochs because of huge volume of data. Could you please help me to reduce time for epochs? |
That's weird. How can the job take the same amount of time (16 hours) on both CPU and GPU? |
Yes Ed, I have faced some weird issues. cost = f_grad_shared(x, batchD, mask, iVector, jVector) This particular line takes more time Ed, after removing this line model works fine. |
If you are using demographic information, and are using grouped codes for the softmax output label, then you are going to need that line. Otherwise, the model won't be trained at all. |
OK Ed, But I didn't use demographic information and also not perform grouped codes. cost = f_grad_shared(x, batchD, mask, iVector, jVector) I have removed this line and generated the embeddings. The generated embeddings are quite good. I have validated it by using cosine similarity. Could you please confirm whether this approach is correct or not? |
My mistake. The line you deleted is used only when you are using demographic information, but not grouped codes (see line 274 of the source code). |
Is the diagnosis code representation as
|
Technically it should be |
Hello Ed,
In Med2Vec, after creating the model file, you have created a 2D scatter plot using learned code representations. Is there any grouping is performed between the medical codes after creating the model file for scatter plot?
Because in High charts, the coloring is done based on some grouping.
example:
https://jsfiddle.net/gh/get/library/pure/highcharts/highcharts/tree/master/samples/highcharts/demo/scatter/
I have tried to create scatter plot after performing TSNE on embedding. It is created but there is no grouping, the colors are randomly placed. Cluster does not formed.
Can you please help me in understanding this?
Thanks,
SathickIbrahim
The text was updated successfully, but these errors were encountered: