New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large dataset #5
Comments
Glad you enjoyed it. :) I haven't looked at these models particularly recently, so could you clarify for me which part is the bottleneck? |
There are many places involving construction of large tensors. For example, calculate the Euclidean distance between PhiControl and PhiTreatment in Tutorial 3,:
Since the Phi layer has 200 nodes, the x2 shape is (Nc,200) and y2 is (Nt,200) ; Then dist shape will be (Nc, Nt) or roughly (100000,100000) in my dataset. Also, calculating the distance of the large matrix will take a long time due to quadratic complexity. |
Hmm, those calculations are only for calculating nearest neighbors in representation space for validation. A couple quick solutions:
Hope this helps a bit! https://proceedings.mlr.press/v162/parikh22a.html |
Thank yous. This helps a lot! |
Hi, this is a great tutorial! Thank you for sharing.
I have a question about implementing Dragonnet with a large dataset (in my case 200k subjects). Since to calculate loss it needs to construct a large matrix (200k x 200k) in float32 dtype, that cannot fit into memory. Do you have any suggestions?
Thanks
The text was updated successfully, but these errors were encountered: