Large cohort #15

sararselitsky · 2018-06-04T13:46:32Z

I tried running X-shift by command line on 112 samples (>100K cells each) using 50 threads and 100G of RAM. It ran for 16 hours and then had an out of memory error. I can keep tweaking the submission parameters for the cluster computer, but I was wondering what the maximum number of cells this has been successfully run on. Besides sub-sampling, is there a parameter I should use to decrease the computation? My cohort will soon increase in 350 samples and I need a method capable of handling 40 million cells.

Thanks!

Sara

nsamusik · 2018-06-04T18:37:55Z

Hi Sara, Sorry to hear that. What operating system are you on? Can you please open your terminal and type “java -version” and tell me what response you are seeing? Nikolay

On Mon, Jun 4, 2018 at 6:46 AM Sara Selitsky ***@***.***> wrote: I tried running X-shift by command line on 112 samples (>100K cells each) using 50 threads and 100G of RAM. It ran for 16 hours and then had an out of memory error. I can keep tweaking the submission parameters for the cluster computer, but I was wondering what the maximum number of cells this has been successfully run on. Besides sub-sampling, is there a parameter I should use to decrease the computation? My cohort will soon increase in 350 samples and I need a method capable of handling 40 million cells. Thanks! Sara — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#15>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADacLwnSuneCILKbMVDnCViy98yJMtvbks5t5To4gaJpZM4UZHlL> .

-- Nikolay

…

---------------------------------------------------------------------------- This message is intended for the named addressee(s) only. It may contain privileged and confidential information and protected by a copyright. Any disclosure, copying or distribution of this message is prohibited and may be unlawful. If you are not the intended recipient, please destroy this message and notify me immediately. Thank you for your cooperation.

sararselitsky · 2018-06-04T18:44:43Z

Sure! See below:
CentOS Linux release 7.3.1611 (Core)
openjdk version "1.8.0_102"
OpenJDK Runtime Environment (build 1.8.0_102-b14)
OpenJDK 64-Bit Server VM (build 25.102-b14, mixed mode)

nsamusik · 2018-06-04T19:26:25Z

And what happens if you try running it with java -Xmx64G ? Still running out of memory? How many FCS files do you have? How many cells do you sample from each FCS file? Do you have a record of the stack trace of the OutOfMemoryError? It could be jn the Vortex.log It may help me understand at what stage is the error is happening

On Mon, Jun 4, 2018 at 11:44 AM Sara Selitsky ***@***.***> wrote: Sure! See below: CentOS Linux release 7.3.1611 (Core) openjdk version "1.8.0_102" OpenJDK Runtime Environment (build 1.8.0_102-b14) OpenJDK 64-Bit Server VM (build 25.102-b14, mixed mode) — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#15 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADacLwGSUxkBRskydM56JwwUs87xlwzwks5t5YAbgaJpZM4UZHlL> .

-- Nikolay

…

---------------------------------------------------------------------------- This message is intended for the named addressee(s) only. It may contain privileged and confidential information and protected by a copyright. Any disclosure, copying or distribution of this message is prohibited and may be unlawful. If you are not the intended recipient, please destroy this message and notify me immediately. Thank you for your cooperation.

sararselitsky · 2018-06-05T16:19:01Z

There are 112 FCS files, each with around 100K cells. The original error I got was from the job submission program, not the program or java, so I am rerunning it. It has currently been running for 19 hours (80 threads, maximum 200G of RAM). Since my cohort will soon more than double in size, I was wondering if you have experience with cohorts of a comparable size and what I can do to improve the speed, besides sub-sampling.

sararselitsky · 2018-06-08T16:18:17Z

I wanted to let you know that X-shift has been running 112 samples with around 100K cells per FCS file, for 3 days and 19 hours. It is running on 80 threads, 200G of RAM. Have you tested a cohort of a comparable size? If so, did you see these types of times? Thanks!

nsamusik · 2018-06-08T16:29:12Z

Yes, that makes sense that it's taking that long. I think it's too much for your system to handle. I suggest that you change the "limit of rows per file" in the data import config and set it to a maximum of 10000, you will end up then with 1.12M cells, which should get clustered in about a day. It will compute the clustering based on that 'core' set of cells and then impute the cluster assignments for the rest of the cells using nearest neighbour classification while writing the output FCS files.

On Fri, Jun 8, 2018 at 9:18 AM Sara Selitsky ***@***.***> wrote: I wanted to let you know that X-shift has been running 112 samples with around 100K cells per FCS file, for 3 days and 19 hours. It is running on 80 threads, 200G of RAM. Have you tested a cohort of a comparable size? If so, did you see these types of times? Thanks! — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#15 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADacL-ovG7J27GRTslHd1Kjg0t83oGQKks5t6qPagaJpZM4UZHlL> .

-- Nikolay

…

---------------------------------------------------------------------------- This message is intended for the named addressee(s) only. It may contain privileged and confidential information and protected by a copyright. Any disclosure, copying or distribution of this message is prohibited and may be unlawful. If you are not the intended recipient, please destroy this message and notify me immediately. Thank you for your cooperation.

sararselitsky · 2018-06-19T20:26:32Z

Ok, thanks!

sararselitsky closed this as completed Jun 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large cohort #15

Large cohort #15

sararselitsky commented Jun 4, 2018

nsamusik commented Jun 4, 2018 via email

sararselitsky commented Jun 4, 2018

nsamusik commented Jun 4, 2018 via email

sararselitsky commented Jun 5, 2018

sararselitsky commented Jun 8, 2018

nsamusik commented Jun 8, 2018 via email

sararselitsky commented Jun 19, 2018

Large cohort #15

Large cohort #15

Comments

sararselitsky commented Jun 4, 2018

nsamusik commented Jun 4, 2018 via email

sararselitsky commented Jun 4, 2018

nsamusik commented Jun 4, 2018 via email

sararselitsky commented Jun 5, 2018

sararselitsky commented Jun 8, 2018

nsamusik commented Jun 8, 2018 via email

sararselitsky commented Jun 19, 2018