-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large cohort #15
Comments
Hi Sara,
Sorry to hear that. What operating system are you on? Can you please open
your terminal and type “java -version” and tell me what response you are
seeing?
Nikolay
On Mon, Jun 4, 2018 at 6:46 AM Sara Selitsky ***@***.***> wrote:
I tried running X-shift by command line on 112 samples (>100K cells each)
using 50 threads and 100G of RAM. It ran for 16 hours and then had an out
of memory error. I can keep tweaking the submission parameters for the
cluster computer, but I was wondering what the maximum number of cells this
has been successfully run on. Besides sub-sampling, is there a parameter I
should use to decrease the computation? My cohort will soon increase in 350
samples and I need a method capable of handling 40 million cells.
Thanks!
Sara
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#15>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADacLwnSuneCILKbMVDnCViy98yJMtvbks5t5To4gaJpZM4UZHlL>
.
--
Nikolay
…----------------------------------------------------------------------------
This message is intended for the named addressee(s) only. It may contain
privileged and confidential information and protected by a copyright. Any
disclosure, copying or distribution of this message is prohibited and may
be unlawful. If you are not the intended recipient, please destroy this
message and notify me immediately. Thank you for your cooperation.
|
Sure! See below: |
And what happens if you try running it with java -Xmx64G ? Still running
out of memory? How many FCS files do you have? How many cells do you sample
from each FCS file? Do you have a record of the stack trace of the
OutOfMemoryError? It could be jn the Vortex.log
It may help me understand at what stage is the error is happening
On Mon, Jun 4, 2018 at 11:44 AM Sara Selitsky ***@***.***> wrote:
Sure! See below:
CentOS Linux release 7.3.1611 (Core)
openjdk version "1.8.0_102"
OpenJDK Runtime Environment (build 1.8.0_102-b14)
OpenJDK 64-Bit Server VM (build 25.102-b14, mixed mode)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#15 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADacLwGSUxkBRskydM56JwwUs87xlwzwks5t5YAbgaJpZM4UZHlL>
.
--
Nikolay
…----------------------------------------------------------------------------
This message is intended for the named addressee(s) only. It may contain
privileged and confidential information and protected by a copyright. Any
disclosure, copying or distribution of this message is prohibited and may
be unlawful. If you are not the intended recipient, please destroy this
message and notify me immediately. Thank you for your cooperation.
|
There are 112 FCS files, each with around 100K cells. The original error I got was from the job submission program, not the program or java, so I am rerunning it. It has currently been running for 19 hours (80 threads, maximum 200G of RAM). Since my cohort will soon more than double in size, I was wondering if you have experience with cohorts of a comparable size and what I can do to improve the speed, besides sub-sampling. |
I wanted to let you know that X-shift has been running 112 samples with around 100K cells per FCS file, for 3 days and 19 hours. It is running on 80 threads, 200G of RAM. Have you tested a cohort of a comparable size? If so, did you see these types of times? Thanks! |
Yes, that makes sense that it's taking that long. I think it's too much for
your system to handle. I suggest that you change the "limit of rows per
file" in the data import config and set it to a maximum of 10000, you will
end up then with 1.12M cells, which should get clustered in about a day. It
will compute the clustering based on that 'core' set of cells and then
impute the cluster assignments for the rest of the cells using nearest
neighbour classification while writing the output FCS files.
On Fri, Jun 8, 2018 at 9:18 AM Sara Selitsky ***@***.***> wrote:
I wanted to let you know that X-shift has been running 112 samples with
around 100K cells per FCS file, for 3 days and 19 hours. It is running on
80 threads, 200G of RAM. Have you tested a cohort of a comparable size? If
so, did you see these types of times? Thanks!
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#15 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADacL-ovG7J27GRTslHd1Kjg0t83oGQKks5t6qPagaJpZM4UZHlL>
.
--
Nikolay
…----------------------------------------------------------------------------
This message is intended for the named addressee(s) only. It may contain
privileged and confidential information and protected by a copyright. Any
disclosure, copying or distribution of this message is prohibited and may
be unlawful. If you are not the intended recipient, please destroy this
message and notify me immediately. Thank you for your cooperation.
|
Ok, thanks! |
I tried running X-shift by command line on 112 samples (>100K cells each) using 50 threads and 100G of RAM. It ran for 16 hours and then had an out of memory error. I can keep tweaking the submission parameters for the cluster computer, but I was wondering what the maximum number of cells this has been successfully run on. Besides sub-sampling, is there a parameter I should use to decrease the computation? My cohort will soon increase in 350 samples and I need a method capable of handling 40 million cells.
Thanks!
Sara
The text was updated successfully, but these errors were encountered: