Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Program keeps crashing R! #110

Closed
AbbyKimball opened this issue Feb 27, 2017 · 9 comments
Closed

Program keeps crashing R! #110

AbbyKimball opened this issue Feb 27, 2017 · 9 comments

Comments

@AbbyKimball
Copy link

Hello! I've tried this a couple of times and R keeps unexpectedly closing. Is this a common problem?

@AbbyKimball
Copy link
Author

Here is the code I have, I followed the directions on the wiki:

install.packages("glmnet")
install.packages("pamr")
install.packages("ggplot2")
install.packages("ggplot2")
install.packages("Rclusterpp")
source("http://bioconductor.org/biocLite.R")
biocLite("flowCore")
biocLite("impute")
install.packages("samr")
install.packages("shiny")
install.packages("brew")
install.packages("devtools")
library("devtools")
install_github('nolanlab/citrus')
library("citrus")
citrus.launchUI()
g++-mp-4.8 -v
nano ~/.R/Makevars
install.packages("Matrix")
install.packages(c("Rcpp","RcppEigen","Rclusterpp"),type="source")
library("devtools")
install_github('nolanlab/citrus')
library("citrus")
citrus.launchUI()
library("citrus")
citrus.launchUI()

You see it begins, but seems to get stuck on the clustering part:

screen shot 2017-02-27 at 4 19 54 pm

And then I get:

screen shot 2017-02-27 at 4 27 35 pm

@SamGG
Copy link

SamGG commented Feb 28, 2017

Hi,
I would try to decrease the number of events before the clustering step. 40 000 events sounds better for testing.
HTH

@AbbyKimball
Copy link
Author

Thanks for the reply :)

From playing around on cytobank it seems like lowering your events results in different CITRUS results. Do you know if these differences are significant?

@AbbyKimball
Copy link
Author

I've tried it now with 5,000 events per sample and 1,000 events per sample. Still crashes.

@rbruggner
Copy link
Collaborator

Hi - sorry you're experiencing crashing. Could you tell me what version of Mac OS X you're using?

@SamGG
Copy link

SamGG commented Mar 1, 2017

Robert is definitevily the guru for Macintosh issues.
Concerning the sampling, here is my opinion. Lowering too much the number of events will lead to some kind of loss of resolution: a real effect might be unseen. I haven't done any test concerning this aspect yet. What I understand from the Citrus algorithm is the following. Let's say we are interested in comparing condition A vs B, each made of 8 or more biological samples (this is the simplest example). Citrus builds a hierarchical clustering of sampled events from FCS files of condition A and B. In the resulting tree, Citrus will then consider only nodes with more than a defined percentage of events. This user defined threshold is applied to the FCS file merging events of conditions A and B. Therefore, if a population is 2% in condition A and 0% in condition B, the percentage in the mix will be 1%. I would then recommand to set the threshold less than 1% for such a population. Let's go with 0.5%. For the nodes of the hierarchical tree that aggregated more then 0.5%, Citrus will test if the percentage of events is statistically different in condition A versus B. If the threshold was set to 2%, the population of interest will certainly be aggregated with events of a different profile, leading to an inhomogeneous cluster of events. Due to the mixing of events, the difference in percentage might be statistically not significant. This sounds like a loss of resolution.
IMHO:

  • don't lower too much the final number of events of the merged file, except for testing the computational setup.
  • as events from conditions A and B are merged, set the percentage of events threshold to a rational value.
  • because of the merging step, Citrus (as well as SPADE and many other algorithms) considers that a population of events is at the same position in the multi-parameters space, ie inter-individuals variation of position are not adjusted and participate to the statistical dispersion.

@AbbyKimball
Copy link
Author

Robert, I looked over the issue you sent along and I added the code you provided for potential memory leak:

install.packages(c("Rcpp","RcppEigen"),type="source")
library("devtools")
install_github('nolanlab/Rclusterpp')
install_github('nolanlab/citrus')

I am on macOS Sierra Version 10.12.1

Sam, per your recommendation here, and on the linked issue I first lowered the Minimum Cluster Size to 1 (the program still crashed), and then to .5 and the program still crashed.

I read somewhere that having uneven samples (4 in one group and 5 in another for this case) could affect the algorithm, is this really true?

I've attached files of the set up.

screen shot 2017-03-01 at 12 50 37 pm

screen shot 2017-03-01 at 12 50 08 pm

screen shot 2017-03-01 at 12 49 57 pm

screen shot 2017-03-01 at 12 50 15 pm

screen shot 2017-03-01 at 12 48 59 pm

@rbruggner
Copy link
Collaborator

Abby, does Citrus still crash after you've installed Rclusterpp from source, e:g:

R> library("devtools")
R> install_github('nolanlab/Rclusterpp')

@SamGG
Copy link

SamGG commented Mar 1, 2017

Dear Abby, I really think that the problem is due to some compilation issue. Parameter setup should not lead to a crash. I think Robert could also explain you how to run the script (created by the Shiny interface) from command line.
Please do find a simple code that may help to understand the faulty code. This code should lead to the following graphics.

library(Rclusterpp)
# Matrix size
nc = 20
nr = 10000
# Generate some random data
set.seed(123)
some.data = matrix(rnorm(nc*nr), ncol = nc)
# Define a population with a different mean value
some.data[1:1000, 1:(nc %/% 4)] = some.data[1:1000, 1:(nc %/% 4)] + 1
# Do clustering
hc = Rclusterpp.hclust(some.data)
hc
# Optional process: cut the hierarchical tree in 10 clusters
hc.cut.members = cutree(hc, k = 10)
hc.cut.mean = apply(some.data, 2, function(x) tapply(x, hc.cut.members, mean))
hc.cut.mean = as.matrix(hc.cut.mean)
plot(hclust(dist(hc.cut.mean)))
# Alternative display
image(some.data)
image(hc.cut.mean)

rplot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants