Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent Results with RolDE Approach without Setting Seed #3

Open
bellayqian opened this issue May 31, 2024 · 2 comments
Open

Inconsistent Results with RolDE Approach without Setting Seed #3

bellayqian opened this issue May 31, 2024 · 2 comments

Comments

@bellayqian
Copy link

Dear Elo Lab,
I am very interested in your RolDE approach and would like to apply it to my project. However, when I tried to work on your sample data, I found out that without setting a seed, the results are inconsistent after each run. So basically, if I ran the code below multiple times, each time I would yield a different result for head(RolDE.data1, 5). Could you please tell me why? I also tried to run it 10 times with the same input dataset and saw the overlap of significant proteins, and unfortunately, there were not many overlaps between those 10 results. Any suggestions? Thank you very much for your time and kind support!

Best,
Bella

library(RolDE) data(data1) data("des_matrix1") data1.res<-RolDE(data=data1, des_matrix=des_matrix1, n_cores=3) RolDE.data1<-data1.res$RolDE_Results RolDE.data1<-RolDE.data1[order(as.numeric(RolDE.data1[,2])),] head(RolDE.data1, 5)

@tsvali
Copy link
Collaborator

tsvali commented Jun 5, 2024

Hi!

Thank you for your interest in RolDE. Indeed, there is some randomness associated with the bootstrapping procedures applied by RolDE. Thus, without setting a random seed, the results will be slightly different for different runs. Regarding data1 you have tried, it is a “null” dataset of generated random protein expression values; it has no true differential expression signal between the conditions. This is why the top proteins are rather arbitrary or random and without setting a random seed, will differ from run to run due to RolDEs bootstrapping. If you do the same with data3 instead, which is a semi-simulated proteomics dataset with spike-in (“ups”) proteins, the results should be more consistent from run to run even with different seeds, as in the following example:

library(RolDE) 
data(“data3”) 
data("des_matrix3") 

res_list <- list()
for(i in 1:4){
  set.seed(i)
  data3.res <- RolDE(data=data3, des_matrix=des_matrix3, n_cores=3) 
  RolDE.data3 <- data3.res$RolDE_Results 
  RolDE.data3 <- RolDE.data3[order(as.numeric(RolDE.data3[,2])),] 
  res_list[[i]] <- as.character(RolDE.data3[1:50,1])
}
length(intersect(res_list[[4]][1:10],intersect(res_list[[3]][1:10],intersect(res_list[[1]][1:10],res_list[[2]][1:10])))) #8

I hope this helps,
Best,
Tommi

@bellayqian
Copy link
Author

Hi Tommi,

Thank you for your detailed explanation regarding RolDE's functionality. I appreciate your clarification on the randomness associated with bootstrapping and the nature of the datasets. Your insights on the differences between data1 and data3 are particularly helpful. I'll proceed with testing data3 as suggested.

Best,
Bella

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants