pretty low GRN scores obtained #8

JianmeiZhong · 2018-04-20T13:10:22Z

Hi,
I work through most part of the analysis pipeline but the results of GRN scores obtained via grnscores() in specified paths SP_10.SP_8 as tSNE and barplot shows.the grn scores have different levels compared to yours presented in several exemplary analysis,that's level of 0.0* versus 0.* .Sometimes,the GRNscores plot has one or two genes with low scores in both directions.I don't know if there's some wrong configuration in my pipeline.If my source and endpoints are correct and reliable,low scores level are possible for some datasets and it's available?

another questions asking for your guide :)
known knowledge this project based on are less subpopulations and increasing number of cells along differentiation process.So,k=120 was set according to my understanding in function buildGRN() and creatKNN() to obtain relatively less subpopulations as tSNE plot shows.k value is different a lot from yours.It might be a cause of low GRN scores?And if a path didn't go through every subpopulation in non-branched cells differentiation,gene expression dynamics with this path can represent entire process of changes?

Thank a lot !

Jm Zhong

JianmeiZhong · 2018-04-23T15:11:09Z

add:every time running the grnscores(),warning messages were generated as follows.Hope this helps.
[1] 0
There were 20 warnings (use warnings() to see them)

warnings()
Warning messages:
1: In if (length(bla) == 1 & names(bla) == r) { ... :
the condition has length > 1 and only the first element will be used
2: In if (length(bla) == 1 & names(bla) == r) { ... :
the condition has length > 1 and only the first element will be used
3: In if (length(bla) == 1 & names(bla) == r) { ... :
the condition has length > 1 and only the first element will be used
4: In if (length(bla) == 1 & names(bla) == r) { ... :

edroaldo · 2018-04-23T15:13:40Z

Hi, I am sorry for the delay! Thanks for using CellRouter. Regarding your first question: 1) There is nothing wrong with your pipeline. The values are low though. I think this could be due to dropouts in your data because the dynamics of your regulators/target genes seem heterogeneous. What you could do is the following: in the processtrajectory function, increase "neighs" from 1 to 2 or 3. This will use the second or third neighborhood of each cell along the trajectory to smoothen the kinetic trend. Maybe, by chooing neighs=1 in your dataset does not provide enough information for a good smoothing. Let me know if that works. *cellrouter <- processtrajectories(cellrouter, genes2use, path.rank=ranks[3], num.cells = 3, neighs = 2) * What could be also happing is that you may not have genes that monotonically increase or decrease in expression along your trajectory but are rather more transiently expressed. Regarding your second question: 2) I do not a larger k will affect the grnscore that much. It should not. I am trying to optimize the clustering step that CellRouter does for larger datasets. For now, choosing a larget k is totally fine. No, the selected path will not represent the entire process of differentiation. I am working on now to also include other paths dynamics. The trajectory identified represents the one where each cell along a particular path is more similar to each other, according to our optimization algorithm. There might be many other paths as well. I am extending the platform in this direction as well. Please, keep me posted on how it goes! Thanks! 2018-04-20 9:10 GMT-04:00 JianmeiZhong <notifications@github.com>:

…

Hi, I work through most part of the analysis pipeline but the results of GRN scores obtained via grnscores() in specified paths SP_10.SP_8 as tSNE and barplot shows.the grn scores have different levels compared to yours presented in several exemplary analysis,that's level of 0.0* versus 0.* .Sometimes,the GRNscores plot has one or two genes with low scores in both directions.I don't know if there's some wrong configuration in my pipeline.If my source and endpoints are correct and reliable,low scores level are possible for some datasets and it's available? [image: barplot] <https://user-images.githubusercontent.com/37535747/39050789-438495c2-44d9-11e8-97ba-bd6a4ea498d0.png> another questions asking for your guide :) known knowledge this project based on are less subpopulations and increasing number of cells along differentiation process.So,k=120 was set according to my understanding in function buildGRN() and creatKNN() to obtain relatively less subpopulations as tSNE plot shows.k value is different a lot from yours.It might be a cause of low GRN scores?And if a path didn't go through every subpopulation in non-branched cells differentiation,gene expression dynamics with this path can represent entire process of changes? [image: tsne] <https://user-images.githubusercontent.com/37535747/39050778-385bd228-44d9-11e8-9e05-571bf8013e36.jpg> Thank a lot ! Jm Zhong — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#8>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AJqUR9y39O0haayxIve1In_OXKudlT-Yks5tqd4-gaJpZM4Tdb6B> .

-- Edroaldo

edroaldo · 2018-04-23T15:14:53Z

OK. I will check that out as well. I have to seen this msgs erlier. Thanks a lot for your comments! 2018-04-23 11:11 GMT-04:00 JianmeiZhong <notifications@github.com>:

…

add:every time running the grnscores(),warning messages were generated as follows.Hope this helps. [1] 0 There were 20 warnings (use warnings() to see them) warnings() Warning messages: 1: In if (length(bla) == 1 & names(bla) == r) { ... : the condition has length > 1 and only the first element will be used 2: In if (length(bla) == 1 & names(bla) == r) { ... : the condition has length > 1 and only the first element will be used 3: In if (length(bla) == 1 & names(bla) == r) { ... : the condition has length > 1 and only the first element will be used 4: In if (length(bla) == 1 & names(bla) == r) { ... : — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#8 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AJqURyeBCApYEN8C2Lrmh08QHZfj68SGks5tre8OgaJpZM4Tdb6B> .

-- Edroaldo

JianmeiZhong · 2018-04-23T16:29:20Z

Thanks! I got some very useful answers from your nice explications. :)
i'll try different neigh value.and yes,the dataset contains 5000~6000 cells.It's a large one.but larger k value,longer time it takes.if k value don't affect the grnscore that much,lower k value I'll set referring to your setting.And I called back some expression of genes using scImpute(WV Li et.al,2018) before and used normalized expression as input.
As for the monotonically increase or decrease genes,many genes are observed as up or down with obvious trends along the non-brached process of differentiation(five cell-stages,for example).I just cannot figure out how the numbers of genes represented in GRNscore plot so less in some samples.
Maybe the enpoint subpopulations SP_8 are too small to represent the whole last cell-stage(SP_1,SP_2,SP_5,SP_7,SP_8,S_12,but no marker genes are found for this cell-stage)?I don't know...

JianmeiZhong · 2018-04-23T16:53:00Z

or if there's any way that subpopulation id of each cell can be customized coercively?That will be a absolutely sole path which all cells can be included. grin:

JianmeiZhong · 2018-04-23T17:03:30Z

I tried neigh=2 in another sample,nothing seems changed.

edroaldo · 2018-04-24T11:42:23Z

How many genes are you using for GRN reconstruction? I noticed that, for some datasets, a larger GRN produces more interesting results because each regulator has more target genes. If you have enough memory, you can do it using all genes. If not, you can select more genes from the pca analysis at the top of your script to have more genes in the genes2use variable. Let me know if you decide to try it. Thanks!

…

On Mon, Apr 23, 2018, 1:03 PM JianmeiZhong ***@***.***> wrote: I tried neigh=2 in another sample,nothing seems changed. [image: sample2] <https://user-images.githubusercontent.com/37535747/39141763-466ab774-475b-11e8-920f-97ce4143a0b1.png> — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#8 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AJqUR5HAwFz1cgHtNcQiQUSfuEhRf-D3ks5trgligaJpZM4Tdb6B> .

edroaldo · 2018-04-24T11:44:30Z

Also, would you mind to show me the commands you are using to generate these plots. Just want to double check how your defining the cutoffs... Thanks! On Tue, Apr 24, 2018, 7:42 AM Edroaldo Lummertz da Rocha <edroaldo@gmail.com> wrote:

…

How many genes are you using for GRN reconstruction? I noticed that, for some datasets, a larger GRN produces more interesting results because each regulator has more target genes. If you have enough memory, you can do it using all genes. If not, you can select more genes from the pca analysis at the top of your script to have more genes in the genes2use variable. Let me know if you decide to try it. Thanks! On Mon, Apr 23, 2018, 1:03 PM JianmeiZhong ***@***.***> wrote: > I tried neigh=2 in another sample,nothing seems changed. > [image: sample2] > <https://user-images.githubusercontent.com/37535747/39141763-466ab774-475b-11e8-920f-97ce4143a0b1.png> > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <#8 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AJqUR5HAwFz1cgHtNcQiQUSfuEhRf-D3ks5trgligaJpZM4Tdb6B> > . >

JianmeiZhong · 2018-05-02T10:06:13Z

mmmm....my apologies for missing your reply and my delayed respond.:)
I tried to put all genes with non-zero variance into calculation of GRN as you suggested and more genes were presented in GRNscores plot.increasing number of input genes generated more nodes and more GRN genes shown but with pretty low GRN scores mentioned before.more closer endpoint to source subpopulation was tested to have higher GRN scores,for example 0.1 for SNCA from SP_10 to SP_3.

Here are commands generating the low-scores plot and several previous steps.
### selecting genes to use as regulated along developmental trajectories
pca <- prcomp(t(ndata),scale=TRUE,center=TRUE)
loadings <- pca$rotation
num_pc <- 5
quantile <- 0.975
genes2use <- unique(as.vector(unlist(apply(loadings[,1:num_pc],2,function(x){names(x[which(abs(x) >= quantile(x,quantile))])}))))
ggrn <- buildGRN('Hs',ndata,genes2use,2,'results/GRN.R')
cellrouter <- findsubpopulations(cellrouter,90,'jaccard','results/kNN_network.gml')
cellrouter <- createKNN(cellrouter,90,'jaccard','results/paths/kNN_network_trajectory.gml')
tfs <- find_tfs(species = 'Hs')
t <- c('SP_10.SP_8')
x <- grnscores(cellrouter,tfs,t,direction='both',dir.targets='up',columns=1,width=8,height=5,flip=T,filename=paste('results/',t,sep=''))

edroaldo · 2018-05-08T16:29:48Z

I think this should be fine. I have not seen GRN score that low but your analysis is correct. So, this something related to your dataset. I am testing in other datasetsas well. Will let you know what I get. Thanks! 2018-05-02 6:06 GMT-04:00 JianmeiZhong <notifications@github.com>:

…

mmmm....my apologies for missing your reply and my delayed respond.:) I tried to put all genes with non-zero variance into calculation of GRN as you suggested and more genes were presented in GRNscores plot.increasing number of input genes generated more nodes and more GRN genes shown but with pretty low GRN scores mentioned before.more closer endpoint to source subpopulation was tested to have higher GRN scores,for example 0.1 for SNCA from SP_10 to SP_3. Here are commands generating the low-scores plot and several previous steps. ### selecting genes to use as regulated along developmental trajectories pca <- prcomp(t(ndata),scale=TRUE,center=TRUE) loadings <- pca$rotation num_pc <- 5 quantile <- 0.975 genes2use <- unique(as.vector(unlist(apply(loadings[,1:num_pc],2, function(x){names(x[which(abs(x) >= quantile(x,quantile))])})))) ggrn <- buildGRN('Hs',ndata,genes2use,2,'results/GRN.R') cellrouter <- findsubpopulations(cellrouter,90,'jaccard','results/kNN_ network.gml') cellrouter <- createKNN(cellrouter,90,'jaccard','results/paths/kNN_ network_trajectory.gml') tfs <- find_tfs(species = 'Hs') t <- c('SP_10.SP_8') x <- grnscores(cellrouter,tfs,t,direction='both',dir.targets=' up',columns=1,width=8,height=5,flip=T,filename=paste('results/',t,sep='')) — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#8 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AJqUR1e5ZfBAkM5Lhgi6O2ljZFMG53E5ks5tuYUWgaJpZM4Tdb6B> .

-- Edroaldo

edroaldo · 2018-05-09T12:19:15Z

Hi, I noticed that increasing k in the createKNN function provided better results in the dataset generated by Paul et al (Cell 2015). Could you please try to increase k in the createKNN function? Please, let me know... Thanks! 2018-05-08 12:29 GMT-04:00 Edroaldo Lummertz da Rocha <edroaldo@gmail.com>:

…

I think this should be fine. I have not seen GRN score that low but your analysis is correct. So, this something related to your dataset. I am testing in other datasetsas well. Will let you know what I get. Thanks! 2018-05-02 6:06 GMT-04:00 JianmeiZhong ***@***.***>: > mmmm....my apologies for missing your reply and my delayed respond.:) > I tried to put all genes with non-zero variance into calculation of GRN > as you suggested and more genes were presented in GRNscores plot.increasing > number of input genes generated more nodes and more GRN genes shown but > with pretty low GRN scores mentioned before.more closer endpoint to source > subpopulation was tested to have higher GRN scores,for example 0.1 for SNCA > from SP_10 to SP_3. > > Here are commands generating the low-scores plot and several previous > steps. > ### selecting genes to use as regulated along developmental trajectories > pca <- prcomp(t(ndata),scale=TRUE,center=TRUE) > loadings <- pca$rotation > num_pc <- 5 > quantile <- 0.975 > genes2use <- unique(as.vector(unlist(apply(loadings[,1:num_pc],2,function(x){names(x[which(abs(x) > >= quantile(x,quantile))])})))) > ggrn <- buildGRN('Hs',ndata,genes2use,2,'results/GRN.R') > cellrouter <- findsubpopulations(cellrouter, > 90,'jaccard','results/kNN_network.gml') > cellrouter <- createKNN(cellrouter,90,'jacca > rd','results/paths/kNN_network_trajectory.gml') > tfs <- find_tfs(species = 'Hs') > t <- c('SP_10.SP_8') > x <- grnscores(cellrouter,tfs,t,direction='both',dir.targets='up' > ,columns=1,width=8,height=5,flip=T,filename=paste('results/',t,sep='')) > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <#8 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AJqUR1e5ZfBAkM5Lhgi6O2ljZFMG53E5ks5tuYUWgaJpZM4Tdb6B> > . > -- Edroaldo

-- Edroaldo

JianmeiZhong · 2018-05-09T13:53:19Z

Ok,I'll try it and let you know how it goes.

JianmeiZhong · 2018-05-17T08:25:53Z

I tried increasing k in 3 samples but there are no better results than before.they are still pretty low when k=200 .

edroaldo · 2018-05-24T19:38:25Z

So, I think this could be more specific to your dataset then, which should be fine... I tested in other datasets I usually get higher values... Thanks! 2018-05-17 4:25 GMT-04:00 JianmeiZhong <notifications@github.com>:

…

I tried increasing k in 3 samples but there are no better results than before.they are still pretty low when k=200 . — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#8 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AJqUR1fw-JPjqu9DCq3oWt07_0pU9ksiks5tzTQRgaJpZM4Tdb6B> .

-- Edroaldo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pretty low GRN scores obtained #8

pretty low GRN scores obtained #8

JianmeiZhong commented Apr 20, 2018

JianmeiZhong commented Apr 23, 2018

edroaldo commented Apr 23, 2018 via email

edroaldo commented Apr 23, 2018 via email

JianmeiZhong commented Apr 23, 2018

JianmeiZhong commented Apr 23, 2018

JianmeiZhong commented Apr 23, 2018

edroaldo commented Apr 24, 2018 via email

edroaldo commented Apr 24, 2018 via email

JianmeiZhong commented May 2, 2018

edroaldo commented May 8, 2018 via email

edroaldo commented May 9, 2018 via email

JianmeiZhong commented May 9, 2018

JianmeiZhong commented May 17, 2018

edroaldo commented May 24, 2018 via email

pretty low GRN scores obtained #8

pretty low GRN scores obtained #8

Comments

JianmeiZhong commented Apr 20, 2018

JianmeiZhong commented Apr 23, 2018

edroaldo commented Apr 23, 2018 via email

edroaldo commented Apr 23, 2018 via email

JianmeiZhong commented Apr 23, 2018

JianmeiZhong commented Apr 23, 2018

JianmeiZhong commented Apr 23, 2018

edroaldo commented Apr 24, 2018 via email

edroaldo commented Apr 24, 2018 via email

JianmeiZhong commented May 2, 2018

edroaldo commented May 8, 2018 via email

edroaldo commented May 9, 2018 via email

JianmeiZhong commented May 9, 2018

JianmeiZhong commented May 17, 2018

edroaldo commented May 24, 2018 via email