Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

degPatterns results exists unsimilar pattern in a cluster #28

Open
etbuface opened this issue Aug 25, 2019 · 1 comment
Open

degPatterns results exists unsimilar pattern in a cluster #28

etbuface opened this issue Aug 25, 2019 · 1 comment

Comments

@etbuface
Copy link

Hi, i'm using degPatterns to cluster some genes across different time points.

here is part of my code :

clusters <- degPatterns(log2(salld.norm), metadata = colData, time = "age", minc = 5, reduce = T, scale = T)

And here is my metaData :

ID age
1 04d
2 16d
3 28d
4 32d
5 36d
6 40d
7 44d
8 52d
9 56d
  1. In my clusters result, I found that some groups are rising over time. Then I plot every genes' normalized counts in those groups. However, it seemed that they are not exactly what i thought. For example, some of genes are rising over time significantly in a cluster. But another of genes are not so significant change in the same cluster. Besides, some genes from another cluster seem more likely should cluster with the rising genes. I was wondering why those 'unsimilar' genes could cluster with my rising genes.

Figure 1
cluster.png
Figure 2
group4.png

  1. How should i set the groupDifference to cluster more similar genes to one clutser. ( some of clusters seems very similar in my opinion. I don't know why they are divided into multiple clusters.)

Figure 3
group.png

  1. I use minc = 5 to get more return clusters and reduce = T to remove some outliers in clusters. I also use scale = T. Because i just care about the change pattern not the exact count. But i'm also curious that if scale = T is necessary. The kendall test is based on the data rank, right? So what's the influence of scale = T ? Is my understanding of the above parameters correct? I also noticed that there may be some ridiculous outlier if not using the reduce = T. How could these genes cluster with those 'consensus/common' genes?
@lpantano
Copy link
Owner

Hi @etbuface,

thank you for the details.

This function works in the following way:

1-make pair-wise correlations between the input genes (that they should be significant genes defined by some other method, like DESeq2)
2-hierarchical clustering
3-cut the tree at a given point

The third point is the one will define the cluster you see. With Consensus Cluster option one, it may give better clusters, but it is not always the case. This option will use the ConsesusCluster package to define groups.

It is normal to find clusters that go almost identical, but you can see there is always a little different. I use the plot to then merge the groups to make more sense with your biology. If that little difference is not important, it makes sense to put all together.

It is common as well to find some genes that show a bigger difference when you plot the non-scale value, but the scale value should show the same pattern, even if the difference is not equal.

There is a couple of plots in the output of the function if you save it into a variable that may help you define the cutoff (http://lpantano.github.io/DEGreport/reference/degPatterns.html#value benchmarking). Look at http://lpantano.github.io/DEGreport/reference/degPlotCluster.html to see how to plot using different cutoffs.

At the end of the days, the last step is arbitrary, and some genes will go to a cluster even if they are not similar because when you cut the tree they will be part of a group. That is the reason I added reduce to remove those cases.

You are right about scale, it shouldn't be different, it is more a historical parameter and I probably should remove it.

I hope this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants