-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
designed a matrix for identifying genes that are differential expression over all groups #28
Comments
Hi,
Assuming you provide the MMSEQ files for the two CMP, the two GMP and the two MEP samples in sequence, you'd use the following matrices file to compare the baseline model (a single mean) to the most complex model (three means):
```
# M; no. of rows = no. of observations
0
0
0
0
0
0
# C; no. of rows = no. of observations and no. of columns = 2 (one for each model)
0 0
0 0
0 1
0 1
0 1
0 2
0 2
# P0(collapsed); no. of rows = no. of classes for model 0
1
# P1(collapsed); no. of rows = no. of classes for model 1
1 0 0
0 1 0
0 0 1
```
… On 29 Mar 2018, at 11:04, weiiioyo ***@***.***> wrote:
Hi,
I have read a paper called "Transcriptional diversity during lineage commitment of human blood progenitors" recently. In the paper, author used mmdiff to find cell-specificed genes and transcripts. And I want to reproduce the results of the paper.
Now, I have three groups : CMP, GMP and MEP, and have 2 reps in each. And the decription of the models that paper used was "The simplest model assumes that the mean expression level is the same across cell types. The most complex model assumes that the mean expression level is different for each cell type. The remaining three models assume that two of the three cell types have the same mean expression level." I have creatd three matrics according your answer in "Setting up matrix and identifying specific gene clusters". But I have no idea about how to design "the most complex model". Following are the matrics that I have used:
mat1.txt:
# M; no. of rows = no. of observations
1 0 0
1 0 0
0 1 0
0 1 0
0 0 1
0 0 1
# C; no. of rows = no. of observations and no. of columns = 2 (one for each model)
0 0
0 0
0 1
0 1
0 1
0 1
# P0(collapsed); no. of rows = no. of classes for model 0
1
# P1(collapsed); no. of rows = no. of classes for model 1
-.5
.5
mat2.txt:
# M; no. of rows = no. of observations
1 0 0
1 0 0
0 1 0
0 1 0
0 0 1
0 0 1
# C; no. of rows = no. of observations and no. of columns = 2 (one for each model)
0 1
0 1
0 0
0 0
0 1
0 1
# P0(collapsed); no. of rows = no. of classes for model 0
1
# P1(collapsed); no. of rows = no. of classes for model 1
-.5
.5
mat3.txt:
# M; no. of rows = no. of observations
1 0 0
1 0 0
0 1 0
0 1 0
0 0 1
0 0 1
# C; no. of rows = no. of observations and no. of columns = 2 (one for each model)
0 1
0 1
0 1
0 1
0 0
0 0
# P0(collapsed); no. of rows = no. of classes for model 0
1
# P1(collapsed); no. of rows = no. of classes for model 1
-.5
.5
I don't konw if I explained it clearly. If there was anything that I didn't say correctly, please let me know.
Thank you.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@eturro Thanks a lot!! |
Hi
mat2:
mat3:
mat4:
mat5:
mat6:
mat7:
mat8:
Are these matrices correct? |
Yes, that looks fine.
… On 10 Apr 2018, at 12:42, weiiioyo ***@***.***> wrote:
Hi
sorry for bothering you again. If I have 8 cell types (2 reps in each), and I want to select cell-specific genes and transcripts based on 9-model polytomous classification. The 9 models include the null model, under which expression does not differ between cell types, and 8 alternative models, each representing expression diferences in one cell type against the pool of remaining cell types. My matrices are as follows:
mat1:
# M; no. of rows = no. of observations
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
# C; no. of rows = no. of observations and no. of columns = 2 (one for each model)
0 1
0 1
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
# P0(collapsed); no. of rows = no. of classes for model 0
1
# P1(collapsed); no. of rows = no. of classes for model 1
.5
-.5
mat2:
M; no. of rows = no. of observations
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
C; no. of rows = no. of observations and no. of columns = 2 (one for each model)
0 0
0 0
0 1
0 1
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
P0(collapsed); no. of rows = no. of classes for model 0
1
P1(collapsed); no. of rows = no. of classes for model 1
.5
-.5
mat3:
# M; no. of rows = no. of observations
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
# C; no. of rows = no. of observations and no. of columns = 2 (one for each model)
0 0
0 0
0 0
0 0
0 1
0 1
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
# P0(collapsed); no. of rows = no. of classes for model 0
1
# P1(collapsed); no. of rows = no. of classes for model 1
.5
-.5
mat4:
M; no. of rows = no. of observations
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
C; no. of rows = no. of observations and no. of columns = 2 (one for each model)
0 0
0 0
0 0
0 0
0 0
0 0
0 1
0 1
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
P0(collapsed); no. of rows = no. of classes for model 0
1
P1(collapsed); no. of rows = no. of classes for model 1
.5
-.5
mat5:
# M; no. of rows = no. of observations
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
# C; no. of rows = no. of observations and no. of columns = 2 (one for each model)
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 1
0 1
0 0
0 0
0 0
0 0
0 0
0 0
# P0(collapsed); no. of rows = no. of classes for model 0
1
# P1(collapsed); no. of rows = no. of classes for model 1
.5
-.5
mat6:
M; no. of rows = no. of observations
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
C; no. of rows = no. of observations and no. of columns = 2 (one for each model)
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 1
0 1
0 0
0 0
0 0
0 0
P0(collapsed); no. of rows = no. of classes for model 0
1
P1(collapsed); no. of rows = no. of classes for model 1
.5
-.5
mat7:
# M; no. of rows = no. of observations
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
# C; no. of rows = no. of observations and no. of columns = 2 (one for each model)
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 1
0 1
0 0
0 0
# P0(collapsed); no. of rows = no. of classes for model 0
1
# P1(collapsed); no. of rows = no. of classes for model 1
.5
-.5
mat7:
M; no. of rows = no. of observations
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
C; no. of rows = no. of observations and no. of columns = 2 (one for each model)
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 1
0 1
P0(collapsed); no. of rows = no. of classes for model 0
1
P1(collapsed); no. of rows = no. of classes for model 1
.5
-.5
Are these matrices correct?
Thank you.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Thanks.
2018-04-10 21:17 GMT+08:00 Ernest Turro <notifications@github.com>:
… Yes, that looks fine.
> On 10 Apr 2018, at 12:42, weiiioyo ***@***.***> wrote:
>
> Hi
> sorry for bothering you again. If I have 8 cell types (2 reps in each),
and I want to select cell-specific genes and transcripts based on 9-model
polytomous classification. The 9 models include the null model, under which
expression does not differ between cell types, and 8 alternative models,
each representing expression diferences in one cell type against the pool
of remaining cell types. My matrices are as follows:
> mat1:
>
> # M; no. of rows = no. of observations
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> # C; no. of rows = no. of observations and no. of columns = 2 (one for
each model)
> 0 1
> 0 1
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> # P0(collapsed); no. of rows = no. of classes for model 0
> 1
> # P1(collapsed); no. of rows = no. of classes for model 1
> .5
> -.5
>
> mat2:
>
> M; no. of rows = no. of observations
>
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
>
> C; no. of rows = no. of observations and no. of columns = 2 (one for
each model)
>
> 0 0
> 0 0
> 0 1
> 0 1
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
>
> P0(collapsed); no. of rows = no. of classes for model 0
>
> 1
>
> P1(collapsed); no. of rows = no. of classes for model 1
>
> .5
> -.5
>
> mat3:
> # M; no. of rows = no. of observations
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> # C; no. of rows = no. of observations and no. of columns = 2 (one for
each model)
> 0 0
> 0 0
> 0 0
> 0 0
> 0 1
> 0 1
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> # P0(collapsed); no. of rows = no. of classes for model 0
> 1
> # P1(collapsed); no. of rows = no. of classes for model 1
> .5
> -.5
>
> mat4:
>
> M; no. of rows = no. of observations
>
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
>
> C; no. of rows = no. of observations and no. of columns = 2 (one for
each model)
>
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 1
> 0 1
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
>
> P0(collapsed); no. of rows = no. of classes for model 0
>
> 1
>
> P1(collapsed); no. of rows = no. of classes for model 1
>
> .5
> -.5
>
> mat5:
> # M; no. of rows = no. of observations
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> # C; no. of rows = no. of observations and no. of columns = 2 (one for
each model)
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 1
> 0 1
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> # P0(collapsed); no. of rows = no. of classes for model 0
> 1
> # P1(collapsed); no. of rows = no. of classes for model 1
> .5
> -.5
>
> mat6:
>
> M; no. of rows = no. of observations
>
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
>
> C; no. of rows = no. of observations and no. of columns = 2 (one for
each model)
>
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 1
> 0 1
> 0 0
> 0 0
> 0 0
> 0 0
>
> P0(collapsed); no. of rows = no. of classes for model 0
>
> 1
>
> P1(collapsed); no. of rows = no. of classes for model 1
>
> .5
> -.5
>
> mat7:
> # M; no. of rows = no. of observations
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> # C; no. of rows = no. of observations and no. of columns = 2 (one for
each model)
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 1
> 0 1
> 0 0
> 0 0
> # P0(collapsed); no. of rows = no. of classes for model 0
> 1
> # P1(collapsed); no. of rows = no. of classes for model 1
> .5
> -.5
>
> mat7:
>
> M; no. of rows = no. of observations
>
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
>
> C; no. of rows = no. of observations and no. of columns = 2 (one for
each model)
>
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 1
> 0 1
>
> P0(collapsed); no. of rows = no. of classes for model 0
>
> 1
>
> P1(collapsed); no. of rows = no. of classes for model 1
>
> .5
> -.5
>
> Are these matrices correct?
> Thank you.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub, or mute the thread.
>
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#28 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AVwGpFrnIsjDmxI0ZwmBwr68EnY3rslFks5tnLDwgaJpZM4TAEzE>
.
|
@eturro Hi,
mat2:
what's the different between mat1 and mat2? When should I use mat1 or mat2?
mat1:
mat2:
However, their resullts were different. For example: the result of mat2: In the result of mat1, "ENSMUSG00000002985.16" and "ENSMUSG00000034282.3" are considered as differentially expressed. But in the result of mat2, they are not. Maybe I ignore some important detail, so it makes me confused. |
Hi,
1.
I propose you read the paper to get the details on what the different matrices represent. The M matrix is a nuisance covariate matrix present under both models. In your "mat1", you are grouping the samples into three groups with the M matrix. However, the first group corresponds exactly to one of the two conditions so there may be some confounding going on. Although I don't know the details of your experiment, my impression is you should use "mat2" here. That is the same as running mmdiff with "-de 2 4" by the way.
2.
Here, "mat1" and "mat2" encode the exact same model comparison. The signs of the eta1_0 will be inverted and there will be slight differences because of stochasticity in the MCMC, but that's all. This is the same as running mmdiff with "-de 2 2" by the way.
Try doing a scatterplot of the posterior probabilities and of the eta1_0 (for posterior probability greater than some threshold such as 0.2); you should find the scatterplots are tightly around the diagonal/anti-diagonal.
bw
Ernest
… On 8 May 2018, at 16:36, weiiioyo ***@***.***> wrote:
@eturro Hi,
Thanks for your help, I have run mmdiff successfully. But the result looks a little strange, so I have some questions to ask you.
• Actually I don't know what M, C, P0 and P1 represent ,respectively. I used a matrix two different matrix before, and their result were very different.
mat1
# M; no. of rows = no. of observations
1 0 0
1 0 0
0 1 0
0 1 0
0 0 1
0 0 1
# C; no. of rows = no. of observations and no. of columns = 2 (one for each model)
0 0
0 0
0 1
0 1
0 1
0 1
# P0(collapsed); no. of rows = no. of classes for model 0
1
# P1(collapsed); no. of rows = no. of classes for model 1
.5
-.5
mat2:
# M; no. of rows = no. of observations
0
0
0
0
0
0
# C; no. of rows = no. of observations and no. of columns = 2 (one for each model)
0 0
0 0
0 1
0 1
0 1
0 1
# P0(collapsed); no. of rows = no. of classes for model 0
1
# P1(collapsed); no. of rows = no. of classes for model 1
.5
-.5
what's the different between mat1 and mat2? When should I use mat1 or mat2?
• When I used two matrice files(which I think they could generate the same resluts) to compare two groups, respectively, and I thinks features with a posterior probability of the second model greater than 0.5 were considered as differentially expressed.
mat1:
# M; no. of rows = no. of observations
0
0
0
0
# C; no. of rows = no. of observations and no. of columns = 2 (one for each model)
0 1
0 1
0 0
0 0
# P0(collapsed); no. of rows = no. of classes for model 0
1
# P1(collapsed); no. of rows = no. of classes for model 1
.5
-.5
mat2:
# M; no. of rows = no. of observations
0
0
0
0
# C; no. of rows = no. of observations and no. of columns = 2 (one
# M; no. of rows = no. of observations
0
0
0
0
# C; no. of rows = no. of observations and no. of columns = 2 (one for each model)
0 0
0 0
0 1
0 1
# P0(collapsed); no. of rows = no. of classes for model 0
1
# P1(collapsed); no. of rows = no. of classes for model 1
.5
-.5
However, their resullts were different. For example:
the result of mat1:
feature_id bayes_factor posterior_probability alpha0 alpha1 eta1_0 mu_HSC_1.gene mu_HSC_2.gene mu_MPP_1.gene mu_MPP_2.gene sd_HSC_1.gene sd_HSC_2.gene sd_MPP_1.gene sd_MPP_2.gene
ENSMUSG00000002985.16 8.35589 0.481444 4.36426 4.40582 -1.6944 5.6799 5.21409 3.42422 3.42422 0.0104655 0.0133535 0.0324889 0.0324889
ENSMUSG00000034282.3 8.55887 0.487439 0.0520896 0.0505148 1.37941 -0.721467 -0.702619 0.80822 0.80822 0.0702831 0.0771662 0.0332276 0.0332276
the result of mat2:
feature_id bayes_factor posterior_probability alpha0 alpha1 eta1_0 mu_HSC_1.gene mu_HSC_2.gene mu_MPP_1.gene mu_MPP_2.gene sd_HSC_1.gene sd_HSC_2.gene sd_MPP_1.gene sd_MPP_2.gene
ENSMUSG00000029596.13 9.0208 0.500577 2.53767 2.54621 1.25705 3.21732 3.28218 1.85884 1.85884 0.0234526 0.025066 0.0503775 0.0503775
ENSMUSG00000034282.3 9.09749 0.502694 0.0778652 0.0454626 -1.3604 -0.721467 -0.702619 0.80822 0.80822 0.0702831 0.0771662 0.0332276 0.0332276
In the result of mat1, "ENSMUSG00000002985.16" and "ENSMUSG00000034282.3" are considered as differentially expressed. But in the result of mat2, they are not.
Maybe I ignore some important detail, so it makes me confused.
Sorry for disturbing you again.
Thanks!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Hi,
I have read a paper called "Transcriptional diversity during lineage commitment of human blood progenitors" recently. In the paper, author used mmdiff to find cell-specificed genes and transcripts. And I want to reproduce the results of the paper.
Now, I have three groups : CMP, GMP and MEP, and have 2 reps in each. And the decription of the models that paper used was "The simplest model assumes that the mean expression level is the same across cell types. The most complex model assumes that the mean expression level is different for each cell type. The remaining three models assume that two of the three cell types have the same mean expression level." I have creatd three matrics according your answer in "Setting up matrix and identifying specific gene clusters". But I have no idea about how to design "the most complex model". Following are the matrics that I have used:
mat1.txt:
# M; no. of rows = no. of observations
1 0 0
1 0 0
0 1 0
0 1 0
0 0 1
0 0 1
# C; no. of rows = no. of observations and no. of columns = 2 (one for each model)
0 0
0 0
0 1
0 1
0 1
0 1
# P0(collapsed); no. of rows = no. of classes for model 0
1
# P1(collapsed); no. of rows = no. of classes for model 1
.5
-.5
mat2.txt:
# M; no. of rows = no. of observations
1 0 0
1 0 0
0 1 0
0 1 0
0 0 1
0 0 1
# C; no. of rows = no. of observations and no. of columns = 2 (one for each model)
0 1
0 1
0 0
0 0
0 1
0 1
# P0(collapsed); no. of rows = no. of classes for model 0
1
# P1(collapsed); no. of rows = no. of classes for model 1
.5
-.5
mat3.txt:
# M; no. of rows = no. of observations
1 0 0
1 0 0
0 1 0
0 1 0
0 0 1
0 0 1
# C; no. of rows = no. of observations and no. of columns = 2 (one for each model)
0 1
0 1
0 1
0 1
0 0
0 0
# P0(collapsed); no. of rows = no. of classes for model 0
1
# P1(collapsed); no. of rows = no. of classes for model 1
.5
-.5
I don't konw if I explained it clearly. If there was anything that I didn't say correctly, please let me know.
Thank you.
The text was updated successfully, but these errors were encountered: