Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

designed a matrix for identifying genes that are differential expression over all groups #28

Open
weiiioyo opened this issue Mar 29, 2018 · 7 comments

Comments

@weiiioyo
Copy link

weiiioyo commented Mar 29, 2018

Hi,
I have read a paper called "Transcriptional diversity during lineage commitment of human blood progenitors" recently. In the paper, author used mmdiff to find cell-specificed genes and transcripts. And I want to reproduce the results of the paper.
Now, I have three groups : CMP, GMP and MEP, and have 2 reps in each. And the decription of the models that paper used was "The simplest model assumes that the mean expression level is the same across cell types. The most complex model assumes that the mean expression level is different for each cell type. The remaining three models assume that two of the three cell types have the same mean expression level." I have creatd three matrics according your answer in "Setting up matrix and identifying specific gene clusters". But I have no idea about how to design "the most complex model". Following are the matrics that I have used:
mat1.txt:
# M; no. of rows = no. of observations
1 0 0
1 0 0
0 1 0
0 1 0
0 0 1
0 0 1
# C; no. of rows = no. of observations and no. of columns = 2 (one for each model)
0 0
0 0
0 1
0 1
0 1
0 1
# P0(collapsed); no. of rows = no. of classes for model 0
1
# P1(collapsed); no. of rows = no. of classes for model 1
.5
-.5

mat2.txt:
# M; no. of rows = no. of observations
1 0 0
1 0 0
0 1 0
0 1 0
0 0 1
0 0 1
# C; no. of rows = no. of observations and no. of columns = 2 (one for each model)
0 1
0 1
0 0
0 0
0 1
0 1
# P0(collapsed); no. of rows = no. of classes for model 0
1
# P1(collapsed); no. of rows = no. of classes for model 1
.5
-.5

mat3.txt:
# M; no. of rows = no. of observations
1 0 0
1 0 0
0 1 0
0 1 0
0 0 1
0 0 1
# C; no. of rows = no. of observations and no. of columns = 2 (one for each model)
0 1
0 1
0 1
0 1
0 0
0 0
# P0(collapsed); no. of rows = no. of classes for model 0
1

# P1(collapsed); no. of rows = no. of classes for model 1
.5
-.5

I don't konw if I explained it clearly. If there was anything that I didn't say correctly, please let me know.
Thank you.

@weiiioyo weiiioyo changed the title esigned a matrix for identifying genes that are differential expression over all groups designed a matrix for identifying genes that are differential expression over all groups Mar 30, 2018
@eturro
Copy link
Owner

eturro commented Apr 8, 2018 via email

@weiiioyo
Copy link
Author

weiiioyo commented Apr 9, 2018

@eturro Thanks a lot!!

@weiiioyo
Copy link
Author

weiiioyo commented Apr 10, 2018

Hi
sorry for bothering you again. If I have 8 cell types (2 reps in each), and I want to select cell-specific genes and transcripts based on 9-model polytomous classification. The 9 models include the null model, under which expression does not differ between cell types, and 8 alternative models, each representing expression diferences in one cell type against the pool of remaining cell types. My matrices are as follows:
mat1:

# M; no. of rows = no. of observations
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
# C; no. of rows = no. of observations and no. of columns = 2 (one for each model)
0 1
0 1
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
# P0(collapsed); no. of rows = no. of classes for model 0
1
# P1(collapsed); no. of rows = no. of classes for model 1
.5
-.5

mat2:

# M; no. of rows = no. of observations
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
# C; no. of rows = no. of observations and no. of columns = 2 (one for each model)
0 0
0 0
0 1
0 1
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
# P0(collapsed); no. of rows = no. of classes for model 0
1
# P1(collapsed); no. of rows = no. of classes for model 1
.5
-.5

mat3:

# M; no. of rows = no. of observations
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
# C; no. of rows = no. of observations and no. of columns = 2 (one for each model)
0 0
0 0
0 0
0 0
0 1
0 1
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
# P0(collapsed); no. of rows = no. of classes for model 0
1
# P1(collapsed); no. of rows = no. of classes for model 1
.5
-.5

mat4:

# M; no. of rows = no. of observations
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
# C; no. of rows = no. of observations and no. of columns = 2 (one for each model)
0 0
0 0
0 0
0 0
0 0
0 0
0 1
0 1
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
# P0(collapsed); no. of rows = no. of classes for model 0
1
# P1(collapsed); no. of rows = no. of classes for model 1
.5
-.5

mat5:

# M; no. of rows = no. of observations
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
# C; no. of rows = no. of observations and no. of columns = 2 (one for each model)
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 1
0 1
0 0
0 0
0 0
0 0
0 0
0 0
# P0(collapsed); no. of rows = no. of classes for model 0
1
# P1(collapsed); no. of rows = no. of classes for model 1
.5
-.5

mat6:

# M; no. of rows = no. of observations
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
# C; no. of rows = no. of observations and no. of columns = 2 (one for each model)
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 1
0 1
0 0
0 0
0 0
0 0
# P0(collapsed); no. of rows = no. of classes for model 0
1
# P1(collapsed); no. of rows = no. of classes for model 1
.5
-.5

mat7:

# M; no. of rows = no. of observations
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
# C; no. of rows = no. of observations and no. of columns = 2 (one for each model)
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 1
0 1
0 0
0 0
# P0(collapsed); no. of rows = no. of classes for model 0
1
# P1(collapsed); no. of rows = no. of classes for model 1
.5
-.5

mat8:

# M; no. of rows = no. of observations
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
# C; no. of rows = no. of observations and no. of columns = 2 (one for each model)
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 1
0 1
# P0(collapsed); no. of rows = no. of classes for model 0
1
# P1(collapsed); no. of rows = no. of classes for model 1
.5
-.5

Are these matrices correct?
Thank you.

@eturro
Copy link
Owner

eturro commented Apr 10, 2018 via email

@weiiioyo
Copy link
Author

weiiioyo commented Apr 17, 2018 via email

@weiiioyo
Copy link
Author

weiiioyo commented May 8, 2018

@eturro Hi,
Thanks for your help, I have run mmdiff successfully. But the result looks a little strange, so I have some questions to ask you.

  1. Actually I don't know what M, C, P0 and P1 represent ,respectively. I used a matrix two different matrix before, and their result were very different.
    mat1
# M; no. of rows = no. of observations
1 0 0
1 0 0
0 1 0
0 1 0
0 0 1
0 0 1
# C; no. of rows = no. of observations and no. of columns = 2 (one for each model)
0 0
0 0
0 1
0 1
0 1
0 1
# P0(collapsed); no. of rows = no. of classes for model 0
1
# P1(collapsed); no. of rows = no. of classes for model 1
.5
-.5

mat2:

# M; no. of rows = no. of observations
0
0
0 
0
0
0
# C; no. of rows = no. of observations and no. of columns = 2 (one for each model)
0 0
0 0
0 1
0 1
0 1
0 1
# P0(collapsed); no. of rows = no. of classes for model 0
1
# P1(collapsed); no. of rows = no. of classes for model 1
.5
-.5

what's the different between mat1 and mat2? When should I use mat1 or mat2?

  1. When I used two matrice files(which I think they could generate the same resluts) to compare two groups, respectively, and I thinks features with a posterior probability of the second model greater than 0.5 were considered as differentially expressed.

mat1:

# M; no. of rows = no. of observations
0
0
0
0
# C; no. of rows = no. of observations and no. of columns = 2 (one for each model)
0 1
0 1
0 0
0 0
# P0(collapsed); no. of rows = no. of classes for model 0
1
# P1(collapsed); no. of rows = no. of classes for model 1
.5
-.5

mat2:

# M; no. of rows = no. of observations
0
0
0
0
# C; no. of rows = no. of observations and no. of columns = 2 (one
# M; no. of rows = no. of observations
0
0
0
0
# C; no. of rows = no. of observations and no. of columns = 2 (one for each model)
0 0
0 0
0 1
0 1
# P0(collapsed); no. of rows = no. of classes for model 0
1
# P1(collapsed); no. of rows = no. of classes for model 1
.5
-.5

However, their resullts were different. For example:
the result of mat1:
feature_id bayes_factor posterior_probability alpha0 alpha1 eta1_0 mu_HSC_1.gene mu_HSC_2.gene mu_MPP_1.gene mu_MPP_2.gene sd_HSC_1.gene sd_HSC_2.gene sd_MPP_1.gene sd_MPP_2.gene
ENSMUSG00000002985.16 8.35589 0.481444 4.36426 4.40582 -1.6944 5.6799 5.21409 3.42422 3.42422 0.0104655 0.0133535 0.0324889 0.0324889
ENSMUSG00000034282.3 8.55887 0.487439 0.0520896 0.0505148 1.37941 -0.721467 -0.702619 0.80822 0.80822 0.0702831 0.0771662 0.0332276 0.0332276

the result of mat2:
feature_id bayes_factor posterior_probability alpha0 alpha1 eta1_0 mu_HSC_1.gene mu_HSC_2.gene mu_MPP_1.gene mu_MPP_2.gene sd_HSC_1.gene sd_HSC_2.gene sd_MPP_1.gene sd_MPP_2.gene
ENSMUSG00000029596.13 9.0208 0.500577 2.53767 2.54621 1.25705 3.21732 3.28218 1.85884 1.85884 0.0234526 0.025066 0.0503775 0.0503775
ENSMUSG00000034282.3 9.09749 0.502694 0.0778652 0.0454626 -1.3604 -0.721467 -0.702619 0.80822 0.80822 0.0702831 0.0771662 0.0332276 0.0332276

In the result of mat1, "ENSMUSG00000002985.16" and "ENSMUSG00000034282.3" are considered as differentially expressed. But in the result of mat2, they are not.

Maybe I ignore some important detail, so it makes me confused.
Sorry for disturbing you again.
Thanks!

@eturro
Copy link
Owner

eturro commented May 13, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants