"NA"s in dmltest output #20

milaeri · 2022-01-14T21:24:31Z

Hi,
I have a question about result output:

I recently ran a model with three treatments using the DML.fitmultiFactor function, and then ran a DMLtest.multifactor to test for the effect of each treatment. Long story short, I'm noticing that the output tables from each test contains NAs under stat. p-val, and fdr for particular sets of chrs (not all of them). I was wondering if this just means that the model was not able to detect any difference for these particular loci/positions? I would just like some confirmation/explanation (in case I'm doing something wrong) so I can move on to the next stage in the analysis, your help much appreciated thanks!

chr pos stat pvals fdrs 1 NW_019289415.1 69 NA NA NA 2 NW_019289415.1 134 NA NA NA 3 NW_019289415.1 154 NA NA NA 4 NW_019289415.1 178 NA NA NA 5 NW_019289415.1 186 NA NA NA 6 NW_019289415.1 244 NA NA NA

The text was updated successfully, but these errors were encountered:

haowulab · 2022-01-14T22:06:15Z

I never encountered this. Are there only a few rows with NA or all results are NA? It might be that some sites have missing data, or the variance is 0.

milaeri · 2022-01-14T22:28:07Z

Hi Hao, Thanks for the quick response. About 60% of the output is NAs (out of ~10e106 rows), so not all of it but a good amount. Could it be sites that were not found under all treatments that could’ve caused it call it NA? or does the DSS model already account for that prior to testing for main effects? Just thinking out loud since I’m curious about how that works. Thanks. Erika From: Hao Wu ***@***.***> Sent: Friday, January 14, 2022 2:06 PM To: haowulab/DSS ***@***.***> Cc: Erika Bueno ***@***.***>; Author ***@***.***> Subject: Re: [haowulab/DSS] "NA"s in dmltest output (Issue #20) I never encountered this. Are there only a few rows with NA or all results are NA? It might be that some sites have missing data, or the variance is 0. — Reply to this email directly, view it on GitHub<#20 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AXJ4GGCK4H5MCLUFCZQ3DE3UWCM6FANCNFSM5L7ZJMBA>. Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you authored the thread.Message ID: ***@***.******@***.***>>

haowulab · 2022-01-14T23:33:08Z

I’m not sure. Can you check if there are missing data? It's possible that some sites has missing data for some experimental factors, so that the regression can't run. DSS doesn't filter data. It'll keep everything and report a result whenever possible.

milaeri · 2022-01-17T19:49:31Z

Hi Hao,

I don't see any missing cases in my data, but I do notice that not all sites are equally represented for each sample (see below). Do you think that could be the issue?
Just to give some background, I'm working with RRBS data from an non-model insect (genome not highly methylated as in mammals) thus there's a lot of zeros in my data (see below). I'm also working with scaffolds instead of chr since the genome is still under works for this species.

List of 24 samples
$ :'data.frame': 3504291 obs. of 4 variables:
..$ chr: chr [1:3504291] "NW_019289415.1" "NW_019289415.1" "NW_019289415.1" "NW_019289415.1" ...
..$ pos: int [1:3504291] 69 134 154 178 186 244 265 380 729 734 ...
..$ X : int [1:3504291] 0 0 0 0 0 0 0 0 0 0 ...
..$ N : int [1:3504291] 1 1 1 1 1 1 1 1 1 1 ...
$ :'data.frame': 3174646 obs. of 4 variables:
..$ chr: chr [1:3174646] "NW_019289415.1" "NW_019289415.1" "NW_019289415.1" "NW_019289415.1" ...
..$ pos: int [1:3174646] 812 828 883 884 892 893 925 926 934 935 ...
..$ X : int [1:3174646] 0 0 0 0 0 0 0 0 0 0 ...
..$ N : int [1:3174646] 4 4 2 12 2 12 2 12 2 12 ...
$ :'data.frame': 2854266 obs. of 4 variables:
..$ chr: chr [1:2854266] "NW_019289415.1" "NW_019289415.1" "NW_019289415.1" "NW_019289415.1" ...
..$ pos: int [1:2854266] 735 811 812 827 828 883 884 892 893 925 ...
..$ X : int [1:2854266] 0 0 0 0 0 0 0 0 0 0 ...
..$ N : int [1:2854266] 1 1 11 1 11 2 25 2 25 1 ...`

haowulab · 2022-01-18T13:46:46Z

"not all sites are equally represented for each sample" means there are missing data. DSS combines all data for all sites. If some samples don't have coverage for some sites, they are deemed as missing. However, based on your data, there shouldn't be 60% sites with missing data. Can you send me a small portion of data, such as chr21, so that I can try it?

milaeri · 2022-01-18T21:48:00Z

Hi Hao,
Thanks for clarifying. Sure, I can send you a subset of the data. What is your email address?

haowulab · 2022-01-18T21:52:34Z

It's better to put the data and your analysis script on a cloud drive such as dropbox, so that I can download. I don't think it'll fit in an email.

milaeri · 2022-01-18T22:18:07Z

Hi Hao, Yes, but I think I still need your email to send you a drop box link. Erika From: Hao Wu ***@***.***> Sent: Tuesday, January 18, 2022 1:53 PM To: haowulab/DSS ***@***.***> Cc: Erika Bueno ***@***.***>; Author ***@***.***> Subject: Re: [haowulab/DSS] "NA"s in dmltest output (Issue #20) It's better to put the data and your analysis script on a cloud drive such as dropbox, so that I can download. I don't think it'll fit in an email. — Reply to this email directly, view it on GitHub<#20 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AXJ4GGCI43HZKKTGINVUX6TUWXOKZANCNFSM5L7ZJMBA>. Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you authored the thread.Message ID: ***@***.******@***.***>>

milaeri · 2022-01-18T22:27:40Z

Perhaps google drive may work- let me know if it doesn’t. Thanks! https://drive.google.com/drive/folders/1s8OLMePJ_JsvHnPkZP34O8kfbI6VOP6S?usp=sharing Thanks, Erika From: Hao Wu ***@***.***> Sent: Tuesday, January 18, 2022 1:53 PM To: haowulab/DSS ***@***.***> Cc: Erika Bueno ***@***.***>; Author ***@***.***> Subject: Re: [haowulab/DSS] "NA"s in dmltest output (Issue #20) It's better to put the data and your analysis script on a cloud drive such as dropbox, so that I can download. I don't think it'll fit in an email. — Reply to this email directly, view it on GitHub<#20 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AXJ4GGCI43HZKKTGINVUX6TUWXOKZANCNFSM5L7ZJMBA>. Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you authored the thread.Message ID: ***@***.******@***.***>>

haowulab · 2022-01-20T10:57:05Z

I looked at your data. It is indeed caused by missing data. For the data you sent to me, there are 24 samples, each ha 3000-4000 CG sites. makeBSseqData combines all data and return an object with 10161 methylation loci (which is a union of all CG sites in the inputs). A lot of them will only have data from a few samples, thus a regression cannot be performed. The function returned non-NA for 4197 sites, which are the ones that a model can be fit. This is a reasonably result. You can just ignore and filter out the sites with NA values.

milaeri · 2022-01-20T18:07:11Z

Hi Hao,

Thank you for the help! This makes a lot of sense now. I'm glad I can move on with the analysis.

Best,
E

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"NA"s in dmltest output #20

"NA"s in dmltest output #20

milaeri commented Jan 14, 2022

haowulab commented Jan 14, 2022

milaeri commented Jan 14, 2022 via email

haowulab commented Jan 14, 2022

milaeri commented Jan 17, 2022 •

edited

Loading

haowulab commented Jan 18, 2022

milaeri commented Jan 18, 2022

haowulab commented Jan 18, 2022

milaeri commented Jan 18, 2022 via email

milaeri commented Jan 18, 2022 via email

haowulab commented Jan 20, 2022

milaeri commented Jan 20, 2022

"NA"s in dmltest output #20

"NA"s in dmltest output #20

Comments

milaeri commented Jan 14, 2022

haowulab commented Jan 14, 2022

milaeri commented Jan 14, 2022 via email

haowulab commented Jan 14, 2022

milaeri commented Jan 17, 2022 • edited Loading

haowulab commented Jan 18, 2022

milaeri commented Jan 18, 2022

haowulab commented Jan 18, 2022

milaeri commented Jan 18, 2022 via email

milaeri commented Jan 18, 2022 via email

haowulab commented Jan 20, 2022

milaeri commented Jan 20, 2022

milaeri commented Jan 17, 2022 •

edited

Loading