Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pair.ia calculating p-values? #180

Closed
3 of 5 tasks
ktiedje opened this issue Apr 26, 2018 · 14 comments
Closed
3 of 5 tasks

pair.ia calculating p-values? #180

ktiedje opened this issue Apr 26, 2018 · 14 comments
Assignees

Comments

@ktiedje
Copy link

ktiedje commented Apr 26, 2018

Please place an "x" in all the boxes that apply

  • I have the most recent version of poppr and R
  • I have found a bug
  • I want to request a new feature

When I undertake SNP pairwise LD analysis (ia.pair), the plot produced provides the Ia and/or rd values. Although these are useful, I was wondering if there is a way to also provide the p-values for these multiple SNP pairwise comparison easily without having to generate all the individual pairwise SNP input files. (Similar to what is provided for the ia command).

Any assistance would be greatly appreciated.

Cheers Kathryn

ia(PopPfcrt, sample=10000, index = "Ia")
pair.ia(PopPfcrt, index = "Ia")
ia(PopPfcrt, sample=10000, index = "rbarD")
pair.ia(PopPfcrt, index = "rbarD")

@zkamvar
Copy link
Member

zkamvar commented Apr 29, 2018

Hello,

I believe this should be possible. I'll implement it in the next version of poppr (version 2.8). In the meantime, you can calculate the index of association for each pair of loci manually by using the combn() function to generate pairs of locus names and running ia() on your data set subset with those names. Here's an example:

suppressPackageStartupMessages(library("poppr"))
data("partial_clone")
pairs <- combn(locNames(partial_clone), 2)
pairs[, 1:5]
#>      [,1]      [,2]      [,3]      [,4]      [,5]     
#> [1,] "Locus_1" "Locus_1" "Locus_1" "Locus_1" "Locus_1"
#> [2,] "Locus_2" "Locus_3" "Locus_4" "Locus_5" "Locus_6"
res <- t(apply(pairs, 2, function(i) ia(partial_clone[loc = i], sample = 9, quiet = TRUE, plot = FALSE)))
rownames(res) <- apply(pairs, 2, paste, collapse = ":")
print(res, digits = 2)
#>                       Ia p.Ia   rbarD p.rD
#> Locus_1:Locus_2  -0.0079  0.3 -0.0083  0.3
#> Locus_1:Locus_3   0.2693  0.1  0.2748  0.1
#> Locus_1:Locus_4   0.3209  0.1  0.3375  0.1
#> Locus_1:Locus_5  -0.0036  0.6 -0.0038  0.6
#> Locus_1:Locus_6   0.1339  0.2  0.1378  0.2
#> Locus_1:Locus_7   0.5044  0.1  0.5415  0.1
#> Locus_1:Locus_8   0.2976  0.1  0.3024  0.1
#> Locus_1:Locus_9   0.3061  0.1  0.3272  0.1
#> Locus_1:Locus_10  0.2947  0.1  0.3070  0.1
#> Locus_2:Locus_3   0.0965  0.1  0.0969  0.1
#> Locus_2:Locus_4   0.1219  0.1  0.1219  0.1
#> Locus_2:Locus_5   0.0522  0.1  0.0522  0.1
#> Locus_2:Locus_6   0.3716  0.1  0.3721  0.1
#> Locus_2:Locus_7   0.0233  0.3  0.0234  0.3
#> Locus_2:Locus_8  -0.0263  0.9 -0.0265  0.9
#> Locus_2:Locus_9   0.0544  0.3  0.0545  0.3
#> Locus_2:Locus_10  0.1418  0.1  0.1418  0.1
#> Locus_3:Locus_4   0.2062  0.1  0.2077  0.1
#> Locus_3:Locus_5   0.1016  0.1  0.1025  0.1
#> Locus_3:Locus_6   0.1510  0.1  0.1511  0.1
#> Locus_3:Locus_7   0.4572  0.1  0.4646  0.1
#> Locus_3:Locus_8   0.3059  0.1  0.3060  0.1
#> Locus_3:Locus_9   0.2718  0.1  0.2756  0.1
#> Locus_3:Locus_10  0.0836  0.2  0.0839  0.2
#> Locus_4:Locus_5  -0.0173  0.5 -0.0173  0.5
#> Locus_4:Locus_6   0.1858  0.1  0.1864  0.1
#> Locus_4:Locus_7   0.3802  0.1  0.3809  0.1
#> Locus_4:Locus_8   0.2261  0.1  0.2284  0.1
#> Locus_4:Locus_9   0.2488  0.1  0.2491  0.1
#> Locus_4:Locus_10  0.2003  0.1  0.2004  0.1
#> Locus_5:Locus_6   0.0354  0.2  0.0356  0.2
#> Locus_5:Locus_7   0.1985  0.1  0.1987  0.1
#> Locus_5:Locus_8   0.1840  0.1  0.1863  0.1
#> Locus_5:Locus_9   0.3124  0.1  0.3125  0.1
#> Locus_5:Locus_10 -0.0124  0.6 -0.0124  0.6
#> Locus_6:Locus_7   0.2634  0.1  0.2660  0.1
#> Locus_6:Locus_8   0.1850  0.1  0.1854  0.1
#> Locus_6:Locus_9   0.2458  0.1  0.2478  0.1
#> Locus_6:Locus_10  0.1818  0.1  0.1820  0.1
#> Locus_7:Locus_8   0.4488  0.1  0.4581  0.1
#> Locus_7:Locus_9   0.4880  0.1  0.4880  0.1
#> Locus_7:Locus_10  0.4579  0.1  0.4599  0.1
#> Locus_8:Locus_9   0.2900  0.1  0.2953  0.1
#> Locus_8:Locus_10  0.2831  0.1  0.2848  0.1
#> Locus_9:Locus_10  0.3018  0.1  0.3028  0.1

Created on 2018-04-29 by the reprex package (v0.2.0).

@zkamvar
Copy link
Member

zkamvar commented Apr 29, 2018

In addition, for the code you presented, there is no need to run ia() twice with different index inputs. it will automatically calculate both indices. The index indicates which one is plotted. If you want to visualize both, you can set the option valuereturn = TRUE:

res <- ia(PopPfcrt, sample=10000, valuereturn = TRUE)
plot(res, index = "Ia")
plot(res, index = "rbarD")

@zkamvar
Copy link
Member

zkamvar commented Apr 30, 2018

I forgot to mention: I will let you know when a version of the function is available so you can test it out. What system do you run? (MacOS, Windows, Linux)

Also, the current visualization of pairwise ia presents the values on a lower triangle of a square heatmap. I'm thinking of placing the p-values on the upper triangle. Would that presentation work for you?

Best,
Zhian

@ktiedje
Copy link
Author

ktiedje commented Apr 30, 2018 via email

@zkamvar
Copy link
Member

zkamvar commented Apr 30, 2018

It would also be good if the data in the heat map could also be exported as a file as well (data.frame) so that it can be manipulated in other plotting programs.

The output of pair.ia() provides a data frame of the data by default. You can also extract the reshaped data from the visualization with ggplot2:

library("ggplot2")
p <- last_plot()
p$data

@ktiedje
Copy link
Author

ktiedje commented May 1, 2018 via email

zkamvar added a commit that referenced this issue May 1, 2018
Tweaking still needs to be done to modify the
plot and turn off the noise
zkamvar added a commit that referenced this issue May 2, 2018
zkamvar added a commit that referenced this issue May 2, 2018
Tweaking still needs to be done to modify the
plot and turn off the noise
zkamvar added a commit that referenced this issue May 2, 2018
zkamvar added a commit that referenced this issue May 2, 2018
I have to coerce logical to integer for addition now apparently
@zkamvar
Copy link
Member

zkamvar commented May 2, 2018

Hello,

I have added this feature in the pair-ia-pval branch. I have changed the visualization so that the p-values are displayed as text over the tiles like so:

suppressPackageStartupMessages(library("poppr"))
data(nancycats)
pair.ia(nancycats[pop = 9], sample = 99, quiet = TRUE)

#>                   Ia     p.Ia    rbarD     p.rD
#> fca8:fca23   2.0e-02  4.0e-01  2.0e-02  4.0e-01
#> fca8:fca43   5.0e-03  5.3e-01  5.1e-03  5.3e-01
#> fca8:fca45   4.6e-01  3.0e-02  4.6e-01  3.0e-02
#> fca8:fca77   1.1e-01  3.0e-01  1.1e-01  3.0e-01
#> fca8:fca78   4.3e-02  3.4e-01  4.6e-02  3.4e-01
#> fca8:fca90  -5.8e-02  5.2e-01 -5.8e-02  5.2e-01
#> fca8:fca96   4.3e-01  2.0e-02  4.4e-01  2.0e-02
#> fca8:fca37   5.9e-02  3.7e-01  6.0e-02  3.7e-01
#> fca23:fca43  1.2e-01  2.5e-01  1.2e-01  2.5e-01
#> fca23:fca45  3.0e-01  4.0e-02  3.1e-01  4.0e-02
#> fca23:fca77  3.5e-01  5.0e-02  3.6e-01  5.0e-02
#> fca23:fca78  6.5e-01  1.0e-02  6.9e-01  1.0e-02
#> fca23:fca90  6.7e-01  1.0e-02  6.7e-01  1.0e-02
#> fca23:fca96  2.2e-01  1.7e-01  2.3e-01  1.6e-01
#> fca23:fca37  5.6e-03  4.8e-01  5.7e-03  4.8e-01
#> fca43:fca45  1.9e-01  1.7e-01  1.9e-01  1.8e-01
#> fca43:fca77  2.7e-01  1.4e-01  2.7e-01  1.4e-01
#> fca43:fca78  1.9e-01  1.7e-01  2.0e-01  1.7e-01
#> fca43:fca90  1.3e-01  2.6e-01  1.3e-01  2.6e-01
#> fca43:fca96  2.0e-01  2.4e-01  2.0e-01  2.4e-01
#> fca43:fca37  7.5e-01  1.0e-02  7.5e-01  1.0e-02
#> fca45:fca77  1.7e-01  1.5e-01  1.7e-01  1.5e-01
#> fca45:fca78  2.8e-01  9.0e-02  2.9e-01  8.0e-02
#> fca45:fca90  2.2e-16  4.3e-01  1.4e-16  4.3e-01
#> fca45:fca96  3.5e-01  1.1e-01  3.5e-01  1.1e-01
#> fca45:fca37  1.9e-01  1.1e-01  1.9e-01  1.1e-01
#> fca77:fca78  6.7e-01  1.0e-02  6.9e-01  1.0e-02
#> fca77:fca90  3.5e-01  2.0e-02  3.5e-01  2.0e-02
#> fca77:fca96  5.1e-01  2.0e-02  5.1e-01  2.0e-02
#> fca77:fca37  3.3e-01  6.0e-02  3.3e-01  6.0e-02
#> fca78:fca90  5.7e-01  1.0e-02  5.9e-01  1.0e-02
#> fca78:fca96  5.0e-01  2.0e-02  5.0e-01  2.0e-02
#> fca78:fca37  1.6e-01  2.1e-01  1.7e-01  2.1e-01
#> fca90:fca96  2.2e-01  1.7e-01  2.2e-01  1.7e-01
#> fca90:fca37  1.4e-01  2.3e-01  1.4e-01  2.3e-01
#> fca96:fca37  2.0e-01  1.6e-01  2.0e-01  1.6e-01

Created on 2018-05-02 by the reprex package (v0.2.0).

The code is currently available on the pair-ia-pval branch. If you have a C-compiler and are familiar with installing packages that need one, you can use the remotes package to install the latest version:

remotes::install_github("grunwaldlab/poppr@pair-ia-pval")

Otherwise, please let me know what system you use and I can provide you with a binary for you to install and test.

@zkamvar
Copy link
Member

zkamvar commented May 5, 2018

Hi @ktiedje,

Does this update to pair.ia work for you? Are you able to install the new version? If so, is there anything you would like to see changed about it? If not, can you provide me with information of your operating system so I can create a binary for you to test?

Thanks,
Zhian

@zkamvar
Copy link
Member

zkamvar commented May 7, 2018

Hi @ktiedje,

Just checking to see if the update works for you. If it does, I'll merge it into the master branch and get it ready for release within a couple of weeks.

Best,
Zhian

@ktiedje
Copy link
Author

ktiedje commented May 7, 2018 via email

@ktiedje
Copy link
Author

ktiedje commented May 9, 2018 via email

@zkamvar
Copy link
Member

zkamvar commented May 9, 2018 via email

@ktiedje
Copy link
Author

ktiedje commented May 9, 2018 via email

@zkamvar
Copy link
Member

zkamvar commented May 9, 2018

Hi @ktiedje

To answer your questions:

  1. pop=X is a shortcut for selecting the Xth population from the data set. I used it here to make a quick-running example. The reason you get different p-values without pop = 9 is that it's using the entire data set.

  2. P-values are calculated out of the number of permutations (in the above case, 99) plus one for the observed value. You should use a permutation values that end in 9 to get a round number. (e. g. sample = 999). Otherwise, you can either manually manipulate the output matrix or you can manipulate the data in the ggplot object directly: https://cran.r-project.org/web/packages/poppr/vignettes/poppr_manual.html#appendix

  3. Is there an advantage of using Ia or rbarD over the other measures for calculating pairwise LD and significance? Honestly, I don't know. I would suspect that the answer is no.

@zkamvar zkamvar closed this as completed May 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants