Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

genind2genalex() produces all zero genotypes with some SNP data. #231

Closed
zkamvar opened this issue Jan 25, 2021 · 1 comment · Fixed by #233
Closed

genind2genalex() produces all zero genotypes with some SNP data. #231

zkamvar opened this issue Jan 25, 2021 · 1 comment · Fixed by #233

Comments

@zkamvar
Copy link
Member

zkamvar commented Jan 25, 2021

From https://groups.google.com/g/poppr/c/FfDlDWArQsA/m/UWp8h6ISDQAJ

genind2genalex() is producing all zeroes for output when it should produce SNP data. It's clear the error lies with the genind2genalex() and not df2genind().

  suppressPackageStartupMessages(library("poppr"))
  tmp <- tempfile(fileext = ".csv")
  x <- new("genind", tab = structure(c(NA, 2L, 2L, 2L, 2L, NA, 0L, 0L,
0L, 0L, NA, 2L, 2L, 2L, 2L, NA, 0L, 0L, 0L, 0L, 1L, 1L, 2L, 2L,
1L, 1L, 1L, 0L, 0L, 1L), .Dim = 5:6, .Dimnames = list(c("TT056001.trim",
"TT060001.trim", "TT062001.trim", "TT063001.trim", "TT064001.trim"
), c("loc87_pos30.A", "loc87_pos30.G", "loc106_pos31.G", "loc106_pos31.T",
"loc345_pos27.G", "loc345_pos27.T"))), loc.fac = structure(c(1L,
1L, 2L, 2L, 3L, 3L), .Label = c("loc87_pos30", "loc106_pos31",
"loc345_pos27"), class = "factor"), loc.n.all = c(loc87_pos30 = 2L,
loc106_pos31 = 2L, loc345_pos27 = 2L), all.names = list(loc87_pos30 = c("A",
"G"), loc106_pos31 = c("G", "T"), loc345_pos27 = c("G", "T")),
    ploidy = c(2L, 2L, 2L, 2L, 2L), type = "codom", other = list(),
    call = .local(x = x, i = i, j = j, loc = ..1, drop = drop),
    pop = NULL, strata = NULL, hierarchy = NULL)
  genind2df(x) # ok
#>               loc87_pos30 loc106_pos31 loc345_pos27
#> TT056001.trim        <NA>         <NA>           GT
#> TT060001.trim          AA           GG           GT
#> TT062001.trim          AA           GG           GG
#> TT063001.trim          AA           GG           GG
#> TT064001.trim          AA           GG           GT
  genind2genalex(x, tmp)
#> Extracting the table ...
#> Warning in FUN(X[[i]], ...): NAs introduced by coercion

#> Warning in FUN(X[[i]], ...): NAs introduced by coercion

#> Warning in FUN(X[[i]], ...): NAs introduced by coercion

#> Warning in FUN(X[[i]], ...): NAs introduced by coercion

#> Warning in FUN(X[[i]], ...): NAs introduced by coercion

#> Warning in FUN(X[[i]], ...): NAs introduced by coercion

#> Warning in FUN(X[[i]], ...): NAs introduced by coercion

#> Warning in FUN(X[[i]], ...): NAs introduced by coercion

#> Warning in FUN(X[[i]], ...): NAs introduced by coercion

#> Warning in FUN(X[[i]], ...): NAs introduced by coercion

#> Warning in FUN(X[[i]], ...): NAs introduced by coercion

#> Warning in FUN(X[[i]], ...): NAs introduced by coercion

#> Warning in FUN(X[[i]], ...): NAs introduced by coercion
#> Writing the table to /tmp/RtmpnNa39T/file106e133aa9690.csv ... Done.
#> [1] "/tmp/RtmpnNa39T/file106e133aa9690.csv"
  readLines(tmp)
#> [1] "3,5,1,5,,,,"                                                  
#> [2] ",,,Pop,,,,"                                                   
#> [3] "Ind,Pop,loc87_pos30, ,loc106_pos31, ,loc345_pos27, "          
#> [4] "\"TT056001.trim\",\"Pop\",\"0\",\"0\",\"0\",\"0\",\"0\",\"0\""
#> [5] "\"TT060001.trim\",\"Pop\",\"0\",\"0\",\"0\",\"0\",\"0\",\"0\""
#> [6] "\"TT062001.trim\",\"Pop\",\"0\",\"0\",\"0\",\"0\",\"0\",\"0\""
#> [7] "\"TT063001.trim\",\"Pop\",\"0\",\"0\",\"0\",\"0\",\"0\",\"0\""
#> [8] "\"TT064001.trim\",\"Pop\",\"0\",\"0\",\"0\",\"0\",\"0\",\"0\""

Created on 2021-01-25 by the reprex package (v0.3.0)

@zkamvar
Copy link
Member Author

zkamvar commented Jan 30, 2021

I have found the problem. poppr:::fill_zero() assumes that the incoming data is numeric. This procedure was bypassed in the fix for #108 by assuming that all SNP data was haploid.

The solution I'm going with is to modify the mat = FALSE flag of fill_zero -> fill_zero_locus -> generate_bruvo_mat that will accept non-numeric data. I am changing it to mat_type = character(0) by default and accepting one of three scenarios:

  1. character(0): should produce a data frame with one locus per column
  2. "numeric": produces a numeric matrix with one allele per column
  3. "character": produces a character matrix with one allele per column.

zkamvar added a commit that referenced this issue Jan 30, 2021
These will now allow for numeric or character matrix output. This also
changes the argument `mat` to `mat_type` and changes the value from a
boolean to a character vector indicating the type of matrix output (if
none, no matrix output).

This commit will fix #231
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant