Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predicting profiles #9

Open
thoree opened this issue Nov 9, 2019 · 4 comments
Open

Predicting profiles #9

thoree opened this issue Nov 9, 2019 · 4 comments

Comments

@thoree
Copy link

thoree commented Nov 9, 2019

# A feature wish: a ranked list of possible profiles. A small example:

library(pedprobr)
#> Loading required package: pedtools
x = nuclearPed(2, father = "FA")
m1 = marker(x, "3" = 1, "4" = 1:2, alleles = 1:2)
m2 = marker(x, "3" = 1, "4" = 2, alleles = 1:2)
x = setMarkers(x, list(m1, m2))
p1 = oneMarkerDistribution(x, "FA",  partialmarker = 1, verbose = F)
p2 = oneMarkerDistribution(x, "FA",  partialmarker = 2, verbose = F)

# Apriori there are 9 possible genotype
# combinations (profiles) for 'FA' for these markers.
# The (posterior) probabilities are

p1%o%p2
#>     1/1       1/2 2/2
#> 1/1   0 0.3333333   0
#> 1/2   0 0.6666667   0
#> 2/2   0 0.0000000   0

# More generally, it would be interesting to
# have a list of the possible profiles for FA for n markers
# ranked according to how probable they are. This could then
# be used to search for "FA", say in a database of missing persons.

Created on 2019-11-09 by the reprex package (v0.3.0)

@magnusdv
Copy link
Owner

Thanks, that is an interesting idea, which shouldn't be too difficult to implement.
I'm not sure exactly how you envision the output to look like, though. You say a "list of profiles", but could you give an explicit example?

Below is a quick attempt using your example, resulting in a sorted data frame. Is this in the right direction?

library(pedprobr)
#> Loading required package: pedtools

x = nuclearPed(2, father = "FA")
m1 = marker(x, "3" = 1, "4" = 1:2, alleles = 1:2)
m2 = marker(x, "3" = 1, "4" = 2, alleles = 1:2)

p1 = oneMarkerDistribution(x, "FA",  partialmarker = m1, verbose = F)
p2 = oneMarkerDistribution(x, "FA",  partialmarker = m2, verbose = F)

# Posterior probabilities
pp = p1 %o% p2

# Indices of positive entries
idx = which(pp > 0, arr.ind = T)

# Result as data.frame
res = data.frame(
  M1 = rownames(pp)[idx[, 'row']], 
  M2 = colnames(pp)[idx[, 'col']], 
  Prob = pp[pp > 0], 
  stringsAsFactors = F)

# Sort
res[order(res$Prob, decreasing = T), , drop = F]
#>    M1  M2      Prob
#> 2 1/2 1/2 0.6666667
#> 1 1/1 1/2 0.3333333

Created on 2019-11-11 by the reprex package (v0.3.0)

@thoree
Copy link
Author

thoree commented Nov 11, 2019

Very nice! I tried this on an MP example, F5. Then one could restrict attention to markers for which EP>0. However, in this example, and I fear in most MP examples, the list will be too long and
no candidate profiles will stand out as as clearly more likely. In some cases, like the below
one from your Argentina talk I would think the profile of MP could be shortlisted

image,

@magnusdv
Copy link
Owner

Ok, here is a sketch of a function, which (I think) does more or less what you want. I don't know how it will react when the dimensions blow up, but I included a parameter maxPedMarker which limits the number of genotypes considered for each marker.

library(pedprobr)
#> Loading required package: pedtools

rankProfiles = function(x, id, markers = NULL, maxPerMarker = Inf, verbose = TRUE) {

  if(!hasMarkers(x)) 
    stop("No markers attached to pedigree")
  
  if(is.null(markers))
    markers = seq_len(nMarkers(x))

  # Extract wanted markers
  x = selectMarkers(x, markers)
  nmark = nMarkers(x)
  
  # Marker names
  mnames = name(x, 1:nmark)
  mnames[is.na(mnames)] = as.character((1:nmark)[is.na(mnames)])
  
  # Posterior distributions for each marker
  omdList = lapply(1:nmark, function(i) {
    omd = oneMarkerDistribution(x, ids = id, partialmarker = i, verbose = FALSE)
    a = omd[omd > 0, drop = F]
    
    if(maxPerMarker < length(a))
      a = sort(a, decreasing = T)[1:maxPerMarker]
    
    # Trick to ensure proper naming in `expand.grid` below
    names(dimnames(a)) = mnames[i]
    
    if(verbose) {
      print(a)
      cat("\n")
    }
    
    a
  })

  # Array with total posterior probs
  pp = Reduce(`%o%`, omdList)

  # Transform dimnames into data frame with possible profiles
  profiles = expand.grid(dimnames(pp))
  
  # Add probabilities
  prob = as.numeric(pp)
  profiles$Posterior = prob

  # Sort
  profiles = profiles[order(prob, decreasing = T), , drop = F]
  row.names(profiles) = NULL

  profiles
}

And here's an expanded version of your original example:

x = nuclearPed(2, father = "FA")
m1 = marker(x, "3" = 1, "4" = 1:2, alleles = 1:2, name = "m1")
m2 = marker(x, "3" = 1, "4" = 2, alleles = 1:2, name = "m2")
m3 = marker(x, "3" = 2, "4" = 2, alleles = 1:2, name = "m3")
x = setMarkers(x, list(m1, m2, m3))

rankProfiles(x, "FA")
#> m1
#>       1/1       1/2 
#> 0.3333333 0.6666667 
#> 
#> m2
#> 1/2 
#>   1 
#> 
#> m3
#>       1/2       2/2 
#> 0.3333333 0.6666667
#>
#>    m1  m2  m3 Posterior
#> 1 1/2 1/2 2/2 0.4444444
#> 2 1/2 1/2 1/2 0.2222222
#> 3 1/1 1/2 2/2 0.2222222
#> 4 1/1 1/2 1/2 0.1111111

Created on 2019-11-11 by the reprex package (v0.3.0)

@thoree
Copy link
Author

thoree commented Nov 12, 2019

Thanks, I will try some examples and get back to you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants