Predicting profiles #9

thoree · 2019-11-09T13:29:22Z

# A feature wish: a ranked list of possible profiles. A small example:

library(pedprobr)
#> Loading required package: pedtools
x = nuclearPed(2, father = "FA")
m1 = marker(x, "3" = 1, "4" = 1:2, alleles = 1:2)
m2 = marker(x, "3" = 1, "4" = 2, alleles = 1:2)
x = setMarkers(x, list(m1, m2))
p1 = oneMarkerDistribution(x, "FA",  partialmarker = 1, verbose = F)
p2 = oneMarkerDistribution(x, "FA",  partialmarker = 2, verbose = F)

# Apriori there are 9 possible genotype
# combinations (profiles) for 'FA' for these markers.
# The (posterior) probabilities are

p1%o%p2
#>     1/1       1/2 2/2
#> 1/1   0 0.3333333   0
#> 1/2   0 0.6666667   0
#> 2/2   0 0.0000000   0

# More generally, it would be interesting to
# have a list of the possible profiles for FA for n markers
# ranked according to how probable they are. This could then
# be used to search for "FA", say in a database of missing persons.

^{Created on 2019-11-09 by the reprex package (v0.3.0)}

magnusdv · 2019-11-11T09:28:01Z

Thanks, that is an interesting idea, which shouldn't be too difficult to implement.
I'm not sure exactly how you envision the output to look like, though. You say a "list of profiles", but could you give an explicit example?

Below is a quick attempt using your example, resulting in a sorted data frame. Is this in the right direction?

library(pedprobr)
#> Loading required package: pedtools

x = nuclearPed(2, father = "FA")
m1 = marker(x, "3" = 1, "4" = 1:2, alleles = 1:2)
m2 = marker(x, "3" = 1, "4" = 2, alleles = 1:2)

p1 = oneMarkerDistribution(x, "FA",  partialmarker = m1, verbose = F)
p2 = oneMarkerDistribution(x, "FA",  partialmarker = m2, verbose = F)

# Posterior probabilities
pp = p1 %o% p2

# Indices of positive entries
idx = which(pp > 0, arr.ind = T)

# Result as data.frame
res = data.frame(
  M1 = rownames(pp)[idx[, 'row']], 
  M2 = colnames(pp)[idx[, 'col']], 
  Prob = pp[pp > 0], 
  stringsAsFactors = F)

# Sort
res[order(res$Prob, decreasing = T), , drop = F]
#>    M1  M2      Prob
#> 2 1/2 1/2 0.6666667
#> 1 1/1 1/2 0.3333333

^{Created on 2019-11-11 by the reprex package (v0.3.0)}

thoree · 2019-11-11T11:07:42Z

Very nice! I tried this on an MP example, F5. Then one could restrict attention to markers for which EP>0. However, in this example, and I fear in most MP examples, the list will be too long and
no candidate profiles will stand out as as clearly more likely. In some cases, like the below
one from your Argentina talk I would think the profile of MP could be shortlisted

,

magnusdv · 2019-11-11T15:49:06Z

Ok, here is a sketch of a function, which (I think) does more or less what you want. I don't know how it will react when the dimensions blow up, but I included a parameter maxPedMarker which limits the number of genotypes considered for each marker.

library(pedprobr)
#> Loading required package: pedtools

rankProfiles = function(x, id, markers = NULL, maxPerMarker = Inf, verbose = TRUE) {

  if(!hasMarkers(x)) 
    stop("No markers attached to pedigree")
  
  if(is.null(markers))
    markers = seq_len(nMarkers(x))

  # Extract wanted markers
  x = selectMarkers(x, markers)
  nmark = nMarkers(x)
  
  # Marker names
  mnames = name(x, 1:nmark)
  mnames[is.na(mnames)] = as.character((1:nmark)[is.na(mnames)])
  
  # Posterior distributions for each marker
  omdList = lapply(1:nmark, function(i) {
    omd = oneMarkerDistribution(x, ids = id, partialmarker = i, verbose = FALSE)
    a = omd[omd > 0, drop = F]
    
    if(maxPerMarker < length(a))
      a = sort(a, decreasing = T)[1:maxPerMarker]
    
    # Trick to ensure proper naming in `expand.grid` below
    names(dimnames(a)) = mnames[i]
    
    if(verbose) {
      print(a)
      cat("\n")
    }
    
    a
  })

  # Array with total posterior probs
  pp = Reduce(`%o%`, omdList)

  # Transform dimnames into data frame with possible profiles
  profiles = expand.grid(dimnames(pp))
  
  # Add probabilities
  prob = as.numeric(pp)
  profiles$Posterior = prob

  # Sort
  profiles = profiles[order(prob, decreasing = T), , drop = F]
  row.names(profiles) = NULL

  profiles
}

And here's an expanded version of your original example:

x = nuclearPed(2, father = "FA")
m1 = marker(x, "3" = 1, "4" = 1:2, alleles = 1:2, name = "m1")
m2 = marker(x, "3" = 1, "4" = 2, alleles = 1:2, name = "m2")
m3 = marker(x, "3" = 2, "4" = 2, alleles = 1:2, name = "m3")
x = setMarkers(x, list(m1, m2, m3))

rankProfiles(x, "FA")
#> m1
#>       1/1       1/2 
#> 0.3333333 0.6666667 
#> 
#> m2
#> 1/2 
#>   1 
#> 
#> m3
#>       1/2       2/2 
#> 0.3333333 0.6666667
#>
#>    m1  m2  m3 Posterior
#> 1 1/2 1/2 2/2 0.4444444
#> 2 1/2 1/2 1/2 0.2222222
#> 3 1/1 1/2 2/2 0.2222222
#> 4 1/1 1/2 1/2 0.1111111

^{Created on 2019-11-11 by the reprex package (v0.3.0)}

thoree · 2019-11-12T08:01:57Z

Thanks, I will try some examples and get back to you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Predicting profiles #9

Predicting profiles #9

thoree commented Nov 9, 2019

magnusdv commented Nov 11, 2019

thoree commented Nov 11, 2019

magnusdv commented Nov 11, 2019

thoree commented Nov 12, 2019

Predicting profiles #9

Predicting profiles #9

Comments

thoree commented Nov 9, 2019

magnusdv commented Nov 11, 2019

thoree commented Nov 11, 2019

magnusdv commented Nov 11, 2019

thoree commented Nov 12, 2019