Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assign categorical Lazarin 2014 Tiers #4

Closed
bschilder opened this issue Apr 16, 2024 · 6 comments
Closed

Assign categorical Lazarin 2014 Tiers #4

bschilder opened this issue Apr 16, 2024 · 6 comments
Assignees

Comments

@bschilder
Copy link
Collaborator

@NathanSkene suggested we should use the Lazarin 2014 Tier system. But I pointed out that the reason we switched to a continuous severity score is because it provides a quantitative way of sorting the phenotypes. Also, we don't exactly recapitulate the Lazarin criteria with the GPT annotations, it's more like we were inspired by Lazarin 2014 to generate some our own somewhat similar criteria.

One mid-ground might be to create a rule-based function that attempts to approximate the Lazarin 2014 Tiers. It won't be exactly the same, but it might be useful for grouping our phenotypes into discrete severity categories.

@bschilder bschilder self-assigned this Apr 16, 2024
@bschilder
Copy link
Collaborator Author

bschilder commented May 14, 2024

So after rereading Lazarin 2014, my understanding of Tiers is a bit different. Basically clinical characteristic can be assigned tiers (1-4). The tiers are then mapped onto severity categories (Mild, Moderate, Severe, Profound) like so:

image

So perhaps it would make more sense to map our phenotypes onto these severity categories instead of the tiers

@bschilder
Copy link
Collaborator Author

Lazarin 2014 also struggled with the same ambiguity we're facing regarding the role of available treatments:

Availability of treatment is not a measure of the severity of an untreated disease. However, it was rated as highly important (more so than any sensory deficit); thus, while it is not sensible to include it in an assessment of untreated severity, it is reasonable to consider it in conjunction with severity when considering disease inclusion criteria. Unfortunately, the survey's design makes it difficult to interpret responses to this characteristic: it is not clear whether respondents believed that the presence or absence of treatment was of importance.

One thing we do improve upon over Lazarin is the issue of "expressivity". We basically capture a rough approximation of this with the never/rarely/often/always classifications.

@bschilder
Copy link
Collaborator Author

bschilder commented May 14, 2024

Mapping our metrics onto Tiers is a bit challenging since they're quite different:
(from Table 1 on Lazarin 2014)
image

Here's my closest approximation. Notable issues:

  • Our metric "congenital onset" doesn't directly map onto any of the Lazarin criterion. Something can be congenital and not necessarily cause death at an early age. That said, it's still an important feature and thus perhaps worth considering adding to our tier assignments.
  • Our metric "physical malformation" maps onto multiple Lazarin criteria; internal physical malformation (Tier 2), and dysmorphic features (Tier 3). We currently can't distinguish between these two situations, where internal malformations are more likely to be severe since they affect organ systems. Assigning our "physical malformation" onto Tier 2 only for now.
tiers_dict <- list(
  ## Tier 1
  death=1, 
  intellectual_disability=1,
  # congenital_onset=1,
  ## Tier 2
  impaired_mobility=2, 
  physical_malformations=2,
  ## Tier 3
  blindness=3,  
  sensory_impairments=3,
  immunodeficiency=3, 
  cancer=3, 
  ## Tier 4
  reduced_fertility=4
)

@NathanSkene
Copy link
Collaborator

NathanSkene commented May 14, 2024 via email

@bschilder
Copy link
Collaborator Author

Severity class can be Mild, Moderate, Severe, or Profound.
I've also generated a severity class score, which is just the proportion of metrics that meet our threshold of often/always. This provides a way to rank phenotypes within each severity class as well.

res_coded <- HPOExplorer::gpt_annot_codify()

map_severity_class <- function(r,
                               tiers_dict = list(
                                ## Tier 1
                                death=1, 
                                intellectual_disability=1,
                                # congenital_onset=1,
                                ## Tier 2
                                impaired_mobility=2, 
                                physical_malformations=2,
                                ## Tier 3
                                blindness=3,  
                                sensory_impairments=3,
                                immunodeficiency=3, 
                                cancer=3, 
                                ## Tier 4
                                reduced_fertility=4
                               ),
                               inclusion_values=c(2,3), # i.e. often, always
                               return_score=FALSE){
  tiers <- unique(unlist(tiers_dict))
  tier_scores <- lapply(stats::setNames(tiers,paste0("tier",tiers)),
                        function(x){
    tx <- tiers_dict[unname(unlist(tiers_dict)==x)]
    counts <- r[,sapply(.SD, function(v){v %in% inclusion_values}), 
               .SDcols = names(tx)]
    list(
      counts=counts,
      proportion=sum(counts)/length(tx)
    )
  })
  mean_proportion <- sapply(tier_scores, function(x)x$proportion)|>mean()
  assigned_class <- if(sum(tier_scores$tier1$counts)>1){
    c("profound"=mean_proportion)
  } else if (sum(tier_scores$tier1$counts)>0 ||
             sum(c(tier_scores$tier2$counts,tier_scores$tier3$counts))>3){
    c("severe"=mean_proportion)
  } else if(sum(tier_scores$tier3$counts)>0){
    c("moderate"=mean_proportion)
  } else{
    c("mild"=mean_proportion)
  }  
  if(return_score){
    return(assigned_class)
  } else{
    return(names(assigned_class))
  }
}

res_coded$annot_coded[,severity_class:=map_severity_class(.SD), by=.I]
res_coded$annot_coded[,severity_class_score:=map_severity_class(.SD, return_score = TRUE), by=.I]

I checked that there's a correspondence between our severity scores and the severity classes assigned in this way, and indeed there is:
image

@bschilder
Copy link
Collaborator Author

Now described in Results and Methods under new section " Severity classes".

Added the violin plot to the supp as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants