RelativeCopyNumber #277

ahwagner · 2021-02-22T18:57:38Z

Revisit Relative CopyNumber statements, as seen in cytogenetics resources and WRT X-chromosome abnormalities.

ahwagner · 2021-12-10T15:00:22Z

@mcannon068nw follow this thread.

mbaudis · 2021-12-10T17:20:07Z

+1

mbaudis · 2021-12-11T15:26:57Z

As mentioned in an exchange w/ @ahwagner the representation of relative copy number states is essential since:

most analyses do not deliver copy number counts - in fact the use of "copies" is extremely unusual
CN estimates will be influenced by clonality, impurity and ploidy
even correct CN counts can frequently (cancer genomes...) only be interpreted by knowing the base ploidy level

A main logical paradigm of CN analyses and representation is the relative level with respect to a baseline (i.e. ploidy level). In cancer genomics there is pretty much consistent use of a limited set of CN levels:

homozygous deletion (i.e. estimated 0 copies in any ploidy)
deletion (i.e. fewer copies than the assumed baseline; CN=1 for diploid)
duplication / low level gain (i.e. one or few more copies than baseline; operationally one can assume this as "not more than a duplication of the base CN count", i.e. 4 in a diploid genome)
amplification / high level gain : from ~5 to any number - possibly in the hundreds - of copies of a genomic region. However, there is no clear definition of an amplification threshold and some definitions may include the regionality and exclude e.g. events leading to multiple copies of a chromosome

In practice, the current lack of a relative indication of CN state prohibits the use of the schema for most real world applications representing CNV events (or require to use fake values).

Changes needed

a class describing the relative copy number state (REQUIRED)
a representation of the base copy number at the location in the given sample (OPTIONAL; e.g. 2 for autosomes in diploid cells, 1 for X/Y in males, 3 for a triploid cell line with e.g. 69,XXX etc.)

The best option would be to have an ontology for such classes & SO should be the place? However, CNV representation there is confusing & incomplete.

Minimal pseudo-ontology for CNVs

id: CNVO:000001
label: copy number assessment
  id: CNVO:000002
  label: base ploidy
   id: CNVO:000004
   label: copy-neutral loss of heterozygosity
  id: CNVO:000003
  label: copy number variation
    id: CNVO:000005
    label: copy number loss
      id: CNVO:000007
      label: low-level copy number loss
      id: CNVO:000008
      label: homozygous deletion
    id: CNVO:000006
    label: copy number gain
      id: CNVO:000009
      label: low-level copy number gain
      id: CNVO:000010
      label: genomic amplification

I'd be happy to help working on this & extremely flexible regarding solutions ...

ahwagner · 2021-12-14T22:07:57Z

I agree completely with @mbaudis above. I think this gets around many of the challenges of representing the assay signal (e.g. log2 ratios) and moves straight to the heart of what CNV callers predict. This is very VRS-like, in my opinion (we also avoid VAF / read depth / intensity metrics elsewhere in VRS).

+1 @mbaudis

mbaudis · 2022-02-01T09:51:32Z

As of January 18, 2022 the copy number assessment class and its tree are represented in the Experimental Factor Ontology (EFO):

id: EFO:0030063
label: copy number assessment
  |
  |-id: EFO:0030064
  | label: regional base ploidy
  |   |
  |   |-id: EFO:0030065
  |     label: copy-neutral loss of heterozygosity
  |
  |-id: EFO:0030066
    label: relative copy number variation
      |
      |-id: EFO:0030067
      | label: copy number loss
      |   |
      |   |-id: EFO:0030068
      |   | label: low-level copy number loss
      |   |
      |   |-id: EFO:0030069
      |     label: complete genomic deletion
      |
      |-id: EFO:0030070
        label: copy number gain
          |
          |-id: EFO:0030071
          | label: low-level copy number gain
          |
          |-id: EFO:0030072
             label: high-level copy number gain
             note: commonly but not consistently used for >=5 copies on a bi-allelic genome region
              |
              |-id: EFO:0030073
                 label: focal genome amplification
                 note: >-
                   commonly used for localized multi-copy genome amplification events where the
                   region does not extend >3Mb (varying 1-5Mb) and may exist in a large number of
                   copies

ahwagner · 2022-02-24T17:54:00Z

On the upcoming 2/28 call @larrybabb and I will discuss a proposal to align the above classifications of low / high level copy number gain / loss as a Relative Copy Number class.

This class will be defined by a subject (matching the same variable from [Absolute] Copy Number) and a copy number assessment described by the integer range -2 to +2:
-2: complete copy loss
-1: low-level copy loss
0: copy neutral
1: low-level copy gain
2: high-level copy gain

The cardinality inherent to integers helps with computability over a strictly term-based system.

ahwagner · 2022-02-24T17:54:23Z

Loss-of-heterozygosity needs to be discussed in the context of genotypes.

larrybabb · 2022-02-24T20:25:56Z

per a discussion between @ahwagner and @larrybabb
Dreaft Relative Copy Number class proposal

-- the target region/gene/feature
subject:  region/gene/feature/allele/haplotype

--5 quantifiable values that correspond to the EFO copy number assessment subterms that are stable and reliable
copy number assessment:   (http://www.ebi.ac.uk/efo/EFO_0030063)
        -2 = complete loss  (http://www.ebi.ac.uk/efo/EFO_0030069)
        -1 = partial loss   (http://www.ebi.ac.uk/efo/EFO_0030068)
         0 = copy-neutral   (http://www.ebi.ac.uk/efo/EFO_0030064)
         1 = low-level gain (http://www.ebi.ac.uk/efo/EFO_0030071)
         2 = high-level gain(http://www.ebi.ac.uk/efo/EFO_0030072)

mbaudis · 2022-02-25T09:29:21Z

Great! 2 questions:

any intention to handle "focal genome amplification", or is this too much in the "annotation realm"?
for CN-LOH: this would then be handled as a combination of a genotype assessment (somehow expressing allelic homozygosity) and then the relative CN at the locus? Could also cover e.g. LOH with CN gain (not sure about examples but everything happens...).

ahwagner · 2022-02-28T18:02:45Z

On GA4GH call today, some concerns were stated about integer approach; confusing, and also might cause challenges when extending to other levels beyond complete / low loss / neutral / low gain / high gain

mbaudis · 2022-03-03T16:40:45Z

I guess the main arguments against directly using CURIEs would be that

VRS feels that not all of the ones from the EFO branch are suitable for this concept - at least in the way VRS is seeing it - and
the terms themselves may not become the defaults (e.g. waiting for SO as still somehow standard ontology in the variant space though w/ some problematic/lacking concepts ATM)

... ?

OTOH - CURIE concept/definition in VRS, recommended values basically adopt term definitions from EFO (thanks), flexibility to change recommended terms while keeping structure, hierarchical retrieval in implementations (complete loss is just a subset of loss) ...

ahwagner · 2022-03-03T17:09:55Z

Mostly this is for consistency with the spec so far, where we can link to all external concepts associated with a concept, e.g. sources for Allele. Our plan is to eventually provide structured alignment to the EFO (and eventually SO?) concepts, and (when we get to producing LD-contexts) we will have explicit concept equivalency maps to these entities.

ahwagner added the Stayin' Alive Issues to exempt from stale issue processing label Feb 22, 2021

ahwagner added this to the 1.3 milestone Feb 22, 2021

ahwagner self-assigned this Feb 22, 2021

reece added Upcoming Types Schema and removed Upcoming Types labels Jul 19, 2021

ahwagner removed their assignment Dec 10, 2021

This was referenced Dec 12, 2021

copy number assessment subtree proposal The-Sequence-Ontology/SO-Ontologies#568

Open

Discuss < and > for IndefiniteRange when used with non-integer numbers #362

Closed

ahwagner mentioned this issue Dec 14, 2021

Revise CopyNumber.copies description #360

Closed

mbaudis mentioned this issue Dec 21, 2021

Develop CNV Ontology hcnv/hCNV-X#1

Open

This was referenced Jan 5, 2022

copy number assessment subtree proposal (SO, EFO) EBISPOT/efo#1404

Closed

VRS differences & alignment ga4gh-beacon/beacon-v2-Models#70

Closed

korikuzma mentioned this issue Mar 8, 2022

Add Relative Copy Number ga4gh/vrsatile-pydantic#31

Closed

ahwagner mentioned this issue Mar 10, 2022

relative copy number #382

Merged

ahwagner mentioned this issue Mar 20, 2022

Valid subjects for Relative and Absolute Copy Number statements #385

Closed

ahwagner mentioned this issue Mar 8, 2023

CopyNumber: Naming and Semantics #404

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RelativeCopyNumber #277

RelativeCopyNumber #277

ahwagner commented Feb 22, 2021

ahwagner commented Dec 10, 2021

mbaudis commented Dec 10, 2021

mbaudis commented Dec 11, 2021

ahwagner commented Dec 14, 2021

mbaudis commented Feb 1, 2022

ahwagner commented Feb 24, 2022

ahwagner commented Feb 24, 2022

larrybabb commented Feb 24, 2022

mbaudis commented Feb 25, 2022 •

edited

ahwagner commented Feb 28, 2022

mbaudis commented Mar 3, 2022

ahwagner commented Mar 3, 2022

RelativeCopyNumber #277

RelativeCopyNumber #277

Comments

ahwagner commented Feb 22, 2021

ahwagner commented Dec 10, 2021

mbaudis commented Dec 10, 2021

mbaudis commented Dec 11, 2021

Changes needed

Minimal pseudo-ontology for CNVs

ahwagner commented Dec 14, 2021

mbaudis commented Feb 1, 2022

ahwagner commented Feb 24, 2022

ahwagner commented Feb 24, 2022

larrybabb commented Feb 24, 2022

mbaudis commented Feb 25, 2022 • edited

ahwagner commented Feb 28, 2022

mbaudis commented Mar 3, 2022

ahwagner commented Mar 3, 2022

mbaudis commented Feb 25, 2022 •

edited