Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epic #480 "Create Adjacency" #488

Merged
merged 3 commits into from
Apr 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
58 changes: 58 additions & 0 deletions docs/source/concepts/molecular_variation/Adjacency.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
.. _Adjacency:

Adjacency
!!!!!!!!!

.. admonition:: New in v2

The adjacency class was added in v2 to describe structural variation.

The adjacency class is a core concept for structural variation, representing the junction point of
two adjoined molecules. This class can be used on its own (e.g. for junctions of chimeric transcript fusions)
or in higher order structures such as :ref:`DerivativeSequence` to represent molecules derived from multiple
adjacencies (e.g. for translocations).

Definition and Information Model
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

.. include:: ../../def/Adjacency.rst

Implementation Guidance
@@@@@@@@@@@@@@@@@@@@@@@

Sequence Locations and Directionality
#####################################

Structural variants on double-stranded nucleic acids may have an adjoined partner that is a reverse complement
of the provided :ref:`SequenceReference`. These types of adjacencies are common in structural variation, and
can be found, for example, on either end of a chromosomal inversion.

To represent this, the :ref:`SequenceLocation` used by each partner of the adjacency is defined using
only one of the `start` or `end` attributes. Defining the location by `start` means that the sequence content
extends right (increases) on the :ref:`SequenceReference`, and defining the location by `end` means that the
sequence content extends left (decreases) on the :ref:`SequenceReference`.

.. figure:: ../../images/ex_simple_breakpoint.png

**An example simple Adjacency.** The chromosome 1 sequence extends left from position 1:123 and so is defined
by the location `start`. The chromosome 2 sequence extends right from position 2:456 and so is defined by the
location `end`.

.. figure:: ../../images/ex_revcomp_breakpoint.png

**An example Adjacency with a reverse complement partner.** The chromosome 1 sequence extends left from
position 1:87337011 and so is defined by the location `start`. The chromosome 10 sequence *also* extends left
from position 10:36119127 and so is *also* defined by the location `start`. Reading left-to-right along this
adjacency one would expect reference sequence up to the adjacency and reverse complement sequence following.

Normalization
#############

Conventions for ordering sequences and handling ambiguous sequence Adjacencies are described in
:ref:`adjacency-normalization`.

Linker Sequences
################

Intervening sequences between adjoined sequences in an adjacency are called *linker sequences* and may be specified
with a :ref:`SequenceExpression`.`
34 changes: 28 additions & 6 deletions docs/source/conventions/normalization.rst
Original file line number Diff line number Diff line change
Expand Up @@ -185,12 +185,34 @@ the following normalization rules apply:
.. [1] Holmes JB, Moyer E, Phan L, Maglott D, Kattman B.
**SPDI: Data Model for Variants and Applications at NCBI.
Bioinformatics.** 2019. `doi:10.1093/bioinformatics/btz856`_

.. [2] Wagner AH, Babb L, Alterovitz G, Baudis M, Brush M, Cameron DL,
..., Hart RK. **The GA4GH Variation Representation Specification (VRS):
a Computational Framework for the Precise Representation and
Federated Identification of Molecular Variation.**
bioRxiv. 2021. `doi:10.1101/2021.01.15.426843`_

.. _doi:10.1101/2021.01.15.426843: https://doi.org/10.1101/2021.01.15.426843
.. _doi:10.1093/bioinformatics/btz856: https://doi.org/10.1093/bioinformatics/btz856


.. _adjacency-normalization:

Adjacency Normalization
@@@@@@@@@@@@@@@@@@@@@@@

.. admonition:: New in v2

The adjacency class was added in v2 to describe structural variation.

.. todo: expand on the below text

.. figure:: ../images/ex_sequence_homology.png

**Describing sequence homology as region of ambiguity.** Adjacency coordinates may be ambiguous
when sequence on either side of the adjacency is homologous. This is addressed through expanding
the region on both sides. Precise algorithm to be described.

When expressed on a double-stranded nucleic acid molecule, an adjacency can be represented in a forward
or reverse orientation. To ensure uniqueness of a computed identifier for these concepts, we require
a convention for determining the preferred orientation of such adjacencies. The conventional orientation
will be selected by meeting the following ordered criteria.

1. The first of the adjoined sequences MUST have a forward orientation (location defined by `end`).
2. The adjoined sequence accessions are equal or in ascending lexicographical order.
3. The defined adjoined sequence coordinates are in ascending numerical order.

Binary file modified docs/source/images/ex_revcomp_breakpoint.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/source/images/ex_sequence_homology.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/source/images/ex_simple_breakpoint.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 5 additions & 1 deletion schema/vrs/def/Adjacency.rst
Original file line number Diff line number Diff line change
Expand Up @@ -46,9 +46,13 @@ Some Adjacency attributes are inherited from :ref:`Variation`.
-
* - adjoinedSequences
- :ref:`IRI` | :ref:`Location`
- 1..2
- 2..2
- The terminal sequence or pair of adjoined sequences that defines in the adjacency.
* - linker
- :ref:`SequenceExpression`
- 0..1
- The sequence found between adjoined sequences.
* - homology
- boolean
- 0..1
- A flag indicating if coordinate ambiguity in the adjoined sequences is from sequence homology (true) or other uncertainty (false).
2 changes: 1 addition & 1 deletion schema/vrs/def/SequenceReference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,4 +47,4 @@ Some SequenceReference attributes are inherited from :ref:`gks.core:Entity`.
* - circular
- boolean
- 0..1
- A boolean indicating whether a sequence is circular (true) or linear (false).
- A boolean indicating whether the molecule represented by the sequence is circular (true) or linear (false).
8 changes: 7 additions & 1 deletion schema/vrs/json/Adjacency
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@
]
},
"description": "The terminal sequence or pair of adjoined sequences that defines in the adjacency.",
"minItems": 1,
"minItems": 2,
"maxItems": 2
},
"linker": {
Expand All @@ -82,6 +82,12 @@
"$ref": "/ga4gh/schema/vrs/2.x/json/ReferenceLengthExpression"
}
]
},
"homology": {
"type": "boolean",
"maturity": "draft",
"$comment": "This flag is under active discussion; see github.com/ga4gh/vrs/discussions/489",
"description": "A flag indicating if coordinate ambiguity in the adjoined sequences is from sequence homology (true) or other uncertainty (false)."
}
},
"required": [
Expand Down
2 changes: 1 addition & 1 deletion schema/vrs/json/SequenceReference
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@
},
"circular": {
"type": "boolean",
"description": "A boolean indicating whether a sequence is circular (true) or linear (false)."
"description": "A boolean indicating whether the molecule represented by the sequence is circular (true) or linear (false)."
}
},
"required": [
Expand Down
27 changes: 9 additions & 18 deletions schema/vrs/vrs-source.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -582,28 +582,19 @@ $defs:
- $refCurie: gks.common:IRI
- $ref: '#/$defs/Location'
description: The terminal sequence or pair of adjoined sequences that defines in the adjacency.
minItems: 1
minItems: 2
maxItems: 2
linker:
$ref: '#/$defs/SequenceExpression'
description: The sequence found between adjoined sequences.
# homology:
# # Only valid for breakends=2
# type: boolean
# default: false
# description:
# A flag indicating whether the location interval of the breakend
# is due to the sequences at the breakends being homologous or
# whether the interval is due to uncertainty regarding the actual
# locations of the breakends.
# terminal:
# # TODO: can the schema encode a constraint that a terminal breakend cannot
# # be part of a breakpoint?
# type: boolean
# default: false
# description:
# # Only valid for breakends=1
# Indicates the end of the molecule
homology:
type: boolean
maturity: draft
$comment: This flag is under active discussion; see github.com/ga4gh/vrs/discussions/489
description: >-
A flag indicating if coordinate ambiguity in the adjoined sequences is from sequence homology
(true) or other uncertainty (false).

required:
- adjoinedSequences

Expand Down
10 changes: 5 additions & 5 deletions tests/test_definitions.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,11 @@ tests:
image: ../../docs/images/ex_revcomp_breakpoint.png
schema: vrs
definition: Adjacency
- test_file: terminal_breakend.yaml
description: An adjacency with only the starting sequence location. defining the break at which the adjacency ends or terminates.
image: ../../docs/images/ex_terminal_breakend.png
schema: vrs
definition: Adjacency
# - test_file: terminal_breakend.yaml
# description: An adjacency with only the starting sequence location. defining the break at which the adjacency ends or terminates.
# image: ../../docs/images/ex_terminal_breakend.png
# schema: vrs
# definition: Adjacency
- test_file: sequence_homology.yaml
description: An adjacency in which the two sequence locations have a homologous overlapping adjoined sequences.
image: ../../docs/images/ex_sequence_homology.png
Expand Down