In [None]:
from maws.dna_structure import load_dna_structure
from maws.rna_structure import load_rna_structure

dna = load_dna_structure()
rna = load_rna_structure()

ModuleNotFoundError: No module named 'heidelberg_maws'

# 1) What `dna_structure()` passes in

`DNA.py` defines six tables and then calls:

```python
Structure.Structure(
    RESIDUE_NAMES,          # Sequence[str]
    RESIDUE_LENGTH,         # Sequence[int]
    rotating_elements=ROTATIONS,     # Sequence[(name, start, bond, end_or_None)]
    backbone_elements=BACKBONE,      # Sequence[(name, start, middle_pre, bond, middle_post, end)]
    connect=CONNECT,                 # Per-residue [[append_first, append_last],[prepend_last, prepend_first], append_len, prepend_len]
    residue_path=residue_path,       # None (so no LEaP lines)
    alias=ALIASES                    # [name, alone, start, middle, end]
)
```

Because we called `dna_structure()` with no argument, `residue_path=None`.


In [2]:
from pprint import pprint

pprint(vars(dna), width=120, compact=True, sort_dicts=False)

{'residue_names': ['DGN', 'DAN', 'DTN', 'DCN', 'DG', 'DA', 'DT', 'DC', 'DG5', 'DA5', 'DT5', 'DC5', 'DG3', 'DA3', 'DT3',
                   'DC3'],
 'residue_path': None,
 'init_string': '',
 'residue_length': {'DGN': 32,
                    'DAN': 31,
                    'DTN': 31,
                    'DCN': 29,
                    'DG': 33,
                    'DA': 32,
                    'DT': 32,
                    'DC': 30,
                    'DG5': 31,
                    'DA5': 30,
                    'DT5': 30,
                    'DC5': 28,
                    'DG3': 34,
                    'DA3': 33,
                    'DT3': 33,
                    'DC3': 31},
 'connect': {'DGN': [[0, -1], [-2, 0], 1.6, 1.6],
             'DAN': [[0, -1], [-2, 0], 1.6, 1.6],
             'DTN': [[0, -1], [-2, 0], 1.6, 1.6],
             'DCN': [[0, -1], [-2, 0], 1.6, 1.6],
             'DG': [[0, -1], [-2, 0], 1.6, 1.6],
             'DA': [[0, -1], [-2, 0], 1.6, 1.6],
             'DT': 

## 2) How `Structure.__init__` builds the object

Inside the constructor, those inputs are transformed as follows.

## a) Names and LEaP init

* `self.residue_names = list(RESIDUE_NAMES)`
* `self.residue_path = None` (because we didn’t pass a path)
* `self.init_string = ""`
  Since `residue_path is None`, it doesn’t append any `loadoff`/`loadamberparams` lines.

Result you see:

```python
'residue_names': ['DGN', 'DAN', ..., 'DC3'],
'residue_path': None,
'init_string': '',
```

## b) Lengths

* `self.residue_length` becomes a dict mapping each name to its atom count, taken from `RESIDUE_LENGTH`, aligned by index.

Result (sample):

```python
'residue_length': {'DGN': 32, 'DAN': 31, ... 'DC3': 31}
```

## c) Connectivity

* If `connect` is provided, it copies each entry by residue index into a dict:

  ```python
  self.connect[res] = [[0,-1], [-2,0], 1.6, 1.6]
  ```

  These are polymerization rules:

  * “append” bond uses new residue atom index `0` to old tail atom index `-1`
  * “prepend” bond uses new residue atom `-2` to old head atom `0`
  * both bond lengths default to `1.6 Å`

Result (same for every residue here):

```python
'connect': {'DGN': [[0, -1], [-2, 0], 1.6, 1.6], ...}
```

## d) Alias mapping

* Initializes identity mapping `[alone, start, middle, end] = [name]*4`
* Then overwrites with  `ALIASES` rows of the form:
  `['A', 'DAN', 'DA5', 'DA', 'DA3']` etc.
* Stored internally **without** the first column (key), so:

  ```python
  self.alias['A'] = ['DAN', 'DA5', 'DA', 'DA3']
  ```

Result (sample):

```python
'alias': {
  'A': ['DAN','DA5','DA','DA3'],
  'C': ['DCN','DC5','DC','DC3'],
  'G': ['DGN','DG5','DG','DG3'],
  'T': ['DTN','DT5','DT','DT3'],
  # plus entries for the full deoxy names themselves (DA, DG, DC, etc.)
}
```

## e) Rotations

* Starts each residue with a sentinel list `[None]` meaning “no rotations yet”
* For each `(residue, start, bond, end_or_None)` in `ROTATIONS` it appends `[start, bond, end_or_None]`
* **Negative indices are kept** here; normalization happens later (the rotation caller will use the residue length to resolve negatives).

Result (sample):

```python
'rotating_elements': {
  'DG': [[0,3,None], [3,4,None], [10,12,-6], [-6,-1,None]],
  ...
}
```

Interpretation: each triple `[start, bond, end]` describes a subchain you’re allowed to rotate around the “bond” atom index (upstream code decides exact semantics), spanning from `start` up to `end` (or to the end of residue if `end is None`). Negative indices mean “count from the end” (Python-style).

## f) Backbone elements

* For each backbone tuple `(residue, start, middle_pre, bond, middle_post, end)`, it **normalizes negatives to positive indices** using the residue’s atom count, then stores:

  ```python
  self.backbone_elements[residue] = [[start, middle_pre, bond], [middle_post, end]]
  ```

  In DNA table, these are already non-negative, so the stored values match the input.

Result (sample):

```python
'backbone_elements': {
  'DG': [[0,10,12], [27,32]],
  ...
}
```

Interpretation: that’s a compact way to encode the 5′→3′ backbone anchor atoms for each residue: left triplet `[start, middle_pre, bond]` and right pair `[middle_post, end]`.

---

### A) Sequence building via aliases

Goal: turn an abstract sequence “A C G” into LEaP-ready residue names with correct 5′/3′ end forms.

Rules in `translate()`:

* If there’s one token: use `alias[name][0]` → “alone”.
* If there are many:

  * first uses `alias[first][1]` → “start”
  * middle tokens use `alias[token][2]` → “middle”
  * last uses `alias[last][3]` → “end”

Example:

```python
dna.translate("A C G")
# alias['A'] -> ['DAN','DA5','DA','DA3']   → start="DA5"
# alias['C'] -> ['DCN','DC5','DC','DC3']  → middle="DC"
# alias['G'] -> ['DGN','DG5','DG','DG3']  → end="DG3"
# RESULT: "DA5 DC DG3"
```

So the translated string is exactly what a LEaP script expects for terminals and internal residues.

In [5]:
# 1) Human-friendly aliasing (you already have this via translate)
dna.translate("A C G")  # -> "DA5 DC DG3"

'DA5 DC DG3'

In [6]:
# 2) Make indices obvious
dna.resolve_index("DG", -1)  # -> 32  (DG length is 33)

32

### B) Polymerization (how residues “glue”)

Use `connect[residue]`:

```
[[append_first, append_last], [prepend_last, prepend_first], append_len, prepend_len]
```

* To **append** a new residue Y to the right end (tail) of an existing chain X:

  * Bond new Y’s atom `append_first` (typically 0) to the **last atom** of X’s last residue (`append_last = -1`)
  * Desired bond length `append_len` (1.6 Å)

* To **prepend** a residue Y to the left end (head) of an existing chain X:

  * Bond new Y’s atom `prepend_last` (typically -2) to the **first atom** of X’s first residue (`prepend_first = 0`)
  * Desired bond length `prepend_len` (1.6 Å)

Example: appending “DA” to the right end of a chain whose last residue is “DC” (length 30):

* `connect['DA'][0]` → `[0, -1]`
  → bond DA atom 0 to DC atom `-1` → absolute index `30-1 = 29`.
* length uses `connect['DA'][2]` → `1.6`.

The negative indices in `connect` are resolved at polymerization time using the appropriate residue length (`residue_length[name]`).


In [7]:
# 3) Polymerization glue points you can print / log
dna.append_bond("DA")  # -> (0, -1, 1.6)  # old index relative to previous residue
dna.append_bond("DA", prev_residue_length=30)  # -> (0, 29, 1.6)

(0, 29, 1.6)

### C) Rotating Elements/ Torsions


Take `dna.rotating_elements["DG"]`:

```python
[[0, 3, None], [3, 4, None], [10, 12, -6], [-6, -1, None]]
```

With `len(DG)=33`, the engine could interpret as:

* Axis bond(0–3), rotate atoms 3..32
* Axis bond(3–4), rotate atoms 4..32
* Axis bond(10–12), rotate atoms 12..27
* Axis bond(27–32), rotate atoms 32..32 (just terminal twist), or more broadly the subgraph from 32 back to 27 depending on flood-fill direction

(Exact moving-set details depend on the rotation implementation; the triple gives it the **bond** and the **span** to consider.)


### The mental model (hinge + door)

* Pick **two atoms that share a bond**: call them `i` and `j`.
* The **axis** is the straight line through that bond (vector = `coords[j] - coords[i]`, unit-normalized).
* You **hold the `i` side fixed** and **rotate the `j` side** (a bunch of atoms connected “downstream” of `j`) around that bond by some angle θ.
* That whole rotating bunch moves as a rigid body around that axis—exactly like a door on a hinge.

In our data, each rotation is a triple:

```
[start, bond, end_or_None]   # i, j, k  (using i=start, j=bond, k=end)
```

* The **hinge bond** is `(i—j) = (start—bond)`.
* The **rotating side** is the subgraph of atoms reachable from `j` **without crossing** back over the hinge into `i`.
* The optional **`end`** says “don’t rotate past this atom”; if `end` is `None`, rotate **all the way to the residue’s terminus** on that side.

Negative indices (like `-1`) just mean “count from the end of this residue,” so at use-time we convert with:

```
if idx < 0: idx += residue_length
```

> Those “3..32” notations you saw were shorthand. In reality we rotate a **connected subgraph**, not a numeric range.



### Why “normalization” and when

* **Backbone anchors** are normalized to non-negative indices **immediately** (constructor) because later code dereferences them directly.
* **Rotations** keep negatives **as-is** until the mover actually uses them; at that moment it knows the residue length and can resolve `-6 → 27`, etc.

That’s all “normalization” means here: turning negative indices into absolute (≥0) indices.


## Walkthrough on one residue (DG, length = 33)

`dna.rotating_elements["DG"] ==`

```
[
  [0, 3, None],
  [3, 4, None],
  [10, 12, -6],
  [-6, -1, None],
]
```

1. `[0, 3, None]`

   * Hinge bond: atoms `0—3` (axis along that bond).
   * Rotating side: everything reachable from atom `3` without crossing back into atom `0`.
   * `end=None` ⇒ rotate the **entire tail** on that side (to the residue end).

2. `[3, 4, None]`

   * Hinge: `3—4`
   * Rotating side: connected subgraph from `4` outward (can include base + sugar + phosphate further along).
   * `end=None` ⇒ again, rotate the full downstream side.

3. `[10, 12, -6]`  (with `L=33`, `-6 → 27`)

   * Hinge: `10—12`
   * Rotating side: from `12` outward, **but stop at atom 27** (inclusive).
     This defines a **local** torsion that doesn’t twist the very far tail.

4. `[-6, -1, None]`  (maps to hinge `27—32`)

   * Hinge: `27—32`
   * Rotating side: from `32` outward (essentially the terminal tip).
   * `end=None` ⇒ rotate that terminal side fully (a small end/tail twist).

We can imagine these four as progressively “closer to the end” backbone torsions—first broader (big chunk rotates), then more local.

In [3]:
dna.rotating_elements["DG"]

[[0, 3, None], [3, 4, None], [10, 12, -6], [-6, -1, None]]

In [8]:
# 4) Torsions without mental math on negatives
dna.torsions("DG")
# -> [(0,3,None), (3,4,None), (10,12,27), (27,32,None)]

[(0, 3, None), (3, 4, None), (10, 12, 27), (27, 32, None)]

### D) Backbone anchors

Example for “DG”:

```python
dna.backbone_elements['DG'] == [[0, 10, 12], [27, 32]]
```

We can read this as:

* “left side of the backbone” passes through atoms 0 → 10 → 12,
* “right side of the backbone” passes through 27 → 32.
  Downstream code can use these anchors to compute dihedrals, place bonds, or define rigid segments during moves.


In [11]:
dna.backbone_elements["DG"]

[[0, 10, 12], [27, 32]]


# 4) A quick “under the hood” recap against your printout

* The big dict you showed is just the object’s state:

  * `residue_names` — the ordered universe of residue templates.
  * `residue_length` — atom counts for index normalization and bounds checking.
  * `connect` — how to bond head/tail atoms and what bond length to target.
  * `alias` — mapping to terminal/middle forms for LEaP sequences.
  * `rotating_elements` — lists of rotatable subchains per residue (indices may be negative here).
  * `backbone_elements` — pre-normalized, always non-negative anchor indices per residue.
  * `init_string` — empty because no path was provided.

# RNA.py

In [10]:
pprint(vars(rna), width=120, compact=True, sort_dicts=False)

{'residue_names': ['GN', 'AN', 'UN', 'CN', 'G', 'A', 'U', 'C', 'G5', 'A5', 'U5', 'C5', 'G3', 'A3', 'U3', 'C3'],
 'residue_path': None,
 'init_string': '',
 'residue_length': {'GN': 33,
                    'AN': 32,
                    'UN': 29,
                    'CN': 30,
                    'G': 34,
                    'A': 33,
                    'U': 30,
                    'C': 31,
                    'G5': 32,
                    'A5': 31,
                    'U5': 28,
                    'C5': 29,
                    'G3': 35,
                    'A3': 34,
                    'U3': 31,
                    'C3': 32},
 'connect': {'GN': [[0, -1], [-2, 0], 1.6, 1.6],
             'AN': [[0, -1], [-2, 0], 1.6, 1.6],
             'UN': [[0, -1], [-2, 0], 1.6, 1.6],
             'CN': [[0, -1], [-2, 0], 1.6, 1.6],
             'G': [[0, -1], [-2, 0], 1.6, 1.6],
             'A': [[0, -1], [-2, 0], 1.6, 1.6],
             'U': [[0, -1], [-2, 0], 1.6, 1.6],
             'C': [[0, -1], 