# Lab 2: demo 📓

The purpose of this lab is to add the ability to decouple parsing/writing from creating the `RNA_Molecule` object. This is an encapsulatd step that will not change the way the user would interact with this utility, but rather make the interface more flexible and less susceptible to errors coming from changes in the future.

In [1]:
# -- necessary setup

import os,sys
sys.path.append(os.path.abspath('../src'))

from IO.RNA_IO import RNA_IO
from utils import pathify_pdb

## Reading from PDB File

In [2]:
rna_io=RNA_IO()
pdb_path=pathify_pdb('7EAF')
mol=rna_io.read(pdb_path, "PDB",array=False)
len(mol.get_models()[0].get_chains()['A'].get_residues())

Downloading PDB structure '7eaf'...
>> initializing new species: CALDANAEROBACTER SUBTERRANEUS SUBSP. TENGCONGENSIS <<


94

In [4]:
#here array is defaulted to true

rna_io=RNA_IO()
pdb_path=pathify_pdb('7EAF')
mol=rna_io.read(pdb_path, "PDB",array=True)
mol.shape
mol

Downloading PDB structure '7eaf'...
> note: Species with this name already exists, will return the same instance


array([[[ 0.0000e+00,  0.0000e+00,  0.0000e+00],
        [-1.0238e+01,  5.7800e+00, -3.8326e+01],
        [-1.2861e+01,  2.8200e+00, -4.0280e+01],
        [-1.4758e+01, -1.4300e-01, -4.0792e+01],
        [-1.5175e+01, -4.5880e+00, -4.1549e+01],
        [-1.5237e+01, -8.6290e+00, -4.0425e+01],
        [-1.2005e+01, -1.2462e+01, -3.6764e+01],
        [-1.3476e+01, -1.4405e+01, -3.4007e+01],
        [-1.4716e+01, -1.4456e+01, -2.9704e+01],
        [-1.1051e+01, -2.0912e+01, -2.5748e+01],
        [-6.7140e+00, -2.1716e+01, -2.4340e+01],
        [-3.9300e+00, -1.9415e+01, -2.2856e+01],
        [-7.0700e-01, -1.7054e+01, -2.0657e+01],
        [ 7.5130e+00, -1.4439e+01, -1.5280e+01],
        [ 1.2791e+01, -1.9966e+01, -7.1150e+00],
        [ 1.1765e+01, -1.1560e+01, -1.3740e+01],
        [ 1.2316e+01, -6.9690e+00, -1.3006e+01],
        [ 1.0237e+01, -2.9770e+00, -1.2122e+01],
        [ 9.5470e+00, -8.9500e-01, -9.2520e+00],
        [ 6.4660e+00,  4.3400e-01, -6.3550e+00],
        [ 2.7000e-02

##  Writing the Output to a PDB-like format File

In [5]:
rna_io.write(mol, "demo.pdb",'PDB')

# with open("demo.pdb", "r") as f:
#     print(f.read())

 -- writing from array -- 
RNA molecule written to demo.pdb
HEADER    RNA                                                 ????
ATOM      1 X1   UNK X  -1     -10.238   5.780 -38.326                      X1  
ATOM      2 X2   UNK X  -1     -12.861   2.820 -40.280                      X2  
ATOM      3 X3   UNK X  -1     -14.758  -0.143 -40.792                      X3  
ATOM      4 X4   UNK X  -1     -15.175  -4.588 -41.549                      X4  
ATOM      5 X5   UNK X  -1     -15.237  -8.629 -40.425                      X5  
ATOM      6 X6   UNK X  -1     -12.005 -12.462 -36.764                      X6  
ATOM      7 X7   UNK X  -1     -13.476 -14.405 -34.007                      X7  
ATOM      8 X8   UNK X  -1     -14.716 -14.456 -29.704                      X8  
ATOM      9 X9   UNK X  -1     -11.051 -20.912 -25.748                      X9  
ATOM     10 X10  UNK X  -1      -6.714 -21.716 -24.340                      X10  
ATOM     11 X11  UNK X  -1      -3.930 -19.415 -22.856        

To check the file content, check the file [`demo.pdb`](./demo.pdb) in the same directory as this notebook (demo/)

##  Reading from the Written File

In [16]:
mol1=rna_io.read("demo.pdb", "PDB")

> note: Species with this name already exists, will return the same instance
RNA_Molecule 7EAF already exists in the species CALDANAEROBACTER SUBTERRANEUS SUBSP. TENGCONGENSIS; not added again


##  Writing again the Content to another File 
To verify the integrity of the reading and writing functions, the content will be written to a second file. By comparing the two files, we can ensure that the content is identical, confirming that the functions are correctly implemented and that the PDB format is followed accurately.

In [20]:
rna_io.write(mol1, "demo1.pdb",'PDB')

# with open("demo1.pdb", "r") as f:
#     print(f.read())

RNA molecule written to demo1.pdb


Instead of printing, checkout the file content in [`demo1.pdb`](./demo1.pdb) file found in the current directory.

## Coarse-Grained 3D Representation



In [18]:
cg_mol=rna_io.read(pdb_path, "PDB", coarse_grained=True, atom_name="C1'") #atom_name is optional, default is "C1'"
rna_io.write(cg_mol, "demo_cg.pdb",'PDB')

> note: Species with this name already exists, will return the same instance
RNA_Molecule 7EAF already exists in the species CALDANAEROBACTER SUBTERRANEUS SUBSP. TENGCONGENSIS; not added again
RNA molecule written to demo_cg.pdb


## Reading other PDB Entries

In [19]:
from utils import get_pdb_ids_from_fam
from Families.family import Family

pdb_ids=get_pdb_ids_from_fam('SAM')
molecules=[]
fam=Family(id='RF01510',name='2dG-I ')

for pdb_id in pdb_ids:
    rna_io=RNA_IO()
    rna=rna_io.read(pathify_pdb(pdb_id),format='PDB')
    molecules.append(rna)


Family with this id already exists, will link it to the existing family
Downloading PDB structure '5fk6'...
> note: Species with this name already exists, will return the same instance
RNA_Molecule 5FK6 already exists in the species THERMOANAEROBACTER TENGCONGENSIS; not added again
Downloading PDB structure '3gx2'...
Downloading PDB structure '5fk3'...
> note: Species with this name already exists, will return the same instance
RNA_Molecule 5FK3 already exists in the species THERMOANAEROBACTER TENGCONGENSIS; not added again
Downloading PDB structure '5fkf'...
> note: Species with this name already exists, will return the same instance
RNA_Molecule 5FKF already exists in the species THERMOANAEROBACTER TENGCONGENSIS; not added again
Downloading PDB structure '4kqy'...
> note: Species with this name already exists, will return the same instance
RNA_Molecule 4KQY already exists in the species BACILLUS SUBTILIS; not added again
Downloading PDB structure '3gx6'...
Downloading PDB structure '

In [12]:
print(f'number of molecules in 2dG-I family: {len(molecules)}')
for x in molecules:
    print(f'type of item {x} in molecules: {type(x)}')

number of molecules in 2dG-I family: 29
type of item ID: 5FK6 Experiment: X-RAY DIFFRACTION THERMOANAEROBACTER TENGCONGENSIS species in molecules: <class 'Structure.RNA_Molecule.RNA_Molecule'>
type of item ID: 3GX2 Experiment: X-RAY DIFFRACTION None in molecules: <class 'Structure.RNA_Molecule.RNA_Molecule'>
type of item ID: 5FK3 Experiment: X-RAY DIFFRACTION THERMOANAEROBACTER TENGCONGENSIS species in molecules: <class 'Structure.RNA_Molecule.RNA_Molecule'>
type of item ID: 5FKF Experiment: X-RAY DIFFRACTION THERMOANAEROBACTER TENGCONGENSIS species in molecules: <class 'Structure.RNA_Molecule.RNA_Molecule'>
type of item ID: 4KQY Experiment: X-RAY DIFFRACTION BACILLUS SUBTILIS species in molecules: <class 'Structure.RNA_Molecule.RNA_Molecule'>
type of item ID: 3GX6 Experiment: X-RAY DIFFRACTION None in molecules: <class 'Structure.RNA_Molecule.RNA_Molecule'>
type of item ID: 5FK4 Experiment: X-RAY DIFFRACTION THERMOANAEROBACTER TENGCONGENSIS species in molecules: <class 'Structure.RNA_