# 1. Clean Structures and Create one CIFF file per Biological Assembly

## What is the goal of this notebook? 

This notebook achieves 2 goals:

1. The first step 'cleans' the CIF files we downloaded in step 0. This step will remove some of the **data names** 
and **data blocks** included in the raw CIF files. A new directory 'clean_bank' is created in this step.


2. The second step will create a new CIF files for each [biological assembly](https://pdb101.rcsb.org/learn/guide-to-understanding-pdb-data/biological-assemblies#Anchor-BioUnit) present in any structure. Each new structure will be saved with the suffix +0x.cif (where x is the number of the biological assembly). This step also standardizes the **data names** and **data blocks**. In particular how the coordinate portion of the file is printed.


## Import library and create working directory

In [1]:
from PDBClean import pdbclean_io, pdbutils, cleanutils

In [2]:
# Path to project directory
PROJDIR="./TIM/"

In [3]:
# Create directory where we will stored the 'clean' cif files.
pdbclean_io.check_project(projdir=PROJDIR, level='clean_bank')

## Clean CIF files (standardize data blocks)

In [4]:
cleanutils.process(projdir=PROJDIR, step='clean', source='raw_bank', target='clean_bank')

[1/244]: 7rpn.cif
[2/244]: 2v2d.cif
[3/244]: 4poc.cif
[4/244]: 5zfx.cif
[5/244]: 1o5x.cif
[6/244]: 4gnj.cif
[7/244]: 1ydv.cif
[8/244]: 4zvj.cif
[9/244]: 4ff7.cif
[10/244]: 7qon.cif
[11/244]: 1klu.cif
[12/244]: 3qsr.cif
[13/244]: 4o54.cif
[14/244]: 2x1s.cif
[15/244]: 3py2.cif
[16/244]: 2vfh.cif
[17/244]: 1hg3.cif
[18/244]: 4obt.cif
[19/244]: 6up5.cif
[20/244]: 7sx1.cif
[21/244]: 4hhp.cif
[22/244]: 4o57.cif
[23/244]: 1nf0.cif
[24/244]: 4iot.cif
[25/244]: 5tim.cif
[26/244]: 1ml1.cif
[27/244]: 2vfi.cif
[28/244]: 2x1r.cif
[29/244]: 3gvg.cif
[30/244]: 1m7p.cif
[31/244]: 1aw2.cif
[32/244]: 7pek.cif
[33/244]: 4zz9.cif
[34/244]: 4o4v.cif
[35/244]: 4o53.cif
[36/244]: 1ney.cif
[37/244]: 6upf.cif
[38/244]: 4mva.cif
[39/244]: 2y63.cif
[40/244]: 5i3k.cif
[41/244]: 4jeq.cif
[42/244]: 4owg.cif
[43/244]: 3qst.cif
[44/244]: 5i3j.cif
[45/244]: 2y62.cif
[46/244]: 7tim.cif
[47/244]: 4o52.cif
[48/244]: 6up1.cif
[49/244]: 4o4w.cif
[50/244]: 4pod.cif
[51/244]: 1ssd.cif
[52/244]: 4br1.cif
[53/244]: 7pej.cif
[5

## Simplify and Split into Biological Assemblies 

In [5]:
# Create directory to store new structures
pdbclean_io.check_project(projdir=PROJDIR, level='simple_bank')

In [6]:
cleanutils.process(projdir=PROJDIR, step='simplify', source='clean_bank', target='simple_bank')

[1/244]: 7rpn.cif
[2/244]: 2v2d.cif
[3/244]: 4poc.cif
[4/244]: 5zfx.cif
[5/244]: 1o5x.cif
[6/244]: 4gnj.cif
[7/244]: 1ydv.cif
[8/244]: 4zvj.cif
[9/244]: 4ff7.cif
[10/244]: 7qon.cif
[11/244]: 1klu.cif
[12/244]: 3qsr.cif
[13/244]: 4o54.cif
[14/244]: 2x1s.cif
[15/244]: 3py2.cif
[16/244]: 2vfh.cif
[17/244]: 1hg3.cif
[18/244]: 4obt.cif
[19/244]: 6up5.cif
[20/244]: 7sx1.cif
[21/244]: 4hhp.cif
[22/244]: 4o57.cif
[23/244]: 1nf0.cif
[24/244]: 4iot.cif
[25/244]: 5tim.cif
[26/244]: 1ml1.cif
[27/244]: 2vfi.cif
[28/244]: 2x1r.cif
[29/244]: 3gvg.cif
[30/244]: 1m7p.cif
[31/244]: 1aw2.cif
[32/244]: 7pek.cif
[33/244]: 4zz9.cif
[34/244]: 4o4v.cif
[35/244]: 4o53.cif
[36/244]: 1ney.cif
[37/244]: 6upf.cif
[38/244]: 4mva.cif
[39/244]: 2y63.cif
[40/244]: 5i3k.cif
[41/244]: 4jeq.cif
[42/244]: 4owg.cif
[43/244]: 3qst.cif
[44/244]: 5i3j.cif
[45/244]: 2y62.cif
[46/244]: 7tim.cif
[47/244]: 4o52.cif
[48/244]: 6up1.cif
[49/244]: 4o4w.cif
[50/244]: 4pod.cif
[51/244]: 1ssd.cif
[52/244]: 4br1.cif
[53/244]: 7pej.cif
[5