# 1. Clean Structures and Create one CIFF file per Biological Assembly

## What is the goal of this notebook? 

This notebook achieves 2 goals:

1. The first step 'cleans' the CIF files we downloaded in step 0. This step will remove some of the **data names** 
and **data blocks** included in the raw CIF files. A new directory 'clean_bank' is created in this step.


2. The second step will create a new CIF files for each [biological assembly](https://pdb101.rcsb.org/learn/guide-to-understanding-pdb-data/biological-assemblies#Anchor-BioUnit) present in any structure. Each new structure will be saved with the suffix +0x.cif (where x is the number of the biological assembly). This step also standardizes the **data names** and **data blocks**. In particular how the coordinate portion of the file is printed.


## Import library and create working directory

In [1]:
from PDBClean import pdbclean_io, pdbutils, cleanutils

In [2]:
# Path to project directory
PROJDIR="./TIM/"

In [3]:
# Create directory where we will stored the 'clean' cif files.
pdbclean_io.check_project(projdir=PROJDIR, level='clean_bank')

## Clean CIF files (standardize data blocks)

In [4]:
cleanutils.process(projdir=PROJDIR, step='clean', source='raw_bank', target='clean_bank')

[1/237]: 7rpn.cif
[2/237]: 2v2d.cif
[3/237]: 4poc.cif
[4/237]: 5zfx.cif
[5/237]: 1o5x.cif
[6/237]: 4gnj.cif
[7/237]: 1ydv.cif
[8/237]: 4zvj.cif
[9/237]: 4ff7.cif
[10/237]: 1klu.cif
[11/237]: 3qsr.cif
[12/237]: 4o54.cif
[13/237]: 2x1s.cif
[14/237]: 3py2.cif
[15/237]: 2vfh.cif
[16/237]: 1hg3.cif
[17/237]: 4obt.cif
[18/237]: 6up5.cif
[19/237]: 4hhp.cif
[20/237]: 4o57.cif
[21/237]: 1nf0.cif
[22/237]: 4iot.cif
[23/237]: 5tim.cif
[24/237]: 1ml1.cif
[25/237]: 2vfi.cif
[26/237]: 2x1r.cif
[27/237]: 3gvg.cif
[28/237]: 1m7p.cif
[29/237]: 1aw2.cif
[30/237]: 4zz9.cif
[31/237]: 4o4v.cif
[32/237]: 4o53.cif
[33/237]: 1ney.cif
[34/237]: 6upf.cif
[35/237]: 4mva.cif
[36/237]: 2y63.cif
[37/237]: 5i3k.cif
[38/237]: 4jeq.cif
[39/237]: 4owg.cif
[40/237]: 3qst.cif
[41/237]: 5i3j.cif
[42/237]: 2y62.cif
[43/237]: 7tim.cif
[44/237]: 4o52.cif
[45/237]: 6up1.cif
[46/237]: 4o4w.cif
[47/237]: 4pod.cif
[48/237]: 1ssd.cif
[49/237]: 4br1.cif
[50/237]: 2v2c.cif
[51/237]: 2x16.cif
[52/237]: 2x1u.cif
[53/237]: 1aw1.cif
[5

## Simplify and Split into Biological Assemblies 

In [5]:
# Create directory to store new structures
pdbclean_io.check_project(projdir=PROJDIR, level='simple_bank')

In [6]:
cleanutils.process(projdir=PROJDIR, step='simplify', source='clean_bank', target='simple_bank')

[1/237]: 7rpn.cif
[2/237]: 2v2d.cif
[3/237]: 4poc.cif
[4/237]: 5zfx.cif
[5/237]: 1o5x.cif
[6/237]: 4gnj.cif
[7/237]: 1ydv.cif
[8/237]: 4zvj.cif
[9/237]: 4ff7.cif
[10/237]: 1klu.cif
[11/237]: 3qsr.cif
[12/237]: 4o54.cif
[13/237]: 2x1s.cif
[14/237]: 3py2.cif
[15/237]: 2vfh.cif
[16/237]: 1hg3.cif
[17/237]: 4obt.cif
[18/237]: 6up5.cif
[19/237]: 4hhp.cif
[20/237]: 4o57.cif
[21/237]: 1nf0.cif
[22/237]: 4iot.cif
[23/237]: 5tim.cif
[24/237]: 1ml1.cif
[25/237]: 2vfi.cif
[26/237]: 2x1r.cif
[27/237]: 3gvg.cif
[28/237]: 1m7p.cif
[29/237]: 1aw2.cif
[30/237]: 4zz9.cif
[31/237]: 4o4v.cif
[32/237]: 4o53.cif
[33/237]: 1ney.cif
[34/237]: 6upf.cif
[35/237]: 4mva.cif
[36/237]: 2y63.cif
[37/237]: 5i3k.cif
[38/237]: 4jeq.cif
[39/237]: 4owg.cif
[40/237]: 3qst.cif
[41/237]: 5i3j.cif
[42/237]: 2y62.cif
[43/237]: 7tim.cif
[44/237]: 4o52.cif
[45/237]: 6up1.cif
[46/237]: 4o4w.cif
[47/237]: 4pod.cif
[48/237]: 1ssd.cif
[49/237]: 4br1.cif
[50/237]: 2v2c.cif
[51/237]: 2x16.cif
[52/237]: 2x1u.cif
[53/237]: 1aw1.cif
[5