<a href="https://colab.research.google.com/github/luquelab/pyCapsid/blob/Colab/notebooks/pyCapsid_colab_notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# pyCapsid Colab




## Description
This Colab notebook contains a pipeline to predict dominant dynamics and quasi-rigid mechanical units in large protein complexes. The outputs capture the molecular regions likely to be involved in structural transitions, including transitions triggering the disassembly of the protein complex.

The notebook builds on the Python package [pyCapsid](https://luquelab.github.io/pyCapsid/), which combines elastic network models, normal mode analysis, and clustering methods. The current version includes options to refine the resolution of the results for protein shells, like viral capsids. Check pyCapsid's [online documentation](https://luquelab.github.io/pyCapsid/) for further technical details.

## Quick-start guide
Follow the steps described below to obtain the dominant dynamics and quasi-rigid units of a protein complex. To help navigate the guide, we recommend displaying the Colab notebook's Table of contents of Colab notebook (open the `View` menu on the top bar and choose `Table of contents`):
1. Specify the structure to be analyzed in the [Input structure](#scrollTo=Input_structure) section. Run the code block to import the structure to pyCapsid Colab.
2. Modify the default pyCapsid parameters if necessary in the [pyCapsid parameters](#scrollTo=pyCapsid_parameters) section. Run the code block to import the parameters to pyCapsid Colab.
3. Exectute the rest of the notebook, for example, navigating the Colab menu `Runtime` and choosing the option `Run all`. This will install pyCapsid, run the pipeline, and generate and store the results.
  + The pipeline will automatically compressed the results in a zip file and download it. Your browser might prompt a request to allow the downloading process.
  + If you encounter any issues in the downloading process, check the section [Problems downloading results?](#scrollTo=Problems_downloading_results_)
  + The execution time and maximum size of the protein complex depend on the computing power and memory of the Colab cloud service used, which depends on the user's Colab plan. The section [Estimate time and memory](#scrollTo=Estimate_time_and_memory) contains a code block to estimate the execution time and memeory requirement for the input structure and parameters.
4. Check the downloaded report (`pyCapsid_report.*`) for a summary and interpretation of the main results. The report is available in three formats: Markdown, `*.md`, Microsoft Word (`*.docx`), and Web page's HyperText Markup Language (`*.html`). The multi-formatted report aims to facilitate users adapting pyCapsid's results to their publication needs. Check the section [pyCapsid report](#scrollTo=pyCapsid_report) for further details.   
   + Additional results are displayed throughout the different subsections in the [Run the pyCapsid pipeline](#scrollTo=Run_the_pyCapsid_pipeline)'s section.
5. Modify and re-run the section [Generate advanced analysis](#scrollTo=Generate_advanced_analysis) to obtain advanced analyses using results stored during the execution of the pyCapsid pipeline.


## Issues, support, and citation
pyCapsid is licensed under the permissive free software license, MIT License. It is recommended to run the Colab notebook using [Google Chrome Browser](https://www.google.com/chrome/).

+ If you encounter any problem using pyCapsid or required any additional functionalities, please, [open an issue on GitHub](https://github.com/luquelab/pyCapsid/issues).
+ If you use pyCapsid and would like to help support its development further, please, [add a star to its GitHub repository](https://github.com/luquelab/pyCapsid).
+ If you publish any work that included the use of pyCapsid, please, follow its [online citation guide](https://luquelab.github.io/pyCapsid/acknowledgements/).

# Input structure and parameters

## Input structure
pyCapsid requires the protein complex to be encoded in the [Protein Data Bank (PDB) format](https://pdb101.rcsb.org/learn/guide-to-understanding-pdb-data/introduction).

### Source and structure
Fill the variables in the block code below to specify the source and identifier for the structure. Follow these guides:
+ `pdb_source` determines if the structure will be fetched from the Protein Data Bank (`'PDB'`) or uploaded (`'upload'`).
+ `pdb_id` stores the PDBid to fetch the structure online.

If the option is `'upload'`, when executing the code cell below a prompt will appear to choose the file from the local directory. The maximum file size allowed in the standard Colab cloud-server is 2 GB.

In [None]:
# Specify the PDB source
pdb_source = 'PDB' #Values expected: 'PDB' or 'upload'

# Specify the PDBid (if the structure has to be fetched online)
pdb_id = '4oq8'

# Print option
if pdb_source == 'PDB':
  print('The structure with PDBid ' + pdb_id + ' will be fetched from the Protein Data Bank.')

elif pdb_source == 'upload':

  from google.colab import files
  uploaded = files.upload()
  pdb_file_name = list(uploaded.keys())[0] # Extract PDB file name
  print('The name of the PDB file is ' + pdb_file_name)

else:
  print('The value `'+ pdb_source +'` specifed in `pdb_source` is not valid. Choose from the expected options above.')

## pyCapsid parameters

Edit the variables in the block code below to specify the main options in the pyCapsid calculations. If you do not edit anything, the default values will be used. The list below describes the variables and their options:

+ `ENM_model` specifies the elastic network model used to coarse-grained the protein complex. There are four different models that can be specified:
  + `ANM`: Anisotropic network model with a default cutoff of 15Å and no distance weighting.
  + `GNM`: Gaussian network model (no three-dimensional directionality) with a default cutoff of 7.5Å and no distance weighting.
  + `U-ENM`: Unified elastic network Model with a default cutoff of 7.5Å and a default anisotropy parameter (f_anm) of 0.1. It is the **default** and **recommended** option.
  + `bbENM`: Backbone-enhanced Elastic network model with a default cutoff of 7.5Å and no distance weighting.

+ `n_modes` specifies the number of modes to be used to calculate the dynamics. Must be an integer. The default value is 200. However, while using 200 modes can yield good results, we recommend using more modes for larger structures. Increasing the number of modes often improves the accuracy but results in longer computational times. The `fit_modes` option described below can assist in selecting the optimal value for this parameter.

+ `fit_modes` is True or False. If true pyCapsid will select the number of low-frequency modes used in further calculations by finding the number of modes (less than n_modes) that results in the best correlation with experimental B-factors. If true pyCapsid will also provide a plot showing how correlation with B-factors changes with the number of modes used. If false, all modes calculated will be used.

+ `cluster_min` specifies the minimum number of clusters used in the clustering analysis to identify the optimal quasi-rigid mechanical units. The minimum value is 3.

+ `cluster_max` specifies the maximum number of clusters used in the clustering analysis to identify the optimal quasi-rigid mechanical units. The number of residues in the structure represents an upper value. The default value is 100. The recommended value should be at least the number of proteins in the structure. Ideally, the value should be the number of proteins times the number of expected protein domains defining the protein fold.

+ `cluster_delta` specifies the steps taken when exploring the range of clusters to determine the optimal quasi-rigid mechanical units. The default value is 2. It is recommended to refine the search in a sub-region once a potential optimal result has been identified.

In [None]:
# Specify values

## Elastic network model
ENM_model = 'U-ENM' # Values expected: 'ANM', 'GNM', 'U-ENM', and 'bbENM'.

## Number of modes used in the dynamics
n_modes = 100

## Whether to use the number of modes that maximize CC
fit_modes = True

## Cluster range and step in the optimal analysis of quasi-rigid units.
cluster_min = 4
cluster_max = 100
cluster_delta = 2

# Double-check options

## Elastic model
valid_ENM = ['ANM','GNM','U-ENM']
if ENM_model in valid_ENM:
  print('The ENM model used for coarse-graining is ' + ENM_model + ' .')

else:
  print('The value `'+ ENM_model +'` specifed in `ENM_model` is not valid. Choose from the expected options above.')

## Modes
### Cast to non-negative integers
n_modes = abs(int(n_modes))
if n_modes > 0:
  print('The number of modes in the dynamics will be ' + str(n_modes) + ' .')
else:
  print('WARNING: The values of `n_modes` should be both larger than zero.')

## Clusters
### Cast to non-negative integers
cluster_min = abs(int(cluster_min))
cluster_max = abs(int(cluster_max))
cluster_delta = abs(int(cluster_delta))
if ((cluster_min > 0) and (cluster_max > 0)):
  if cluster_min <= cluster_max:
    print('The lowest number of quasi-rigid units explored will be ' + str(cluster_min) + ' .')
    print('The largest number of quasi-rigid units explored will be ' + str(cluster_max) + ' .')
    print('The resolution of search for the optimal number of quasi-rigid units will be ' + str(cluster_delta) + ' .')

  elif cluster_min > cluster_max:
    print('WARNING: The value of `cluster_min` should be smaller or equal to `cluster_max`.')

else:
  print('WARNING: The values of `cluster_min` and `cluster_max` should be both larger than zero.')

## Estimate time and memory

Expect small protein shells (up to \~40,000 residues) to run in less than 10 minutes and medium shells (\~80,000 residues) in 2 or more hours using the free Colab cloud service. To investigate capsids exceeding 8 GB of RAM (120,000+ residues), consider upgrading your [Colab plan](https://colab.research.google.com/signup) or installing pyCapsid locally via [GitHub](https://github.com/luquelab/pyCapsid), [PIP](https://pypi.org/project/pyCapsid/), or [Conda](https://anaconda.org/luque_lab/pycapsid), as detailed in its [online installation guide](https://luquelab.github.io/pyCapsid/installation/).

# Installation

The following command installs pyCapsid and the necessary components for visualizing results in this notebook.

In [None]:
!pip install --upgrade pyCapsid ipywidgets==7.7.2 nglview
from google.colab import output
output.enable_custom_widget_manager()

# Run the pyCapsid pipeline


## Extract features from input structure
This code loads the information from the input structure (PDB format) necessary for the calculations and validation in pyCapsid.



In [None]:
from pyCapsid.PDB import getCapsid

if pdb_source == 'PDB':
  # Extract the features fetching the PDB structure from the Protein Data Bank.
  pdb = pdb_id
  capsid, calphas, asymmetric_unit, coords,  bfactors, chain_starts, title = getCapsid(pdb)
  print('The strucure ' + pdb + ' was fetched in the pyCapsid pipeline.')

elif pdb_source == 'upload':
  pdb = pdb_file_name
  capsid, calphas, asymmetric_unit,  coords,bfactors, chain_starts, title = getCapsid(pdb, local = True)
  print('The strucure in the file ' + pdb + ' was inputed in the pyCapsid pipeline.')
else:
  print('WARNING: The PDB structure is not available.')

## Build the elastic network model (ENM)
This step uses the function `buildENMPreset` in pyCapsid to specify the elastic network model and build the associated hessian matrix.

In [None]:
from pyCapsid.CG import buildENMPreset
kirch, hessian = buildENMPreset(coords, preset = ENM_model)

## Perform the Normal Mode Analysis (NMA)

This section obtains the dynamics of the protein complex based on the dominant normal modes activated by thermal energy.

### Calculate the low frequency modes
This code obtains the number of low frequency modes specified by the variable `n_modes` in the [pyCapsid parameters section](#scrollTo=b_8gyJk1wLlV). The calculation relies on the eigenvalues and eigenvectors of the hessian matrix obtained in the [Build the elastic network model section](#scrollTo=cf28904f).

In [None]:
from pyCapsid.NMA import modeCalc
evals, evecs = modeCalc(hessian, n_modes = n_modes)

### Predict, scale, and validate the b-factors
This code uses the resulting normal modes and frequencies to predict the b-factors of each alpha carbon, fits these results to experimental values from the pdb entry, and plots the results for comparison.

In [None]:
from pyCapsid.NMA import fitCompareBfactors
evals_scaled, evecs_scaled, cc, gamma, n_modes_best = fitCompareBfactors(evals, evecs, bfactors, pdb, fit_modes=fit_modes)

## Perform the analysis of quasi-rigid clusters (QRC)

In [None]:
from pyCapsid.NMA import calcDistFlucts
from pyCapsid.QRC import findQuasiRigidClusters

dist_flucts = calcDistFlucts(evals_scaled, evecs_scaled, coords)

cluster_start = cluster_min
cluster_stop = cluster_max
cluster_step = cluster_delta
labels, score, residue_scores  = findQuasiRigidClusters(pdb, dist_flucts, cluster_start=cluster_start, cluster_stop=cluster_stop, cluster_step=cluster_step)

## Visualize in jupyter notebook with nglview
You can visualize the results in the notebook with nglview. The following function returns an nglview object with the results colored based on cluster. See the nglview documentation for further info (http://nglviewer.org/nglview/release/v2.7.7/index.html)

In [None]:
# This cell will create an standard view of the capsid, which the next cell will
# modify to create the final result.
from pyCapsid.VIS import createCapsidView
view_clusters = createCapsidView(pdb, capsid)
view_clusters

In [None]:
# If the above view doesn't change coloration, run this cell again.
# In general do not run this cell until the above cell has finished rendering
from pyCapsid.VIS import createClusterRepresentation
createClusterRepresentation(pdb, labels, view_clusters)

# Add rep_type='spacefill' to represent the atoms of the capsid as spheres. This provides less information regarding the proteins but makes it easier to identify the geometry of the clusters
#createClusterRepresentation(pdb, labels, view_clusters, rep_type='spacefill')

In [None]:
# Once you've done this use this code to download the results
view_clusters.center()
view_clusters.download_image(f'{pdb}_highest_quality_clusters.png',factor=2)

Running the same code but replacing labels with residue_scores and adding rwb_scale=True visualizes the quality score of each residue. This is a measure of how rigid each residue is with respect to its cluster. Blue residues make up the cores of rigid clusters, and red residues represent borders between clusters.

In [None]:
# This code adds a colorbar based on the residue scores
print('Each atom in this structure is colored according to the clustering quality score of its residue.')
import matplotlib.colorbar as colorbar
import matplotlib.pyplot as plt
from pyCapsid.VIS import clusters_colormap_hexcolor
import numpy as np
hexcolor, cmap = clusters_colormap_hexcolor(residue_scores, rwb_scale=True)
fig, ax = plt.subplots(figsize=(10, 0.5))
cb = colorbar.ColorbarBase(ax, orientation='horizontal',
                            cmap=cmap, norm=plt.Normalize(np.min(residue_scores), np.max(residue_scores)))
plt.show()

# This cell will create an empty view, which the next cell will
# modify to create the final result.
from pyCapsid.VIS import createCapsidView
view_scores = createCapsidView(pdb, capsid)
view_scores

In [None]:
from pyCapsid.VIS import createClusterRepresentation
createClusterRepresentation(pdb, residue_scores, view_scores, rwb_scale=True)

In [None]:
from google.colab import output
output.enable_custom_widget_manager()

In [None]:
# Once you've done this use this code to download the results
view_scores.center()
view_scores.download_image(f'{pdb}_residue_cluster_scores',factor=2)

### Snapshot of capsid and asymmetric unit

In [None]:
from pyCapsid.VIS import createCapsidView
view_capsid = createCapsidView(pdb, capsid)
view_capsid

In [None]:
# Once you've done this use this code to download the results
view_capsid.center()
view_capsid.download_image(f'{pdb}_full_capsid',factor=2)

In [None]:
from pyCapsid.VIS import createCapsidView
import biotite.structure.io as strucio
strucio.save_structure(pdb + '_asym.pdb', asymmetric_unit, hybrid36=True)

import nglview as ngl
view_asym = ngl.NGLWidget()
from nglview.adaptor import FileStructure

view_asym.add_component(FileStructure(pdb + '_asym.pdb'))
view_asym

In [None]:
# Once you've done this use this code to download the results
view_asym.center()
view_asym.download_image(f'{pdb}_asymmetric_unit',factor=2)

# pyCapsid report

The block code below generates a report with the key results in the analysis. The information is outputted in three formats:
+ Markdown file included in the downloaded outputs.
+ Pdf file included in the downloaded outputs.
+ HTML file included in the downloaded outputs.



In [None]:
# Prepare file names
file_name = 'pyCapsid_report'
file_md = file_name+'.md' # Markdown
file_html = file_name+'.html' # HTML
file_docx = file_name+'.docx'

In [None]:
# Include libraries to generate plots
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

In [None]:
# Generate separate folder for the report
import os

parent_dir = os.getcwd()
report_dir_name = 'pyCapsid_report'
report_dir = parent_dir + '/' + report_dir_name

os.mkdir(report_dir)
os.chdir(report_dir)
os.getcwd()


In [None]:
# Generate figures folder
figures_dir_name = 'figures'
figures_dir = report_dir + '/' + figures_dir_name
os.mkdir(figures_dir)
os.chdir(figures_dir)
os.getcwd()

In [None]:
# Generate b-factors dataframe and plot

## Upload and store file
os.chdir(parent_dir)
file_bfactors = 'b_factors.npz'
b_factors = np.load(file_bfactors)
os.chdir(figures_dir)
os.getcwd()

## Generate data frame
## (!! revise method for residues in asymmetric unit)
npz = b_factors
df_bfactors = pd.DataFrame(data = [b_factors['residue_numbers'],b_factors['bfactors_experimental'],b_factors['bfactors_predicted']]).T
df_bfactors.columns = ['Residue','Experimental','Predicted']
n_res_asym = int(len(df_bfactors)/60) #! residues in asymmetric unit (refactor)
df_bfactors_asym = df_bfactors[0:n_res_asym]
len(df_bfactors_asym)

## Generate and save plot
sns.set_style('darkgrid')
plt.ioff() # Comment out this command to display figure in the notebook
g = sns.lineplot(x = 'Residue', y = 'Experimental', data = df_bfactors_asym, label = 'Experimental')
g = sns.lineplot(x = 'Residue', y = 'Predicted', data = df_bfactors_asym, label = 'Predicted')
g.set(xlabel = 'Residue number', ylabel = r"B-factor ($\AA^2$)")
g.set(title='Protein complex\'s vibration profile (asymmetric unit)')
g.legend(frameon=False)
plt.savefig("b_factors_report.png",dpi=300)
plt.savefig("b_factors_report.pdf")
plt.savefig("b_factors_report.eps",bbox_inches='tight')
plt.savefig("b_factors_report.svg")

## Generate caption
text = 'Figure NMA.1. Empirical (blue) and predicted (orange) B-factors for each residue in the asymmetric unit.'
caption_bfactors = text

In [None]:
# Generate markdown output file
os.chdir(report_dir)
os.getcwd()
file = open(file_md,'w')
figures_subdir = './' + figures_dir.split('/')[-1]
print(figures_subdir)

# Load data needed
from datetime import date
today = date.today()

# HEADING
text = '# pyCapsid Report'
file.write('\n'+text)

text = today.strftime("%B %d, %Y")
file.write('\n'+text)

# INPUT STRUCTURE
file.write('\n')
text = '## Input structure'
file.write('\n'+text)

text = 'Identifier: ' + str(pdb)
file.write('\n'+text)

file.write('\n')
import biotite.structure as struc
n_asym_residues = struc.get_residue_count(asymmetric_unit)
text = 'Number of residues in the asymmetric unit: ' + str(n_asym_residues)
file.write('\n'+text)

file.write('\n')
n_asym_chains = struc.get_chain_count(asymmetric_unit)
text = 'Number of protein chains in the asymmetric unit: ' + str(n_asym_chains)
file.write('\n'+text)

file.write('\n')
n_full_residues = len(calphas)
text = 'Number of residues in the full structure: ' + str(n_full_residues)
file.write('\n'+text)

file.write('\n')
asym_factor = int(n_full_residues/n_asym_residues)
text = 'Multiplying factor to interpret full structure: ' + str(asym_factor)
file.write('\n'+text)

file.write('\n')
text = '+ If the multiplying factor is 1, the asymmetric units and full structure are the same.'
file.write('\n'+text)

file.write('\n')
text = '!!! Add snapshots of the asymmetric unit and full structure. Include captions for standard publication style.'
file.write('\n'+text)


# Elastic network model
file.write('\n')
text = '## Elastic network model'
file.write('\n'+text)

file.write('\n')
text = 'Elastic model used: ' + str(ENM_model)
file.write('\n'+text)
file.write('\n')
text = '+ This model implies {Include here the dictionary value of the model}'
file.write('\n'+text)
file.write('\n')
text = '!!! Add snapshot of the model approximations. Include captions following standard publication style.'
file.write('\n'+text)


file.write('\n')
text = 'Calibrated stiffness constant: ' + str(gamma)
file.write('\n'+text)
file.write('\n')
text = '+ This constant was fitted to scale the model to the structure, assuming a linear relationship between the residues fluctuations and B-factors.'
file.write('\n'+text)

file.write('\n')
text = '!!! Add snapshots of the elastic model for the asymmetric unit and full structure. Include captions following standard publication style.'
file.write('\n'+text)

# Normal modes and B-factors
CC = float(b_factors['CC'])

file.write('\n')
text = '## Normal mode analysis (NMA)'
file.write('\n'+text)

file.write('\n')
text = 'Optimal number of modes reproducing B-factors: '+ str(n_modes_best)
file.write('\n'+text)

file.write('\n')
text = '!!! Add figure plotting CC versus number of modes. Include captions following standard publication style.'
file.write('\n'+text)


file.write('\n')
text = 'Correlation between empirical and predicted B-factors: ' + str(round(CC,2))
file.write('\n'+text)

file.write('\n')
text = '!!! Add figure plotting B-factors (experimental and predicted) versus number of residues in the assymmetric unit. Include caption following standard publication style.'
file.write('\n'+text)

## B-factors figure
file.write('\n')
text = '![]('+ figures_subdir +'/b_factors_report.svg)'
file.write('\n'+text)

file.write('\n')
text = caption_bfactors
file.write('\n'+text)


file.write('\n')
text = '!!! Add some info (snapshots of min and max amplitude? movie?) regarding the dynamics.'
file.write('\n'+text)


# Quasi-rigid units
## Upload and store file
os.chdir(parent_dir)
file_cluster_results = str(pdb) + '_final_results.npz'
cluster_results = np.load(file_cluster_results)
nc = int(cluster_results['nc'])
os.chdir(report_dir)

file.write('\n')
text = '## Quasi-rigid mechanical units'
file.write('\n'+text)

file.write('\n')
text = 'Number of optimal quasi-rigid mechanical units identified: ' + str(nc)
file.write('\n'+text)

file.write('\n')
text = '!!! Add figure plotting the quality score and number of unique clusters as a function of the number of selected clusters. Include captions following standard publication style.'
file.write('\n'+text)

file.write('\n')
text = '!!! Add figure displaying a snapshot of the capsid coloring the different clusters. Include caption following standard publication style.'
file.write('\n'+text)

file.write('\n')
text = '!!! Add figure displaying a representative of each unique cluster. Include caption following standard publication style.'
file.write('\n'+text)

file.write('\n')
text = '!!! Add figure displaying the quality score of each residue in the full structure. Include caption following standard publication style.'
file.write('\n'+text)

file.write('\n')
text = '!!! Add figure displaying the quality score of each residue in the representative unique clusters. Include caption following standard publication style.'
file.write('\n'+text)

# Close ouptut file
file.close()


In [None]:
# Generate HTML version
!python -m markdown $file_md -f $file_html

In [None]:
# Generate docx version
!pip install pypandoc

import pypandoc

pypandoc.convert_file(file_md, 'docx', outputfile=file_docx)
print(file_docx)

In [None]:
# Return to parent directory
os.chdir(parent_dir)

In [None]:
from google.colab import output
output.disable_custom_widget_manager()

# Download results
The files containing results are compressed (ZIP format) and downloaded to the default Download folder associated with the browser where the current Colab notebook.

In [None]:
# Generate readme file for pyCapsid outputs
file = open('README_OUTPUTS.md','w')

## Intro
text = '# pyCapsid outputs\n'
file.write(text)

text = 'This file describes the content of the outputted files generated by pyCapsid.\n\n'
file.write(text)

text = 'The term PDBid should be replaced by the id from the structure imported in pyCapsid. For example, if the structure''s imported id is 4oq8, then, the file listed as `PDBid.pdb` corresponds in the list of output as `4oq8.pdb`. The same applies for the other files.\n\n'
file.write(text)

## Ouptu files
text ='**Output files**\n'
file.write(text)

### PDBid.pdb
text = '+ `PDBid.pdb`: File containing input protein complex in PDB format.\n'
file.write(text)

### PDBid_capsid.pdb
text = '\n+ `PDBid_capsid.pdb`: File containing the atom coordinates of the complete protein complex. It does not include meta information in the preamble. If the protein complex in `PDBid.pdb` encodes symmetries and an asymmetric unit, the atoms will be the generated structure after applying the symmetries, and the size of the file will increase accordingly. Otherwise, the number of atoms will coincide with those in the input `PDBid.pdb` file.\n'
file.write(text)

### b_factors.svg
text = '\n+ `b_factors.svg`: Vectorial file plotting the B-factors with units of Angstrom squared (Å^2), on the y-axis as a function of the residue number in the x-axis. It graphs the empirical B-factors obtained from the input `PDBid.pdb` structure (blue) and the B-factors generated using the (selected or optimal) number of modes (orange). The title of the figure includes the fitted elastic constant, gamma (mean value +/- standard deviation), used to scale the predicted results and the correlation-coefficient (CC) comparing the empirical and computational results.\n'
file.write(text)

### results_plot.svg
text = '\n+ `results_plot.svg`: Vectorial file plotting the Quality score (top panel) and Number of unique clusters (bottom panel) as a function of the Number of clusters sampled (x-axis). The black line selects the number of clusters that maximizes the quality score using a limited number of unique clusters.\n'
file.write(text)

### modes.npz
text = '+ `modes.npz`: Compressed numpy file containing two arrays with information about the normal modes:\n'
file.write(text)

#### Array 'eigen_vals'
text = '    + `eigen_vals`: Numpy array containing the eigenvalues (units??? stiffness/mass??? N/m*1/kg = 1/s^2 = Hz^2 ???) obtained in the normal mode analysis. It contains maximum number of modes requested in the input. (Why the eigenvalues are ordered from small to large??)\n'
file.write(text)

#### Array 'eigen_vecs'
text = '    + `eigen_vecs`: Numpy array containing the eigenvectors obtained in the normal mode analysis (units??? Å???). It contains n amplitudes of vibration for each mode, where n is the number of residues in the protein complex.\n'
file.write(text)

### PDBid_final_results.npz
text = '\n+ `PDBid_final_results.npz`: Compressed numpy file containing arrays with information about clustering for the residues in the asymmetric unit of the protein complex:\n'
file.write(text)

#### Array: 'labels'
text = '    + `labels`: Numpy array containing the cluster assigned to each label. The cluster number starts at zero. If there are 10 clusters, the label for the first cluster is 0 and the last cluster is 9.\n'
file.write(text)

#### Array: 'score'
text = '    + `score`: Quality score of the clustering.\n'
file.write(text)

#### Array: 'nc'
text = '    + `nc`: Number of clusters.\n'
file.write(text)

#### Array: 'cluster_method'
text = '    + `cluster_method`: Cluster method.\n'
file.write(text)

#### Array: 'final_full_score'
text = '    + `final_full_score`: One-dimensional array containing the quality clustering score of each residue.\n'
file.write(text)

### PDBid_final_results.npz
text = '\n+ `PDBid_final_results_full.npz`: This file is equivalent to the file `PDBid_final_results.npz` described above, but it extends the results to all cluster numbers investigated.'
file.write(text)


# Close file
file.close()

In [None]:
# Remove the Colab generated sample_data directory
sample = !ls | grep 'sample_data'
if len(sample) != 0:
  name = sample.get_list()[0]
  !rm -r $name

# Zip and download the files with results
zip_file = str(pdb) + '_pyCapsid_results.zip'
! zip "$zip_file" *.*

zip_file_report = str(pdb) + '_pyCapsid_report.zip'
! zip -r $zip_file_report $report_dir

from google.colab import files
files.download(zip_file)
files.download(zip_file_report)

## Problems downloading results?
If you allow your browser to download the results but do not find the output results in your folder, check first the downloading progress in the [downloading code cell display](#scrollTo=gKOV73W6iYUa&line=1&uniqifier=1). The size of the files can be substantial depending on the protein complex, so the downloading process can take a while.

If you are having problems downloading the results, try downloading them directly from the `Files` panel available on the Colab side bar.

If the problems persist, generate an [issue on pyCapsid's repository](https://github.com/luquelab/pyCapsid/issues). Use the subject `Colab: Downloading issue`.


In [None]:
import numpy as np
filename = pdb + '_coords.txt'
np.savetxt(filename, coords)
from google.colab import files
files.download(filename)

# Generate advanced analysis
Adapt this section to offer users the opportunity to go dipper in the initial results to refine their interpretation or explore alternatie outputs.


The numerical results are saved as compressed .npz files by default and can be opened and used to visualize the results afterwards. This includes the ability to visualize clusters that weren't the highest scoring cluster. In this example
we visualize the results of clustering the capsid into 20 clusters.

In [None]:
from pyCapsid.VIS import visualizeSavedResults
results_file = f'{pdb}_final_results_full.npz' # Path of the saved results
labels_20, view_clusters = visualizeSavedResults(pdb, results_file, n_cluster=20, method='nglview')
view_clusters

In [None]:
# If the above view doesn't change coloration, run this cell again.
# In general do not run this cell until the above cell has finished rendering
from pyCapsid.VIS import createClusterRepresentation
createClusterRepresentation(pdb, labels_20, view_clusters)

# Add rep_type='spacefill' to represent the atoms of the capsid as spheres. This provides less information regarding the proteins but makes it easier to identify the geometry of the clusters
#createClusterRepresentation(pdb, labels, view_clusters, rep_type='spacefill')