[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/jhmlam/Inching/blob/main/GoogleColab/GoogleColab_Inching_v023_ReleaseOkay.ipynb)

# Running Inching on Colaboratory

In this notebook, we will illustrate how to use Inching to analyse vibration of biological structures on the Google Colaboratory. In general, a Google Colaboratory free user will have access to the following computing resource

* CPU. 2-core Intel(R) Xeon(R) @ 2.20GHz Family 6
* System RAM. 12.7GB
* GPU. Tesla T4, availability depends.
* GPU RAM. 15GB memory.

To proceed, click the play button on the top left corner of each cell.

# Acknowledgement
We would like to thank colleagues from [AlphaFold](https://colab.research.google.com/github/deepmind/alphafold/blob/main/notebooks/AlphaFold.ipynb#scrollTo=VzJ5iMjTtoZw) and [ColabFold](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/beta/AlphaFold2_advanced.ipynb#scrollTo=woIxeCPygt7K) in formalising routines setting up Google Colab projects! Refer to FAQ in the last cell when question arise.





# Install Conda and Dependencies

Here we present a version that runs on Google Colaboratory, a tentative service provided by Google for free GPU resources. To run the software with a strictly controlled version, please refer to `Inching-main/Command8A/README_InstallationOnCarc_20230221.sh` for instruction.

* Colaboratory weather report 2023-12-27. The following cell will trigger a restart saying crash, but once restarted, click the next cell, everything's okay.


In [None]:
import google.colab #@markdown Click the play button!
!pip install condacolab
import condacolab
condacolab.install()

Collecting condacolab
  Downloading condacolab-0.1.9-py3-none-any.whl (7.2 kB)
Installing collected packages: condacolab
Successfully installed condacolab-0.1.9
⏬ Downloading https://github.com/conda-forge/miniforge/releases/download/23.11.0-0/Mambaforge-23.11.0-0-Linux-x86_64.sh...
📦 Installing...
📌 Adjusting configuration...
🩹 Patching environment...
⏲ Done in 0:00:14
🔁 Restarting kernel...


## INSTRUCTION: After Restart, click each of the following cells ONCE. Follow the instruction of each cell.


# User Options

`User_n_mode = 64` is the number of eigenpairs to be output. `User_EED_ = True` determines whether External Explicit Deflation is to be used.

In [None]:
User_n_mode = 32 #@param [32, 64] {type:"raw"}
User_Eigensolver = "InchingCTRLM" #@param ["InchingJDM", "InchingCTRLM"] {type:"raw"}
User_EED = True #@param {type:"boolean"}
User_AnimateMode = 5 #@param [5, 10, 15, 20] {type:"raw"}
User_IntegerOfIndexing = "INTEGER32" # NOTE Indexing. For extremely large system with nnz > 2.1 billion use INTEGER64

# Upload and Install

Upload a pdb/cif file of your choice. Please name it with 4-alphanumerics followed by a suffix, either `.pdb` or `.cif`. For example, `5h2f.pdb`. Also, be reminded that

* The default unit of 3-D coordinates in `.pdb` format is angstrom, but for `.cif` format, it is assumed nanometer. For simplicity, the uploaded file will always be renamed as `upload{suffix}`.
* By default, we will remove the first 6 rigid modes with zero-eigenvalues by Hotelling deflation. However, if there are disconnected components separate from one another for more than 8 angstrom in the macromolecule, there will be more than 6 rigid modes with zero-eigenvalues! See how to check it quickly in our Notebook `Inching-main/Notebook/Application/99_Inching_CheckConnectivity.ipynb`.

# Remark
`zip -P AAAAA10115 -r Inching-main.zip ./Inching-main/`

In [None]:
from google.colab import files
import os

uploaded = files.upload() #@markdown Upload a pdb/cif file of your choice.
uploaded_filename = list(uploaded.keys())[0]
print(uploaded_filename)

import os
import shutil
uploaded_suffix = uploaded_filename.split(".")[-1]
shutil.move(uploaded_filename, "upload.%s" %(uploaded_suffix))



# NOTE Then we install
import condacolab
from google.colab import files
from IPython.display import clear_output
condacolab.check()
#!conda install -q -y -c conda-forge -c pytorch scipy=1.8.0 pytorch=1.11.0 pandas=1.5.3 openmm=7.7.0 tqdm cupy=11.5.0 python=3.10
#!conda install -q -y -c conda-forge openmm=7.7.0 mdtraj=1.9.7 tqdm python=3.10 cudatoolkit=12.2
!conda install -q -y -c conda-forge -c nvidia -c pytorch openmm=7.7.0 mdtraj=1.9.7 tqdm python=3.10 cudatoolkit=11.8.0
on_colab = True
clear_output()

# NOTE Fast steps
#!wget https://zenodo.org/records/10443729/files/Inching-main.zip?download=1 -O /content/Inching-main.zip
!wget https://zenodo.org/records/10645601/files/jhmlam/Inching-zenodov1.0.zip -O /content/Inching-main.zip
#!unzip -P AAAAA10115 -o Inching-main.zip
!unzip -o Inching-main.zip
!cp -r jhmlam-Inching-933f839/InchingLiteInteger/ .
!rm -r Result
!mkdir Result

clear_output()

IndexError: list index out of range

# Running the analysis

The NVIDIA® T4 is a single-slot, low-profile GPU, but it can still stably deliver analysis for macromolecules containing
- [X] 100 thousand atoms, 32 modes in ~5 minutes.
- [ ] 200 tounsand atoms, 32 modes in ~10 minutes.


Note that the time to read/write/download is discounted and may even be longer than the calculation on Colab(!). For performance computing, please use our code locally at a linux workstation. Follow through the notebooks at `https://github.com/jhmlam/Inching/blob/main/Notebook/Application/`.


In [None]:
import glob #@markdown Click the play button
import platform
import cupy as cp
import cupyx
from cupyx.scipy import sparse as cupysparse
# A list of pdb available at different sizes
PART00_IO = True
if PART00_IO:
  pdbavail = [ "./upload.%s" %(uploaded_suffix)]
  Benchmarking_folder = "./Result/"

  User_Platform = platform.system() # Windows Darwin Linux

  User_rc_Gamma = 8.0
  User_maxleafsize = 100

  User_tol = 1e-15
  User_PlusI = 1.0

  if uploaded_suffix == 'cif':
    PDBCIF="Cif"
  else:
    PDBCIF = "Pdb"
  User_MaxIter = 15000

  # JDM Params
  User_GapEstimate = 0
  User_SolverName = 'gmres'
  User_SolverMaxIter = 20
  User_EigTolerance = 1e-12

PART00_Import = True
if PART00_Import:
   import os
   import gc
   import sys
   import pickle

   import numpy as np
   import time
   import tqdm

   import torch


   import platform


   import time

   import cupy
   from cupy import cublas


   from scipy.spatial import cKDTree



   sys.path.append('.')
   sys.path.append('./InchingLiteInteger/Burn/')



   import InchingLiteInteger.util
   import InchingLiteInteger.Fuel.Coordinate.T1
   import InchingLiteInteger.Fuel.Coordinate.T2
   import InchingLiteInteger.Burn.Coordinate.T1
   import InchingLiteInteger.Burn.Coordinate.T3

   from InchingLiteInteger.Fuel.T1 import Xnumpy_SparseCupyMatrixUngappped

   import InchingLiteInteger.Burn.Visualisation.T1
   import InchingLiteInteger.Burn.Visualisation.T2

   # ============================
   # Some torch speed up tips
   # =============================

   # Turn on cuda optimizer
   torch.backends.cudnn.is_available()
   torch.backends.cudnn.enabled = True
   torch.backends.cudnn.benchmark = True
   # disable debugs NOTE use only after debugging
   torch.autograd.set_detect_anomaly(False)
   torch.autograd.profiler.profile(False)
   torch.autograd.profiler.emit_nvtx(False)
   # Disable gradient tracking
   torch.no_grad()
   torch.inference_mode()
   torch.manual_seed(0)
   cupy.random.seed(seed = 0)
   os.environ['CUDA_LAUNCH_BLOCKING'] = "1" # NOTE In case any error showup
   # Reset Cuda and Torch
   device = torch.device(0)
   torch.set_default_dtype(torch.float64)
   torch.set_default_tensor_type(torch.cuda.DoubleTensor)
   try:
      InchingLiteInteger.util.TorchEmptyCache()
   except RuntimeError:
      print("The GPU is free to use. THere is no existing occupant")
   try:
      print(torch.cuda.memory_summary(device = 0, abbreviated=True))
   except KeyError:
      print("The GPU is free to use. THere is no existing occupant")



pdbfn = pdbavail[0]
devices_ = [d for d in range(torch.cuda.device_count())]
device_names_  = [torch.cuda.get_device_name(d) for d in devices_]
User_Device =  device_names_[0]


pdbid = pdbfn.split("/")[-1].split(".")[0]



print("Reading structure...")
X_df, X_top = InchingLiteInteger.util.BasicPdbCifLoading(pdbfn)
protein_xyz = X_df[['x','y','z']].to_numpy().astype(np.float64)
# NOTE PDB format digit decimal do no destroy collinearity!
protein_xyz -= np.around(protein_xyz.mean(axis= 0), decimals=4)
n_atoms = protein_xyz.shape[0]




# ===============================================
# K-d Cuthill (NOTE CPU np array)
# ===================================
PART02_Cuthill = True
if PART02_Cuthill:
    # NOTE Cuthill Order and Undo
    st = time.time()
    cuthill_order, cuthill_undoorder = InchingLiteInteger.Fuel.Coordinate.T1.X_KdCuthillMckeeOrder(protein_xyz,
                                rc_Gamma = User_rc_Gamma, Reverse = True,
                                )
    protein_xyz = protein_xyz[cuthill_order,:]
    #protein_tree = cKDTree(protein_xyz, leafsize=16, compact_nodes=True, copy_data=False, balanced_tree=True, boxsize=None)


    from InchingLiteInteger.Burn.JacobiDavidsonHotellingDeflation.T1 import S_HeigvalJDMHD_HeigvecJDMHD
    from InchingLiteInteger.Burn.ThickRestartLanczosHotellingDeflation.T1 import S_HeigvalTRLMHD_HeigvecTRLMHD
    from InchingLiteInteger.Burn.ChebyshevDavidsonSubspaceIteration.T1 import S_HeigvalCDSIHD_HeigvecCDSIHD


    import InchingLiteInteger.Burn.HermitianLanczos.T2
    import InchingLiteInteger.Burn.PolynomialFilters.T0
    import InchingLiteInteger.Burn.PolynomialFilters.T2



    print('start eigsh cupy')

    mempool = cupy.get_default_memory_pool()
    pinned_mempool = cupy.get_default_pinned_memory_pool()

# ==================
# Cupy hessian
# =====================


PART03a_MakeCupyHessian = True
if PART03a_MakeCupyHessian:
    # NOTE Nnz neighborhood after cuthill
    NnzMinMaxDict, HalfNnz  = InchingLiteInteger.Fuel.Coordinate.T1.X_KdUngappedMinMaxNeighbor(protein_xyz,
                                rc_Gamma = User_rc_Gamma,
                                maxleafsize = User_maxleafsize,
                                CollectStat = False,
                                User_ReturnHalfNnz = True,
                                SliceForm= True)


    # NOTE Pyotch tensor spend textra memory when dlpack has to be called and there are mmeleak
    #X = torch.tensor(protein_xyz, device=device, requires_grad= False)
    X = protein_xyz

    Xnumpy_SparseCupyMatrixUngapppedC = Xnumpy_SparseCupyMatrixUngappped(X, batch_head = None,
        maxleafsize = User_maxleafsize, rc_Gamma = User_rc_Gamma,
        #device  = torch.device(0),
        User_PlusI = User_PlusI,
        #dtype_temp = torch.float64,
        #X_precision = torch.cuda.DoubleTensor,
        User_DictCharmmGuiPbc = None, #Dict_Pbc,
        NnzMinMaxDict = NnzMinMaxDict)
    if User_IntegerOfIndexing == "INTEGER32":
        A, A_diag = Xnumpy_SparseCupyMatrixUngapppedC.ReturnCupyHLowerTriangleInt32(
                        User_MaxHalfNnzBufferSize = HalfNnz)
    else:
        #print('gagag')
        A, A_diag = Xnumpy_SparseCupyMatrixUngapppedC.ReturnCupyHLowerTriangleInt64(
                        User_MaxHalfNnzBufferSize = HalfNnz)

    print("Matrix Index Datatype", A.indices.dtype)
    print("Matrix Datatype",A.data.shape)

    cupy.get_default_memory_pool().free_all_blocks()
    cupy.get_default_pinned_memory_pool().free_all_blocks()
    gc.collect()







PART03b_MakeFreeModes = User_EED
if PART03b_MakeFreeModes:

    Q_HotellingDeflation = cp.zeros((6,3*n_atoms), dtype = cp.float64)
    # NOTE Translation
    for i in range(3):
        q1 = cp.zeros((n_atoms,3))
        q1[:,i] = 1/np.sqrt(n_atoms)
        Q_HotellingDeflation[i,:] = q1.flatten()
        q1 = None
        del q1
        cupy.get_default_memory_pool().free_all_blocks()
        cupy.get_default_pinned_memory_pool().free_all_blocks()



    # NOTE Rotation
    R_x = cp.array([        [0,0,0],
                            [0,0,-1],
                            [0,1,0]], dtype=cp.float64).T
    R_y = cp.array([        [0,0,1],
                            [0,0,0],
                            [-1,0,0]], dtype=cp.float64).T
    R_z = cp.array([        [0,-1,0],
                            [1,0,0],
                            [0,0,0]], dtype=cp.float64).T
    R_x = cupysparse.csr_matrix(R_x, dtype= cp.float64)
    R_y = cupysparse.csr_matrix(R_y, dtype= cp.float64)
    R_z = cupysparse.csr_matrix(R_z, dtype= cp.float64)
    gx = (cp.array(X)@R_x).flatten()
    Q_HotellingDeflation[3,:] = gx/ cp.linalg.norm(gx,ord=2)
    gy = (cp.array(X)@R_y).flatten()
    Q_HotellingDeflation[4,:] = gy/ cp.linalg.norm(gy,ord=2)
    gz = (cp.array(X)@R_z).flatten()
    Q_HotellingDeflation[5,:] = gz/ cp.linalg.norm(gz,ord=2)



    for i_FRO in range(2):
        V = Q_HotellingDeflation.T

        for ix in range(6):
            if ix == 0:
                continue
            V[:,ix] -= cp.matmul(V[:,:ix], cp.matmul( V[:, :ix].T,V[:,ix] ))
            V[:,ix] /= cp.sqrt(V[:, ix].T @ V[:, ix]) # TODO torch.matmul or mvs
            V[:,ix] -= cp.matmul(V[:,:ix], cp.matmul( V[:, :ix].T,V[:,ix] ))
            V[:,ix] /= cp.sqrt(V[:, ix].T @ V[:, ix])
        Q_HotellingDeflation = V.T

    gx = Q_HotellingDeflation[3]


    Q_HotellingDeflation = cupyx.scipy.sparse.csr_matrix(Q_HotellingDeflation, dtype = cp.float64)

    gx, gy, gz = None, None, None
    del gx, gy, gz
    cupy.get_default_memory_pool().free_all_blocks()
    cupy.get_default_pinned_memory_pool().free_all_blocks()





if User_Eigensolver == "InchingJDM":
  PART04_CalcualteEigJDM = True
else:
  PART04_CalcualteEigJDM = False
if PART04_CalcualteEigJDM:
    if User_EED:

        User_GapEstimate = 0 # NOTE Not in use.
        eigval, eigvec = S_HeigvalJDMHD_HeigvecJDMHD(A, A_diag,
                    k = User_n_mode,
                    tol = User_EigTolerance,
                    maxiter = User_MaxIter,
                    User_CorrectionSolverMaxiter = User_SolverMaxIter,
                    User_HalfMemMode= True,
                    User_IntermediateConvergenceTol=1e-3, # NOTE Do not touch for this problem
                    User_GapEstimate = User_GapEstimate, # NOTE This will be used for theta - gap_estimate
                    User_FactoringToleranceOnCorrection = 1e-4, # NOTE Do not touch for this problem
                    User_Q_HotellingDeflation= Q_HotellingDeflation,
                    User_HotellingShift = 40, # NOTE 40 is generally safe for first 64 modes, of course if you want to guarentee it you know a norm

                    )
    else:

        User_GapEstimate = 0 # NOTE Not in use.
        eigval, eigvec = S_HeigvalJDMHD_HeigvecJDMHD(A, A_diag,
                    k = User_n_mode,
                    tol = User_EigTolerance,
                    maxiter = User_MaxIter,
                    User_CorrectionSolverMaxiter = User_SolverMaxIter,
                    User_HalfMemMode= True,
                    User_IntermediateConvergenceTol=1e-3, # NOTE Do not touch for this problem
                    User_GapEstimate = User_GapEstimate, # NOTE This will be used for theta - gap_estimate
                    User_FactoringToleranceOnCorrection = 1e-4, # NOTE Do not touch for this problem
                    User_Q_HotellingDeflation= None,
                    User_HotellingShift = 40, # NOTE 40 is generally safe for first 64 modes, of course if you want to guarentee it you know a norm

                    )



    runtime = time.time() - st
    print("RUNNNTIME %s" %(runtime))
    peak_mem = cupy.get_default_memory_pool().used_bytes() / 1024 / 1024


    runtime = time.time() - st
    peak_mem = cupy.get_default_memory_pool().used_bytes() / 1024 / 1024
    with open("%s/Eigval_InchingJDM_%s_%s_%s.pkl" %(
                Benchmarking_folder, pdbid, User_Platform,
                User_Device.replace(" ","")),"wb") as fn:
        pickle.dump(cupy.asnumpy(eigval) - User_PlusI ,fn, protocol=4)

    with open("%s/Eigvec_InchingJDM_%s_%s_%s.pkl" %(
                Benchmarking_folder, pdbid, User_Platform,
                User_Device.replace(" ","")),"wb") as fn:
        tempeigvec = cupy.asnumpy(eigvec)
        tempeigvec = tempeigvec.T
        tempeigvec = tempeigvec.reshape((int(User_n_mode),int(n_atoms),int(3)))
        pickle.dump(tempeigvec[:,cuthill_undoorder,:] ,fn, protocol=4)



    gc.collect()


if User_Eigensolver == "InchingCTRLM":
  PART04_CalcualteEigCTRLM = True
else:
  PART04_CalcualteEigCTRLM = False

if PART04_CalcualteEigCTRLM:
    User_WantedNumberEigenvalue = User_n_mode
    User_SpectrumBound = InchingLiteInteger.Burn.HermitianLanczos.T2.A_Adiag_EstimateSpectrumBound(
                            A, A_diag, User_HalfMemMode = True )
    #RitzValues = User_SpectrumBound[2]
    #print(User_SpectrumBound)
    #print(float(np.quantile(RitzValues,0.025)))
    User_SpectrumBound = (User_SpectrumBound[0], User_SpectrumBound[1])
    User_SpectrumBound = (User_PlusI, User_SpectrumBound[1].get()+1e-12)


    User_WantedInterval_ = (1.0, User_SpectrumBound[1]/10)
    User_WantedInterval = User_WantedInterval_
    # NOTE For the extremal it is empirically unwise to squeeze too hard as many of
    #      these eigenvalues are close to 0+1 and the mapped function is in steep decline (with very deep caution here.)
    #      I would delibrately make the wanted nubmer of eigval larger, because anyhow the lowest will be the first to converge in this scenairo
    User_PolynomialParams, eigval_count_estimate, temp_User_WantedInterval = InchingLiteInteger.Burn.PolynomialFilters.T2.A_Adiag_OptimizePolynomialParamsOnMemory(
                                                    A, A_diag,
                                                    User_MaximumDegree = 5120,
                                                    User_MinimumDegree = 5,
                                                    User_DampingKernel = "Jackson",
                                                    User_ExtremalIntervalDefinition = 1e-10,
                                                    User_WantedInterval = User_WantedInterval,
                                                    User_SpectrumBound = User_SpectrumBound,
                                                    User_DesignatedStart = User_SpectrumBound[1]/10,
                                                    # NOTE Make this a underestimate of the actual number.
                                                    #      While we will have repeated modes, the risk of having a repeated mode is outweighed by risk of non-convergence!
                                                    #
                                                    User_WantedNumberEigenvalue = int(User_WantedNumberEigenvalue-5)*3,
                                                    User_AffordableMemoryMargin = 5*5,
                                                    User_HalfMemMode = True,
                                                    User_NumberKpmTrials =  5,
                                                    User_ConvergenceRatio = 0.7, # NOTE For extremal this is overrided with 0.1
                                                    )
    User_WantedInterval = (temp_User_WantedInterval[0], temp_User_WantedInterval[1])
    print(User_PolynomialParams.AdjustedDegree, temp_User_WantedInterval)

    User_Q_HotellingDeflation_ = Q_HotellingDeflation
    eigval, eigvec = S_HeigvalTRLMHD_HeigvecTRLMHD(A, A_diag,


                        User_WorkspaceSizeFactor = 2 ,
                        k = User_n_mode ,
                        tol = User_EigTolerance,
                        maxiter = User_MaxIter,
                        #User_CorrectionSolverMaxiter = User_SolverMaxIter,
                        User_HalfMemMode= True,
                        #User_IntermediateConvergenceTol=1e-3, # NOTE Do not touch for this problem
                        #User_GapEstimate = User_GapEstimate, # NOTE This will be used for theta - gap_estimate
                        #User_FactoringToleranceOnCorrection = 1e-4,#1e-4, # NOTE Do not touch for this problem

                        User_Q_HotellingDeflation = User_Q_HotellingDeflation_, #Q_HotellingDeflation,
                        User_HotellingShift = -40.0,# NOTE pull to negative, we are on cheb

                        User_PolynomialParams = User_PolynomialParams,
                        )



    runtime = time.time() - st
    print("RUNNNTIME %s" %(runtime))
    peak_mem = cupy.get_default_memory_pool().used_bytes() / 1024 / 1024


    runtime = time.time() - st
    peak_mem = cupy.get_default_memory_pool().used_bytes() / 1024 / 1024
    with open("%s/Eigval_InchingJDM_%s_%s_%s.pkl" %(
                Benchmarking_folder, pdbid, User_Platform,
                User_Device.replace(" ","")),"wb") as fn:
        pickle.dump(cupy.asnumpy(eigval) - User_PlusI ,fn, protocol=4)

    with open("%s/Eigvec_InchingJDM_%s_%s_%s.pkl" %(
                Benchmarking_folder, pdbid, User_Platform,
                User_Device.replace(" ","")),"wb") as fn:
        tempeigvec = cupy.asnumpy(eigvec)
        tempeigvec = tempeigvec.T
        tempeigvec = tempeigvec.reshape((int(User_n_mode),int(n_atoms),int(3)))
        pickle.dump(tempeigvec[:,cuthill_undoorder,:] ,fn, protocol=4)



    gc.collect()




PART05_Performance = True
if PART05_Performance:
    #===================================
    # Check correct
    # =====================================
    #print(eigval)
    #print(eigvec.shape)

    User_HalfMemMode = True
    if User_HalfMemMode:
        KrylovAv = InchingLiteInteger.Burn.Krylov.T3.OOC2_HalfMemS_v_KrylovAv_VOID(A, A_diag)
    else:
        KrylovAv = InchingLiteInteger.Burn.Krylov.T3.OOC2_FullMemS_v_KrylovAv_VOID(A, A_diag)
    Av = cupy.empty((n_atoms*3,)).astype(A.dtype)


    delta_lambda_list = []
    for jj in range(User_n_mode):
        KrylovAv(A,cupy.ravel(eigvec[:,jj]),Av)
        B = Av - eigval[jj]* cupy.ravel(eigvec[:,jj])

        delta_lambda_list.append(cupy.asnumpy(cublas.nrm2(B)))
        #if jj < 20:
        print(eigval[jj], cupy.asnumpy(cublas.nrm2(B)))


    eigval = cupy.asnumpy(eigval)
    n_atoms = protein_xyz.shape[0]

    GPU = "%s %s" %(User_Platform, User_Device.replace(" GPU", ""))

    performance = ["Inching (JDM %s)" %(GPU), pdbfn, n_atoms,
                    runtime, peak_mem,
                    User_Platform, User_Device,
                    User_maxleafsize]



    longperformance = []
    for i in range(len(delta_lambda_list)):
        longperformance.append(performance + [i ,delta_lambda_list[i], eigval[i] - User_PlusI])

    with open("%s/PerformanceList_InchingJDM_%s_%s_%s.pkl" %(Benchmarking_folder,
        pdbid, User_Platform, User_Device.replace(" ","")),"wb") as fn:
        pickle.dump(longperformance,fn, protocol=4)


    #del X_df#, protein_xyz
    #gc.collect()



    B = None
    A.data = None
    A.indices = None
    A.indptr = None
    Q_HotellingDeflation = None
    del Q_HotellingDeflation

    del A.data, A.indices, A.indptr
    del A, B
    Xnumpy_SparseCupyMatrixUngapppedC.X, Xnumpy_SparseCupyMatrixUngapppedC.X_unsqueezed = None, None
    del Xnumpy_SparseCupyMatrixUngapppedC.X, Xnumpy_SparseCupyMatrixUngapppedC.X_unsqueezed
    Xnumpy_SparseCupyMatrixUngapppedC = None
    del Xnumpy_SparseCupyMatrixUngapppedC
    eigvec, eigval = None, None
    del eigvec, eigval


    cupy.get_default_memory_pool().free_all_blocks()
    cupy.get_default_pinned_memory_pool().free_all_blocks()
    del X
    torch.cuda.empty_cache()
    torch.cuda.reset_peak_memory_stats(0)
    torch.cuda.memory_allocated(0)
    torch.cuda.max_memory_allocated(0)








  _C._set_default_tensor_type(t)


The GPU is free to use. THere is no existing occupant
The GPU is free to use. THere is no existing occupant
Reading structure...
On Linux


100%|██████████| 152/152 [00:02<00:00, 58.52it/s]


N_neighbor within 8.0 angstrom Mean 91.66226264728547, Std 23.711226761684024
NN search in 2.6058661937713623 s


  0%|          | 0/151831 [00:10<?, ?it/s]


start eigsh cupy
On Linux


100%|██████████| 2048/2048 [00:00<00:00, 1491826.08it/s]
100%|██████████| 2048/2048 [00:06<00:00, 311.50it/s]


Mean number of Gaps > 100 is 17.2255859375. Mean Gap Length Given Gap is 549.855802483134
Max number of Gaps > 100 is 37. Max Gap Length Given Gap is 5681
Median number of Gaps > 100 is 17.0. Median Gap Length Given Gap is 336.0
Total Entry Savings 1438796716 which is 80.30246670680903 percent of a Rectangular Batch
Nnz in Hessian (L+D) is 63310518.0. This will occupy 0.4666095860302448 GB for (L+D) data and at max 0.4666095860302448 GB for all indexings. Acceptable?


  self.frontal_gap_offset[i] = torch.tensor(
100%|██████████| 2048/2048 [00:41<00:00, 49.10it/s]


Matrix Index Datatype int32
Matrix Datatype (62844725,)


100%|██████████| 300/300 [00:02<00:00, 133.22it/s]


Done Bound


100%|██████████| 5/5 [00:01<00:00,  3.61it/s]


Kpm Estiamte Mean 10072.2 Std 167.2703201407829


100%|██████████| 5/5 [00:00<00:00, 14.14it/s]


Kpm Estiamte Mean 5331.4 Std 132.05695740853642
Adjusted degree to 9 and Interval to 1.0,4.364174112664313. Estimate number of eigval is 5332
Convergence ratio 0.7


100%|██████████| 5/5 [00:00<00:00,  8.78it/s]


Kpm Estiamte Mean 2311.4 Std 69.93883041630022
Adjusted degree to 14 and Interval to 1.0,2.6820870563321564. Estimate number of eigval is 2313
Convergence ratio 0.7


100%|██████████| 5/5 [00:00<00:00,  6.02it/s]


Kpm Estiamte Mean 960.0 Std 30.364452901377952
Adjusted degree to 21 and Interval to 1.0,1.8410435281660782. Estimate number of eigval is 961
Convergence ratio 0.7


100%|██████████| 5/5 [00:01<00:00,  3.77it/s]


Kpm Estiamte Mean 467.6 Std 21.105449533236673
Adjusted degree to 33 and Interval to 1.0,1.420521764083039. Estimate number of eigval is 469
Convergence ratio 0.7


100%|██████████| 5/5 [00:02<00:00,  2.36it/s]


Kpm Estiamte Mean 255.2 Std 24.194214184387143
Adjusted degree to 53 and Interval to 1.0,1.2102608820415195. Estimate number of eigval is 256
Convergence ratio 0.7


100%|██████████| 5/5 [00:03<00:00,  1.49it/s]


Kpm Estiamte Mean 145.8 Std 13.181805642627264
Adjusted degree to 84 and Interval to 1.0,1.1051304410207599. Estimate number of eigval is 147
Convergence ratio 0.7


100%|██████████| 5/5 [00:05<00:00,  1.09s/it]


Kpm Estiamte Mean 78.4 Std 11.909659944767524
Adjusted degree to 136 and Interval to 1.0,1.05256522051038. Estimate number of eigval is 79
Convergence ratio 0.7
136 (1.0, 1.05256522051038)
There are 64 Ritz vectors, tol = 1e-12
Coarse_iter 0 Estimate at 7.383804768371553e-07. Ritz values follows

[  0.87953603   0.87042884   0.85773009   0.85481088   0.81719735
   0.80715763   0.8000678    0.75171336   0.73224811   0.72197768
   0.72050719   0.70022293   0.67612713   0.64993303   0.62425965
   0.58660809   0.58358883   0.55845117   0.53021621   0.49425734
   0.48043344   0.47530666   0.44619414   0.41914286   0.40375565
   0.38996362   0.37060972   0.35468943   0.33734035   0.32444119
   0.31287002   0.28944584   0.19212887   0.1700562    0.1514901
   0.20776515   0.20847292   0.17943964   0.16080298 -38.67207979
  -0.12934543   0.21978016   0.14034219   0.13692881   0.13176089
  -0.13987678 -38.66974258   0.13565679   0.1725776    0.2007276
   0.17231042   0.16228474 -14.91233464 -23.

## Generate Animation and Download

This writes the linearized motion as a `cif` file for each mode. Unfortunately, the I/O can often take much longer than the calculation itself. After downloading the `InchingResult.zip`, decompress it, open the cif file in pymol and load the corresponding `.pml` script. Enjoy!

* By default, the `save_to_google_drive` check box is activated; it will ask for your permission to upload the result to your google drive. If `save_to_google_drive` check box is deactivated, you can directly download the file, but it can take very long.
* To indicate the mode shape, for each mode, a black arrow is placed on 10000 randomly chosen atoms with top 50% of diplacement magnitude. Thickness of the arrow can be tuned by checking `User_ThickerArrowForDisplay`. It is suggested to use thicker arrows for structures with >100 thousand atoms.

In [None]:
save_to_google_drive = True #@param {type:"boolean"}
User_ThickerArrowForDisplay = True #@param {type:"boolean"}
# =======================
# Linearize
# ==========================
pdbfn = pdbavail[0]#@markdown The zip file will be downloaded as `InchingResult_{time}.zip`. Size is printed below

User_QuantileDisplay = 0.5
User_RandomPickArrows = 10000
User_EigenvectorTwinDirection = 1

User_BigClusterArrowFloatingFactor = 0.5
User_DBscanMinDist = 1.5 # NOTE THis roughly cluster the 90% percentile arrows. largerr the less arrows

print("Printing animations...")
import sklearn.cluster
from InchingLiteInteger.Fuel.Coordinate.T1 import HeigvecOne_BoxCoxMagnitude
protein_xyz_ = protein_xyz
PART06_Animate = True
if PART06_Animate:
    eigvec = tempeigvec[:,cuthill_undoorder,:]
    protein_xyz = protein_xyz[cuthill_undoorder,:]
    if User_EED:
      pass
    else:
      User_AnimateMode +=6

    i_mode = 0
    for User_TheModeToShow in range(User_AnimateMode):

        if User_EED:
          pass
        else:
          if User_TheModeToShow <=5:
              continue

        if pdbfn.split(".")[-1] == 'pdb':
            nmfactor = 0.1
        else:
            nmfactor = 1

        gc.collect()

        PART06b_Logistic = True
        if PART06b_Logistic:

            #if os.path.exists("%s/%s_Animated_%s_%s.cif" %(Benchmarking_folder, pdbid, pdbid, i_mode)):
            #    continue

            # NOTE Kerneled eigvec
            deltaX_magnitude = HeigvecOne_BoxCoxMagnitude( eigvec[User_TheModeToShow,:,:],
                    User_WinsorizingWindow = (0.025, 0.975),
                    User_LogisticParam = (0.05, 1.0),
                    )

            deltaX_magnitude = np.clip(deltaX_magnitude, 0.1, 1.0)
            eigvec_unit = eigvec[User_TheModeToShow] / np.linalg.norm(eigvec[User_TheModeToShow], axis=1)[:,None]
            deltaX = deltaX_magnitude[:,None] * eigvec_unit


            InchingLiteInteger.util.SaveOneModeLinearisedAnime(
                    deltaX,
                    protein_xyz*nmfactor,
                    n_timestep = 16,
                    DIR_ReferenceStructure = pdbfn,#[:-4] + "trial.cif",
                    DIR_SaveFolder = Benchmarking_folder,
                    SaveFormat = 'cif',
                    outputlabel = 'Animated_%s_%s'%(pdbid, i_mode),
                    max_abs_deviation = 3.0*nmfactor,
                    stepsize = 1.0*nmfactor,
                    UnitMovement = False,
                    max_n_output = 32,
                    SaveSeparate = False,
                    RemoveOrig = False, # NOTE This flag remove the unmoved structure from the trajectory produce
                    User_Bfactor=deltaX_magnitude
                    )





        PART06c_Arrows = True
        if PART06c_Arrows:
          PART02_DecideWhatArrowsToPlot = True
          if PART02_DecideWhatArrowsToPlot:
              where_CaOrP = X_df.loc[X_df['name'].isin(["CA", "P"]) & ~X_df['element'].isin(["Ca"])].index.values
              where_larger = np.where((deltaX_magnitude > np.quantile(deltaX_magnitude, q = User_QuantileDisplay)))[0]
              # a ball with large displacement TODO Show the stacked detail
              where_larger_CaOrP = np.intersect1d(where_larger, where_CaOrP, assume_unique=False, return_indices=False)
              where_random = np.random.choice(where_larger_CaOrP,
                                                  size= min(User_RandomPickArrows, where_larger_CaOrP.shape[0]), replace = False)

              # TODO Make  a big arrow for those large ones only! Cluster the coordinate by dbscan.
              #      average the arrow put it in center and floating in air.
              #      Make the arrow obvious enough to indicate the direction.
              where_CaOrP_subset = where_CaOrP[::max(1, int(protein_xyz.shape[0]/User_RandomPickArrows))]


              # ======================
              # Big Arrow
              # =========================

              clustering = sklearn.cluster.DBSCAN(eps=User_DBscanMinDist, min_samples=10, metric='euclidean',
                                                  metric_params=None, algorithm='kd_tree',
                                                  leaf_size=100, p=2, n_jobs=1).fit(protein_xyz[where_larger_CaOrP,:])
              unique_clusters = np.unique(clustering.labels_)
              DBSCAN_Coord = np.zeros((unique_clusters.shape[0],3))
              DBSCAN_UnitEigvec = np.zeros((unique_clusters.shape[0],3))
              DBSCAN_UnitEigvecmag = np.zeros((unique_clusters.shape[0],1))
              for i_cluster in unique_clusters:
                  if i_cluster == -1:
                      continue
                  same_cluster = where_larger_CaOrP[np.where(clustering.labels_ == i_cluster)[0]]
                  DBSCAN_Coord[i_cluster,:] = np.mean(protein_xyz[same_cluster,:], axis=0)
                  DBSCAN_UnitEigvec[i_cluster,:] = np.mean(eigvec_unit[same_cluster,:], axis=0)
                  DBSCAN_UnitEigvecmag[i_cluster,:] = np.mean(deltaX_magnitude[same_cluster])

          #print("dbscan done")
          # ========================
          # Print arrwo
          # ==========================
          PART03_PrintCgoArrows = True
          if PART03_PrintCgoArrows:
              # NOTE Pymol...
              if pdbfn.split(".")[-1] == 'pdb':
                  nmfactor_ = 10.0
              else:
                  nmfactor_ = 10.0


              #print(deltaX_magnitude)
              percentilescore_all =  np.argsort(np.argsort(deltaX_magnitude, axis=0), axis=0) / float(len(deltaX_magnitude)) # NOTE Assumed that each has a unique float
              print_cgoarrows = []

              # =================================
              # NOTE THe Big Clustered Arrow
              # ==================================
              """
              for i_cluster in range(unique_clusters.shape[0]):

                  # NOTE Point to point
                  position_source = DBSCAN_Coord[i_cluster] * nmfactor_
                  direction_size = 99 * DBSCAN_UnitEigvecmag[i_cluster]
                  direction_= (User_EigenvectorTwinDirection * DBSCAN_UnitEigvec[i_cluster] *direction_size) #* deltaX_magnitude[atomindex_]*50)
                  gap = direction_* User_BigClusterArrowFloatingFactor

                  position_source += gap
                  #position_source += direction_*User_BigClusterArrowFloatingFactor
                  position_target = position_source + direction_

                  x_s, y_s, z_s = position_source[0], position_source[1], position_source[2]
                  x_t, y_t, z_t = position_target[0], position_target[1], position_target[2]
                  thickness_ = 5 # percentilescore_all[atomindex_]
                  print_cgoarrows.append("cgo_arrow [%.3f, %.3f, %.3f], [%.3f, %.3f, %.3f] " %(
                      x_s, y_s, z_s, x_t, y_t, z_t) + ', name = \"' + "ClusterArrow%s" %(i_cluster+1)+'\",' + " radius = %s, hradius = %s, hlength = %s, " %(thickness_, thickness_*2, direction_size[0]/2 ) + ' color = hotpink')
                      # hotpink black
              """
              # ===========================
              # NOTE every n CA
              # ==============================
              choice_where =  where_random # where_CaOrP_subset
              for i_whererand in range(len(choice_where)):
                  atomindex_ = choice_where[i_whererand]
                  # NOTE Point to point
                  position_source = protein_xyz[atomindex_]*nmfactor_
                  direction_= (eigvec_unit[atomindex_] * User_EigenvectorTwinDirection *25 * deltaX_magnitude[atomindex_]) #* deltaX_magnitude[atomindex_]*50)
                  position_target = position_source + direction_

                  x_s, y_s, z_s = position_source[0], position_source[1], position_source[2]
                  x_t, y_t, z_t = position_target[0], position_target[1], position_target[2]
                  if User_ThickerArrowForDisplay:
                    thickness_ = 0.5
                  else:
                    thickness_ = 0.1 # percentilescore_all[atomindex_]
                  print_cgoarrows.append("cgo_arrow [%.3f, %.3f, %.3f], [%.3f, %.3f, %.3f] " %(
                      x_s, y_s, z_s, x_t, y_t, z_t) + ', name = \"' + "Index%s" %(atomindex_+1)+'\",' + " radius = %s, hradius = %s, hlength = 10.23, " %(thickness_*1, thickness_ * 4) + ' color = black')
                      # hotpink black





          with open('./jhmlam-Inching-933f839/Notebook/Application/ArrowTemplate.pml', 'r') as f :
                  filedata = f.read()

          filedata = filedata.replace('REPLACE_WITH_FILENAME', "%s/%s_Animated_%s_%s.cif" %("./", pdbid, pdbid, i_mode))
          filedata = filedata.replace('REPLACE_WITH_ID', '%s' %(pdbid))
          filedata = filedata.replace('REPLACE_WITH_CGOARROWS', "\n".join(print_cgoarrows))
          #print(filedata)
          with open('%s/PymolSession_%s_%s.pml'%(Benchmarking_folder, pdbid, i_mode), 'w+') as f:
                  f.write(filedata)
          shutil.copy("./jhmlam-Inching-933f839/Notebook/Application/cgo_arrow.py", "%s/cgo_arrow.py" %(Benchmarking_folder))
          i_mode+=1


protein_xyz = protein_xyz_

# --- Download the predictions ---
import locale
locale.getpreferredencoding = lambda: "UTF-8"


import os
timestamp = str(time.time()).split(".")[0]
print('Compressing as zip...')
os.system(f"zip -q -r /content/InchingResult_%s.zip /content/Result/" %(timestamp))
print("File size: %s MB" %(os.path.getsize("/content/InchingResult_%s.zip"%(timestamp))/1024/1024))


if save_to_google_drive:

  from pydrive.drive import GoogleDrive
  from pydrive.auth import GoogleAuth
  from google.colab import auth
  from oauth2client.client import GoogleCredentials
  auth.authenticate_user()
  gauth = GoogleAuth()
  gauth.credentials = GoogleCredentials.get_application_default()
  #print("You are logged into Google Drive and are good to go!")

  from google.colab import drive
  drive.mount('/content/gdrive')
  import shutil

  shutil.copy2("InchingResult_%s.zip" %(timestamp),"/content/gdrive/MyDrive/InchingResult_%s.zip"%(timestamp))

  from subprocess import getoutput
  from IPython.display import HTML, display
  !apt-get install xattr > /dev/null

  #!xattr -p 'user.drive.id' '/content/gdrive/MyDrive/'
  # NOTE There can be a time lag to upload to gdrive? We need to listen for finish...


  def get_shareable_link(file_path, return_URL = False):
    fid = getoutput("xattr -p 'user.drive.id' " + "'" + file_path + "'")
    #print(fid, file_path)
    if return_URL:
      return HTML(f"<a href=https://drive.google.com/file/d/{fid} target=_blank>Click HERE to download the InchingResult.zip file from your drive. Enjoy!</a>")
    else:
      return fid

  print("Preparing Google link address...")
  shareable_link = get_shareable_link("/content/gdrive/MyDrive/InchingResult_%s.zip" %(timestamp))
  while "local-" in shareable_link:
    time.sleep(2)
    shareable_link = get_shareable_link("/content/gdrive/MyDrive/InchingResult_%s.zip" %(timestamp))
    time.sleep(2)
  display(get_shareable_link("/content/gdrive/MyDrive/InchingResult_%s.zip" %(timestamp), return_URL = True))
  #print("/content/gdrive/MyDrive/InchingResult_%s.zip" %(timestamp))

else:
  files.download("/content/InchingResult_%s.zip" %(timestamp))


Printing animations...


100%|██████████| 7/7 [00:00<00:00, 56.12it/s]
100%|██████████| 7/7 [00:00<00:00, 41.05it/s]
100%|██████████| 7/7 [00:00<00:00, 56.41it/s]
100%|██████████| 7/7 [00:00<00:00, 52.21it/s]
100%|██████████| 7/7 [00:00<00:00, 52.08it/s]


Compressing as zip...




File size: 231.99533081054688 MB




Mounted at /content/gdrive
Preparing Google link address...


# Download the source code

We will be offering Inching and its associated eigensolvers as an open-source software once the publication is solid. Meanwhile, reviewers can download the zipped source code with a password `AAAAA10115`, but please do not distribute it before publication. Cite us if you find anything useful or inspirational!

In [None]:
files.download(f'/content/Inching-main.zip')#@markdown Click play button to Download. Remember the password `AAAAA10115`

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## FAQ & Troubleshooting



*   How long will this take?
    *   Downloading the Inching source code can take up to a few minutes.
    *   Downloading and installing through Conda can take up to a few minutes.
    *   Calculation of modes can take minutes to hours, depending on the length of your protein and on which GPU-type Colab has assigned you.
*   My Colab no longer seems to be doing anything, what should I do?
    *   Some steps may take minutes to hours to complete.
    *   If nothing happens or if you receive an error message, try restarting your Colab runtime via _Runtime_ > _Restart runtime_.
    *   If this doesn’t help, try resetting your Colab runtime via _Runtime_ > _Factory reset runtime_.
*   How does this compare to a desktop version of Inching?
    *   The eigenpair should be within error bound as long as conda did its version control.
*   What is a Colab?
    *   See the [Colab FAQ](https://research.google.com/colaboratory/faq.html).
*   I received a warning “Notebook requires high RAM”, what do I do?
    *   The resources allocated to your Colab vary. See the [Colab FAQ](https://research.google.com/colaboratory/faq.html) for more details.
    *   You can execute the Colab nonetheless.
*   I received an error “Colab CPU runtime not supported” or “No GPU/TPU found”, what do I do?
    *   Colab CPU runtime is not supported. Try changing your runtime via _Runtime_ > _Change runtime type_ > _Hardware accelerator_ > _GPU_.
    *   The type of GPU allocated to your Colab varies. See the [Colab FAQ](https://research.google.com/colaboratory/faq.html) for more details.
    *   If you receive “Cannot connect to GPU backend”, you can try again later to see if Colab allocates you a GPU.
    *   [Colab Pro](https://colab.research.google.com/signup) offers priority access to GPUs.
*   I received an error “ModuleNotFoundError: No module named ...”, even though I ran the cell that imports it, what do I do?
    *   Colab notebooks on the free tier time out after a certain amount of time. See the [Colab FAQ](https://research.google.com/colaboratory/faq.html#idle-timeouts). Try rerunning the whole notebook from the beginning.
*   Does this tool install anything on my computer?
    *   No, everything happens in the cloud on Google Colab.
    *   At the end of the Colab execution a zip-archive with the obtained prediction will be automatically downloaded to your computer.
*   How should I share feedback and bug reports?
    *   Please share any feedback and bug reports as an [issue](https://github.com/jhmlam/Inching/issues) on Github.

