## Missing loops

> This notebook is intended to be run in Colab: [![https img shields io badge colabs colab_notebooks 2Fcolab thread_by_AF2_cannibalism ipynb f9ab00 logo googlecolab](https://img.shields.io/badge/colabs-colab_notebooks%2Fcolab--thread_by_AF2_cannibalism.ipynb-f9ab00?logo=googlecolab)](https://colab.research.google.com/github/matteoferla/pyrosetta_help/blob/master/colab_notebooks/colab-thread_by_AF2_cannibalism.ipynb)

This notebook adds the missing parts of a given structure by cannibilising fragments of the AlphaFold2 model.

As it operates by threading, the uniprot to be threaded can be of a close homologue as threading is actually
intended to be used. Do note that the alignment is done with Biopython's PairWise2, which
not really suitable for distant homologues.

This is intended for aestetic purposes only as adding missing loops is bad and this approach is very quirky.
The workings are explained in [this blog post](https://blog.matteoferla.com/2021/10/filling-missing-loops-by-cannibalising.html).

#### Resource location
This notebook is from the repository [pyrosetta help](https://github.com/matteoferla/pyrosetta_help)
<!--- silly badges: --->
[![https img shields io github forks matteoferla pyrosetta_help label Fork style social logo github](https://img.shields.io/github/forks/matteoferla/pyrosetta_help?label=Fork&style=social&logo=github)](https://github.com/matteoferla/pyrosetta_help)
[![https img shields io github stars matteoferla pyrosetta_help style social logo github](https://img.shields.io/github/stars/matteoferla/pyrosetta_help?style=social&logo=github)](https://github.com/matteoferla/pyrosetta_help)
[![https img shields io github watchers matteoferla pyrosetta_help label Watch style social logo github](https://img.shields.io/github/watchers/matteoferla/pyrosetta_help?label=Watch&style=social&logo=github)](https://github.com/matteoferla/pyrosetta_help)
[![https img shields io github last commit matteoferla pyrosetta_help logo github](https://img.shields.io/github/last-commit/matteoferla/pyrosetta_help?logo=github)](https://github.com/matteoferla/pyrosetta_help)
[![https img shields io github license matteoferla pyrosetta_help logo github](https://img.shields.io/github/license/matteoferla/pyrosetta_help?logo=github)](https://github.com/matteoferla/pyrosetta_help/raw/master/LICENCE)
[![https img shields io github release date matteoferla pyrosetta_help logo github](https://img.shields.io/github/release-date/matteoferla/pyrosetta_help?logo=github)](https://github.com/matteoferla/pyrosetta_help)
[![https img shields io github commit activity m matteoferla pyrosetta_help logo github](https://img.shields.io/github/commit-activity/m/matteoferla/pyrosetta_help?logo=github)](https://github.com/matteoferla/pyrosetta_help)
[![https img shields io github issues matteoferla pyrosetta_help logo github](https://img.shields.io/github/issues/matteoferla/pyrosetta_help?logo=github)](https://github.com/matteoferla/pyrosetta_help)
[![https img shields io github issues closed matteoferla pyrosetta_help logo github](https://img.shields.io/github/issues-closed/matteoferla/pyrosetta_help?logo=github)](https://github.com/matteoferla/pyrosetta_help)

#### Authors, funding and affiliations
_Matteo Ferla_: Taylor group / Oxford Genomic Medicine theme, Wellcome Centre for Human Genetics, University of Oxford
[WCHG](https://www.well.ox.ac.uk/people/matteo-ferla)
[![https img shields io badge orcid 0000 0002 5508 4673 a6ce39 logo orcid](https://img.shields.io/badge/orcid-0000--0002--5508--4673-a6ce39?logo=orcid)](https://orcid.org/0000--0002--5508--4673) [![https img shields io badge google scholar gF bp_cAAAAJ success logo googlescholar](https://img.shields.io/badge/google--scholar-gF--bp_cAAAAJ-success?logo=googlescholar)](https://scholar.google.com/citations?user=gF--bp_cAAAAJ&hl=en) [![https img shields io twitter follow matteoferla label Follow logo twitter](https://img.shields.io/twitter/follow/matteoferla?label=Follow&logo=twitter)](https://twitter.com/matteoferla) [![https img shields io stackexchange stackoverflow r 4625475 logo stackoverflow](https://img.shields.io/stackexchange/stackoverflow/r/4625475?logo=stackoverflow)](https://stackoverflow.com/users/4625475) [![https img shields io stackexchange bioinformatics r 6322 logo stackexchange](https://img.shields.io/stackexchange/bioinformatics/r/6322?logo=stackexchange)](https://bioinformatics.stackexchange.com/users/6322) [![https img shields io badge email gmail informational logo googlemail](https://img.shields.io/badge/email-gmail-informational&logo=googlemail)](https://mailhide.io/e/Ey3RNO2G) [![https img shields io badge email Oxford informational logo googlemail](https://img.shields.io/badge/email-Oxford-informational&logo=googlemail)](https://mailhide.io/e/Y1dbgyyE)
![Ox](https://upload.wikimedia.org/wikipedia/en/thumb/2/2f/University_of_Oxford.svg/132px-University_of_Oxford.svg.png)

In [None]:
#@title Installation
#@markdown Press the play button on the top right hand side of this cell
#@markdown once you have checked the settings.
#@markdown You will be notified that this notebook is not from Google, that is normal.
!pip install rdkit-pypi rdkit-to-params biopython pyrosetta-help
!pip install --upgrade plotly

#@markdown Send error messages to errors.matteoferla.com for logging?
#@markdown See [notebook-error-reporter repo for more](https://github.com/matteoferla/notebook-error-reporter)
report_errors = False #@param {type:"boolean"}
if report_errors:
    !pip install notebook-error-reporter
    from notebook_error_reporter import ErrorServer

    es = ErrorServer(url='https://errors.matteoferla.com', notebook='fragmenstein')
    es.enable()

import os
from importlib import reload
import pyrosetta_help as ph
from typing import *

# Muppet-proofing: are we in colab?
assert ph.get_shell_mode() == 'colab', 'This is a colab notebook, if running in Jupyter notebook, the installation is different'

# ============================================================================
#@markdown ### Use Google Drive
#@markdown Optionally store your results in your google drive.
#@markdown If `use_drive` is True, you will be prompted to give permission to use Google Drive
#@markdown (you may be prompted to follow a link and possibly authenticate and then copy a code into a box)
#@markdown —**_always_ remember to check strangers' code against data theft**:
#@markdown e.g. search and look for all instances of `http`, `requests` and `post` in the code, and
#@markdown make sure the creator is not typosquatting as someone else (e.g. username `Coogle`).
use_drive = True  #@param {type:"boolean"}
if use_drive:
    ph.mount_google_drive(use_drive)

# ============================================================================
#@markdown ### PyRosetta installation
#@markdown This will install PyRosetta in this Colab notebook (~2–15 minutes depending on time of day),
#@markdown but you will require a [PyRosetta licence](https://www.pyrosetta.org/home/licensing-pyrosetta)
#@markdown (free for academics).
#@markdown to speed things up _next_ time you can download a release into your Google Drive.
#@markdown Use Google Drive for PyRosetta:

#@markdown &#x1F44D; maybe faster next time

#@markdown &#x1F44E; occupies some 10 GB, so you'll need to be on the 100 GB plan of Google Drive (it's one pound a month).

download_to_drive = False #@param {type:"boolean"}
download_path = '.' #@param {type:"string"}
#@markdown If this is the next-time, `download_to_drive` and the credentials below will be ignored if
#@markdown there's a release in `download_path`.

#@markdown The following is not the real username and password. However, the format is similar.
username = 'boltzmann'  #@param {type:"string"}
username.strip().lower()
password = 'constant'  #@param {type:"string"}
#@markdown If yours are not the the academic credentials
#@markdown disable this:
hash_comparison_required = True #@param {type:"boolean"}
#@markdown &#128544; THIS FLAG IS NOT PREVENTING YOU FROM USING PLAIN ROSETTA CREDENTIALS
#@markdown AS THE CUSTOM ERROR SAYS! **REGULAR ROSETTA CREDENTIALS DO NOT WORK FOR PYROSETTA.**

if download_to_drive and not use_drive:
    raise ValueError('You said False to `use_drive` and True to `download_to_drive`? Very funny.')
elif download_to_drive:
    ph.download_pyrosetta(username=username,
                          password=password,
                          path=download_path,
                          hash_comparison_required=hash_comparison_required)
else:
    pass

ph.install_pyrosetta(username=username,
                     password=password,
                     path=download_path,
                     hash_comparison_required=hash_comparison_required)
reload(ph)

# ??????? NOTE ??????????????????????????????????????????????????????????????????????
# ? Note to code spies
# ? this is a convoluted way to install pyrosetta via pyrosetta_help, due to options.
# ? the quicker way is:
# ?
# ? >>> pip install pyrosetta-help
# ? >>> install_pyrosetta -u xxx -p xxx
# ??????????????????????????????????????????????????????????????????????????????????

# disable as appropriate:
!pip install py3Dmol
# or:
!pip install nglview
from google.colab import output  # noqa (It's a colaboratory specific repo)
output.enable_custom_widget_manager()

# Colab still runs on 3.7
# hack to enable the backport:
import sys
if sys.version_info.major != 3 or sys.version_info.minor < 8:
    !pip install singledispatchmethod
    import functools
    from singledispatchmethod import singledispatchmethod  # noqa it's okay, PyCharm, I am not a technoluddite
    functools.singledispatchmethod = singledispatchmethod
    !pip install typing_extensions
    import typing_extensions
    import typing
    for key, fun in typing_extensions.__dict__.items():
      if key not in typing.__dict__:
        setattr(typing, key, fun)

# refresh imports
import site
site.main()

In [None]:
#@title Start PyRosetta
import pyrosetta
import pyrosetta_help as ph

#@markdown Do not optimise hydrogen on loading:
no_optH = False #@param {type:"boolean"}
#@markdown Ignore (True) or raise error (False) if novel residue (e.g. ligand) —  **don't tick this**.
ignore_unrecognized_res=False  #@param {type:"boolean"}
#@markdown Use autogenerated PDB residues are often weird (bad geometry, wrong match, protonated etc.): —best do it properly and parameterise it, so **don't tick this**.
load_PDB_components=False  #@param {type:"boolean"}
#@markdown Ignore all waters:
ignore_waters=True  #@param {type:"boolean"}

extra_options= ph.make_option_string(no_optH=no_optH,
                                  ex1=None,
                                  ex2=None,
                                  mute='all',
                                  ignore_unrecognized_res=ignore_unrecognized_res,
                                  load_PDB_components=load_PDB_components,
                                  ignore_waters=ignore_waters)


# capture to log
logger = ph.configure_logger()
pyrosetta.init(extra_options=extra_options)

In [None]:
#@markdown ## Load template
pdb_code = '' #@param {type:"string"}

#@markdown Alternatively, leave blank to ypload PDB file of the template structure.
from google.colab import files

#@markdown If the template has novel ligands, they will be loaded too.
#@markdown But to do this a residue type  (=topology) needs to be made or loaded.
#@markdown These are saved as "params files".
#@markdown These following options control both the "acceptor" and "donor" poses (if uploaded).

#@markdown ### Params

#@markdown * Some compounds are parameterised in the database folder of rosetta,
#@markdown others in the PDB component database (if loaded).
#@markdown * Uses the params defined in the cell of the acceptor pose.
#@markdown * If there is no topology avalaible one will be made.
#@markdown * If a params file is present in the working folder it will use it.
#@markdown * See below or visit https://params.mutanalyst.com/ to generate them (upload the with the folder icon on the left).

#@markdown This forces it (a bit silly):
force_parameterisation = False  #@param {type:"boolean"}
#@markdown If it needs to be parameterised make it protonated for pH 7?
neutralize_params=True #@param {type:"boolean"}
save_params=True #@param {type:"boolean"}


#@markdown If a params file is present in the working folder it will use it.
#@markdown Leave this blank... otherwise  (comma separated w/ no rando spaces):
extra_params_files_to_use = '' #@param {type:"string"}
extra_params = [f for f in extra_params_files_to_use.split(',') if f]
use_all_folder_params= True  #@param {type:"boolean"}
if use_all_folder_params:
  present_params = [filename for filename in os.listdir() if os.path.splitext(filename) == '.params']
else:
  present_params = []
print('loading pose...')
if pdb_code:
  pdb_filename = ph.download_pdb(pdb_code)
  with open(pdb_filename, 'r') as fh:
    pdbblock = fh.read()
else:
  uploaded = files.upload()
  assert len(uploaded) ==1, 'wrong number of files (only one plz)'
  pdbblock = list(uploaded.values())[0]
template_pose = ph.parameterized_pose_from_pdbblock(pdbblock,
                                                                  wanted_ligands = [],
                                                                  force_parameterisation=force_parameterisation,
                                                                  neutralize_params=neutralize_params,
                                                                  save_params=save_params,
                                                                  overriding_params=extra_params+present_params)

In [None]:
#@markdown ## Case A: Steal from AlphaFold2 (monomer)
#@markdown This section assumes the easy case of the presence of a single peptide,
#@markdown for the more complex case, skip and see the next cell.
#@markdown A uniprot accession is something like P08684.
#@markdown I am aware that RSCB PDB API does give back Uniprot ids,
#@markdown but it is too much faff and its simpler asking the user!

uniprot = '' #@param {type:"string"}
af_pose = ph.pose_from_alphafold2(uniprot)
fragsets = ph.make_fragment_sets(af_pose)
threaded, threader, threadites = ph.thread(target_sequence=af_pose.sequence(),
                                  template_pose=template_pose,
                                  fragment_sets=fragsets)
ph.steal_ligands(template_pose, threaded)
threaded.dump_pdb('threaded.pdb')

In [None]:
#@markdown ## Case B: Steal from AlphaFold2 (complex)
#@markdown This section assumes the complex case of the presence of a multiple peptides (one per chain),
#@markdown Otherwise see previous cell.

#@markdown Write chain correspondence as a comma or semicolon separated series of chain colon uniprot, 
#@markdown e.g. `A:P63092,B:P62873,G:P59768`. Due to ambiguity spaces are ignored.
uniprots = '' #@param {type:"string"}
keep_unlisted = '' #@param {type:"boolean"}
chain2uniprot = dict([p.split(':') for p in uniprots.replace(' ', '').replace(';', ':').split(',')])
combo_threaded = pyrosetta.Pose()
clean_template_pose = template_pose.clone()
pyrosetta.rosetta.core.pose.remove_nonprotein_residues(clean_template_pose)
for chain in clean_template_pose.split_by_chain():
    pdb_info = chain.pdb_info()
    letter = pdb_info.chain(1)
    if letter not in chain2uniprot and keep_unlisted:
        pyrosetta.rosetta.core.pose.append_pose_to_pose(combo_threaded, chain, True)
    elif letter not in chain2uniprot:
        pass # byee
    uniprot = chain2uniprot[letter]
    af_pose = ph.pose_from_alphafold2(uniprot)
    fragsets = ph.make_fragment_sets(af_pose)
    threaded, threader, threadites = ph.thread(target_sequence=af_pose.sequence(),
                                      template_pose=chain,
                                      fragment_sets=fragsets)
    pyrosetta.rosetta.core.pose.append_pose_to_pose(combo_threaded, threaded, True)
ph.steal_ligands(template_pose, threaded)
threaded.dump_pdb('threaded.pdb')

In [None]:
#@title Download PDB
files.download('threaded.pdb')

In [None]:
#@title Upload to Michelanglo (optional)
#@markdown [Michelanglo](https://michelanglo.sgc.ox.ac.uk/) is a website that
#@markdown allows the creation, annotation and sharing of a webpage with an interactive protein viewport.
#@markdown ([examples](https://michelanglo.sgc.ox.ac.uk/gallery)).
#@markdown The created pages are private —they have a 1 in a quintillion change to be guessed within 5 tries.

#@markdown Registered users (optional) can add interactive annotations to pages.
#@markdown A page created by a guest is editable by registered users with the URL to it
#@markdown (this can be altered in the page settings).
#@markdown Leave blank for guest (it will not add an interactive description):

username = ''  #@param {type:"string"}
password = ''  #@param {type:"string"}

import os
assert not os.system(f'pip3 install michelanglo-api')
import site
site.main()
from michelanglo_api import MikeAPI, Prolink
if not username:
  mike = MikeAPI.guest_login()
else:
  mike = MikeAPI(username, password)
    
page = mike.convert_pdb(pdbblock=ph.get_pdbstr(nicked.acceptor_pose),
                        data_selection='ligand',
                        data_focus='residue',
                        )
if username:
     page.retrieve()
     ligands_prolink = Prolink(text='migrated ligand(s)',
                               focus='residue',
                               selection=' or '.join(wanted_ligands))
     
     page.description = '## Description\n\n'
     page.description += f'Model with {ligands_prolink} from {code} (chain {donor_chain}),'
     page.description += 'namely:\n\n'
     for lig in wanted_ligands:
         page.description += '* '+ Prolink(text=lig,
                                          focus='residue',
                                          selection=lig) +'\n'
     page.commit()
page.show_link()