# Protein Structure Prediction Practical

I choose the target 1 and target 4 proteins to find a structure prediction:

## Target 1

The target protein structure:

```text
> Target 1, 202 residues
QNYVFTSGNGIGGNGFTARVMAKVGTSYITFPCAYDAKLISFNGTVQGNITIKVRSRDIS
GIPLDQQTTNPTFTFWLCRDTTNSSLKNFSNCSNNQFTPNKTTVISKTNPNGLNAWVTST
DQNWYLGNTSGDTIYGVPNGDGKLSGTYLYSVVFSVVQFYGTSMPRLAFTVNLTPTGANS
NHLYVPDNNAAVTIGPVKFWY
```

I have done the following steps:

1. **Identify structural templates**. For that I perform a search on **HHPRED** with this target. The best alignments are shown in the top two line on the figure \ref{fig:fig1}. These alignments show that there are bad similarity, in the domain of the twilight zone.

 So, the templates proposed by HHPRED are the proteins "2VTW" and "2IUM", both with a very good E-value and probability, but bad identity.
The respective alignments are shown in figures \ref{fig:fig2} and \ref{fig:fig3}.

 A more laborious search in BLAST (PSI-BLAST and so on) reveal the results of figure \ref{fig:fig11}, where the 2VTW protein is also found according to the HHPRED result. Again the identity is low: 20% for BLAST and 21% for HHPRED.
 
 Can we really rely on these proteins as structural templates? Notice that the coverage is low, in the case of 2VTW: 39%. This may seem a disadvantage, but if we take a look at the 3D structure of this protein we see that it consists of six chains, and our alignment takes place in only one of them. That is, our structural templates actually participate in only one sixth of the sequence.
 
 According to HHPRED documentation, the estimated probability of the template is the most relevant criterion to decide whether a template is actually homologous or just a high-scoring chance hit. When it is larger than 95%, say, the homology is nearly certain. This is the case.
 
 Also both possible templates are very close in their secondary structure, leaving aside that 2VTW consists of 6 chains and IUM of 3, so we definitely rule out that the alignments are not due to chance.
 

2. But could we model the target protein completely or we have regions that could be modeled independently or even not modeled? The search for Signal Peptide and transmembrane regions produces null results at **SignalP** (figure \ref{fig:fig5}) and **TMHMM** websites (figure \ref{fig:fig4}). **Phobius** website (figure \ref{fig:fig6}) is aligned with these predictions. This implies that we can trust the two previous templates.


3. As the template proteins seem adequate to produce model, I direct **HHPRED** top accomplish this task, one model based on 2VTW protein and another combining 2VTW and 2IUM proteins.


4. I verify the alignments on **Pymol** and they seem accurate. Figure \ref{fig:fig7} shows how similar both model are. Figures \ref{fig:fig8} and \ref{fig:fig9} show the extremely close alignment between the models and their respective templates.


5. For a last investigation I launch the prediction on a generalist server **Robetta** (figure \ref{fig:fig12}) , that confirms the choose of templates, the absence of transmembrane or signal peptides and low disorder.


6. **Quality of the model** (2VTW).  A query in **PROQ** (figure \ref{fig:fig13}), revealed a good model quality.

## Target 4

The target protein structure:

```text
>Target 4, 154 residues
MKKVLLLVFLSLTWLSASAQALVPITWTAYGLTFEAPKGILVEEDTEETFLLNNSRFYIT
IQSLDSDGMTKSDLKSVLKDYANDDGVKDQSAVQEFELPQFFGTYLKGSCETDHCLYACL
MTKAAGSGFYISIIYSKENENIAEKILKSFTMEE
```

I have done the following steps:

1. First I perform a search on **HHPRED** with this target. The best alignments are shown in the two top line on the figure \ref{fig:fig20}. These alignments are on the frontier of the twilight zone, but the HHPRED probability is 100%, so as I commented before, we can have confidence on the results.

 So, I choose as template the protein "3HLZ", with a very good E-value and probability.
The alignment is shown in figure \ref{fig:fig21}.


2. The search for Signal Peptide and transmembrane regions produces signal peptide detection at **SignalP** website (figure \ref{fig:fig22}) and **Phobius** website (figure \ref{fig:fig23}). Both agree in that we have probably a Signal Peptide. To obtain a good model we needed to cut off out protein at a cleavage site at amino acid 22. No transmembrane regions, the protein is predicted as non cytoplasmic. 


3. As a additional check, we obtain the secondary structure prediction from **Robetta** website: \ref{fig:fig24} 


4. Another check is to take in account the function expected of the target protein and the functions known, if any, for the templates. In this case I don't know the function expected for our protein.


5. I verify the alignments on **Pymol** and they seem accurate. Figure \ref{fig:fig25}. There are some regions that could be refined by hand.


6. Finally, I check the model quality at **PROQ**. Figure \ref{fig:fig26}.



## Figures

\begin{figure}
  \centerline{\includegraphics{prot1/1_hhpred_prot1.png}}
  \caption{\label{fig:fig1} HHPRED output}
\end{figure}

\begin{figure}
  \centerline{\includegraphics{prot1/2_hhpred_ium_alignment.jpg}}
  \caption{\label{fig:fig2} HHPRED alignment with IUM protein}
\end{figure}

\begin{figure}
  \centerline{\includegraphics{prot1/2_hhpred_vtw_alignment.png}}
  \caption{\label{fig:fig3} HHPRED alignment with VTW protein}
\end{figure}

\begin{figure}
  \centerline{\includegraphics{prot1/3_tmhmm_transmembrane_prot1.png}}
  \caption{\label{fig:fig4} TMHMM Transmembrane prediction}
\end{figure}

\begin{figure}
  \centerline{\includegraphics{prot1/3_signalip_signal_peptide_prot1.png}}
  \caption{\label{fig:fig5} SignalP. Signal Peptide prediction}
\end{figure}

\begin{figure}
  \centerline{\includegraphics{prot1/3_phobius_transmembrane_prot1.png}}
  \caption{\label{fig:fig6} Phobius transmembrane and signal peptide prediction}
\end{figure}

\begin{figure}
  \centerline{\includegraphics{prot1/5_pymol_align_models_ium_green_2_vtw_prot1.png}}
  \caption{\label{fig:fig7} Pymol. Both models aligned}
\end{figure}

\begin{figure}
  \centerline{\includegraphics{prot1/5_pymol_align_template_model_ium_prot1.png}}
  \caption{\label{fig:fig8} Pymol. Model (2VTW based) aligned with template}
\end{figure}

\begin{figure}
  \centerline{\includegraphics{prot1/5_pymol_align_template_model_vtw_prot1.png}}
  \caption{\label{fig:fig9} Pymol. Combined model(VTW and 2IUM) aligned with template 2IUM}
\end{figure}

\begin{figure}
  \centerline{\includegraphics{prot1/3_psipred_secondary_prediction_prot1.pdf}}
  \caption{\label{fig:fig10} PSIPRED. Secondary prediction}
\end{figure}

\begin{figure}
  \centerline{\includegraphics{prot1/2_blast_prot1.png}}
  \caption{\label{fig:fig11} BLAST computed similar sequences.}
\end{figure}

\begin{figure}
  \centerline{\includegraphics{prot1/2_robetta_prot1.png}}
  \caption{\label{fig:fig12} Robetta global prediction.}
\end{figure}

\begin{figure}
  \centerline{\includegraphics{prot1/ProQ_vtw_model_prot1.png}}
  \caption{\label{fig:fig13} ProQ quality estimation.}
\end{figure}


\begin{figure}
  \centerline{\includegraphics{prot4/HHPRED_global.png}}
  \caption{\label{fig:fig20} HHPRED output}
\end{figure}

\begin{figure}
  \centerline{\includegraphics{prot4/HHPRED_3HLZ.png}}
  \caption{\label{fig:fig21} HHPRED alignment with 3HLZ protein}
\end{figure}

\begin{figure}
  \centerline{\includegraphics{prot4/Phobius_prot4.png}}
  \caption{\label{fig:fig22} Phobius prediction}
\end{figure}

\begin{figure}
  \centerline{\includegraphics{prot4/SignalP_prot4.png}}
  \caption{\label{fig:fig23} SignalP output}
\end{figure}

\begin{figure}
  \centerline{\includegraphics{prot4/02_robetta_prot4.png}}
  \caption{\label{fig:fig24} Robetta output}
\end{figure}

\begin{figure}
  \centerline{\includegraphics{prot4/PYMOL_model2_3hlz.png}}
  \caption{\label{fig:fig25} PYMOL 3D alignment with 3HLZ protein}
\end{figure}

\begin{figure}
  \centerline{\includegraphics{prot4/ProQ_model_prot4.png}}
  \caption{\label{fig:fig26} PROQ 3HLZ template based model quality}
\end{figure}

\begin{figure}
  \centerline{\includegraphics{prot4/I-TASSER results.pdf}}
  \caption{\label{fig:fig27} I-TASSER predictions}
\end{figure}


In [144]:
%%bash
cd /Users/nandoide/Desktop/uni/STRBI.practical
jupyter nbconvert --to=latex --template=~/report.tplx structure_prediction.ipynb
pdflatex -shell-escape structure_prediction
jupyter nbconvert --to html_with_toclenvs structure_prediction.ipynb

This is pdfTeX, Version 3.14159265-2.6-1.40.19 (TeX Live 2018) (preloaded format=pdflatex)
 \write18 enabled.
entering extended mode
(./structure_prediction.tex
LaTeX2e <2018-04-01> patch level 2
Babel <3.18> and hyphenation patterns for 84 language(s) loaded.
(/usr/local/texlive/2018/texmf-dist/tex/latex/base/article.cls
Document Class: article 2014/09/29 v1.4h Standard LaTeX document class
(/usr/local/texlive/2018/texmf-dist/tex/latex/base/size11.clo))
(/usr/local/texlive/2018/texmf-dist/tex/latex/placeins/placeins.sty)
(/usr/local/texlive/2018/texmf-dist/tex/latex/amsfonts/amssymb.sty
(/usr/local/texlive/2018/texmf-dist/tex/latex/amsfonts/amsfonts.sty))
(/usr/local/texlive/2018/texmf-dist/tex/latex/amsmath/amsmath.sty
For additional information on amsmath, use the `?' option.
(/usr/local/texlive/2018/texmf-dist/tex/latex/amsmath/amstext.sty
(/usr/local/texlive/2018/texmf-dist/tex/latex/amsmath/amsgen.sty))
(/usr/local/texlive/2018/texmf-dist/tex/latex/amsmath/amsbsy.sty)
(/usr/local

[NbConvertApp] Converting notebook structure_prediction.ipynb to latex
[NbConvertApp] Writing 34461 bytes to structure_prediction.tex
[NbConvertApp] Converting notebook structure_prediction.ipynb to html_with_toclenvs
[NbConvertApp] Writing 306094 bytes to structure_prediction.html
