<a href="https://colab.research.google.com/github/keviniu/agent-shutton/blob/main/consurf_copy_kevin.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Welcome to ConSurf Colab.

To run ConSurf press the play button at the left side of screen.

The first run might take a bit long to complete because dependencies have to be installed.


A temporary output directory will be created. The name of this directory is the same as the job_name given, a number may be added if such a directory already exists.

In the results directory there will be a zip with the results. Download and unzip it.

If there is a 3D structure (or model structure) you can create an image with the consevation colors projected on it using PyMOL. There are two structure files with conservation rates in the column of the b factor: In the file ending with _ATOMS_section_With_ConSurf_isd.pdb (or _ATOMS_section_With_ConSurf_isd.cif) areas where there is insufficient data will be colored yellow, in the file ending with _ATOMS_section_With_ConSurf.pdb (or _ATOMS_section_With_ConSurf.cif) these areas will be in the regular color. To view the image open PyMOL, drag one of these files to the PyMOL window, now drag to the window either the file pymol.py or pymol_CBS.py (for color blind results).

Contacts: barakyariv@gmail.com

In [8]:
#@title Run ConSurf


import sys
import traceback
import socket
import getpass
import re
import os
import json
import shutil
import gzip
import requests
import subprocess
import time
import tempfile
import urllib
import tarfile
import shutil


os.environ["TRANSFORMERS_VERBOSITY"] = "error"
os.environ["HF_HUB_DISABLE_TELEMETRY"] = "1"

from zipfile import ZipFile
from pathlib import Path
from IPython.display import HTML, display, IFrame

# Use HTML and CSS to style the input box
display(HTML("""
<style>
    .container { width: 100% !important; }
    .CodeMirror, .output_subarea, .output_area { max-width: 100% !important; }
    .input { width: 100% !important; }
</style>
"""))


bayesInterval = 3
ColorScale = {0 : 9, 1 : 8, 2 : 7, 3 : 6, 4 : 5, 5 : 4, 6 : 3, 7 : 2, 8 : 1}

try:

    # not first run
    root_dir = vars['root_dir']

except:

    # first run
    root_dir = os.getcwd()

vars = {}
form = {}

job_name = 'test' #@param {type:"string"}

vars['root_dir'] = root_dir
os.chdir(vars['root_dir'])

# check if directory already exists
i = 1
temp_job_name = job_name
while os.path.isdir(temp_job_name):

    temp_job_name = job_name + "_" + str(i)
    i += 1

vars['job_name'] = temp_job_name
vars['working_dir'] = vars['root_dir'] + "/" + vars['job_name'] + "/"

# create job directory
os.mkdir(vars['job_name'])

# directories for programs installed
vars['prottest'] = vars['root_dir'] + "/prottest-3.4.2/prottest-3.4.2.jar"
vars['cd_hit_dir'] = vars['root_dir'] + "/cdhit/"
vars['rate4site_dir'] = vars['root_dir'] + "/r4s_for_collab/"
vars['rate4site_slow_dir'] = vars['root_dir'] + "/r4s_for_collab_slow/"

# weights
if not os.path.isfile("WEIGHTS_READY"):

    WEIGHTS = open("WEIGHTS.BIN", 'w')
    WEIGHTS.write("""0.079961
-0.042471 0.284039 -0.115342 -0.059031 -0.057580 -0.221184 -0.622583 -0.632449 0.119240 -0.367390 -0.345141 -0.040769 -0.036835 -0.050943 -0.263045 -0.440234 0.092578 0.067303 -0.382988 -0.104377 0.076840 0.721150 -0.478015 0.034385 -0.033848 -0.226497 -0.232466 -0.785597 0.178759 -0.487048 -0.213410 -0.523454 -0.127242 -0.137284 0.171664 -0.079844 -0.104554 0.033901 -0.293857 -0.404099 -0.416435 0.157385 -0.414229 -0.296172 -0.471579 -0.306119 -0.431075 -0.313670 0.055607 0.035966 0.052927 -0.472520 -0.378305 -0.316330 0.063885 0.354014 -0.078238 0.464703 -0.377103 0.000405 -0.110874 -0.331173 -0.249899 -0.573167 -0.369664 -0.453246 -0.108283 -0.808977 0.148720 -0.042741 -0.036522 -0.367998 -0.844013 -0.248575 0.051849 0.234780 0.165519 0.469451 -0.324361 0.131725 0.080134 0.612314 0.227453 -0.125036 -0.005849 -0.156387 -0.428727 -0.896221 -0.095704 -0.251770 -0.248376 -0.462221 -0.299122 -0.144201 -0.555950 -0.263420 -0.268737 -0.528499 -0.255063 -0.139020 -0.611295 0.444935 -0.231921 -0.223407 -0.116494 -0.461209 -0.569035 -0.954768 0.150339 -0.036701 -0.129399 -0.293487 -0.253250 -0.325909 0.104611 0.242358 -0.045911 0.136231 -0.352202 -0.143680 -0.655280 -0.185911 -0.354350 -0.345799 -0.599220 -0.627056 -0.558502 1.093712 0.625240 0.160620 0.711126 0.442624 -1.440944 0.024459 -0.298804 1.393993 0.355034 0.151769 0.242346 0.240612 0.040324 0.103134 -0.648484 -0.387724 -0.268557 -0.333484 -0.346043 0.052173 0.104572 -1.094794 -0.322809 -0.557991 -0.511697 0.038476 0.007950 0.409273 0.243390 0.131082 0.008020 0.058760 0.003713 0.442034 -0.018652 -0.238415 -0.436684 -0.058566 -0.485227 -0.214224 -0.401662 0.004286 -0.148663 -0.161137 -0.139408 -0.223052 -0.306196 -0.093124 -0.418224 -0.341062 -0.532583 -0.232474 0.040335 -0.233600 -0.630569 -0.420837 -0.149225 -0.531155 -0.133665 -0.410447 -0.292123 0.145357 -0.476807 -0.399644 -0.283946 -0.224842 -0.174200 0.133730 0.139391 0.153037 -0.469379 -0.001143 -0.122711 -0.599835 -0.059480 -0.485474 -0.454370 -0.370191 -0.427881 -0.763049 -0.107038 0.125714 0.329847 -0.015233 -0.325263 0.003464 -0.073701 0.178642 -0.052665 0.244747 -0.477071 0.051825 0.461928 0.173730 0.134013 -0.363544 -0.036649 -0.180880 -0.213129 -0.774550 -0.142420 -0.313132 -0.439931 -0.169179 -0.018252 0.036593 -0.299105 0.000470 -0.200218 -0.123280 -0.425699 -0.251190 0.077114 -0.161261 -0.000334 -0.194496 -0.126588 -0.254180 -0.265544 -0.257622 0.072041 0.012121 -0.353757 -0.269024 -0.305191 -0.194474 -0.036296 -0.301452 -0.083759 -0.390128 -0.208091 -0.614116
-0.037835
-0.135652 -0.097861 -0.124032 -0.082832 -0.079981 -0.080759 -0.080005 -0.112538 -0.108645 -0.153970 -0.090287 -0.127751 -0.022577 -0.054848 -0.075064 -0.061371 -0.065418 -0.162476 -0.188244 -0.219030 -0.155061 -0.223714 -0.122331 -0.133745 -0.064488 -0.145680 -0.072198 -0.051936 -0.168912 -0.109149 -0.005762 -0.122343 0.009825 -0.030152 -0.041595 -0.035915 -0.116701 -0.141694 -0.121796 -0.110433 -0.099315 -0.301401 -0.101548 -0.148282 -0.130996 -0.176345 -0.193819 -0.148637 -0.113018 -0.033919 -0.106760 -0.110076 -0.002691 -0.183740 -0.105777 0.052121 0.015194 0.056617 -0.096041 -0.103306 -0.163592 -0.107840 -0.186439 -0.172847 -0.120203 -0.124392 -0.098864 -0.173437 -0.171424 -0.116866 -0.087735 -0.174578 -0.009553 -0.179103 -0.122766 0.140827 -0.090366 0.012798 -0.161375 -0.099884 -0.232814 -0.222946 -0.134413 -0.200724 -0.201677 -0.152266 -0.128065 -0.079498 -0.192186 -0.043459 -0.015164 -0.007513 -0.062827 -0.185702 -0.154283 -0.082762 -0.118400 -0.259073 0.041187 -0.003137 -0.178665 -0.287254 -0.193866 -0.255345 -0.185832 -0.194552 -0.085949 0.062150 -0.225319 0.075137 -0.002692 -0.054584 -0.052849 -0.220834 -0.100336 0.068750 -0.066442 -0.010368 -0.163883 -0.056490 -0.123533 -0.211744 -0.267799 -0.188097 -0.315561 -0.265491 -0.269421 0.140139 0.330403 0.097782 0.300753 0.379973 -0.374714 -0.150125 -0.128261 0.891383 0.040080 0.036730 0.082033 0.117802 -0.229061 -0.221868 -0.286277 -0.274466 -0.157755 -0.044367 -0.121318 0.132026 -0.180808 -0.205960 -0.037371 -0.210609 0.030780 -0.063130 -0.110591 -0.070211 0.036235 0.101944 0.018493 -0.016074 -0.146690 -0.240073 -0.220570 -0.141108 -0.141487 -0.243177 -0.190067 -0.044916 -0.178520 0.033824 -0.084478 -0.034626 -0.018853 -0.158008 -0.192055 0.039844 -0.106545 -0.020606 -0.007895 -0.255355 -0.207277 -0.350054 -0.202661 -0.213441 -0.223656 -0.177883 -0.191003 -0.089601 -0.213466 -0.018869 -0.042249 -0.096639 -0.022556 -0.214572 -0.145668 0.166606 -0.027856 0.005464 -0.074335 -0.091599 -0.138469 -0.187154 -0.212407 -0.275710 -0.092425 -0.171058 -0.103894 -0.093088 -0.200088 -0.083920 -0.121132 -0.114293 0.058626 -0.038506 -0.121911 0.116106 -0.020284 -0.024348 -0.136227 -0.008320 -0.058348 -0.102498 -0.092001 -0.116864 -0.054950 -0.081698 -0.053132 -0.136699 -0.070402 -0.129130 -0.070749 -0.144664 0.067491 -0.091968 -0.088276 -0.006360 -0.112374 -0.186199 -0.139677 -0.134099 -0.073819 -0.060381 -0.064273 -0.132142 -0.136076 -0.132970 -0.054528 -0.212359 -0.201456 -0.090024 -0.158139 -0.159991 -0.019921 -0.145357 -0.150928 -0.129886 -0.035867 -0.060223 -0.238624 -0.058521
-0.078758
-0.151740 -0.026767 -0.106500 -0.092961 -0.117010 -0.112364 -0.192851 -0.272185 -0.065880 -0.177882 -0.134294 -0.107798 -0.028889 -0.068127 -0.117018 -0.156986 -0.057003 -0.117327 -0.273595 -0.272451 -0.126404 -0.079874 -0.185071 -0.114151 -0.047380 -0.189966 -0.100589 -0.270994 -0.127206 -0.213702 -0.002939 -0.252945 0.014415 -0.038136 0.031142 -0.065138 -0.117974 -0.132185 -0.179534 -0.186496 -0.220966 -0.316856 -0.232701 -0.196354 -0.220504 -0.232680 -0.286300 -0.279093 -0.045867 0.022409 -0.056297 -0.185112 -0.082968 -0.233401 -0.074099 0.141940 0.029905 0.169479 -0.150252 -0.068862 -0.225708 -0.170601 -0.247159 -0.259554 -0.180183 -0.204830 -0.102752 -0.367822 -0.052640 -0.132881 -0.112569 -0.239671 -0.193336 -0.216353 -0.105227 0.182112 -0.041267 0.150941 -0.216789 -0.063747 -0.219734 -0.010456 -0.032662 -0.200514 -0.192833 -0.179559 -0.216095 -0.327812 -0.235238 -0.115399 -0.029876 -0.111954 -0.099778 -0.191547 -0.234776 -0.122411 -0.161412 -0.351707 -0.006454 -0.003536 -0.228784 -0.156429 -0.185776 -0.260639 -0.168103 -0.249610 -0.187173 -0.206471 -0.150917 0.032764 -0.067341 -0.102790 -0.062856 -0.257987 -0.088676 0.088700 -0.072901 0.012187 -0.224291 -0.133150 -0.274977 -0.237442 -0.321321 -0.222992 -0.419615 -0.387955 -0.338747 0.176174 0.563658 0.119931 0.396977 0.419047 -0.651059 -0.131797 -0.166447 1.066683 0.127136 0.110843 0.116229 0.164438 -0.243415 -0.111208 -0.390476 -0.321042 -0.219125 -0.085586 -0.186006 0.114595 -0.080241 -0.397100 -0.096225 -0.391717 -0.085629 -0.038488 -0.093712 -0.014351 0.112260 0.132180 0.005796 0.024345 -0.090793 -0.058719 -0.169727 -0.152258 -0.208756 -0.213912 -0.276227 -0.060369 -0.223690 -0.003176 -0.107536 -0.104605 -0.017045 -0.206944 -0.286270 -0.015647 -0.199786 -0.111666 -0.115525 -0.316241 -0.217539 -0.446536 -0.345833 -0.282246 -0.264920 -0.271352 -0.213021 -0.185402 -0.211945 0.019463 -0.088338 -0.161891 -0.072983 -0.227131 -0.175166 0.213077 0.035432 0.082961 -0.148847 -0.073226 -0.170132 -0.230370 -0.199540 -0.334150 -0.192919 -0.220681 -0.204650 -0.318956 -0.158520 -0.070792 -0.042598 -0.178701 0.026312 -0.033983 -0.119504 0.162432 -0.031498 0.019919 -0.228353 0.018548 0.023433 -0.067525 -0.076803 -0.172050 -0.083896 -0.098566 -0.083301 -0.284714 -0.099721 -0.186871 -0.138397 -0.163712 0.045302 -0.068164 -0.133595 0.011395 -0.125378 -0.198303 -0.214062 -0.205253 -0.065363 -0.073619 -0.079477 -0.167100 -0.169177 -0.173859 -0.077143 -0.268030 -0.169336 -0.060870 -0.222505 -0.168985 -0.096238 -0.180600 -0.157466 -0.160292 -0.017769 -0.116800 -0.260166 -0.170865
-0.524720
0.154513 0.203979 0.048294 0.090322 0.055480 0.078362 0.004012 -0.082407 0.090325 0.031930 -0.051573 0.137165 -0.021265 0.059140 -0.006711 -0.023714 0.100909 0.152038 0.022316 0.074661 0.124397 0.320426 0.032491 0.171462 0.113720 0.113248 0.068659 -0.198500 0.091383 -0.061092 0.021773 -0.082033 0.030394 0.013086 0.090308 0.069989 0.056663 0.073158 -0.027319 -0.093840 -0.077431 -0.005912 0.029071 0.074618 0.095422 0.104539 0.078844 -0.041097 0.130584 0.041358 0.193946 0.008914 -0.007856 0.134291 0.055873 0.031463 0.000300 -0.007945 0.058287 0.126511 0.035880 0.110994 0.150183 0.119831 0.053810 0.079044 0.115316 -0.174369 0.245411 0.051703 0.027911 0.052232 -0.087423 0.166628 0.102278 -0.132092 0.090646 0.000045 0.058830 0.121950 0.214049 0.370781 0.241658 0.247640 0.196595 0.131533 0.070703 -0.229280 0.144735 -0.102751 -0.014604 -0.041123 0.072117 0.152531 0.008596 -0.068417 0.056369 0.034000 -0.090817 -0.039916 0.030354 0.293707 0.187306 0.236166 0.080975 0.129781 -0.069191 -0.211208 0.275564 -0.034266 -0.003660 0.017415 0.050324 0.152961 0.124033 -0.167242 0.007611 0.018227 0.103696 -0.076929 -0.281846 -0.195073 0.075917 0.182325 -0.008900 0.102061 0.131555 -0.213223 -0.024959 -0.131552 -0.289637 -0.448343 -0.019283 0.229526 0.211105 -0.573386 0.142129 0.266419 0.108788 0.061394 0.125213 0.281386 0.138985 0.242696 0.115892 0.016850 0.106856 -0.105255 0.249325 -0.135292 -0.085143 -0.044688 -0.047135 0.132880 0.132071 0.028059 0.054803 -0.108401 -0.050485 -0.059982 0.160326 0.353662 0.329035 0.119034 0.111641 0.263509 0.133962 0.006345 0.123893 -0.138080 -0.018373 -0.055900 0.043093 0.126120 0.053354 -0.095226 -0.019028 -0.149699 -0.097878 0.036043 0.164260 0.118168 0.112959 0.134726 0.169293 0.096612 0.199498 -0.146965 0.218327 0.021544 -0.041629 0.040537 0.054411 0.192245 0.152940 -0.138435 0.077090 0.002609 -0.052208 -0.042528 0.129684 0.077185 0.186645 0.187370 0.038889 0.069162 0.061597 -0.197347 0.194808 0.041196 0.135066 0.028764 0.018190 0.100267 0.093109 -0.071298 -0.059168 -0.080016 -0.016150 -0.056207 0.124955 0.205059 0.164676 0.046918 0.070075 0.069983 0.097966 -0.200358 0.070685 -0.064280 -0.069341 0.017293 -0.039881 0.107536 0.071292 0.005646 0.025108 0.040541 0.042647 -0.043777 0.070598 0.044540 0.131586 0.078689 0.119421 0.067460 0.038218 -0.061153 0.186128 0.107766 0.024194 0.086200 -0.015730 0.051704 0.123282 0.057591 0.016769 -0.041757 0.156185 -0.064568
-3.392679
-0.045022 0.073616 -0.000408 0.160156 0.154186 0.251712 -0.033704 0.051172 0.428546 -0.027031 0.327703 0.021208 0.168122 0.315747 0.277537 0.200548 0.188944 0.206525 0.117451 -0.059984 0.010269 0.076054 -0.116692 0.055683 0.035898 -0.299356 0.019773 0.022074 0.151251 -0.128346 0.189414 0.179334 0.022655 0.420717 0.212447 -0.097653 0.027509 0.335759 0.155124 0.072035 0.129656 0.194478 0.083952 0.000398 0.097400 0.171972 -0.201141 -0.174791 0.170778 0.054058 0.211608 0.156615 0.046134 -0.399402 0.050863 0.202286 0.247494 0.491820 0.060935 0.276054 0.191318 0.260353 0.260102 -0.105599 0.190812 0.372459 0.169546 0.003022 -0.001262 -0.016418 0.311207 -0.183385 -0.150570 -0.070429 0.143529 0.720439 0.001143 0.513456 -0.137016 -0.023083 0.032894 0.193806 0.358182 -0.048925 -0.000153 0.156751 -0.013043 -0.082260 0.188459 0.415768 0.342516 0.154946 -0.068848 0.067780 -0.062038 0.250291 0.269551 0.155357 0.460478 0.548554 -0.044765 -0.001985 0.162424 -0.195813 -0.001534 0.126595 0.492953 -0.175303 0.008005 -0.043580 -0.012260 0.180904 -0.022612 0.128195 0.441880 0.617810 0.324884 0.616176 -0.232894 0.227519 1.505843 1.686601 1.598056 0.807867 1.273509 0.789232 0.309206 0.999822 2.621622 -0.367464 1.648606 1.451281 1.094744 -0.841376 -3.625247 -2.927335 -0.831452 -5.212661 -0.698117 -4.950808 0.041992 0.266384 0.049116 -0.118177 0.074138 0.288970 0.073334 -0.376111 0.019275 -0.747190 0.097097 -0.066781 0.178590 0.008632 0.435278 0.655038 0.419311 0.507827 0.217161 0.160956 0.462550 0.492895 0.473460 0.268630 0.258759 -0.016715 -0.217376 -0.238036 0.228603 -0.117788 0.100081 0.009037 0.361601 -0.429419 0.013617 0.574723 0.139799 0.557784 0.217710 -0.006795 0.500248 0.375827 0.344209 0.507256 0.275542 0.123528 0.290640 -0.282119 0.099417 -0.258653 -0.166433 -0.018108 0.248410 0.028849 -0.014118 0.820778 0.024199 0.359858 0.022365 0.035588 0.307911 0.280823 0.333445 0.036463 0.282302 0.114528 0.100341 0.043328 0.120032 -0.315219 -0.057145 -0.162257 0.410823 0.071707 -0.333178 0.150516 0.032312 0.305116 -0.204997 0.176935 0.351771 -0.063635 0.094086 -0.123456 0.088026 0.089180 -0.121424 -0.073208 0.128705 -0.018277 0.069866 -0.232850 0.223583 -0.072594 -0.028345 0.504329 0.003772 0.019249 -0.166544 0.014105 0.065732 0.069389 0.255167 0.082785 -0.132714 0.139074 0.188195 0.022171 0.127342 0.017475 0.309935 -0.111449 0.114716 0.092440 0.180858 -0.008111 0.451164 0.344322 -0.038933 -0.000417
-0.041590
-0.132178 -0.125809 -0.129481 -0.083871 -0.060361 -0.056246 -0.038208 -0.072517 -0.142981 -0.126048 -0.068757 -0.130837 -0.015395 -0.052602 -0.052098 -0.038161 -0.067340 -0.166669 -0.117403 -0.214311 -0.136092 -0.266236 -0.094254 -0.136102 -0.063053 -0.117778 -0.048600 0.028962 -0.214789 -0.074492 -0.005237 -0.048622 0.016390 -0.015761 -0.068684 -0.034536 -0.112518 -0.137535 -0.110367 -0.056196 -0.042167 -0.255057 -0.065883 -0.118502 -0.115106 -0.150318 -0.158210 -0.090196 -0.149499 -0.041800 -0.127829 -0.058796 0.023974 -0.148427 -0.118233 0.014338 0.010101 0.004113 -0.060837 -0.108300 -0.145160 -0.108116 -0.172288 -0.141194 -0.102221 -0.096769 -0.103515 -0.078920 -0.181267 -0.095226 -0.074094 -0.123623 0.051999 -0.163724 -0.133526 0.122587 -0.103489 -0.043332 -0.137697 -0.125252 -0.240598 -0.298775 -0.169125 -0.206577 -0.204941 -0.143264 -0.098425 0.020087 -0.163173 -0.020891 -0.005268 0.046792 -0.056783 -0.177118 -0.103821 -0.057102 -0.091597 -0.225117 0.068481 -0.006780 -0.138751 -0.330845 -0.199337 -0.242210 -0.177021 -0.158106 -0.035243 0.142816 -0.225250 0.075833 0.017431 -0.048766 -0.036734 -0.199296 -0.106929 0.040741 -0.058131 -0.022288 -0.155812 -0.031227 -0.030030 -0.211709 -0.248420 -0.174684 -0.245596 -0.217578 -0.236770 0.121377 0.206832 0.089238 0.261187 0.365860 -0.259989 -0.160865 -0.110200 0.787453 0.000391 -0.007308 0.059851 0.086278 -0.206551 -0.248948 -0.211574 -0.244159 -0.131569 -0.021120 -0.110736 0.136446 -0.210287 -0.106094 -0.024033 -0.140486 0.062489 -0.064616 -0.128236 -0.099440 0.015303 0.069268 0.027189 -0.045384 -0.174615 -0.322549 -0.252268 -0.129119 -0.127833 -0.235098 -0.152967 -0.047136 -0.158150 0.063726 -0.054265 0.015198 -0.009136 -0.140124 -0.157710 0.054738 -0.071698 0.022549 0.034080 -0.220900 -0.194509 -0.324938 -0.145107 -0.181066 -0.199487 -0.130370 -0.186619 -0.052283 -0.189933 -0.035983 -0.022248 -0.065733 -0.007002 -0.195613 -0.149219 0.138675 -0.048173 -0.020118 -0.044938 -0.084325 -0.112506 -0.171200 -0.196599 -0.250543 -0.058026 -0.142673 -0.070435 0.009976 -0.205047 -0.091593 -0.150539 -0.076336 0.059103 -0.046835 -0.120795 0.081649 -0.018661 -0.034827 -0.117086 -0.013402 -0.080044 -0.115887 -0.108078 -0.100791 -0.057655 -0.078371 -0.046533 -0.064284 -0.076863 -0.096889 -0.035407 -0.121175 0.063673 -0.101571 -0.052454 -0.023295 -0.112777 -0.163278 -0.093675 -0.088857 -0.081985 -0.065261 -0.066517 -0.121067 -0.103562 -0.103624 -0.052081 -0.152509 -0.211060 -0.095401 -0.140610 -0.123229 0.010304 -0.132269 -0.135124 -0.112554 -0.032559 -0.036863 -0.220164 -0.027542
-0.271952
-0.045395 -0.056557 -0.064905 -0.040018 -0.006837 -0.013241 -0.002977 -0.090237 -0.089319 -0.066436 -0.042569 -0.085973 -0.005157 -0.028067 -0.048009 -0.021940 -0.050250 -0.095640 -0.044304 -0.105895 -0.041250 -0.108397 -0.016584 -0.068507 -0.029634 -0.048570 -0.031929 0.018268 -0.087497 -0.023484 -0.047070 -0.010950 0.007556 -0.017551 -0.070870 -0.057478 -0.048746 -0.103563 -0.066915 -0.067710 -0.026611 -0.132975 -0.020912 -0.067232 -0.050926 -0.058340 -0.059234 -0.044551 -0.092157 -0.068601 -0.087231 -0.002799 0.012061 -0.051551 -0.060536 -0.032698 -0.002092 -0.029861 -0.018319 -0.062244 -0.070498 -0.115205 -0.099401 -0.060689 -0.049254 -0.042813 -0.073455 -0.008414 -0.112830 -0.041533 -0.041827 -0.050015 0.051119 -0.055389 -0.065314 0.017481 -0.052383 -0.074433 -0.046139 -0.079537 -0.120091 -0.190605 -0.137245 -0.090236 -0.083762 -0.063619 -0.050073 0.052787 -0.049213 0.015644 -0.052631 0.024157 -0.021599 -0.084328 -0.060639 -0.069323 -0.042671 -0.110230 0.014725 -0.044992 -0.106150 -0.181858 -0.108550 -0.122784 -0.075157 -0.059407 -0.020988 0.079720 -0.122836 0.049664 0.003600 -0.044370 -0.021544 -0.079783 -0.060791 -0.006372 -0.044595 -0.045142 -0.065355 -0.017842 -0.041831 -0.155656 -0.168019 -0.101678 -0.102315 -0.074573 -0.131361 0.022933 0.035834 0.043777 0.104031 0.168131 -0.092152 -0.065470 -0.024410 0.381073 -0.001802 -0.042548 0.017464 0.012254 -0.103151 -0.135851 -0.095654 -0.103571 -0.065123 -0.002239 -0.058437 0.034210 -0.110232 0.001864 -0.012001 -0.045981 0.032254 -0.020927 -0.085583 -0.072909 -0.029966 -0.009529 0.010964 -0.091342 -0.127509 -0.154757 -0.141770 -0.054333 -0.055695 -0.103479 -0.059540 -0.044984 -0.070868 0.055362 -0.033872 -0.006520 -0.013914 -0.048720 -0.080282 -0.010101 -0.035035 -0.012672 0.005685 -0.131039 -0.139054 -0.163211 -0.072477 -0.076171 -0.098375 -0.054348 -0.115966 -0.013842 -0.071572 -0.020503 -0.022137 -0.055190 -0.028280 -0.093389 -0.069000 0.040945 -0.040701 -0.046302 -0.011549 -0.051834 -0.092055 -0.173768 -0.130574 -0.106170 -0.020419 -0.049567 -0.052849 0.011113 -0.109798 -0.041580 -0.092304 -0.026687 0.001739 -0.020729 -0.043661 0.017232 0.005828 0.009025 -0.016535 -0.020974 -0.072331 -0.065299 -0.074565 -0.037504 -0.030196 -0.022728 -0.045894 -0.036832 -0.041925 -0.031327 -0.039065 -0.067165 0.034849 -0.052476 -0.047802 -0.044696 -0.066765 -0.072643 -0.038657 -0.039403 -0.057635 -0.022819 -0.038915 -0.046765 -0.039010 -0.033113 -0.027183 -0.082576 -0.095628 -0.068323 -0.079280 -0.095925 0.004589 -0.061938 -0.083236 -0.055389 -0.034972 -0.039288 -0.114143 -0.043573
-0.182199
0.333577 0.402240 0.057833 0.150660 0.067994 0.109537 -0.043435 -0.239277 0.232094 0.016439 -0.039315 0.207970 0.009558 0.084781 -0.023556 -0.029111 0.183485 0.319671 0.064607 0.161129 0.264343 0.735518 0.110168 0.278684 0.161790 0.137883 0.059641 -0.328873 0.367503 -0.086449 -0.046984 -0.139189 0.029268 -0.025356 0.131801 0.033863 0.147411 0.125243 0.004550 -0.235114 -0.132314 0.051398 -0.007926 0.075810 0.070909 0.147701 0.122150 0.012046 0.285163 0.015713 0.357777 0.038930 -0.072818 0.162763 0.147995 0.100531 0.057325 0.080997 0.079210 0.281238 0.098437 0.016878 0.152984 0.100850 0.041531 0.084748 0.127100 -0.193288 0.361176 0.108822 0.112392 0.073172 -0.181214 0.226990 0.284555 -0.096773 0.233745 0.090602 0.192730 0.233476 0.431908 0.729104 0.429869 0.399562 0.444897 0.193794 0.058353 -0.366542 0.364086 -0.126459 -0.076009 -0.163273 0.078494 0.222658 -0.029707 -0.134970 0.101285 0.106026 -0.213533 -0.056406 0.018651 0.622755 0.279328 0.340790 0.212750 0.140273 -0.155088 -0.404798 0.466519 -0.051646 -0.104733 -0.036386 0.054584 0.267557 0.321469 -0.082935 0.054593 0.039709 0.180017 -0.073799 -0.376128 -0.029779 0.164219 0.238586 0.112156 0.191306 0.179340 -0.227914 0.183345 -0.147179 -0.311659 -0.522449 -0.013863 0.323889 0.226280 -0.683743 0.210250 0.168602 0.018130 -0.038754 0.194055 0.470361 0.126607 0.379459 0.089224 0.030830 0.108482 -0.215503 0.433648 -0.166473 -0.065165 -0.038147 -0.157224 0.211466 0.299139 0.243043 0.103381 -0.082295 -0.046633 -0.061255 0.255138 0.860237 0.523026 0.214367 0.163951 0.405456 0.152334 0.174879 0.171473 -0.157955 -0.018311 -0.205462 0.041257 0.190565 0.067828 -0.194655 -0.000133 -0.252547 -0.211616 0.093334 0.211760 0.147077 0.068884 0.150826 0.222989 0.101185 0.252162 -0.155209 0.392221 0.118544 -0.049227 -0.039225 0.004824 0.314964 0.289233 -0.104307 0.184015 0.142527 -0.000708 0.040517 0.144526 -0.076148 0.178027 0.216719 0.025673 0.125074 -0.016286 -0.266400 0.307535 0.085370 0.338186 0.044601 -0.107020 0.144549 0.171706 -0.003326 0.030036 0.181491 0.090871 0.010198 0.214869 0.381655 0.163297 0.075862 0.153203 0.159458 0.067810 -0.268114 0.090157 -0.024632 -0.096695 0.016455 -0.091401 0.182737 0.012597 0.096837 0.052684 0.116400 0.082458 -0.003502 0.128264 0.216218 0.243622 0.131073 0.157021 0.104347 0.054909 0.027074 0.287800 0.085316 0.047755 0.047057 -0.081919 0.105615 0.173316 0.206435 0.063383 -0.045434 0.245405 -0.238126
2.977369
-0.163825 -0.148193 -0.539381 -0.053625 -0.014042 0.015580 0.040129 0.068626 0.104471 0.008693 0.014319 0.033902 0.256730 -0.058533 -0.127252 -0.063938 -0.003439 0.128616 0.053186 -0.006668 -0.150285 -0.028547 -0.300495 -0.203528 -0.170441 0.174345 -0.068260 -0.180999 0.186269 -0.091045 -0.238320 0.152270 0.089248 -0.026105 -0.231873 0.027399 0.388154 0.248230 0.127165 -0.086841 -0.626401 -0.181094 -0.447582 -0.457010 0.053507 0.244100 0.104963 -0.284865 -0.161025 -0.391923 0.086810 0.618345 0.125925 -0.360638 -0.114660 -0.102226 0.350878 -0.093910 0.197694 0.444243 -0.555014 -0.091094 -0.465243 -0.281863 -0.357215 -0.043547 -0.267070 -0.041294 -0.226283 -0.121937 0.144535 0.062620 0.054027 -0.176766 0.161581 0.297493 0.382143 -0.137483 0.252969 0.013659 -0.659448 -0.681797 -1.215761 -0.720721 -0.695361 -0.359942 -0.549934 -0.392816 0.119995 0.580115 0.033602 0.134216 -0.162930 -0.318197 0.094974 0.721420 0.646058 0.974587 0.316684 0.422577 -0.756116 -0.502022 -0.641515 -0.531720 -0.531165 -0.240401 -0.349240 -0.354057 0.247424 0.203356 -0.141061 -0.172185 -0.085899 -0.011301 0.130300 0.771358 0.565833 0.744581 0.153583 0.459617 -0.682627 -0.359130 -0.717771 -0.547726 -0.392511 -0.157887 0.321411 -1.404261 0.272510 -0.134869 -0.386654 -0.610925 -0.372502 -0.205246 1.897225 3.615124 1.110774 2.159541 0.224313 -0.388655 -0.113974 0.164249 -0.209217 -0.163456 -0.327062 -0.404606 -0.220949 -0.781521 0.040759 -0.335444 -0.059735 0.109019 -0.101773 -0.026008 0.233253 0.728030 0.116161 0.219268 0.355826 0.051124 -0.147746 0.101143 -0.212429 -0.295290 -0.128348 -0.466121 -0.374825 -0.650218 0.109741 -0.494414 -0.241089 -0.243229 -0.459396 -0.585304 0.106755 0.826389 0.559245 0.818260 0.041081 0.127557 0.468061 0.331585 0.376270 -0.100065 0.173053 0.126962 -0.103761 -0.836315 0.187833 -0.592226 -0.556322 -0.172010 -0.277741 -0.241643 -0.020999 0.305415 -0.197842 -0.305594 0.062626 -0.550501 0.379441 0.626009 0.327497 0.070365 0.379606 -0.183974 0.253020 -0.067718 0.175933 -0.707898 -0.573596 -0.234710 0.016759 -0.388430 -0.106722 -0.261563 -0.236886 -0.360963 0.076597 -0.519936 -0.150382 -0.240036 -0.380615 -0.270374 -0.140048 -0.096099 -0.374396 -0.330776 0.128389 0.167291 -0.072818 -0.287860 0.011989 -0.222849 -0.116449 0.615885 0.145824 0.333600 0.043666 0.012717 -0.486615 -0.210876 -0.086740 -0.130119 -0.358112 -0.059483 -0.073635 -0.065814 0.060688 -0.303054 0.165594 0.003233 -0.022032 -0.357019 -0.044611 0.365111 -0.125719 0.038804 -0.116673 -0.332629
-0.094604
-0.039338 0.296835 -0.109780 -0.053343 -0.079219 -0.197336 -0.545583 -0.718238 0.148056 -0.352939 -0.336613 -0.087467 -0.030405 -0.042069 -0.248009 -0.426245 0.051863 0.090482 -0.399861 -0.163716 0.004398 0.692705 -0.354803 0.045932 -0.069145 -0.220988 -0.197235 -0.726658 0.201940 -0.458321 -0.211128 -0.558260 -0.106761 -0.127248 0.157355 -0.142533 -0.090229 -0.020835 -0.253951 -0.499515 -0.450339 -0.042651 -0.385696 -0.284229 -0.508064 -0.297727 -0.407110 -0.370053 0.089062 0.032335 0.074856 -0.447646 -0.353307 -0.287445 0.079439 0.300494 -0.007943 0.438043 -0.313608 0.020245 -0.151971 -0.333631 -0.314983 -0.532355 -0.347810 -0.432814 -0.082438 -0.737444 0.164247 -0.083795 -0.049421 -0.410284 -0.752655 -0.207693 0.045166 0.190905 0.166991 0.418131 -0.275685 0.111588 -0.022886 0.641094 0.249008 -0.100379 -0.026861 -0.153531 -0.425699 -0.902560 -0.087920 -0.227595 -0.239064 -0.501777 -0.245282 -0.138420 -0.508208 -0.312420 -0.256226 -0.450817 -0.307563 -0.101317 -0.528765 0.423435 -0.112638 -0.202997 -0.057583 -0.414258 -0.510479 -0.977738 0.054880 -0.039762 -0.193027 -0.314843 -0.212870 -0.314162 0.113636 0.184343 -0.062394 0.082256 -0.311104 -0.229930 -0.661264 -0.150922 -0.317935 -0.281499 -0.559787 -0.580492 -0.517852 0.754533 0.734836 0.108148 0.630953 0.382296 -1.286025 0.008342 -0.267389 1.282460 0.345797 0.160099 0.182185 0.190976 -0.065601 0.117423 -0.604518 -0.337964 -0.280926 -0.307003 -0.337597 -0.071197 0.131602 -0.962672 -0.257850 -0.620464 -0.492981 0.059145 0.025409 0.354517 0.213128 0.175706 -0.026755 0.052323 0.026497 0.529440 0.002549 -0.186243 -0.385961 -0.054022 -0.442101 -0.028723 -0.454393 -0.045693 -0.196437 -0.298465 -0.114626 -0.209493 -0.377295 -0.189588 -0.369909 -0.338174 -0.517855 -0.311570 -0.066555 -0.233133 -0.648530 -0.385981 -0.213091 -0.507662 -0.138410 -0.423533 -0.238494 0.156420 -0.376262 -0.396881 -0.280833 -0.216257 -0.165199 0.119120 0.150084 0.216046 -0.424824 -0.035148 -0.135921 -0.606469 -0.104852 -0.452003 -0.381154 -0.342007 -0.485115 -0.764214 -0.079538 0.089738 0.275435 -0.094768 -0.302108 0.006434 -0.079492 0.172712 -0.033670 0.268834 -0.424918 0.086672 0.283847 0.132815 0.029089 -0.327448 -0.054655 -0.133993 -0.216832 -0.687770 -0.178209 -0.250897 -0.401552 -0.219359 -0.002567 0.030353 -0.290212 0.017752 -0.170495 -0.104610 -0.386807 -0.232563 0.036212 -0.050471 -0.053409 -0.164151 -0.169275 -0.232935 -0.259185 -0.329156 0.054224 -0.012304 -0.371517 -0.331643 -0.289799 -0.159334 -0.079343 -0.271762 -0.040660 -0.314192 -0.212018 -0.569574
-0.392777
0.203537 0.296640 0.065429 0.109822 0.067185 0.096914 -0.008482 -0.152957 0.140161 0.035327 -0.040236 0.183215 -0.012267 0.069784 -0.010210 -0.027530 0.128182 0.225686 0.043649 0.108298 0.185494 0.472725 0.061893 0.207828 0.128756 0.125855 0.068143 -0.245356 0.198891 -0.075380 0.013523 -0.101156 0.030953 0.006037 0.112339 0.062794 0.097405 0.093984 -0.013129 -0.142728 -0.097026 0.010063 0.019152 0.077966 0.093400 0.130726 0.098075 -0.042396 0.189559 0.042804 0.261221 0.018371 -0.019631 0.142844 0.085692 0.054201 0.023030 0.024954 0.080383 0.191031 0.064473 0.098640 0.149579 0.121460 0.051979 0.083557 0.119597 -0.195555 0.310305 0.087659 0.059454 0.069991 -0.130153 0.188555 0.172471 -0.118118 0.143640 0.029440 0.107274 0.165677 0.294418 0.524829 0.323565 0.312121 0.299076 0.153294 0.069785 -0.281318 0.232066 -0.115985 -0.035950 -0.089454 0.087497 0.183607 -0.005081 -0.081386 0.079269 0.061372 -0.121941 -0.041216 0.030541 0.419723 0.217438 0.284271 0.127450 0.138897 -0.095616 -0.282095 0.349096 -0.043567 -0.039506 0.012014 0.054252 0.209419 0.194034 -0.139690 0.029685 0.034301 0.141983 -0.070635 -0.312960 -0.136457 0.114074 0.207017 0.039599 0.142643 0.159224 -0.218020 0.058559 -0.135081 -0.310394 -0.495440 -0.006483 0.268749 0.220279 -0.635072 0.167186 0.228830 0.077136 0.024005 0.153210 0.357499 0.137760 0.305635 0.113599 0.016444 0.108366 -0.139975 0.335879 -0.148565 -0.076256 -0.036228 -0.085414 0.161532 0.188031 0.099451 0.079315 -0.105692 -0.042771 -0.050740 0.222864 0.540739 0.407701 0.158379 0.134144 0.324416 0.148800 0.067091 0.138713 -0.139825 -0.015136 -0.102607 0.037993 0.153182 0.058557 -0.131649 -0.016486 -0.182364 -0.135553 0.059650 0.186625 0.137912 0.101537 0.145659 0.182772 0.104982 0.220964 -0.158127 0.279868 0.059779 -0.042729 0.031355 0.042569 0.240120 0.203341 -0.124256 0.125841 0.060221 -0.032868 -0.010991 0.148098 0.042404 0.191024 0.210561 0.031927 0.097362 0.048109 -0.228724 0.241916 0.052664 0.209242 0.039070 -0.015631 0.126973 0.121133 -0.037404 -0.031303 0.004051 0.027034 -0.026682 0.165797 0.271204 0.181098 0.058164 0.109116 0.102160 0.096367 -0.232021 0.079374 -0.053306 -0.067261 0.030119 -0.066380 0.143831 0.052213 0.044223 0.043300 0.069748 0.054664 -0.033082 0.102198 0.112022 0.170247 0.095496 0.137250 0.082229 0.043658 -0.033703 0.241539 0.102650 0.042315 0.082426 -0.037443 0.078071 0.151551 0.115037 0.030598 -0.046591 0.194040 -0.118081
-0.055820
-0.121479 -0.133056 -0.115169 -0.083490 -0.057763 -0.056579 -0.029135 -0.102803 -0.142941 -0.111679 -0.048034 -0.134901 -0.009248 -0.059440 -0.065897 -0.035504 -0.079382 -0.156592 -0.118349 -0.177046 -0.129994 -0.277985 -0.083963 -0.136459 -0.073243 -0.119593 -0.043485 0.024679 -0.176581 -0.047406 -0.009394 -0.051934 0.007456 -0.019545 -0.062273 -0.033026 -0.097654 -0.150597 -0.101362 -0.067320 -0.054840 -0.248828 -0.063123 -0.125825 -0.102894 -0.147863 -0.153828 -0.096040 -0.131382 -0.044115 -0.119491 -0.059995 0.017253 -0.151601 -0.111128 0.013686 0.002520 0.008727 -0.055603 -0.103418 -0.144667 -0.128232 -0.164900 -0.146209 -0.099819 -0.092378 -0.107558 -0.057313 -0.188908 -0.095534 -0.066855 -0.115029 0.047725 -0.158264 -0.131815 0.117374 -0.087353 -0.044507 -0.128326 -0.115374 -0.225108 -0.290606 -0.198992 -0.197555 -0.197049 -0.129921 -0.107397 0.022246 -0.127625 -0.006417 -0.020481 0.038151 -0.043106 -0.172641 -0.112930 -0.072291 -0.089858 -0.216876 0.063296 -0.003437 -0.149940 -0.306912 -0.198909 -0.240335 -0.162010 -0.157916 -0.041190 0.152565 -0.211830 0.070833 0.020353 -0.030620 -0.047953 -0.192854 -0.108560 0.045081 -0.061613 -0.033767 -0.146886 -0.045043 -0.035122 -0.201860 -0.245936 -0.175989 -0.230188 -0.205728 -0.229009 0.099050 0.226996 0.090465 0.240876 0.352045 -0.237687 -0.159642 -0.108561 0.764744 -0.002452 -0.015068 0.056013 0.076670 -0.190420 -0.221864 -0.208151 -0.235902 -0.128749 -0.022118 -0.094027 0.109996 -0.180476 -0.097681 -0.008762 -0.129248 0.064832 -0.074999 -0.135937 -0.105771 0.003928 0.056951 0.009376 -0.058670 -0.177704 -0.303343 -0.256221 -0.120467 -0.118024 -0.228577 -0.146067 -0.057340 -0.124134 0.060037 -0.048466 0.011748 -0.022462 -0.130877 -0.150059 0.040134 -0.074255 0.009831 0.040874 -0.218470 -0.215642 -0.303035 -0.142924 -0.170101 -0.197026 -0.124626 -0.171419 -0.044217 -0.185641 -0.034296 -0.017797 -0.075835 -0.015893 -0.191592 -0.133849 0.132224 -0.047472 -0.018077 -0.034816 -0.087102 -0.143914 -0.194778 -0.202754 -0.247199 -0.052653 -0.123314 -0.071601 -0.000754 -0.183160 -0.089701 -0.139661 -0.078169 0.062691 -0.035014 -0.118674 0.084416 -0.020495 -0.033630 -0.078990 -0.008507 -0.094916 -0.107743 -0.103418 -0.095558 -0.056487 -0.067497 -0.047209 -0.042595 -0.068895 -0.096188 -0.062985 -0.132691 0.068711 -0.097243 -0.065401 -0.032095 -0.105592 -0.153127 -0.085402 -0.071613 -0.076555 -0.068316 -0.073121 -0.113681 -0.114025 -0.101642 -0.052198 -0.124898 -0.200499 -0.092317 -0.128149 -0.140340 0.003828 -0.126312 -0.140049 -0.119608 -0.043176 -0.036841 -0.199526 -0.004209
-0.161078
-0.172391 0.025239 -0.085512 -0.084904 -0.122630 -0.138075 -0.292537 -0.459631 0.024116 -0.212798 -0.222674 -0.103279 -0.027362 -0.054813 -0.110075 -0.200719 -0.020696 -0.041251 -0.308523 -0.242370 -0.094512 0.129123 -0.198249 -0.051768 -0.019994 -0.199654 -0.125130 -0.471003 -0.005567 -0.276041 -0.100492 -0.358800 -0.022249 -0.069341 0.055352 -0.095938 -0.110787 -0.126817 -0.196012 -0.316984 -0.297598 -0.301411 -0.285787 -0.199488 -0.296939 -0.276413 -0.291677 -0.375303 -0.008631 0.012930 -0.016319 -0.249079 -0.175029 -0.238590 -0.007563 0.183786 0.040158 0.248214 -0.169802 -0.014483 -0.274571 -0.271680 -0.275015 -0.334338 -0.222501 -0.275566 -0.106748 -0.428248 0.006957 -0.112649 -0.103941 -0.318722 -0.352560 -0.195494 -0.042672 0.205496 0.009432 0.261508 -0.195477 -0.020906 -0.137096 0.181666 0.084269 -0.154019 -0.105540 -0.171398 -0.291570 -0.539083 -0.185612 -0.166018 -0.099122 -0.235284 -0.160622 -0.169920 -0.300413 -0.226650 -0.191486 -0.361659 -0.089243 -0.057073 -0.276483 0.049469 -0.148107 -0.241054 -0.118095 -0.296521 -0.253199 -0.500722 -0.063161 -0.020703 -0.140950 -0.167000 -0.086346 -0.260960 -0.011717 0.099753 -0.071788 0.013291 -0.233057 -0.189618 -0.418945 -0.171773 -0.331955 -0.216415 -0.435723 -0.458900 -0.387893 0.206371 0.725318 0.107766 0.427584 0.385412 -0.821375 -0.091132 -0.184805 1.096339 0.196358 0.141818 0.112831 0.174096 -0.262809 -0.001910 -0.448283 -0.309763 -0.227441 -0.131561 -0.246966 0.046605 -0.017689 -0.518350 -0.149961 -0.514222 -0.209210 -0.027361 -0.029345 0.075802 0.158852 0.169768 -0.023767 0.014278 -0.041781 0.225495 -0.067278 -0.147090 -0.230889 -0.166383 -0.322870 -0.053692 -0.217476 -0.043260 -0.168465 -0.231783 -0.052413 -0.206351 -0.359356 -0.089411 -0.239557 -0.201603 -0.251934 -0.348514 -0.227247 -0.441685 -0.451012 -0.311771 -0.299639 -0.340731 -0.197510 -0.245530 -0.202427 0.058944 -0.171650 -0.227614 -0.121655 -0.209345 -0.139101 0.210194 0.075655 0.144711 -0.208947 -0.038906 -0.181463 -0.378369 -0.188499 -0.352619 -0.223904 -0.230327 -0.304854 -0.530162 -0.083519 -0.032368 0.045086 -0.210445 -0.063815 -0.029508 -0.111865 0.195861 -0.013328 0.123889 -0.263193 0.038298 0.067218 0.029397 -0.070134 -0.184194 -0.060511 -0.100832 -0.126483 -0.424602 -0.095108 -0.194774 -0.252977 -0.201346 0.027409 -0.032838 -0.172429 -0.021257 -0.136541 -0.186234 -0.268332 -0.199512 -0.088875 -0.003836 -0.071940 -0.145804 -0.186770 -0.188114 -0.109169 -0.273979 -0.140718 -0.060033 -0.275533 -0.243945 -0.156622 -0.164890 -0.157648 -0.180747 -0.032483 -0.147635 -0.252529 -0.284086
-0.563571
0.120322 0.154458 0.041098 0.073627 0.037233 0.076409 0.017270 -0.057808 0.073238 0.026444 -0.045548 0.100748 -0.022373 0.055925 -0.000089 -0.015441 0.072839 0.119565 0.023756 0.045131 0.083379 0.237394 0.033806 0.138231 0.102737 0.102833 0.064920 -0.162085 0.047248 -0.055964 0.027962 -0.067649 0.031920 0.019389 0.074054 0.065807 0.036460 0.046454 -0.028003 -0.077238 -0.068313 -0.020336 0.031672 0.073515 0.083638 0.088389 0.065469 -0.041788 0.096842 0.041880 0.159168 0.007666 0.006462 0.118759 0.035239 0.017411 -0.012948 -0.024826 0.049392 0.085123 0.023740 0.103529 0.128443 0.113459 0.052164 0.073035 0.100633 -0.152076 0.211742 0.039854 0.013456 0.049487 -0.070176 0.140208 0.065161 -0.141333 0.062139 -0.017834 0.037236 0.089601 0.167567 0.299501 0.190394 0.207793 0.151776 0.113968 0.065488 -0.202612 0.115226 -0.087068 -0.008540 -0.025654 0.066602 0.134856 0.015033 -0.063128 0.051069 0.023213 -0.072602 -0.040732 0.027346 0.224056 0.161321 0.211990 0.057096 0.115685 -0.055343 -0.171607 0.222655 -0.032219 0.004970 0.028330 0.044867 0.124676 0.078851 -0.174356 -0.011138 -0.000943 0.088707 -0.082074 -0.265033 -0.222822 0.046524 0.157241 -0.029840 0.078976 0.106708 -0.198657 -0.046291 -0.114275 -0.265609 -0.403534 -0.027848 0.198312 0.203056 -0.525759 0.129914 0.276387 0.124617 0.078650 0.104105 0.234809 0.138051 0.202483 0.118352 0.017503 0.097943 -0.077781 0.212668 -0.124081 -0.085158 -0.047266 -0.025869 0.109906 0.096959 -0.021738 0.041489 -0.098763 -0.055070 -0.068516 0.134757 0.252116 0.277225 0.103904 0.095756 0.226866 0.121194 -0.027503 0.126593 -0.134203 -0.010807 -0.036958 0.044039 0.111074 0.039732 -0.080888 -0.020815 -0.124666 -0.075622 0.015292 0.137525 0.095861 0.101984 0.115815 0.151238 0.080594 0.169109 -0.130559 0.174649 0.011270 -0.033394 0.048167 0.060185 0.161278 0.118346 -0.143063 0.057783 -0.017055 -0.057869 -0.045772 0.107045 0.087503 0.169693 0.169809 0.045307 0.062701 0.062737 -0.165204 0.166752 0.035014 0.102036 0.021505 0.033209 0.085936 0.077331 -0.080510 -0.067440 -0.108128 -0.028024 -0.062128 0.098899 0.161838 0.141706 0.036940 0.056320 0.052189 0.095025 -0.179686 0.064414 -0.067892 -0.067008 0.003530 -0.033432 0.077296 0.072190 -0.002364 0.027397 0.035145 0.035084 -0.050795 0.046409 0.010211 0.101925 0.078864 0.100554 0.059511 0.031974 -0.071298 0.157141 0.097137 0.017377 0.081699 -0.008919 0.047634 0.104277 0.027635 0.001932 -0.039838 0.128556 -0.039240
-0.055524
-0.121651 -0.101338 -0.121235 -0.084784 -0.067835 -0.064605 -0.037366 -0.123333 -0.143258 -0.119420 -0.053103 -0.127142 -0.018966 -0.056507 -0.050512 -0.037054 -0.069465 -0.167924 -0.118400 -0.195334 -0.131635 -0.272007 -0.104675 -0.138941 -0.071934 -0.126321 -0.051848 -0.004914 -0.191014 -0.065679 -0.003344 -0.049551 0.006970 -0.013819 -0.060370 -0.032778 -0.111196 -0.132110 -0.092390 -0.059479 -0.050894 -0.256448 -0.063320 -0.123688 -0.118327 -0.158227 -0.160815 -0.116985 -0.129793 -0.036013 -0.110564 -0.070994 0.013973 -0.150768 -0.112704 0.014372 0.002902 0.007730 -0.069568 -0.097175 -0.152530 -0.105592 -0.168924 -0.157034 -0.097092 -0.094271 -0.096795 -0.066025 -0.178140 -0.089373 -0.070440 -0.122396 0.028963 -0.149962 -0.129638 0.110476 -0.101048 -0.030802 -0.151373 -0.123410 -0.228702 -0.279224 -0.175215 -0.199674 -0.203795 -0.142241 -0.110343 0.025648 -0.163264 -0.019142 -0.010216 0.039434 -0.057636 -0.174898 -0.120690 -0.058461 -0.102664 -0.211775 0.067887 -0.006881 -0.141776 -0.319092 -0.205200 -0.239568 -0.168431 -0.169594 -0.033231 0.138793 -0.218081 0.074933 0.005742 -0.042981 -0.040221 -0.200093 -0.108539 0.043099 -0.062497 -0.024146 -0.150357 -0.044188 -0.021711 -0.208915 -0.257955 -0.179539 -0.240411 -0.210115 -0.245366 0.119732 0.233647 0.095043 0.261241 0.364013 -0.281277 -0.152620 -0.108577 0.787485 0.010426 -0.002798 0.062594 0.090302 -0.197515 -0.216695 -0.222525 -0.249882 -0.123530 -0.020037 -0.102914 0.121007 -0.189858 -0.106601 -0.022515 -0.151221 0.052090 -0.082054 -0.128048 -0.107045 0.016787 0.065745 0.007277 -0.045921 -0.166345 -0.301979 -0.251333 -0.135737 -0.122085 -0.236000 -0.158643 -0.054714 -0.141145 0.066686 -0.053127 0.004143 -0.016260 -0.143358 -0.157748 0.040642 -0.072479 0.010542 0.025920 -0.215266 -0.198187 -0.309701 -0.138896 -0.189426 -0.203630 -0.126080 -0.192996 -0.067037 -0.177639 -0.033326 -0.026226 -0.076226 -0.016337 -0.199065 -0.145941 0.134489 -0.043930 -0.017339 -0.041515 -0.084516 -0.123700 -0.191777 -0.191897 -0.257580 -0.058458 -0.129101 -0.081872 -0.019121 -0.200639 -0.095669 -0.137097 -0.080382 0.061683 -0.046181 -0.116035 0.074372 -0.028700 -0.025087 -0.104776 0.000581 -0.093153 -0.089673 -0.115359 -0.100979 -0.065522 -0.072421 -0.051436 -0.041600 -0.067092 -0.102080 -0.062083 -0.128460 0.060374 -0.094861 -0.067093 -0.019844 -0.105964 -0.171154 -0.111553 -0.088344 -0.077191 -0.062789 -0.067233 -0.121622 -0.118219 -0.107361 -0.056076 -0.156528 -0.208740 -0.082909 -0.115473 -0.138487 -0.002550 -0.135713 -0.140876 -0.112622 -0.031032 -0.039740 -0.222639 -0.021673
-0.340880
-0.010669 -0.008709 -0.032664 -0.018814 0.011350 0.008296 0.002915 -0.095287 -0.053276 -0.044164 -0.038849 -0.056359 -0.003695 -0.010153 -0.038092 -0.035090 -0.037073 -0.066615 -0.033802 -0.075677 -0.018758 -0.040032 0.003286 -0.029362 -0.009171 -0.017405 -0.014017 -0.012406 -0.052292 -0.015908 -0.046171 -0.019382 0.011681 -0.015143 -0.044084 -0.047031 -0.035571 -0.085328 -0.054307 -0.068596 -0.024859 -0.095742 -0.011174 -0.046443 -0.027005 -0.034031 -0.030518 -0.037408 -0.060673 -0.054100 -0.046651 -0.004083 0.000992 -0.018888 -0.040737 -0.031346 -0.008029 -0.026529 -0.009894 -0.042470 -0.036844 -0.090009 -0.050305 -0.028628 -0.026699 -0.022709 -0.045672 -0.020419 -0.060096 -0.022889 -0.029577 -0.038078 0.017980 -0.023075 -0.040808 -0.018548 -0.033046 -0.059204 -0.018927 -0.057977 -0.069024 -0.108314 -0.086055 -0.041126 -0.045787 -0.024992 -0.020627 0.013067 -0.017615 0.013456 -0.043768 0.020712 -0.014131 -0.040229 -0.046692 -0.067922 -0.018519 -0.079815 -0.009910 -0.057836 -0.086255 -0.100761 -0.063211 -0.060912 -0.042272 -0.033872 -0.024101 0.033391 -0.065751 0.034346 -0.004670 -0.039934 -0.006180 -0.044691 -0.040406 -0.032319 -0.045666 -0.043812 -0.037480 -0.015752 -0.080369 -0.178663 -0.142052 -0.058952 -0.082488 -0.049549 -0.088193 -0.003946 -0.016048 0.037912 0.058860 0.075167 -0.073451 -0.019209 0.018548 0.216351 0.017831 0.002890 0.035588 0.021159 -0.060321 -0.078895 -0.063021 -0.047834 -0.032695 -0.001212 -0.033330 0.023866 -0.048300 -0.002942 -0.018228 -0.041226 0.015202 -0.004359 -0.062784 -0.067143 -0.030500 -0.029204 -0.000549 -0.086299 -0.080078 -0.077281 -0.066569 -0.018943 -0.022992 -0.044136 -0.030067 -0.045453 -0.034200 0.032039 -0.030979 -0.009580 -0.015083 -0.019785 -0.055096 -0.022597 -0.029376 -0.047693 -0.014020 -0.100962 -0.092450 -0.107647 -0.040029 -0.038917 -0.041689 -0.030097 -0.068286 -0.023911 -0.032546 -0.009965 -0.017693 -0.038751 -0.021534 -0.043604 -0.035164 0.007090 -0.016169 -0.037198 -0.009088 -0.034636 -0.063924 -0.138150 -0.078362 -0.053076 -0.014407 -0.028168 -0.033091 -0.029238 -0.064291 -0.021520 -0.054481 -0.014342 -0.014393 -0.005188 -0.014237 -0.013621 -0.003359 0.011550 -0.008074 -0.025254 -0.047624 -0.025409 -0.025882 -0.028033 -0.011064 -0.004319 -0.028525 -0.051974 -0.019050 -0.024563 -0.044468 -0.055931 0.018788 -0.035872 -0.029706 -0.046159 -0.051239 -0.051328 -0.022407 -0.042540 -0.042409 -0.015712 -0.005086 -0.023235 -0.006871 -0.013907 -0.017394 -0.073153 -0.058088 -0.044144 -0.056008 -0.068301 -0.000228 -0.037227 -0.055310 -0.049773 -0.034661 -0.041731 -0.069481 -0.052783
-0.055524
-0.123188 -0.130947 -0.111958 -0.092500 -0.064006 -0.058321 -0.025669 -0.087031 -0.151760 -0.123802 -0.051260 -0.117508 -0.007170 -0.055864 -0.057568 -0.019901 -0.083637 -0.166667 -0.120907 -0.201315 -0.145045 -0.276746 -0.100759 -0.134633 -0.070106 -0.111931 -0.053883 0.022319 -0.210716 -0.061789 0.003330 -0.035003 0.022154 -0.012293 -0.055640 -0.035159 -0.109552 -0.143841 -0.093055 -0.049893 -0.031713 -0.241993 -0.061046 -0.121126 -0.109371 -0.146593 -0.162277 -0.098806 -0.152864 -0.046845 -0.127258 -0.062047 0.022331 -0.141547 -0.116173 0.007897 0.010447 0.006594 -0.051794 -0.100264 -0.132250 -0.102319 -0.159004 -0.138800 -0.099854 -0.089698 -0.110680 -0.064832 -0.173070 -0.095212 -0.061200 -0.106989 0.051641 -0.147268 -0.140468 0.112907 -0.097235 -0.045134 -0.135412 -0.132155 -0.241182 -0.305034 -0.194811 -0.198144 -0.205512 -0.135497 -0.104642 0.057546 -0.155201 -0.001192 -0.015004 0.039015 -0.050663 -0.174674 -0.112794 -0.054663 -0.090941 -0.215613 0.057098 -0.018033 -0.149497 -0.325976 -0.182822 -0.242581 -0.163308 -0.161563 -0.039461 0.161612 -0.215408 0.067102 0.023724 -0.033456 -0.044086 -0.190129 -0.108819 0.049387 -0.061385 -0.017780 -0.150605 -0.029813 -0.012017 -0.200655 -0.252967 -0.176066 -0.234370 -0.206350 -0.224511 0.113818 0.189642 0.091300 0.246149 0.363320 -0.248464 -0.159198 -0.108556 0.762741 -0.000994 -0.018037 0.055581 0.079560 -0.188300 -0.229577 -0.203075 -0.242469 -0.121178 -0.025496 -0.100213 0.126989 -0.209933 -0.089257 -0.003835 -0.131354 0.066194 -0.075523 -0.130078 -0.105788 0.000434 0.049314 0.011046 -0.059083 -0.176051 -0.339247 -0.247409 -0.123293 -0.121230 -0.233246 -0.149617 -0.058124 -0.145198 0.070631 -0.052749 0.024122 -0.014955 -0.128921 -0.154684 0.059350 -0.082790 0.017489 0.050102 -0.211435 -0.199979 -0.279635 -0.128665 -0.177572 -0.184523 -0.117800 -0.180919 -0.071180 -0.183225 -0.036130 -0.026228 -0.076075 -0.000932 -0.195691 -0.145029 0.133778 -0.057217 -0.034427 -0.035964 -0.080485 -0.126296 -0.173102 -0.193967 -0.245837 -0.058266 -0.135580 -0.068270 0.012207 -0.207528 -0.086007 -0.137100 -0.061534 0.058701 -0.039873 -0.118549 0.068637 -0.020781 -0.037527 -0.098883 -0.007877 -0.080812 -0.118788 -0.104510 -0.094973 -0.050723 -0.075839 -0.050990 -0.022879 -0.070608 -0.103792 -0.038852 -0.118345 0.072276 -0.100905 -0.060607 -0.034271 -0.111141 -0.161607 -0.096421 -0.093272 -0.071481 -0.065006 -0.063472 -0.117289 -0.105645 -0.107677 -0.052738 -0.144368 -0.198796 -0.089154 -0.131919 -0.131590 0.007530 -0.119763 -0.132352 -0.112089 -0.042553 -0.042068 -0.210420 -0.009216
-0.486091
0.064477 0.081504 0.026874 0.044634 0.037261 0.056560 0.015655 -0.034844 0.028678 0.007178 -0.043457 0.036630 -0.011737 0.036687 -0.011747 -0.011310 0.036552 0.047534 0.008210 0.001234 0.046964 0.102268 0.025401 0.077903 0.067433 0.066706 0.044558 -0.082250 -0.008746 -0.029429 0.009296 -0.032597 0.020852 0.014712 0.037189 0.021274 0.008153 0.005058 -0.031590 -0.047687 -0.026414 -0.036252 0.029871 0.043835 0.054545 0.040781 0.044192 -0.034354 0.040529 0.009457 0.067047 0.013326 0.007208 0.077851 0.010252 -0.013107 -0.024258 -0.030598 0.025934 0.018936 0.015891 0.053638 0.090974 0.078377 0.029254 0.044463 0.067352 -0.086567 0.110640 0.016358 -0.008995 0.011897 -0.029884 0.079599 0.011748 -0.116792 0.017392 -0.041552 0.022211 0.030854 0.076297 0.138927 0.096433 0.117447 0.069136 0.062304 0.044012 -0.105353 0.050478 -0.049578 -0.016938 0.015190 0.045340 0.072201 0.005880 -0.063869 0.029111 -0.015537 -0.035881 -0.048838 0.002865 0.091267 0.096272 0.122886 0.021172 0.079588 -0.025038 -0.075315 0.107730 0.001750 0.017535 0.014763 0.033558 0.060733 0.023844 -0.139122 -0.021014 -0.022912 0.043800 -0.069368 -0.199700 -0.222440 -0.027857 0.083226 -0.058485 0.035489 0.035407 -0.129288 -0.064453 -0.061740 -0.147288 -0.223603 -0.033822 0.120373 0.144032 -0.310629 0.078233 0.220361 0.117385 0.075370 0.056549 0.122092 0.085682 0.111980 0.077644 0.015114 0.054596 -0.018600 0.112969 -0.071757 -0.069830 -0.050383 -0.002183 0.063235 0.031887 -0.059578 0.002797 -0.082785 -0.037662 -0.087120 0.057080 0.106612 0.150975 0.057186 0.052134 0.131403 0.087739 -0.049076 0.078160 -0.083373 -0.009038 -0.012009 0.026432 0.072579 0.015223 -0.063027 -0.026930 -0.091203 -0.044165 -0.024702 0.057721 0.031649 0.066977 0.077464 0.092783 0.043605 0.089750 -0.082724 0.092782 -0.002469 -0.018213 0.038463 0.039650 0.083644 0.057403 -0.102030 0.016508 -0.033885 -0.049141 -0.037024 0.050683 0.027857 0.095785 0.100878 0.028308 0.032504 0.039158 -0.099253 0.089059 0.019542 0.029946 0.009274 0.027114 0.043419 0.039742 -0.071644 -0.044944 -0.082925 -0.015797 -0.067424 0.044159 0.084274 0.085030 0.017850 0.029234 0.025825 0.053502 -0.116168 0.042321 -0.057481 -0.052808 -0.016987 -0.004995 0.040141 0.047972 -0.027220 0.000804 0.005358 0.017293 -0.045098 0.014360 -0.008490 0.061663 0.046281 0.073928 0.035608 0.024990 -0.068028 0.086265 0.059148 -0.002630 0.042496 -0.001227 0.023778 0.052048 -0.013567 -0.013077 -0.037425 0.062453 -0.030843
-0.022806
-0.155360 -0.066650 -0.119927 -0.094214 -0.114445 -0.091846 -0.136155 -0.172902 -0.115423 -0.160518 -0.114837 -0.079660 -0.019202 -0.060908 -0.085990 -0.117745 -0.046584 -0.141824 -0.246103 -0.279503 -0.143058 -0.169634 -0.182320 -0.129732 -0.048653 -0.169455 -0.076733 -0.169466 -0.184966 -0.169890 0.002180 -0.172955 0.013730 -0.028397 0.004138 -0.042734 -0.116962 -0.134944 -0.167129 -0.125596 -0.168811 -0.282752 -0.171291 -0.168392 -0.172167 -0.202640 -0.248134 -0.217014 -0.086144 -0.001537 -0.083492 -0.157991 -0.029741 -0.220335 -0.097208 0.097354 0.015144 0.103367 -0.112372 -0.098831 -0.201880 -0.080104 -0.209565 -0.232628 -0.157016 -0.164620 -0.111527 -0.294555 -0.119790 -0.125459 -0.089252 -0.212894 -0.100429 -0.203062 -0.114994 0.192756 -0.069508 0.071803 -0.210157 -0.097825 -0.252878 -0.145768 -0.091934 -0.213814 -0.210380 -0.186967 -0.166408 -0.179349 -0.252828 -0.089035 -0.004230 -0.036768 -0.091384 -0.195063 -0.191896 -0.062071 -0.138017 -0.309404 0.031256 -0.004515 -0.216068 -0.256101 -0.202169 -0.267218 -0.190781 -0.231367 -0.115307 -0.060081 -0.188930 0.060493 -0.034551 -0.071494 -0.057059 -0.239721 -0.107063 0.087876 -0.059604 0.019725 -0.202176 -0.093566 -0.181354 -0.264459 -0.295692 -0.214066 -0.370961 -0.338641 -0.316944 0.234029 0.414957 0.120721 0.366519 0.432901 -0.568992 -0.149865 -0.162368 1.018977 0.081562 0.078209 0.118584 0.154448 -0.235470 -0.204636 -0.331409 -0.306707 -0.197640 -0.073851 -0.143376 0.185353 -0.169884 -0.321067 -0.066749 -0.286210 -0.022574 -0.054040 -0.122530 -0.061010 0.077675 0.097011 0.021096 0.015353 -0.124458 -0.230960 -0.206823 -0.169983 -0.187876 -0.244156 -0.240031 -0.063724 -0.267759 0.033642 -0.073610 -0.017757 -0.012209 -0.181368 -0.235108 0.063289 -0.153981 -0.050173 -0.037459 -0.267356 -0.171665 -0.417932 -0.273538 -0.254723 -0.247825 -0.224480 -0.193678 -0.145072 -0.256420 -0.011551 -0.077899 -0.105522 -0.025237 -0.228389 -0.181261 0.186442 -0.004426 0.027185 -0.116652 -0.084278 -0.121933 -0.164385 -0.178147 -0.323689 -0.137000 -0.203485 -0.119417 -0.211980 -0.209841 -0.076433 -0.108906 -0.116705 0.063359 -0.044561 -0.147860 0.125350 -0.038749 -0.037735 -0.218418 -0.003540 0.012043 -0.084678 -0.050313 -0.171633 -0.080256 -0.104211 -0.048266 -0.235514 -0.091574 -0.171429 -0.070577 -0.144370 0.069851 -0.090666 -0.099082 -0.005035 -0.135922 -0.234739 -0.190181 -0.174084 -0.045712 -0.082394 -0.066834 -0.160113 -0.146583 -0.165018 -0.069594 -0.271854 -0.228822 -0.068588 -0.170662 -0.124861 -0.048595 -0.175959 -0.146380 -0.140901 -0.029176 -0.105962 -0.275622 -0.098183
-0.400644
0.025863 0.037665 -0.000525 0.014878 0.021414 0.030382 0.009428 -0.061600 -0.020822 -0.012624 -0.047037 -0.018928 -0.000238 0.008169 -0.024889 -0.029390 -0.004440 -0.019274 -0.016174 -0.047775 0.011759 0.029397 0.023862 0.021848 0.033655 0.024533 0.016207 -0.043899 -0.034766 -0.022761 -0.025268 -0.021901 0.021216 -0.000587 -0.008505 -0.023175 -0.015196 -0.043818 -0.042678 -0.053814 -0.023551 -0.057142 0.012955 -0.000702 0.009950 0.010907 0.012455 -0.037645 -0.018585 -0.021604 -0.001600 0.007529 0.004206 0.035953 -0.015788 -0.024328 -0.014530 -0.024328 -0.000313 -0.024849 -0.010675 -0.025084 0.016239 0.018910 -0.000875 0.004668 0.005844 -0.039350 0.023010 -0.001101 -0.019054 -0.012512 -0.002170 0.026797 -0.021990 -0.066065 -0.011275 -0.054704 0.006118 -0.013365 -0.001913 0.011002 -0.005205 0.034078 0.014057 0.015080 0.014525 -0.029288 0.011781 -0.012597 -0.031152 0.020710 0.017857 0.013660 -0.025139 -0.066415 -0.000978 -0.050104 -0.022322 -0.059622 -0.043285 -0.012837 0.013283 0.027956 -0.011257 0.025595 -0.026572 -0.011584 0.013659 0.021266 0.003634 -0.015803 0.010474 0.004674 -0.015586 -0.083654 -0.032707 -0.034903 0.000139 -0.039439 -0.136470 -0.197467 -0.094446 -0.003130 -0.072347 -0.006393 -0.032628 -0.049245 -0.045201 -0.003011 -0.023094 -0.049691 -0.053097 0.046527 0.076725 -0.027360 0.041150 0.089747 0.073798 0.036583 -0.007736 0.013534 0.007252 0.030825 0.011398 0.012524 0.005745 0.012027 0.024996 -0.023544 -0.041737 -0.042142 0.004928 0.029623 -0.024135 -0.066536 -0.021604 -0.056537 -0.013476 -0.086867 -0.017341 0.011027 0.027187 0.018723 0.018834 0.041114 0.034509 -0.053383 0.016396 -0.014569 -0.030306 -0.013292 0.012070 0.028589 -0.025140 -0.050459 -0.032340 -0.071067 -0.031607 -0.066966 -0.019422 -0.045531 -0.001855 0.011790 0.020760 0.003760 0.002608 -0.036819 0.018927 0.000181 -0.014118 -0.001634 0.007490 0.013175 0.000599 -0.047058 -0.000328 -0.037104 -0.022015 -0.032448 -0.020274 -0.064078 -0.002927 0.019071 -0.000378 0.006713 -0.004564 -0.062295 0.007087 0.001102 -0.012645 -0.003796 -0.002150 0.021510 0.016576 -0.041921 -0.017699 -0.022329 -0.011881 -0.040682 -0.003955 0.029841 0.024216 -0.007419 -0.002425 0.008207 0.014743 -0.079612 0.010590 -0.035174 -0.051610 -0.032677 0.005272 0.000700 0.006142 -0.043504 -0.027036 -0.020666 -0.002434 -0.041886 -0.008996 -0.009557 0.022626 0.018796 0.035262 0.009018 0.003357 -0.071315 0.014485 0.007422 -0.028467 -0.017604 0.001882 -0.011179 -0.011990 -0.032207 -0.023198 -0.041724 -0.006724 -0.037791
1.505785
-2.427554 -0.948803 -1.203377 0.899563 4.836003 -0.879909 -0.390292 1.529532 -3.375759 -2.322754 1.113470 -0.850439 -1.474451 0.773085 -0.864312 -0.167740 -0.860402 0.443572 -1.095174 0.112439
-1.503021
2.424830 0.944486 1.203099 -0.894367 -4.835998 0.880314 0.384920 -1.533527 3.375663 2.326251 -1.114913 0.844994 1.476537 -0.774986 0.871945 0.166729 0.860142 -0.441613 1.096362 -0.111924
""")

    WEIGHTS.close()

    os.system("touch WEIGHTS_READY")

# coloring scripts

vars['pymol'] = "pymol.py"
vars['pymol_CBS'] = "pymol_CBS.py"

# write pymol script
PYMOL = open(vars['working_dir'] + vars['pymol'], 'w')
PYMOL.write("""

# Define a Python subroutine to colour atoms by B-factor, using predefined intervals


def colour_consurf(selection="all"):

    # Colour other chains gray, while maintaining
    # oxygen in red, nitrogen in blue and hydrogen in white
    cmd.color("gray", selection)
    cmd.util.cnc()

    # These are constants
    minimum = 0.0
    maximum = 9.0
    n_colours = 9
    # Colours are calculated by dividing the RGB colours by 255
    # RGB = [[16,200,209],[140,255,255],[215,255,255],[234,255,255],[255,255,255],
    #        [252,237,244],[250,201,222],[240,125,171],[160,37,96]]
    colours = [
                [0.039215686, 0.490196078, 0.509803922],
                [0.294117647, 0.68627451, 0.745098039],
                [0.647058824, 0.862745098, 0.901960784],
                [0.843137255, 0.941176471, 0.941176471],
                [1, 1, 1],
                [0.980392157, 0.921568627, 0.960784314],
                [0.980392157, 0.784313725, 0.862745098],
                [0.941176471, 0.490196078, 0.666666667],
                [0.62745098, 0.156862745, 0.37254902]]
    bin_size = (maximum - minimum) / n_colours

    # Loop through colour intervals
    for i in range(n_colours):

        lower = minimum + (i + 1) * bin_size
        upper = lower + bin_size
        colour = colours[i]

        # Print out B-factor limits and the colour for this group
        print(lower, " - ", upper, " = ", colour)

        # Define a unique name for the atoms which fall into this group
        group = selection + "_group_" + str(i + 1)

        # Compose a selection command which will select all atoms which are
        #	a) in the original selection, AND
        #	b) have B factor in range lower <= b < upper
        sel_string = selection + " & ! b < " + str(lower)

        if(i < n_colours):
            sel_string += " & b < " + str(upper)
        else:
            sel_string += " & ! b > " + str(upper)

        # Select the atoms
        cmd.select(group, sel_string)

        # Create a new colour
        colour_name = "colour_" + str(i + 1)
        cmd.set_color(colour_name, colour)

        # Colour them
        cmd.color(colour_name, group)


    # Create new colour for insufficient sequences
    # RGB_colour = [255,255,150]
    insuf_colour = [1, 1, 0.588235294]
    cmd.set_color("insufficient_colour", insuf_colour)

    # Colour atoms with B-factor of 10 using the new colour
    cmd.select("insufficient", selection + " & b = 10")
    cmd.color("insufficient_colour", "insufficient")




# Make command available in PyMOL
cmd.extend("colour_consurf", colour_consurf)
colour_consurf()

# Hide external molecules (hetatm selection)
cmd.select("het_atoms", "hetatm")
cmd.deselect()
cmd.hide(selection="het_atoms")
cmd.show(selection="all",representation="cartoon")

# Save PyMOL session to pse file
cmd.save("consurf_pymol_session.pse")

""")

PYMOL.close()

# write pymol color blind script
PYMOL_CBS = open(vars['working_dir'] + vars['pymol_CBS'], 'w')
PYMOL_CBS.write("""

# Define a Python subroutine to colour atoms by B-factor, using predefined intervals


def colour_consurf_CBS(selection="all"):

    # Colour other chains gray, while maintaining
    # oxygen in red, nitrogen in blue and hydrogen in white
    cmd.color("gray", selection)
    cmd.util.cnc()

    # These are constants
    minimum = 0.0
    maximum = 9.0
    n_colours = 9
    # Colours are calculated by dividing the RGB colours by 255
    # RGB = [[[27,120,55],[90,174,97],[166,219,160],[217,240,211],[255,255,255],
    #        [231,212,232],[194,165,207],[153,112,171],[118,42,131]]
    colours = [
                [0.058823529, 0.352941176, 0.137254902],
                [0.352941176, 0.68627451, 0.37254902],
                [0.647058824, 0.862745098, 0.62745098],
                [0.843137255, 0.941176471, 0.823529412],
                [1, 1, 1],
                [0.901960784, 0.823529412, 0.901960784],
                [0.764705882, 0.647058824, 0.803921569],
                [0.607843137, 0.431372549, 0.666666667],
                [0.470588235, 0.156862745, 0.509803922]]
    bin_size = (maximum - minimum) / n_colours

    # Loop through colour intervals
    for i in range(n_colours):

        lower = minimum + (i + 1) * bin_size
        upper = lower + bin_size
        colour = colours[i]

        # Print out B-factor limits and the colour for this group
        print(lower, " - ", upper, " = ", colour)

        # Define a unique name for the atoms which fall into this group
        group = selection + "_group_" + str(i + 1)

        # Compose a selection command which will select all atoms which are
        #	a) in the original selection, AND
        #	b) have B factor in range lower <= b < upper
        sel_string = selection + " & ! b < " + str(lower)

        if(i < n_colours):
            sel_string += " & b < " + str(upper)
        else:
            sel_string += " & ! b > " + str(upper)

        # Select the atoms
        cmd.select(group, sel_string)

        # Create a new colour
        colour_name = "colour_" + str(i + 1)
        cmd.set_color(colour_name, colour)

        # Colour them
        cmd.color(colour_name, group)


    # Create new colour for insufficient sequences
    # RGB_colour = [255,255,150]
    insuf_colour = [1, 1, 0.588235294]
    cmd.set_color("insufficient_colour", insuf_colour)

    # Colour atoms with B-factor of 10 using the new colour
    cmd.select("insufficient", selection + " & b = 10")
    cmd.color("insufficient_colour", "insufficient")




# Make command available in PyMOL
cmd.extend("colour_consurf_CBS", colour_consurf_CBS)
colour_consurf_CBS()

# Hide external molecules (hetatm selection)
cmd.select("het_atoms", "hetatm")
cmd.deselect()
cmd.hide(selection="het_atoms")
cmd.show(selection="all",representation="cartoon")

# Save PyMOL session to pse file
cmd.save("consurf_CBS_pymol_session.pse")

""")

PYMOL_CBS.close()

# get fonts
if not os.path.isfile("FONTS_READY"):

    print("Getting fonts.")
    os.system("wget \"https://sourceforge.net/projects/dejavu/files/dejavu/2.37/dejavu-fonts-ttf-2.37.tar.bz2\" -O \"FONTS.tar.bz2\"")
    FONTS = tarfile.open("FONTS.tar.bz2")
    FONTS.extractall()
    FONTS.close()
    os.system("touch FONTS_READY")

# install py3dmol
if not os.path.isfile("PY3DMOL_READY"):

    print("Installing py3dmol.")
    os.system("pip install py3Dmol")
    os.system("touch PY3DMOL_READY")

# install prank
if not os.path.isfile("PRANK_READY"):

    print("Installing prank.")
    os.system("apt-get install prank")
    os.system("touch PRANK_READY")

# install clustalw
if not os.path.isfile("CLUSTALW_READY"):

    print("Installing CLUSTALW.")
    os.system("apt-get -qq install -y clustalw")
    os.system("touch CLUSTALW_READY")

# install prottest
if not os.path.isfile("PROTTEST_READY"):

    print("Installing prottest.")
    #os.system("git clone https://github.com/ddarriba/prottest3.git")
    response = urllib.request.urlopen("https://github.com/ddarriba/prottest3/releases/download/3.4.2-release/prottest-3.4.2-20160508.tar.gz").read()

    prottest_zip = "prottest.tar.gz"
    PROTTEST_ZIP = open(prottest_zip, 'wb')
    PROTTEST_ZIP.write(response)
    PROTTEST_ZIP.close()

    PROTTEST_EXTRACT = tarfile.open(prottest_zip)
    PROTTEST_EXTRACT.extractall()
    PROTTEST_EXTRACT.close()
    os.system("touch PROTTEST_READY")

# install py3dmol
if not os.path.isfile("PY3DMOL_READY"):

    print("Installing py3dmol.")
    os.system("pip install py3Dmol")
    os.system("touch PY3DMOL_READY")

# install biopython
if not os.path.isfile("BIOPYTHON_READY"):

    print("Installing biopython.")
    os.system("pip install biopython")
    os.system("touch BIOPYTHON_READY")

# install fpdf
if not os.path.isfile("FPDF_READY"):

    print("Installing fpdf.")
    os.system("pip install fpdf")
    os.system("touch FPDF_READY")

# install muscle
if not os.path.isfile("MUSCLE_READY"):

    print("Installing muscle.")
    os.system("apt-get -qq install muscle")
    os.system("touch MUSCLE_READY")

# install mafft
if not os.path.isfile("MAFFT_READY"):

    print("Installing mafft.")
    os.system("apt-get install mafft")
    os.system("touch MAFFT_READY")

# install rate4site
if not os.path.isfile("RATE4SITE_READY"):

    print("Installing rate4site.")

    # create directory for rate4site
    os.system("git clone https://github.com/barakav/r4s_for_collab.git")

    # create directory for rate4site slow
    shutil.copytree(vars['rate4site_dir'], vars['rate4site_slow_dir'])

    # make rate4site
    try:

        os.chdir(vars['rate4site_dir'])
        os.system("make")
        os.chdir(vars['root_dir'])

    except Exception as e:

        print(e)
        os.chdir(vars['root_dir'])
        raise("Installing rate4site failed.")

    # change the make file and make rate4site slow
    try:

        os.chdir(vars['rate4site_slow_dir'])
        os.remove("Makefile") # delete regular file
        os.rename("Makefile_slow", "Makefile") # change to slow file
        os.system("make")
        os.chdir(vars['root_dir'])

    except Exception as e:

        print(e)
        os.chdir(vars['root_dir'])
        raise("Installing rate4site failed.")

    os.system("touch RATE4SITE_READY")


# install cd-hit
if not os.path.isfile("CDHIT_READY"):

    print("installing cd-hit.")
    os.system("git clone https://github.com/weizhongli/cdhit.git")
    """
    response = urllib.request.urlopen("https://github.com/weizhongli/cdhit/archive/refs/heads/master.zip").read()

    cd_hit_zip = "cd-hit.zip"
    CD_HIT_ZIP = open(cd_hit_zip, 'wb')
    CD_HIT_ZIP.write(response)
    CD_HIT_ZIP.close()

    CD_HIT_EXTRACT = ZipFile(cd_hit_zip, 'r')
    CD_HIT_EXTRACT.extractall()
    CD_HIT_EXTRACT.close()
    """
    try:

        os.chdir("cdhit")
        os.system("make")
        os.chdir(vars['root_dir'])

    except Exception as e:

        print(e)
        os.chdir(vars['root_dir'])
        raise("Installing cd-hit failed.")

    os.system("touch CDHIT_READY")

# install mmseqs2
if not os.path.isfile("COLABFOLD_READY"):

    print("Installing colabfold.")
    os.system("pip install -q --no-warn-conflicts 'colabfold[alphafold-minus-jax] @ git+https://github.com/sokrypton/ColabFold'")
    os.system("pip install --upgrade dm-haiku")
    os.system("ln -s /usr/local/lib/python3.*/dist-packages/colabfold colabfold")
    os.system("ln -s /usr/local/lib/python3.*/dist-packages/alphafold alphafold")
    # patch for jax > 0.3.25
    os.system("sed -i 's/weights = jax.nn.softmax(logits)/logits=jnp.clip(logits,-1e8,1e8);weights=jax.nn.softmax(logits)/g' alphafold/model/modules.py")
    os.system("pip install -q biopython==1.81")
    os.system("touch COLABFOLD_READY")

# create matrix for pairwise alignment
if not os.path.isfile("MATRIX_READY"):

    MATRIX = open("matrix.txt", 'w')
    MATRIX.write("""#  Matrix made by matblas from blosum62.iij
#  * column uses minimum score
#  BLOSUM Clustered Scoring Matrix in 1/2 Bit Units
#  Blocks Database = /data/blocks_5.0/blocks.dat
#  Cluster Percentage: >= 62
#  Entropy =   0.6979, Expected =  -0.5209
     A    R    N    D    C    Q    E    G    H    I    L    K    M    F    P    S    T    W    Y    V    B    Z    X    *
A  4.0 -1.0 -2.0 -2.0  0.0 -1.0 -1.0  0.0 -2.0 -1.0 -1.0 -1.0 -1.0 -2.0 -1.0  1.0  0.0 -3.0 -2.0  0.0 -2.0 -1.0  0.0 -4.0
R -1.0  5.0  0.0 -2.0 -3.0  1.0  0.0 -2.0  0.0 -3.0 -2.0  2.0 -1.0 -3.0 -2.0 -1.0 -1.0 -3.0 -2.0 -3.0 -1.0  0.0  0.0 -4.0
N -2.0  0.0  6.0  1.0 -3.0  0.0  0.0  0.0  1.0 -3.0 -3.0  0.0 -2.0 -3.0 -2.0  1.0  0.0 -4.0 -2.0 -3.0  3.0  0.0  0.0 -4.0
D -2.0 -2.0  1.0  6.0 -3.0  0.0  2.0 -1.0 -1.0 -3.0 -4.0 -1.0 -3.0 -3.0 -1.0  0.0 -1.0 -4.0 -3.0 -3.0  4.0  1.0  0.0 -4.0
C  0.0 -3.0 -3.0 -3.0  9.0 -3.0 -4.0 -3.0 -3.0 -1.0 -1.0 -3.0 -1.0 -2.0 -3.0 -1.0 -1.0 -2.0 -2.0 -1.0 -3.0 -3.0  0.0 -4.0
Q -1.0  1.0  0.0  0.0 -3.0  5.0  2.0 -2.0  0.0 -3.0 -2.0  1.0  0.0 -3.0 -1.0  0.0 -1.0 -2.0 -1.0 -2.0  0.0  3.0  0.0 -4.0
E -1.0  0.0  0.0  2.0 -4.0  2.0  5.0 -2.0  0.0 -3.0 -3.0  1.0 -2.0 -3.0 -1.0  0.0 -1.0 -3.0 -2.0 -2.0  1.0  4.0  0.0 -4.0
G  0.0 -2.0  0.0 -1.0 -3.0 -2.0 -2.0  6.0 -2.0 -4.0 -4.0 -2.0 -3.0 -3.0 -2.0  0.0 -2.0 -2.0 -3.0 -3.0 -1.0 -2.0  0.0 -4.0
H -2.0  0.0  1.0 -1.0 -3.0  0.0  0.0 -2.0  8.0 -3.0 -3.0 -1.0 -2.0 -1.0 -2.0 -1.0 -2.0 -2.0  2.0 -3.0  0.0  0.0  0.0 -4.0
I -1.0 -3.0 -3.0 -3.0 -1.0 -3.0 -3.0 -4.0 -3.0  4.0  2.0 -3.0  1.0  0.0 -3.0 -2.0 -1.0 -3.0 -1.0  3.0 -3.0 -3.0  0.0 -4.0
L -1.0 -2.0 -3.0 -4.0 -1.0 -2.0 -3.0 -4.0 -3.0  2.0  4.0 -2.0  2.0  0.0 -3.0 -2.0 -1.0 -2.0 -1.0  1.0 -4.0 -3.0  0.0 -4.0
K -1.0  2.0  0.0 -1.0 -3.0  1.0  1.0 -2.0 -1.0 -3.0 -2.0  5.0 -1.0 -3.0 -1.0  0.0 -1.0 -3.0 -2.0 -2.0  0.0  1.0  0.0 -4.0
M -1.0 -1.0 -2.0 -3.0 -1.0  0.0 -2.0 -3.0 -2.0  1.0  2.0 -1.0  5.0  0.0 -2.0 -1.0 -1.0 -1.0 -1.0  1.0 -3.0 -1.0  0.0 -4.0
F -2.0 -3.0 -3.0 -3.0 -2.0 -3.0 -3.0 -3.0 -1.0  0.0  0.0 -3.0  0.0  6.0 -4.0 -2.0 -2.0  1.0  3.0 -1.0 -3.0 -3.0  0.0 -4.0
P -1.0 -2.0 -2.0 -1.0 -3.0 -1.0 -1.0 -2.0 -2.0 -3.0 -3.0 -1.0 -2.0 -4.0  7.0 -1.0 -1.0 -4.0 -3.0 -2.0 -2.0 -1.0  0.0 -4.0
S  1.0 -1.0  1.0  0.0 -1.0  0.0  0.0  0.0 -1.0 -2.0 -2.0  0.0 -1.0 -2.0 -1.0  4.0  1.0 -3.0 -2.0 -2.0  0.0  0.0  0.0 -4.0
T  0.0 -1.0  0.0 -1.0 -1.0 -1.0 -1.0 -2.0 -2.0 -1.0 -1.0 -1.0 -1.0 -2.0 -1.0  1.0  5.0 -2.0 -2.0  0.0 -1.0 -1.0  0.0 -4.0
W -3.0 -3.0 -4.0 -4.0 -2.0 -2.0 -3.0 -2.0 -2.0 -3.0 -2.0 -3.0 -1.0  1.0 -4.0 -3.0 -2.0 11.0  2.0 -3.0 -4.0 -3.0  0.0 -4.0
Y -2.0 -2.0 -2.0 -3.0 -2.0 -1.0 -2.0 -3.0  2.0 -1.0 -1.0 -2.0 -1.0  3.0 -3.0 -2.0 -2.0  2.0  7.0 -1.0 -3.0 -2.0  0.0 -4.0
V  0.0 -3.0 -3.0 -3.0 -1.0 -2.0 -2.0 -3.0 -3.0  3.0  1.0 -2.0  1.0 -1.0 -2.0 -2.0  0.0 -3.0 -1.0  4.0 -3.0 -2.0  0.0 -4.0
B -2.0 -1.0  3.0  4.0 -3.0  0.0  1.0 -1.0  0.0 -3.0 -4.0  0.0 -3.0 -3.0 -2.0  0.0 -1.0 -4.0 -3.0 -3.0  4.0  1.0  0.0 -4.0
Z -1.0  0.0  0.0  1.0 -3.0  3.0  4.0 -2.0  0.0 -3.0 -3.0  1.0 -1.0 -3.0 -1.0  0.0 -1.0 -3.0 -2.0 -2.0  1.0  4.0  0.0 -4.0
X  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  4.0 -4.0
* -4.0 -4.0 -4.0 -4.0 -4.0 -4.0 -4.0 -4.0 -4.0 -4.0 -4.0 -4.0 -4.0 -4.0 -4.0 -4.0 -4.0 -4.0 -4.0 -4.0 -4.0 -4.0 -4.0  1.0
""")

    MATRIX.close()

    os.system("touch MATRIX_READY")

os.chdir(vars['job_name'])

print("\n")

from datetime import date
from datetime import datetime
from colabfold.batch import get_msa_and_templates
from Bio import AlignIO
from Bio import SeqIO
from Bio import SearchIO
from Bio import Phylo
from Bio import Align
from Bio.Align import substitution_matrices
from google.colab import files, output
from Bio.Blast import NCBIWWW
import Bio
import fpdf
import math
import py3Dmol
import uuid

def choose_chain(PDB_name):

    # in the case of nmr we choose the first model
    try:

        PDB_FILE = open(PDB_name, 'r')

    except:

        raise Exception("Can't read the PDB file.")

    # later we find the format (PDB/mmCIF) and we change the files name
    temp_pdb = "temp_file.txt"
    try:

        PDB_TEMP = open(temp_pdb, 'w')

    except:

        raise Exception("Error: choose_chain - Can't open file for writing.")

    line = PDB_FILE.readline()
    while line != "":

        if re.match(r'^MODEL\s+2', line):

            # nmr found
            break

        else:

            PDB_TEMP.write(line)

        line = PDB_FILE.readline()

    PDB_TEMP.close()
    PDB_FILE.close()

    # we now find the chains
    PDB_FILE = open(temp_pdb, 'r')



    chains = []
    last_chain_found = ""
    chain_index = 0
    nmr_struct = False

    # for mmCIF format
    found_auth_comp_id_column = False
    found_auth_asym_id_column = False
    found_label_comp_id_column = False
    found_label_asym_id_column = False
    found_column_numbers = False
    auth_comp_id_column = 0
    auth_asym_id_column = 0
    label_comp_id_column = 0
    label_asym_id_column = 0

    line = PDB_FILE.readline()
    while line != "":

        if line[:4] == "ATOM":

            if (found_auth_comp_id_column or found_label_comp_id_column) and (found_auth_asym_id_column or found_label_asym_id_column):

                # mmCIF format
                found_column_numbers = True
                words = line.split()
                num_columns = len(words)
                if found_auth_comp_id_column:

                    acid = words[num_columns - auth_comp_id_column]

                else:

                    acid = words[num_columns - label_comp_id_column]

                if len(acid) == 3:

                    # it's an amino acid
                    if found_auth_asym_id_column:

                        chain = words[num_columns - auth_asym_id_column]

                    else:

                        chain = words[num_columns - label_asym_id_column]

                    if chain != last_chain_found:

                        chains.append(chain)
                        last_chain_found = chain

            else:

                # PDB format
                acid = line[17:20]
                if len(acid) == 3:

                    # it's an amino acid
                    chain = line[21:22]
                    if chain == " ":

                        chains[0] = "NONE"

                    elif chain != last_chain_found:

                        chains.append(chain)
                        last_chain_found = chain

        # ATOM not found
        # the format maybe mmCIF
        # in this case we need to know what each column means
        elif line.strip() == "_atom_site.auth_comp_id":

            found_auth_comp_id_column = True

        elif line.strip() == "_atom_site.auth_asym_id":

            found_auth_asym_id_column = True

        elif line.strip() == "_atom_site.label_comp_id":

            found_label_comp_id_column = True

        elif line.strip() == "_atom_site.label_asym_id":

            found_label_asym_id_column = True

        if found_auth_comp_id_column and not found_column_numbers:

            auth_comp_id_column += 1

        if found_auth_asym_id_column and not found_column_numbers:

            auth_asym_id_column += 1

        if found_label_comp_id_column and not found_column_numbers:

            label_comp_id_column += 1

        if found_label_asym_id_column and not found_column_numbers:

            label_asym_id_column += 1

        line = PDB_FILE.readline()

    # we found the chains
    # we change the file name according to the format
    if found_column_numbers:

        vars['cif_or_pdb'] = "cif"
        vars['pdb_file_name'] = "file.cif"

    else:

        vars['cif_or_pdb'] = "pdb"
        vars['pdb_file_name'] = "pdb_file.ent"

    os.rename(temp_pdb, vars['pdb_file_name'])
    return chains


def check_msa_tree_match(ref_msa_seqs, ref_tree_nodes):

    for node in ref_tree_nodes:

        if not node in ref_msa_seqs:

            raise Exception("The node %s is in the tree and not in the MSA." %node)

    for seq_name in ref_msa_seqs: #check that all the msa nodes are in the tree

        if not seq_name in ref_tree_nodes:

            raise Exception("The sequence %s is in the msa and not in the tree" %seq_name)

    vars['unique_seqs'] = len(ref_msa_seqs)

def check_validity_tree_file():

	  # checks validity of tree file and returns an array with the names of the nodes
    try:

        TREEFILE = open(vars['tree_file'], 'r')

    except:

        raise Exception("Can't read the tree file.")

    tree = TREEFILE.read()
    TREEFILE.close()
    tree.replace("\n", "")
    if tree[-1] != ';':

        tree += ';'

    leftBrackets = 0
    rightBrackets = 0
    noRegularFormatChar = ""
    nodes = []
    node_name = ""
    in_node_name = False
    in_node_score = False
    for char in tree:

        if char == ':':

            if in_node_name:

                nodes.append(node_name)

            node_name = ""
            in_node_name = False
            in_node_score = True

        elif char == '(':

            leftBrackets += 1

        elif char == ')':

            rightBrackets += 1
            in_node_score = False

        elif char == ',':

            in_node_score = False

        elif char != ';':

            if char in "!@#$^&*~`{}'?<>\\" and not char in noRegularFormatChar:

                noRegularFormatChar += " '" + char + "', "

            if not in_node_score:

                node_name += char
                in_node_name = True

    if leftBrackets != rightBrackets:

        raise Exception("The tree is missing parentheses.")

    if noRegularFormatChar != "":

        raise Exception("The tree contains the following characters " + noRegularFormatChar[:-2])

    return nodes


def check_msa_format():

    MSA = open(vars['user_msa_file_name'], 'r')
    line = MSA.readline()
    while line != "":

        line = line.strip()
        if line == "":

            line = MSA.readline()
            continue

        if line[:4] == "MSF:":

            format = "msf"
            break

        elif line[0] == '>':

            format = "fasta"
            break

        elif line[0] == '#':

            format = "nexus"
            break

        elif line[0] == 'C':

            format = "clustal"
            break

        elif line[0] == 'P':

            format = "gcg"
            break

        else:

            MSA.close()
            raise Exception("Unknown format.")

        line = MSA.readline()

    MSA.close()
    return format

def multiple_chains(chains):

    print("Please choose a chain from this list:")
    i = 1
    for chain in chains:

        print("%d. %s" %(i, chain))
        i += 1

    while True:

        chain_index = input("Press the number of chain.\n")
        if chain_index.isdigit():

            chain_index = int(chain_index)
            if chain_index > 0 and chain_index < i:

                form['PDB_chain'] = chains[chain_index - 1]
                print("You chose the chain %s.\n" %form['PDB_chain'])
                break

        print("Wrong input.")

def upload_PDB():

    # we get the PDB file
    while True:

        PDB_uniprot = input("Do you have a PDB/uniprot ID? (Y/N):\n")
        if PDB_uniprot.upper() == "Y":

            ID = input("Please enter your ID:\n")
            ID = ID.upper().strip()
            form['pdb_ID'] = ID
            if len(ID) == 4:

                # PDB ID
                pdb_url = "https://files.rcsb.org/download/%s.pdb" %ID
                cif_url= "https://files.rcsb.org/download/%s.cif" %ID

            else:

                # uniprot ID
                pdb_url = "https://alphafold.ebi.ac.uk/files/AF-%s-F1-model_v6.pdb" %ID
                cif_url= "https://alphafold.ebi.ac.uk/files/AF-%s-F1-model_v6.cif" %ID

            try:

                response = urllib.request.urlopen(pdb_url).read()

            except:

                try:

                    response = urllib.request.urlopen(cif_url).read()

                except:

                    raise Exception("Could not download model file.")

            PDB_name = "temp_PDB.txt"
            PDB_FILE = open(PDB_name, 'wb')
            PDB_FILE.write(response)
            PDB_FILE.close()
            vars['Used_PDB_Name'] = vars['job_name'] + "_" + ID
            print()
            break

        elif PDB_uniprot.upper() == "N":

            print("Please upload your model.\n")
            PDB_file = files.upload()
            PDB_name = (list(PDB_file))[0]
            vars['Used_PDB_Name'] = PDB_name.replace(" ", "")
            match = re.search(r'(\S+)\.', vars['Used_PDB_Name'])
            if match:

                vars['Used_PDB_Name'] = vars['job_name'] + "_" + match.group(1)

            break

        else:

            print("Wrong input.")

    # the user must choose a chain
    chains = choose_chain(PDB_name)
    if len(chains) == 1:

        form['PDB_chain'] = chains[0]
        print("The PDB has only one chain %s\n" %form['PDB_chain'])

    else:

        multiple_chains(chains)

    os.rename(PDB_name, vars['pdb_file_name'])

    if form['PDB_chain'] != "NONE":

        vars['Used_PDB_Name'] += "_" + form['PDB_chain']

def upload_sequence():

    vars['query_string'] = "Input_seq"
    vars['SEQRES_seq'] = ""
    vars['ATOM_without_X_seq'] = ""
    while True:

        vars['protein_seq_string'] = input("Please enter your sequence.\n")
        if '>' in vars['protein_seq_string']:

            print("You should upload only one sequence, without the sequence name. If you want to upload a MSA, use the correct mode.")

        else:

            break

    # delete sequence name and white spaces
    #vars['protein_seq_string'] = re.sub(r'>\S*', "", vars['protein_seq_string'])
    vars['protein_seq_string'] = re.sub(r'\s', "", vars['protein_seq_string'])
    print()
    if re.match(r'^[actguACTGUNn]+$', vars['protein_seq_string']):

        raise Exception("Your sequence is only composed of Nucleotides (i.e. :A,T,C,G).")

def upload_MSA():

    print("Please upload your MSA.")
    MSA_file = files.upload()
    MSA_name = (list(MSA_file))[0]

    vars['user_msa_file_name'] = MSA_name
    format = check_msa_format()
    alignment = AlignIO.read(MSA_name, format)

    MSA_FASTA = open(vars['msa_fasta'], 'w')
    seq_names = []
    seqs = []
    num_of_seq = 0
    for record in alignment:

        num_of_seq += 1

        seq_name = record.id
        seq = record.seq

        # we save the sequences and their name for two reasons
        # to let the user choose the query
        # to check if they much the tree, if there is one.
        seq_names.append(seq_name)
        seqs.append(seq)

        # we write the msa in fasta format
        MSA_FASTA.write(">%s\n%s\n" %(seq_name, seq))

    MSA_FASTA.close()

    if num_of_seq < 5:

        raise Exception("There are %d sequences in the msa. There must be at least five." %num_of_seq)

    vars['unique_seqs'] = num_of_seq
    vars['final_number_of_homologoues'] = num_of_seq

    print("\nThe names of the sequences are:\n")
    i = 1
    for seq_name in seq_names:

        print("%d. %s" %(i, seq_name))
        i += 1

    while True:

        query_number = input("\nEnter the number of the query.\n")
        if query_number.isdigit():

            query_number = int(query_number)
            if query_number > 0 and query_number < i:

                vars['query_string'] = seq_names[query_number - 1]
                vars['protein_seq_string'] = seqs[query_number - 1]
                vars['protein_seq_string'] = vars['protein_seq_string'].replace("-", "")
                vars['protein_seq_string'] = vars['protein_seq_string'].upper()
                break

        print("Wrong input.")

    print("Chosen query is %s\n" %vars['query_string'])

    try:

        MSA_FASTA = open(vars['user_msa_file_name'], 'w')

    except:

        raise Exception("Can't open file for writing")

    for i in range(len(seq_names)):

        MSA_FASTA.write(">%s\n%s\n" %(seq_names[i], seqs[i]))

    MSA_FASTA.close()

    vars['msa_SEQNAME'] = vars['query_string']

    # include Tree
    if vars['running_mode'] == "_mode_pdb_msa_tree" or vars['running_mode'] == "_mode_msa_tree":

        print("Please upload your tree.")
        Tree_file = files.upload()
        Tree_name = (list(Tree_file))[0]
        os.rename(Tree_name, vars['tree_file'])
        nodes = check_validity_tree_file()
        check_msa_tree_match(seq_names, nodes)
        print()

def create_MSA_parameters():

    # the user may want to change the default parameters
    form['E_VALUE'] = 0.0001
    form['MAX_NUM_HOMOL'] = "150"
    vars['hit_redundancy'] = 95
    form['MSAprogram'] = "MAFFT"
    form['MIN_IDENTITY'] = 35
    form['best_uniform_sequences'] = "sample"
    while True:

        print("\nMSA parameters:")
        print("1. Maximum number of homologs - %s" %form['MAX_NUM_HOMOL'])
        print("2. Maximum redundancy - %d" %vars['hit_redundancy'])
        print("3. MSA building program - %s" %form['MSAprogram'])
        print("4. Minimum identity - %d" %form['MIN_IDENTITY'])
        print("5. Homolog selection - %s" %form['best_uniform_sequences'])
        print("6. E-value cutoff - %.16g" %form['E_VALUE'])
        params = input("Would you like to change? (Y/N):\n")
        if params.upper() == "Y":

            number = input("Enter the number of the field you want to change.\n")
            if number == "1":

                MAX_NUM_HOMOL = input("What should the maximum number of homologs be?\n")
                if MAX_NUM_HOMOL.isdigit() and int(MAX_NUM_HOMOL) > 0:

                    form['MAX_NUM_HOMOL'] = MAX_NUM_HOMOL

                else:

                    print("Wrong input")

            elif number == "2":

                MAX_REDUNDANCY = input("What should the maximum redundancy be?\n")
                if MAX_REDUNDANCY.isdigit() and int(MAX_REDUNDANCY) > 0:

                    if int(MAX_REDUNDANCY) <= 100:

                        vars['hit_redundancy'] = int(MAX_REDUNDANCY)

                    else:

                        vars['hit_redundancy'] = 99.99999999999

                else:

                    print("Wrong input")

            elif number == "3":

                print("The program to build the msa are:\n1. mafft.\n2. muscle.\n3. CLUSTALW.\n4. prank.")
                MSAprogram = input("Choose the number of the program.\n")
                if MSAprogram == "1":

                    form['MSAprogram'] = "MAFFT"

                elif MSAprogram == "2":

                    form['MSAprogram'] = "MUSCLE"

                elif MSAprogram == "3":

                    form['MSAprogram'] = "CLUSTALW"

                elif MSAprogram == "4":

                    form['MSAprogram'] = "PRANK"

                else:

                    print("Wrong input")

            elif number == "4":

                MIN_IDENTITY = input("What should the minimum identity be?\n")
                if MIN_IDENTITY.isdigit() and int(MIN_IDENTITY) > 0:

                    form['MIN_IDENTITY'] = int(MIN_IDENTITY)

                else:

                    print("Wrong input.")

            elif number == "5":

                print("We either sample from the list of homologs (sample) or take the homologs closest to the query (closest).")
                best_uniform = input("What the homolog selection method should be? (sample/closest):\n")
                if best_uniform == "closest":

                    form['best_uniform_sequences'] = "closest"

                elif best_uniform == "sample":

                    form['best_uniform_sequences'] = "sample"

                else:

                    print("Wrong input.")

            elif number == "6":

                print("Sequences with higher e-value will be rejected.")
                form['E_VALUE'] = float(input("What should the e-value be?\n"))

            else:

                print("Wrong input.")


        elif params.upper() == "N":

            print()
            break

        else:

            print("Wrong input.")


mode = "PDB_MSA" #@param ["PDB", "PDB_MSA", "PDB_MSA_Tree", "MSA", "MSA_Tree", "Sequence"]

if mode == "PDB":

    vars['running_mode'] = "_mode_pdb_no_msa"

elif mode == "PDB_MSA":

    vars['running_mode'] = "_mode_pdb_msa"

elif mode == "PDB_MSA_Tree":

    vars['running_mode'] = "_mode_pdb_msa_tree"

elif mode == "MSA":

    vars['running_mode'] = "_mode_msa"

elif mode == "MSA_Tree":

    vars['running_mode'] = "_mode_msa_tree"

elif mode == "Sequence":

    vars['running_mode'] = "_mode_no_pdb_no_msa"

vars['tree_file'] = vars['job_name'] + "_Tree.txt"
vars['msa_fasta'] = vars['job_name'] + "_msa_fasta.aln" # msa copy in fasta format
"""
if vars['running_mode'] == "_mode_pdb_no_msa" or vars['running_mode'] == "_mode_pdb_msa" or vars['running_mode'] == "_mode_pdb_msa_tree":

    upload_PDB()

elif vars['running_mode'] == "_mode_no_pdb_no_msa":

    upload_sequence()

if vars['running_mode'] == "_mode_pdb_msa" or vars['running_mode'] == "_mode_msa" or vars['running_mode'] == "_mode_pdb_msa_tree" or vars['running_mode'] == "_mode_msa_tree":

    upload_MSA()

else:

   create_MSA_parameters()
"""
# chose the best substitution model
substitution_model = "Dayhoff" #@param ["choose the best substitution model using porttest","JTT","LG","mtREV","cpREV","WAG","Dayhoff"]
if substitution_model == "choose the best substitution model using porttest":

    form['SUB_MATRIX'] = "BEST"

else:

    form['SUB_MATRIX'] = substitution_model

"""
while True:

    if form['SUB_MATRIX'] == "BEST":

        print("The best substitution model is chosen using porttest.")

    else:

        print("The substitution model is %s." %form['SUB_MATRIX'])

    change_model = input("Would you like to change the substitution model? (Y/N)\n")
    if change_model.upper() == "Y":

        print("There are 6 models. The prottest program can choose the best model.\n1. JTT\n2. LG\n3. mtREV (for mitochondrial proteins)\n4. cpREV (for chloroplasts proteins)\n5. WAG\n6. Dayhoff\n7. Use prottest to choose the best model.")
        model_num = input("Please select the number of the option you chose.\n")
        if model_num == "1":

            form['SUB_MATRIX'] = "JTT"

        elif model_num == "2":

            form['SUB_MATRIX'] = "LG"

        elif model_num == "3":

            form['SUB_MATRIX'] = "mtREV"

        elif model_num == "4":

            form['SUB_MATRIX'] = "cpREV"

        elif model_num == "5":

            form['SUB_MATRIX'] = "WAG"

        elif model_num == "6":

            form['SUB_MATRIX'] = "Dayhoff"

        elif model_num == "7":

            form['SUB_MATRIX'] = "BEST"

        else:

            print("Wrong input.")

    elif change_model.upper() == "N":

        print()
        break

    else:

        print("Wrong input.")
"""
# choose rate4site algorithm, baysean or maximum likelihood
rate4site_algorithm = "maximum_likelihood" #@param ["baysean","maximum_likelihood"]
if rate4site_algorithm == "baysean":

    form['ALGORITHM'] = "Bayes"

else:

    form['ALGORITHM'] = "Maximum"

"""
while True:

    if form['ALGORITHM'] == "Bayes":

        print("The coloring algorithm is baysean.")

    else:

        print("The coloring algorithm is maximum likelihood.")

    change_algorithm = input("Would you like to change the coloring algorithm? (Y/N):\n")
    if change_algorithm.upper() == "Y":

        coloring_algorithm = input("Should the algorithm be baysean (b) or maximum likelihood (m)? (b/m):\n")
        if coloring_algorithm.upper() == "B":

            form['ALGORITHM'] = "Bayes"

        elif coloring_algorithm.upper() == "M":

            form['ALGORITHM'] = "Maximum"

        else:

            print("Wrong input.")

    elif change_algorithm.upper() == "N":

        break

    else:

        print("Wrong input.")
"""

def print_instructions(pdb_with_grades, cif_pdb, pdb_with_grades_isd = False):

    print("To create pymol with consurf:")
    if pdb_with_grades_isd:

        create_download_link(pdb_with_grades_isd, "Download the modified %s file showing insufficient data" %cif_pdb)
        print("or")
        create_download_link(pdb_with_grades, "the %s file hiding insufficient data" %cif_pdb)

    else:

        create_download_link(pdb_with_grades, "Download the modified %s file" %cif_pdb)

    print("which contains ConSurf's color grades.\nDownload the coloring script")
    create_download_link(vars['pymol'], "regular version")
    print("or")
    create_download_link(vars['pymol_CBS'], "color blind")
    print("1) Start the PyMOL program.\n2) Drag the %s file to the pymol window.\n3) Drag the pymol coloring acript to the window.\n" %cif_pdb)


def print_legend(cbs):

    text = """<style>
.scaleRowStrecher .scaleRowStrecherInner{display: flex;}
.scaleRowStrecher .scaleRowStrecherInner .scaleColorRect{width: 50px;text-align: center;line-height: 32px;font-weight: 500;font-size: 20px;}
.scaleRowStrecher .scaleRowStrecherInner .scaleColorRect.white{color: #ffffff;}
.scaleRowStrecher .scaleRowStrecherInner .scaleColorRect span{width: 100%;}
.scaleRowStrecher .label{display: flex;padding-right: 0;}
.scaleRowStrecher .label div{width: 33.33%;text-align: center;font-weight: 500;font-size: 18px}
.scaleRowStrecher .label .leftLabel{text-align: left;}
.scaleRowStrecher .label .rightLabel{text-align: right;}
#consrvScaleDiv a.download{margin-top: 20px; color: #400080;text-decoration: underline;transition: 0.5s all ease;-webkit-transition: 0.5s all ease;-o-transition: 0.5s all ease;-moz-transition: 0.5s all ease;}
.scaleRowStrecher .insu_data{font-weight: 500;font-size: 18px;margin-top: 20px;}
.scaleRowStrecher .insu_data span{display: inline-block;vertical-align: middle;height: 25px;width: 50px;background-color: #f8f499;margin-right: 10px;}
.scaleRowStrecher {float: left; margin-top: 20px;margin-bottom: 20px;}
.scaleRowStrecher .scaleRowStrecherInner{display: flex;}
.scaleRowStrecher .scaleRowStrecherInner .scaleColorRect.white{color: #ffffff;}
.scaleRowStrecher .scaleRowStrecherInner .scaleColorRect span{width: 100%;}
.scaleRowStrecher .label{display: flex;}
.scaleRowStrecher .label div{width: 33.33%;text-align: center;font-weight: 500;font-size: 18px}
.scaleRowStrecher .label .leftLabel{text-align: left;}
.scaleRowStrecher .label .rightLabel{text-align: right;}
#consrvScaleDiv a.download{margin-top: 20px; color: #400080;text-decoration: underline;transition: 0.5s all ease;-webkit-transition: 0.5s all ease;-o-transition: 0.5s all ease;-moz-transition: 0.5s all ease;}
.scaleRowStrecher .insu_data{font-weight: 500;font-size: 18px;margin-top: 20px;}
.scaleRowStrecher .insu_data span{display: inline-block;vertical-align: middle;height: 25px;width: 50px;background-color: #f8f499;margin-right: 10px;}
.result_c1{background-color: #0a7d82;color: #ffffff;}
.result_c2{background-color: #44afbf;}
.result_c3{background-color: #a5dce6;}
.result_c4{background-color: #d7f0f0;}
.result_c5{background-color: #ffffff;}
.result_c6{background-color: #faebf5;}
.result_c7{background-color: #fac8dc;}
.result_c8{background-color: #f07daa;}
.result_c9{background-color: #a0285f;color: #ffffff;}
.result_c1_CBS{background-color: #0f5a23;color: #ffffff;}
.result_c2_CBS{background-color: #5aaf5f;}
.result_c3_CBS{background-color: #a5dca0;}
.result_c4_CBS{background-color: #d7f0d2;}
.result_c5_CBS{background-color: #ffffff;}
.result_c6_CBS{background-color: #e6d2e6;}
.result_c7_CBS{background-color: #c3a5cd;}
.result_c8_CBS{background-color: #9b6eaa;}
.result_c9_CBS{background-color: #782882;color: #ffffff;}
    </style>
    """
    if cbs:

        text += """
    <div>
        <div class="scaleRowStrecher">
            <div class="scaleRowStrecherInner">
            <div class="scaleColorRect result_c1_CBS"><span>1</span></div>
            <div class="scaleColorRect result_c2_CBS"><span>2</span></div>
            <div class="scaleColorRect result_c3_CBS"><span>3</span></div>
            <div class="scaleColorRect result_c4_CBS"><span>4</span></div>
            <div class="scaleColorRect result_c5_CBS"><span>5</span></div>
            <div class="scaleColorRect result_c6_CBS"><span>6</span></div>
            <div class="scaleColorRect result_c7_CBS"><span>7</span></div>
            <div class="scaleColorRect result_c8_CBS"><span>8</span></div>
            <div class="scaleColorRect result_c9_CBS"><span>9</span></div>
        </div>
        <div class="label">
            <div class="leftLabel">Variable</div>
            <div class="centerLabel">Average</div>
            <div class="rightLabel">Conserved</div>
        </div>
        <div class="insu_data">
            <span></span>Insufficient Data
        </div>
    </div>
            """

    else:

        text += """
    <div>
        <div class="scaleRowStrecher">
            <div class="scaleRowStrecherInner">
            <div class="scaleColorRect result_c1"><span>1</span></div>
            <div class="scaleColorRect result_c2"><span>2</span></div>
            <div class="scaleColorRect result_c3"><span>3</span></div>
            <div class="scaleColorRect result_c4"><span>4</span></div>
            <div class="scaleColorRect result_c5"><span>5</span></div>
            <div class="scaleColorRect result_c6"><span>6</span></div>
            <div class="scaleColorRect result_c7"><span>7</span></div>
            <div class="scaleColorRect result_c8"><span>8</span></div>
            <div class="scaleColorRect result_c9"><span>9</span></div>
        </div>
        <div class="label">
            <div class="leftLabel">Variable</div>
            <div class="centerLabel">Average</div>
            <div class="rightLabel">Conserved</div>
        </div>
        <div class="insu_data">
            <span></span>Insufficient Data
        </div>
    </div>
            """

    display(HTML(text))


def print_msa_colors_FASTA_clustalwLike(grades, MSA, SeqName, cbs):

    # Print the results: the colored sequence of the msa acording to the query sequence

    Header = "ConSurf Color-Coded MSA for Job:%s Date:%s" %(vars['job_name'], vars['date'])

    blockSize = 50

    if form['DNA_AA'] == "AA":

        unknownChar = 'X'

    else:

        unknownChar = 'N'

    if cbs:

        Out_file = vars['job_name'] + "_colored_MSA_CBS.html"

    else:

        Out_file = vars['job_name'] + "_colored_MSA.html"

    try:

        OUT = open(Out_file, 'w')

    except:

        exit_on_error("sys_error", "could not open the file " + Out_file + " for writing.")

    OUT.write("<!DOCTYPE html>\n")
    OUT.write("<html lang=\"en\">\n")
    OUT.write("<head>\n")
    OUT.write("<meta charset=\"utf-8\">\n")
    OUT.write("<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\n")
    OUT.write("<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\n")
    #OUT.write("<link rel=\"stylesheet\" type=\"text/css\" href=\"%s\">\n" %CSS_File)
    OUT.write("<style>")
    if cbs:

        OUT.write("""table {
	/*	table-layout: auto; */
		table-layout: fixed;
        margin-left: 0em;
        margin-right: 0em;
        padding:1em 1em 1em 1em;
	    margin:1em 1em 1em 1em;
        border-collapse: collapse;
      }
td {
        font-family: "Courier New", Courier, monospace;
        font-size:1em;
        font-weight: bold;
        text-align: center;
        overflow:hidden;
     /*   white-space:nowrap; */
		 white-space:pre;
/*	 	padding:0.1em 0.1em 0.1em 0.1em; */
/*	 	margin:0.5em 0.5em 0.5em 0.5em;*/
		width: 1em;
      }
td.Seq_Name{
        text-align: left;
        width: 15em;
        padding-right:1em;
      }
td.Score9{
        color: #FFFFFF;
        background: #782882;
  	}
td.Score8{
        background: #9B6EAA;
        }
td.Score7{
        background: #C3A5CD;
        }
td.Score6{
        background: #E6D2E6;
        }
td.Score5{
        background: #FFFFFF;
        }
td.Score4{
        background: #D7F0D2;
	}
td.Score3{
        background: #A5DCA0;
	}
td.Score2{
        background: #5AAF5F;
        }
td.Score1{
		color: #FFFFFF;
        background: #0F5A23;
        }
td.ScoreNaN{
	    background: #808080;
	}

/* ISD COLORES */
td.Score_ISD{
        background: #FFFF96;
  	}

td.white{
        background: #FFFFFF;
        }
/* GRAPH STYLING */

.barGraph {
/*	background: url(images/horizontal_grid_line_50_pixel.png) bottom left;*/
/*	border-bottom: 3px solid #333;*/
/*	font: 9px Helvetica, Geneva, sans-serif; */
	height: 20em;
	margin: 0em 0em;
	padding: 0em;
	position: relative;
	}

.barGraph p {
	font-size:1em;
	}

.barGraph li {
/*	background: #666 url(images/bar_50_percent_highlight.png) repeat-y top right;*/
/*	border: 0.2em solid #555;*/
	border-bottom: none;
	bottom: 0em;
	color: #FFF;
	margin: 0em;
	padding: 0em 0em 0em 0em;
	position: absolute;
	list-style: none;
	text-align: center;
	width: 1em;
	}

.barGraph li.p1{ background-color:#666666; }
.barGraph li:hover {font-weight:bold;}

                  """)

    else:

        OUT.write("""table {
	/*	table-layout: auto; */
		table-layout: fixed;
        margin-left: 0em;
        margin-right: 0em;
        padding:1em 1em 1em 1em;
	    margin:1em 1em 1em 1em;
        border-collapse: collapse;
      }
td {
        font-family: "Courier New", Courier, monospace;
        font-size:1em;
        font-weight: bold;
        text-align: center;
        overflow:hidden;
     /*   white-space:nowrap; */
		 white-space:pre;
/*	 	padding:0.1em 0.1em 0.1em 0.1em; */
/*	 	margin:0.5em 0.5em 0.5em 0.5em;*/
		width: 1em;
      }
td.Seq_Name{
        text-align: left;
        width: 15em;
        padding-right:1em;
      }
td.Score9{
        color: #FFFFFF;
        background: #A0285F;
  	}
td.Score8{
        background: #F07DAA;
        }
td.Score7{
        background: #FAC8DC;
        }
td.Score6{
        background: #FAEBF5;
        }
td.Score5{
        background: #FFFFFF;
        }
td.Score4{
        background: #D7F0F0;
	}
td.Score3{
        background: #A5DCE6;
	}
td.Score2{
        background: #4BAFBE;
        }
td.Score1{
		color: #FFFFFF;
        background: #0A7D82;
        }
td.ScoreNaN{
	    background: #808080;
	}

/* ISD COLORES */
td.Score_ISD{
        background: #FFFF96;
  	}

td.white{
        background: #FFFFFF;
        }
/* GRAPH STYLING */

.barGraph {
/*	background: url(images/horizontal_grid_line_50_pixel.png) bottom left;*/
/*	border-bottom: 3px solid #333;*/
/*	font: 9px Helvetica, Geneva, sans-serif; */
	height: 20em;
	margin: 0em 0em;
	padding: 0em;
	position: relative;
	}

.barGraph p {
	font-size:1em;
	}

.barGraph li {
/*	background: #666 url(images/bar_50_percent_highlight.png) repeat-y top right;*/
/*	border: 0.2em solid #555;*/
	border-bottom: none;
	bottom: 0em;
	color: #FFF;
	margin: 0em;
	padding: 0em 0em 0em 0em;
	position: absolute;
	list-style: none;
	text-align: center;
	width: 1em;
	}

.barGraph li.p1{ background-color:#666666; }
.barGraph li:hover {font-weight:bold;}

                  """)

    OUT.write("</style>")
    OUT.write("<title>%s</title>\n" %Header)
    OUT.write("</head>\n")
    OUT.write("<H1 align=center><u>%s</u></H1>\n\n" %Header) # MSA color-coded by GAIN probability

    [MSA_Hash, Seq_Names_In_Order] = ReadMSA(MSA)
    NumOfBlocks = int(len(MSA_Hash[SeqName]) / blockSize) + 1

    seq_pos = 0
    ind = 0
    for block in range(0, NumOfBlocks):

        MSA_Pos = blockSize * block
        OUT.write("<table>\n")
        seqNum = 0
        seq_pos += ind
        for name in Seq_Names_In_Order:

            seqNum += 1
            OUT.write("<tr>\n")
            if name == SeqName:

                OUT.write("<td class=\"Seq_Name\"><u><b>%d %s</u></b></td>\n" %(seqNum, name))

            else:

                OUT.write("<td class=\"Seq_Name\">%d %s</td>\n" %(seqNum, name))

            ind = 0
            for (pos, char) in list(zip(MSA_Hash[SeqName], MSA_Hash[name]))[MSA_Pos : (block + 1) * blockSize]:

                if pos != "-" and pos.upper() != unknownChar:

                    ScoreClass = "Score" + str(grades[seq_pos + ind]['COLOR'])
                    if grades[seq_pos + ind]['ISD'] == 1:

                        ScoreClass = "Score_ISD"

                    OUT.write("<td class=\"%s\">%s</td>" %(ScoreClass, char))
                    ind += 1

                else:

                    OUT.write("<td class=\"white\">%s</td>" %char)

            OUT.write("</tr>\n")

        OUT.write("</table><br><br>\n")

    # print the color scale
    OUT.write("<table style = 'table-layout: auto;margin-left: 0em;margin-right: 0em;padding:1px 1px 1px 1px; margin:1px 1px 1px 1px; border-collapse: collapse;' border=0 cols=1 width=310>\n<tr><td align=center>\n<font face='Courier New' color='black' size=+1><center>\n<tr>")
    for i in range(1,10):

        OUT.write("<td class=\"%s\">%d</td>\n" %("Score" + str(i), i))

    OUT.write("</tr></font></center>\n<center><table style = 'table-layout: auto;margin-left:0em;margin-right: 0em;padding:1px 1px 1px 1px; margin:1px 1px 1px 1px; border-collapse: collapse;' border=0 cols=3 width=310>\n<tr>\n<td align=left><td align=left><b>Variable</b></td><td></td><td align=center><b>Average</b></td><td></td>\n<td align=right><b>Conserved</b></td>\n</tr><tr></tr><tr></tr>\n</table></center>\n")
    OUT.write("<table><tr><b><td class=\"Score_ISD\">X</td><td class=\"white\"> - Insufficient data - the calculation for this site was performed on less than 10% of the sequences.</b><br></td></tr></table>\n")
    OUT.write("</body>\n</table>\n")
    OUT.close()

    create_download_link(Out_file, "Download colored MSA")

    vars['zip_list'].append(Out_file)

def ReadMSA(msa):

    Seq_Names_In_Order = [] # array to hold sequences names in order
    MSA_Hash = {} # hash to hold sequnces
    Seq = ""
    Seq_Name = ""

    try:

        MSA = open(msa, 'r')

    except:

        exit_on_error("sys_error", "ReadMSA: Can't read the MSA: " + msa)

    line = MSA.readline()
    while line != "":

        line = line.rstrip()
        match = re.match(r'^>(.*)', line)
        if match:

            if Seq != "":

                MSA_Hash[Seq_Name] = Seq
                Seq = ""
                Seq_Name = ""

            Seq_Name = match.group(1)
            Seq_Names_In_Order.append(Seq_Name)

        else:

            Seq += line

        line = MSA.readline()

    MSA.close()
    MSA_Hash[Seq_Name] = Seq # last sequence

    return(MSA_Hash, Seq_Names_In_Order)


def create_download_link(file, text):
    callback_id = f"download_{uuid.uuid4().hex}"

    def download_file():
        files.download(vars['working_dir'] + file)

    output.register_callback(callback_id, download_file)

    display(HTML(f'''
        <a href="#" onclick="google.colab.kernel.invokeFunction('{callback_id}', [], {{}}); return false;">
           {text}
        </a>
    '''))

def consurf_HTML_Output(cbs):

    # Print the results: the colored sequence and the B/E information
    if cbs:

        consurf_html_colors = vars['color_array_CBS']
    else:

        consurf_html_colors = vars['color_array']

    COLORS ="<html>\n<title>ConSurf Results</title>\n"
    COLORS += "<head>\n<style>\nb { float: left;}\n</style>\n</head>\n"
    COLORS += "<body bgcolor='white'>\n"
    COLORS += "\n<table border=0 width=100%>\n"
    COLORS += "<tr><td>\n"

    # print the colored sequence

    count = 1
    letter_str = ""

    number_of_pos = len(vars['gradesPE_Output'])
    for elem in vars['gradesPE_Output']:

        # print the counter above the beginning of each 10 characters
        if count % 50 == 1:

            count_num = count
            while count_num < count + 50:

                if count_num <= number_of_pos:

                    space_num = 11 - len(str(count_num))
                    spaces = ""
                    for i in range(0, space_num):

                        spaces += "&nbsp;"

                    COLORS += "<font face='Courier New' color='black' size=+1>" + str(count_num) + spaces + "</font>"

                count_num += 10

            COLORS += "<br>\n"

        # print the colored letters and 'e' for the exposed residues

        # after 50 characters - print newline
        if count % 50 == 0 or count == number_of_pos:

            if elem['ISD'] == 1: # INSUFFICIENT DATA

                letter_str += "<b><font face='Courier New' color='black' size=+1><span style='background: %s;'>%s</span></font></b><br>" %(consurf_html_colors['ISD'], elem['SEQ'])

            elif elem['COLOR'] == 9 or elem['COLOR'] == 1: # MOST OR LEAST CONSERVED

                letter_str += "<b><font face='Courier New' color='white' size=+1><span style='background: %s;'>%s</span></font></b><br>\n" %(consurf_html_colors[elem['COLOR']], elem['SEQ'])

            else:

                letter_str += "<b><font face='Courier New' color='black' size=+1><span style='background: %s;'>%s</span></font></b><br>\n" %(consurf_html_colors[elem['COLOR']], elem['SEQ'])

            COLORS += letter_str
            COLORS += "</td></tr>\n"
            COLORS += "<tr><td>\n"

            letter_str = ""

        elif count % 10 == 1 and count % 50 != 1: # after 10 characters - print a space ('&nbsp;')

            letter_str += "<b><font face='Courier New' color='black' size=+1>&nbsp;</font></b>"

            if elem['ISD'] == 1:

                letter_str += "<b><font face='Courier New' color='black' size=+1><span style='background: %s;'>%s</span> </font></b>\n" %(consurf_html_colors['ISD'], elem['SEQ'])

            elif elem['COLOR'] == 9 or elem['COLOR'] == 1: # MOST OR LEAST CONSERVED

                letter_str += "<b><font face='Courier New' color='white' size=+1><span style='background: %s;'>%s</span> </font></b>\n" %(consurf_html_colors[elem['COLOR']], elem['SEQ'])

            else:

                letter_str += "<b><font face='Courier New' color='black' size=+1><span style='background: %s;'>%s</span> </font></b>\n" %(consurf_html_colors[elem['COLOR']], elem['SEQ'])

        else:

            if elem['ISD'] == 1:

                letter_str += "<b><font face='Courier New' color='black' size=+1><span style='background: %s;'>%s</span> </font></b>\n" %(consurf_html_colors['ISD'], elem['SEQ'])

            elif elem['COLOR'] == 9 or elem['COLOR'] == 1: # MOST OR LEAST CONSERVED

                letter_str += "<b><font face='Courier New' color='white' size=+1><span style='background: %s;'>%s</span> </font></b>\n" %(consurf_html_colors[elem['COLOR']], elem['SEQ'])

            else:

                letter_str += "<b><font face='Courier New' color='black' size=+1><span style='background: %s;'>%s</span> </font></b>\n" %(consurf_html_colors[elem['COLOR']], elem['SEQ'])

        count += 1

    COLORS += "</td></tr>\n</table><br>\n"
    COLORS += "</body>\n</html>\n"
    display(HTML(COLORS))

def no_model_view(cbs = False):

    if cbs:

        print("------------------------------------------------------------------------------------------------------------------------")
        print("Color Blind View")
        # ConSurf color palette
        pdf_file = vars['Colored_Seq_CBS_PDF']

    else:

        print("------------------------------------------------------------------------------------------------------------------------")
        print("Regular View")
        # ConSurf color palette
        pdf_file = vars['Colored_Seq_PDF']


    print_legend(cbs)
    consurf_HTML_Output(cbs)
    create_download_link(pdf_file, "Download colored sequence pdf file")
    print_msa_colors_FASTA_clustalwLike(vars['gradesPE_Output'], vars['msa_fasta'], vars['msa_SEQNAME'], cbs)

    if not cbs:

        no_model_view(True)
        print("------------------------------------------------------------------------------------------------------------------------")

def show_py3dmol(file, file_type, cbs = False):

    # Read your ConSurf-colored PDB file
    with open(file) as f:
        pdb_data = f.read()

    view = py3Dmol.view(width=800, height=600)
    view.addModel(pdb_data, file_type)
    view.setStyle({'cartoon': {}})

    if cbs:

        print("------------------------------------------------------------------------------------------------------------------------")
        print("Color Blind View")
        # ConSurf color palette
        colors = vars['color_array_CBS']
        pdf_file = vars['Colored_Seq_CBS_PDF']

    else:

        print("------------------------------------------------------------------------------------------------------------------------")
        print("Regular View")
        # ConSurf color palette
        colors = vars['color_array']
        pdf_file = vars['Colored_Seq_PDF']

    # Apply each color to the appropriate B-factor range
    for i in range(1, 10):

        view.setStyle({'b': i}, {'cartoon': {'color': colors[i]}})

    # Color residues with B = 10 (insufficient data) in yellow
    view.setStyle({'b': 10}, {'cartoon': {'color': 'yellow'}})

    view.zoomTo()
    view.show()
    print_legend(cbs)
    #print("\n")
    consurf_HTML_Output(cbs)
    #print("\n\n\n\n\n\n\n\n")
    create_download_link(pdf_file, "Download colored sequence pdf file")
    print_msa_colors_FASTA_clustalwLike(vars['gradesPE_Output'], vars['msa_fasta'], vars['msa_SEQNAME'], cbs)

    if not cbs:

        show_py3dmol(file, file_type, True)
        print("------------------------------------------------------------------------------------------------------------------------")


def print_selected(arr_ref, print_for_pipe):

    total_text = ""
    string = ""
    if print_for_pipe == "yes":

        string = "! select "

    else:

        string = "select "

    total_length = len(string)

    if len(arr_ref) > 0:

        for aa in arr_ref:

            aa = aa.replace(":", "")
            total_length += len(aa)
            if total_length > 80:

                if re.search(r', $', string):

                    string = string[:-2]

                total_text += string + "\n"
                if print_for_pipe == "yes":

                    string = "! select selected or %s, " %aa

                else:

                    string = "select selected or %s, " %aa

                total_length = len(string)

            else:

                string += aa + ", "
                total_length += 2

    else:

        total_text += string + "none"


    if re.search(r', $', string):

        string = string[:-2]
        total_text += string

    return total_text

def print_rasmol(job_name, out_file, isd, ref_colors_array, chain, cbs):

    # print out new format of rasmol


    consurf_rasmol_colors = ["", "[16,200,209]", "[140,255,255]", "[215,255,255]", "[234,255,255]", "[255,255,255]", "[252,237,244]", "[250,201,222]", "[240,125,171]", "[160,37,96]", "[255,255,150]"]
    consurf_rasmol_colors_CBS = ["", "[27,120,55]", "[90,174,97]", "[166,219,160]", "[217,240,211]", "[247,247,247]", "[231,212,232]", "[194,165,207]", "[153,112,171]", "[118,42,131]", "[255,255,150]"]

    try:

        OUT = open(out_file, 'w')

    except:

        exit_on_error('sys_error', "print_rasmol : Could not open the file " + out_file + " for writing.")

    OUT.write("ConSurfDB.tau.ac.il   %s   %s\nAPD N.NN\n" %(vars['date'], job_name))
    OUT.write("select all\ncolor [200,200,200]\n\n")

    i = len(ref_colors_array) - 1
    while i > 0:

        if i == 10 and not isd:

            i -= 1
            continue

        if len(ref_colors_array[i]) > 0:

            OUT.write(print_selected(ref_colors_array[i], "no"))
            OUT.write("\nselect selected and :%s\n" %chain)
            if cbs:

                OUT.write("color %s\nspacefill\n" %consurf_rasmol_colors_CBS[i])

            else:

                OUT.write("color %s\nspacefill\n" %consurf_rasmol_colors[i])

            OUT.write("define CON%d selected\n\n" %i)

        i -= 1

    OUT.close()

def create_rasmol(job_name, chain, ref_colors_array, ref_colors_array_isd):

    RasMol_file = job_name + "_jmol_consurf_colors.spt"
    RasMol_file_isd = job_name + "_jmol_consurf_colors_isd.spt"
    RasMol_file_CBS = job_name + "_jmol_consurf_colors_CBS.spt"
    RasMol_file_CBS_isd = job_name + "_jmol_consurf_colors_CBS_isd.spt"

    if chain == "NONE":

        chain = " "

    print_rasmol(job_name, RasMol_file, False, ref_colors_array, chain, False)
    print_rasmol(job_name, RasMol_file_CBS, False, ref_colors_array, chain, True)

    if len(ref_colors_array_isd[10]) > 0: # there is isd

        print_rasmol(job_name, RasMol_file_isd, True, ref_colors_array_isd, chain, False)
        print_rasmol(job_name, RasMol_file_CBS_isd, True, ref_colors_array_isd, chain, True)

def read_number(file):

    # reads numbers from a numbers file without storing the whole file in a string
    number = ""
    char = file.read(1)
    while char and char.isspace(): # first we skip white spaces

        char = file.read(1)

    while char and not char.isspace(): # we read the number

        number += char
        char = file.read(1)

    if number != "":

        number = float(number)

    return number


def print_pnet_file(num_pos):

    # for each position in the msa we print percentage per positions for 6 positions before and after
    window = 6
    pad = []
    pnet_file = open("p.net", 'w')
    acids = ["V", "L", "I", "M", "F", "W", "Y", "G", "A", "P", "S", "T", "C", "H", "R", "K", "Q", "E", "N", "D"]

    for i in range(window):

        pad.append({})
        for acid in acids:

            pad[-1][acid] = 0

    padded_percentage_per_pos = pad + vars['percentage_per_pos'] + pad
    for i in range(window, num_pos + window):

        j = -window
        while j < window + 1:

            for acid  in acids:

                if acid in padded_percentage_per_pos[i + j]:

                    pnet_file.write(str(int(padded_percentage_per_pos[i + j][acid])) + " ")

                else:

                    pnet_file.write("0 ")

            j += 1

        pnet_file.write("\n")

    pnet_file.close()

def reveal_buried_exposed(buried_exposed):

    # this function reveals the buried and exposed atoms.
    # We look at 6 positions before and 6 position after the current position, 260 numbers in total.
    # We multiply each number by a unique weight and then we sum the numbers.
    # The results to put in the formula 1 / (1 + e^(-x)).
    # We do this twenty times with different weights. Now we have 20 numbers.
    # We again multiply the numbers by different weights and sum them up and put them in the formula  1 / (1 + e^(-x)).
    # We do this twice, again with different weights. Now we have two numbers.
    # If number 1 > number2 we say the position is exposed, if not it’s buried.

    num_pos = len(vars['percentage_per_pos'])
    print_pnet_file(num_pos)
    G_nl = 2
    G_N = [260, 20, 2]

    weights_file = open("/content/WEIGHTS.BIN", 'r')
    pnet_file = open("p.net", 'r')


    for i in range(num_pos):

        G_o = [[], [],[]]
        for j in range(G_N[0]):

            G_o[0].append(read_number(pnet_file) / 100.0)

        for s in range(1, G_nl + 1):

            for k in range(G_N[s]):

                weight = read_number(weights_file)
                for j in range(G_N[s - 1]):

                    weight += read_number(weights_file) * G_o[s - 1][j]

                G_o[s].append(1 / (1 + math.exp(-weight)))


        weights_file.seek(0) # each time we use the same weights, so we move the pointer to the start
        if G_o[G_nl][0] < G_o[G_nl][1]:

            buried_exposed.append("e")

        else:

            buried_exposed.append("b")

    weights_file.close()
    pnet_file.close()

    return buried_exposed


class pdbParser:

    def __init__(self):

        self.SEQRES = ""
        self.ATOM = ""
        self.ATOM_withoutX = {}
        self.type = ""
        self.MODIFIED_COUNT = 0
        self.MODIFIED_LIST = ""
        self.positions = {}
        self.max_res_details = 0
        self.num_known_atoms = 0
        self.num_known_seqs = 0




    #def read(self, file, query_chain, DNA_AA, atom_position_filename):
    def read(self, file, query_chain, DNA_AA):

        #conversion_table = {"ALA" : "A", "ARG" : "R", "ASN" : "N", "ASP" : "D", "CYS" : "C", "GLN" : "Q", "GLU" : "E", "GLY" : "G", "HIS" : "H", "ILE" : "I", "LEU" : "L", "LYS" : "K", "MET" : "M", "PHE" : "F", "PRO" : "P", "SER" : "S", "THR" : "T", "TRP" : "W", "TYR" : "Y", "VAL" : "V", "A" : "a", "T" : "t", "C" : "c", "G" : "g", "U" : "u", "I" : "i", "DA" : "a", "DT" : "t", "DC" : "c", "DG" : "g", "DU" : "u", "DI" : "i", "5CM" : "c", "5MU" : "t", "N" : "n"}
        #conversion_table = {"ALA" : "A", "ARG" : "R", "ASN" : "N", "ASP" : "D", "CYS" : "C", "GLN" : "Q", "GLU" : "E", "GLY" : "G", "HIS" : "H", "ILE" : "I", "LEU" : "L", "LYS" : "K", "MET" : "M", "PHE" : "F", "PRO" : "P", "SER" : "S", "THR" : "T", "TRP" : "W", "TYR" : "Y", "VAL" : "V", "A" : "a", "T" : "t", "C" : "c", "G" : "g", "U" : "u", "I" : "i", "DA" : "a", "DT" : "t", "DC" : "c", "DG" : "g", "DU" : "u", "DI" : "i", "5CM" : "c", "N" : "n"}
        conversion_table = {"ALA" : "A", "ARG" : "R", "ASN" : "N", "ASP" : "D", "CYS" : "C", "GLN" : "Q", "GLU" : "E", "GLY" : "G", "HIS" : "H", "ILE" : "I", "LEU" : "L", "LYS" : "K", "MET" : "M", "PHE" : "F", "PRO" : "P", "SER" : "S", "THR" : "T", "TRP" : "W", "TYR" : "Y", "VAL" : "V", "A" : "A", "T" : "T", "C" : "C", "G" : "G", "U" : "U", "I" : "I", "DA" : "A", "DT" : "T", "DC" : "C", "DG" : "G", "DU" : "U", "DI" : "I", "5CM" : "C", "N" : "N"}
        modified_residues = {"MSE" : "MET", "MLY" : "LYS", "HYP" : "PRO", "CME" : "CYS", "CGU" : "GLU", "SEP" : "SER", "KCX" : "LYS", "MLE" : "LEU", "TPO" : "THR", "CSO" : "CYS", "PTR" : "TYR", "DLE" : "LEU", "LLP" : "LYS", "DVA" : "VAL", "TYS" : "TYR", "AIB" : "ALA", "OCS" : "CYS", "NLE" : "LEU", "MVA" : "VAL", "SEC" : "CYS", "PYL" : "LYS"}
        localMODRES = {}
        FIRST = [] # first residue in chain
        fas_pos = 0
        chain = "" # current chain
        ENDS = [] # end of chain reached and remaining HETATM should be skipped
        last_residue_number = ""

        if DNA_AA == "Nuc":

            UnknownChar = "N"

        else:

            UnknownChar = "X"

        # open file to read MODRES
        try:

            PDBFILE = open(file, 'r')

        except:

            return 0

        try:

            MODRES_FILE = open(file + ".MODRES", 'w')

        except:

            return 0

        # read the MODRES
        line = PDBFILE.readline()
        while line != "" and not re.match(r'^ATOM', line):

            if re.match(r'^MODRES', line):

                MODRES = line[12:15].strip() # strip spaces to support NUC
                CHAIN = line[16:17]
                if CHAIN == " ":

                    CHAIN = "NONE"

                # we only look at the query chain
                if CHAIN != query_chain:

                    line = PDBFILE.readline()
                    continue

                RES = line[24:27].strip() # strip spaces to support NUC

                if not MODRES in localMODRES:

                    localMODRES[MODRES] = RES
                    MODRES_FILE.write(MODRES + "\t" + RES + "\n")

                elif localMODRES[MODRES] != RES:

                    localMODRES[MODRES] = "" # two different values to the same residue

            line = PDBFILE.readline()


        MODRES_FILE.close()
        PDBFILE.close()

        # reopen file to read all the file
        try:

            PDBFILE = open(file, 'r')

        except:

            return 0

        line = PDBFILE.readline()
        while line != "":

            line = line.strip()

            if re.search(r'^SEQRES', line): # SEQRES record

                chain_seqres = line[11:12] # get chain id
                if chain_seqres == " ":

                    chain_seqres = "NONE"

                # we skip the chain if it is not the query
                if query_chain != chain_seqres:

                    line = PDBFILE.readline()
                    continue

                # convert to one letter format
                for acid in line[19:70].split():

                    # regular conversion
                    if acid in conversion_table:

                        # add to chain
                        self.SEQRES += conversion_table[acid]
                        self.num_known_seqs += 1

                    # modified residue
                    else:

                        # count this modified residue
                        self.MODIFIED_COUNT += 1

                        # check if residue is identified
                        if acid in modified_residues and modified_residues[acid] in conversion_table:

                            self.SEQRES += conversion_table[modified_residues[acid]]
                            self.num_known_seqs += 1

                            # add to modified residue list
                            if not acid + " > " in self.MODIFIED_LIST:

                                self.MODIFIED_LIST += acid + " > " + conversion_table[modified_residues[acid]] + "\n"

                        elif acid in localMODRES and localMODRES[acid] != "" and localMODRES[acid] in conversion_table:

                            self.SEQRES += conversion_table[localMODRES[acid]]
                            self.num_known_seqs += 1

                            # add to modified residue list
                            if not acid + " > " in self.MODIFIED_LIST:

                                self.MODIFIED_LIST += acid + " > " + conversion_table[localMODRES[acid]] + "\n"

                        else:

                            # set residue name to X or N
                            self.SEQRES += UnknownChar

                            # add message to front of modified residue list
                            modified_changed_to_X_or_N_msg = "Modified residue(s) in this chain were converted to the one letter representation '" + UnknownChar + "'\n"

                            if not "Modified residue" in self.MODIFIED_LIST:

                                self.MODIFIED_LIST = modified_changed_to_X_or_N_msg + self.MODIFIED_LIST

            elif re.search(r'^ATOM', line):

                # extract atom data
                res = line[17:20].strip() # for DNA files there is only one or two letter code
                chain = line[21:22]
                pos = line[22:27].strip()

                if chain == " ":

                    chain = "NONE"

                # find modified residue
                if res in modified_residues:

                    mod_res = modified_residues[res]

                elif res in localMODRES and localMODRES[res] != "" and localMODRES[res] in conversion_table:

                    mod_res = localMODRES[res]

                else:

                    mod_res = res

                # convert residue to one letter
                if mod_res in conversion_table:

                    oneLetter = conversion_table[mod_res]

                else:

                    oneLetter = UnknownChar

                # check if we reached a new residue
                if not chain in FIRST:

                    FIRST.append(chain)
                    last_pos = pos
                    self.ATOM_withoutX[chain] = oneLetter

                elif pos != last_pos:

                    last_pos = pos
                    self.ATOM_withoutX[chain] += oneLetter

                else:

                    line = PDBFILE.readline()
                    continue

                # if the chain is not the query we only extract the sequence
                if query_chain != chain:

                    line = PDBFILE.readline()
                    continue

                self.num_known_atoms += 1

                # writing atom position file
                fas_pos += 1
                res_details = "%s:%s:%s" %(res, pos, chain)
                self.positions[fas_pos] = res_details
                if len(res_details) > self.max_res_details:

                    self.max_res_details = len(res_details)

                #CORR.write("%s\t%d\t%s\n" %(res, fas_pos, pos))

                #residue_number = int(line[22:26].strip())

                # check type
                if self.type == "":

                    if len(mod_res) < 3:

                        self.type = "Nuc"

                    else:

                        self.type = "AA"
                """
                if FIRST[chain]:

                    FIRST[chain] = False

                elif last_residue_number < residue_number:

                    while residue_number != last_residue_number + 1: # For Disorder regions

                        self.ATOM += UnknownChar
                        last_residue_number += 1

                self.ATOM += oneLetter
                last_residue_number = residue_number
                """
            elif re.search(r'^HETATM', line):

                # extract hetatm data
                res = line[17:20].strip() # for DNA files there is only one or two letter code
                chain = line[21:22]
                pos = line[22:27].strip()

                if chain == " ":

                    chain = "NONE"

                if chain in ENDS:

                    line = PDBFILE.readline()
                    continue

                # find modified residue
                if res in modified_residues:

                    mod_res = modified_residues[res]

                elif res in localMODRES and localMODRES[res] != "" and localMODRES[res] in conversion_table:

                    mod_res = localMODRES[res]

                else:

                    mod_res = res

                # convert residue to one letter
                if mod_res in conversion_table:

                    oneLetter = conversion_table[mod_res]

                else:

                    oneLetter = UnknownChar

                # check if we reached a new residue
                if not chain in FIRST:

                    FIRST.append(chain)
                    last_pos = pos
                    self.ATOM_withoutX[chain] = oneLetter

                elif pos != last_pos:

                    last_pos = pos
                    self.ATOM_withoutX[chain] += oneLetter

                else:

                    line = PDBFILE.readline()
                    continue

                # if the chain is not the query we only extract the sequence
                if query_chain != chain:

                    line = PDBFILE.readline()
                    continue

                self.num_known_atoms += 1

                # writing atom position file
                fas_pos += 1
                res_details = "%s:%s:%s" %(res, pos, chain)
                self.positions[fas_pos] = res_details
                if len(res_details) > self.max_res_details:

                    self.max_res_details = len(res_details)

                """
                residue_number = int(line[22:26].strip())

                if FIRST[chain]:

                    last_residue_number = residue_number
                    FIRST[chain] = False

                elif last_residue_number < residue_number:

                    while residue_number != last_residue_number + 1: # For Disorder regions

                        self.ATOM += UnknownChar
                        last_residue_number += 1

                self.ATOM += oneLetter
                last_residue_number = residue_number
                """
            elif re.search(r'^TER', line):

                if not chain in ENDS:

                    ENDS.append(chain)

            line = PDBFILE.readline()

        PDBFILE.close()
        #CORR.close()
        return 1

    def get_num_known_atoms(self):

        return self.num_known_atoms

    def get_num_known_seqs(self):

        return self.num_known_seqs

    def get_max_res_details(self):

        return self.max_res_details

    def get_positions(self):

        return self.positions

    def get_type(self):

        return self.type

    def get_SEQRES(self):

        return self.SEQRES

    def get_ATOM_withoutX(self):

        return self.ATOM_withoutX

    def get_MODIFIED_COUNT(self):


        return self.MODIFIED_COUNT



    def get_MODIFIED_LIST(self):

        return self.MODIFIED_LIST



class cifParser:

    def __init__(self):

        self.SEQRES = ""
        #self.ATOM = ""
        self.ATOM_withoutX = {}
        self.type = ""
        self.MODIFIED_COUNT = 0
        self.MODIFIED_LIST = ""
        self.positions = {}
        self.max_res_details = 0
        self.auth_seq_id_column = 0
        self.auth_comp_id_column = 0
        self.auth_asym_id_column = 0
        self.B_iso_or_equiv = 0


    #def read(self, file, query_chain, DNA_AA, atom_position_filename):
    def read(self, file, query_chain, DNA_AA):

        #conversion_table = {"ALA" : "A", "ARG" : "R", "ASN" : "N", "ASP" : "D", "CYS" : "C", "GLN" : "Q", "GLU" : "E", "GLY" : "G", "HIS" : "H", "ILE" : "I", "LEU" : "L", "LYS" : "K", "MET" : "M", "PHE" : "F", "PRO" : "P", "SER" : "S", "THR" : "T", "TRP" : "W", "TYR" : "Y", "VAL" : "V", "A" : "a", "T" : "t", "C" : "c", "G" : "g", "U" : "u", "I" : "i", "DA" : "a", "DT" : "t", "DC" : "c", "DG" : "g", "DU" : "u", "DI" : "i", "5CM" : "c", "N" : "n"}
        conversion_table = {"ALA" : "A", "ARG" : "R", "ASN" : "N", "ASP" : "D", "CYS" : "C", "GLN" : "Q", "GLU" : "E", "GLY" : "G", "HIS" : "H", "ILE" : "I", "LEU" : "L", "LYS" : "K", "MET" : "M", "PHE" : "F", "PRO" : "P", "SER" : "S", "THR" : "T", "TRP" : "W", "TYR" : "Y", "VAL" : "V", "A" : "A", "T" : "T", "C" : "C", "G" : "G", "U" : "U", "I" : "I", "DA" : "A", "DT" : "T", "DC" : "C", "DG" : "G", "DU" : "U", "DI" : "I", "5CM" : "C", "N" : "N"}
        #modified_residues = {"MSE" : "MET", "MLY" : "LYS", "HYP" : "PRO", "CME" : "CYS", "CGU" : "GLU", "SEP" : "SER", "KCX" : "LYS", "MLE" : "LEU", "TPO" : "THR", "CSO" : "CYS", "PTR" : "TYR", "DLE" : "LEU", "LLP" : "LYS", "DVA" : "VAL", "TYS" : "TYR", "AIB" : "ALA", "OCS" : "CYS", "NLE" : "LEU", "MVA" : "VAL", "SEC" : "CYS", "PYL" : "LYS"}

        # find the portion of the file that contains the seqres
        SEQRES_string = ""
        SEQRES_string_found = False
        in_fasta = False
        fas_pos = 0
        last_pos = 0
        current_chain = ""
        hetatm = "" # the part of the sequence that in the HETATM lines
        hetatm_withoutX = "" # X is not added to fill the breaks in the sequence
        hetatm_pos = {} # the positions of the residues in the HETATM
        hetatm_max_res_details = 0 # maximum length of the details of the residues in the HETATM

        if DNA_AA == "Nuc":

            UnknownChar = "N"

        else:

            UnknownChar = "X"

        try:

            CIF = open(file, 'r')

        except:

            return 0

        line = CIF.readline()
        while line != "":

            if re.match(r'^_atom_site.', line):

                # we reached the atoms
                break

            if re.match(r'^_entity_poly.entity_id', line):

                while line != "":

                    if ';' in line:

                        in_fasta = not in_fasta

                    line = line.replace(";", "")

                    if '#' in line:

                        # end of _entity_poly.entity_id
                        SEQRES_string_found = True
                        break

                    elif in_fasta:

                        # delete white spaces in fasta
                        SEQRES_string += line.strip()

                    else:

                        SEQRES_string += line

                    line = CIF.readline()

            if SEQRES_string_found:

                break

            line = CIF.readline()

        if re.match(r'^_entity_poly.entity_id\s+1', SEQRES_string):

            # one seqres
            match1 = re.search(r'_entity_poly.pdbx_seq_one_letter_code_can\s+(\S+)', SEQRES_string)
            match2 = re.search(r'_entity_poly.pdbx_strand_id\s+(\S+)', SEQRES_string)
            if match1 and match2:

                seqres = match1.group(1)
                for chain in (match2.group(1)).split(','):

                    if chain == query_chain:

                        self.SEQRES = seqres

        else:

            # more than one seqres
            SEQRES_substrings = re.split(r'\d+\s+\'?poly', SEQRES_string)

            POLY = open("poly", 'w')
            for string in SEQRES_substrings:

                POLY.write(string + "\n>\n")

            POLY.close()

            SEQRES_substrings = SEQRES_substrings[1:] # delete titles
            for substring in SEQRES_substrings:

                words = substring.split()
                for chain in (words[5]).split(','):

                    if chain == query_chain:

                        self.SEQRES = words[4]

        number_of_columns = 0
        # we find which columns has what value
        auth_seq_id_column = 0
        auth_comp_id_column = 0
        auth_asym_id_column = 0
        label_seq_id_column = 0
        label_comp_id_column = 0
        label_asym_id_column = 0
        B_iso_or_equiv = 0
        found_auth_seq_id_column = False
        found_auth_comp_id_column = False
        found_auth_asym_id_column = False
        found_label_seq_id_column = False
        found_label_comp_id_column = False
        found_label_asym_id_column = False
        found_B_iso_or_equiv = False
        while line != "":

            line = line.strip()

            if line == "_atom_site.B_iso_or_equiv":

                found_B_iso_or_equiv = True

            if line == "_atom_site.auth_seq_id":

                found_auth_seq_id_column = True

            if line == "_atom_site.auth_comp_id":

                found_auth_comp_id_column = True

            if line == "_atom_site.auth_asym_id":

                found_auth_asym_id_column = True

            if line == "_atom_site.label_seq_id":

                found_label_seq_id_column = True

            if line == "_atom_site.label_comp_id":

                found_label_comp_id_column = True

            if line == "_atom_site.label_asym_id":

                found_label_asym_id_column = True

            if not re.match(r'^_atom_site.', line) and found_B_iso_or_equiv and (found_auth_seq_id_column or found_label_seq_id_column) and (found_auth_comp_id_column or found_label_comp_id_column) and (found_auth_asym_id_column or found_label_asym_id_column):

                # we identified the necessary columns
                break

            if found_B_iso_or_equiv:

                B_iso_or_equiv -= 1

            if found_auth_seq_id_column:

                auth_seq_id_column -= 1

            if found_auth_comp_id_column:

                auth_comp_id_column -= 1

            if found_auth_asym_id_column:

                auth_asym_id_column -= 1

            if found_label_seq_id_column:

                label_seq_id_column -= 1

            if found_label_comp_id_column:

                label_comp_id_column -= 1

            if found_label_asym_id_column:

                label_asym_id_column -= 1

            line = CIF.readline()

        if not found_auth_seq_id_column:

            auth_seq_id_column = label_seq_id_column

        if not found_auth_comp_id_column:

            auth_comp_id_column = label_comp_id_column

        if not found_auth_asym_id_column:

            auth_asym_id_column = label_asym_id_column

        FIRST = []
        while line.strip() != "":

            words = line.split()
            if words[0] == "ATOM" and words[1].isnumeric():

                number_of_columns = len(words)

                # extract atom data
                pos = int(words[auth_seq_id_column])
                res = words[auth_comp_id_column]
                chain = words[auth_asym_id_column]

                # if HETATM is not in the end of the chain we add it to the sequence
                if chain == current_chain:

                    self.ATOM_withoutX[chain] += hetatm_withoutX

                else:

                    current_chain = chain

                hetatm_withoutX = ""

                # convert residue to one letter
                if res in conversion_table:

                    oneLetter = conversion_table[res]

                else:

                    oneLetter = UnknownChar

                # check if we reached a new residue
                if not chain in FIRST:

                    FIRST.append(chain)
                    last_pos = pos
                    self.ATOM_withoutX[chain] = oneLetter

                elif pos != last_pos:

                    self.ATOM_withoutX[chain] += oneLetter

                else:

                    line = CIF.readline()
                    continue

                # if the chain is not the query we only extract the sequence
                if query_chain != chain:

                    #hetatm = ""
                    line = CIF.readline()
                    continue
                """
                else:

                    self.ATOM += hetatm
                    hetatm = ""
                """


                # writing atom position file

                self.positions.update(hetatm_pos)
                if hetatm_max_res_details > self.max_res_details:

                    self.max_res_details = hetatm_max_res_details

                hetatm_pos = {}
                hetatm_max_res_details = 0

                #CORR.write(hetatm_pos)
                #hetatm_pos = ""


                fas_pos += 1

                res_details = "%s:%d:%s" %(res, pos, chain)
                self.positions[fas_pos] = res_details
                if len(res_details) > self.max_res_details:

                    self.max_res_details = len(res_details)

                #CORR.write("%s\t%d\t%s\n" %(res, fas_pos, pos))

                # check type
                if self.type == "":

                    if len(res) < 3:

                        self.type = "Nuc"

                    elif len(res) == 3:

                        self.type = "AA"
                """
                if FIRST[chain]:

                    FIRST[chain] = False

                elif last_pos < pos:

                    while pos != last_pos + 1: # For Disorder regions

                        self.ATOM += UnknownChar
                        last_pos += 1

                self.ATOM += oneLetter
                """
                last_pos = pos

            elif words[0] == "HETATM" and words[1].isnumeric():


                # extract atom data
                pos = int(words[auth_seq_id_column])
                res = words[auth_comp_id_column]
                chain = words[auth_asym_id_column]


                # convert residue to one letter
                if res in conversion_table:

                    oneLetter = conversion_table[res]

                else:

                    oneLetter = UnknownChar

                # check if we reached a new residue
                if not chain in FIRST:

                    FIRST.append(chain)
                    last_pos = pos
                    self.ATOM_withoutX[chain] = ""
                    hetatm_withoutX = oneLetter

                elif pos != last_pos:

                    hetatm_withoutX += oneLetter

                else:

                    line = CIF.readline()
                    continue

                # if the chain is not the query we only extract the sequence
                if query_chain != chain:

                    line = CIF.readline()
                    continue

                # writing atom position file
                fas_pos += 1

                res_details = "%s:%d:%s" %(res, pos, chain)
                hetatm_pos[fas_pos] = res_details
                if len(res_details) > hetatm_max_res_details:

                    hetatm_max_res_details = len(res_details)

                #hetatm_pos += "%s\t%d\t%s\n" %(res, fas_pos, pos)

                # check type
                if self.type == "":

                    if len(res) < 3:

                        self.type = "Nuc"

                    elif len(res) == 3:

                        self.type = "AA"
                """
                if FIRST[chain]:

                    FIRST[chain] = False

                elif last_pos < pos:

                    while pos != last_pos + 1: # For Disorder regions

                        hetatm += UnknownChar
                        last_pos += 1

                hetatm += oneLetter
                """
                last_pos = pos

            line = CIF.readline()

        CIF.close()
        #CORR.close()

        self.auth_seq_id_column = number_of_columns + auth_seq_id_column
        self.auth_comp_id_column = number_of_columns + auth_comp_id_column
        self.auth_asym_id_column = number_of_columns + auth_asym_id_column
        self.B_iso_or_equiv = number_of_columns + B_iso_or_equiv
        LOG.write("residue number column - %s\nresidue name column - %s\nchain id column - %s\nb-factor column - %s\n" %(self.auth_seq_id_column + 1, self.auth_comp_id_column + 1, self.auth_asym_id_column + 1, self.B_iso_or_equiv + 1))

        return 1


    def get_max_res_details(self):

        return self.max_res_details

    def get_columns(self):

        return self.auth_seq_id_column, self.auth_comp_id_column, self.auth_asym_id_column, self.B_iso_or_equiv

    def get_type(self):

        return self.type

    def get_SEQRES(self):

        return self.SEQRES

    def get_ATOM_withoutX(self):

        return self.ATOM_withoutX

    def get_MODIFIED_COUNT(self):


        return self.MODIFIED_COUNT



    def get_MODIFIED_LIST(self):

        return self.MODIFIED_LIST


    def get_positions(self):

        return self.positions



bayesInterval = 3
ColorScale = {0 : 9, 1 : 8, 2 : 7, 3 : 6, 4 : 5, 5 : 4, 6 : 3, 7 : 2, 8 : 1}



def insert_spaces(original, spaced_lines, B_iso_or_equiv):

    # insert extra spaces after the tmp column, to allow room for the rate4site scores.

    try:

        READ_PDB = open(original, 'r')

    except:

        return("cp_rasmol_gradesPE_and_pipe.replace_tempFactor: Can't open '" + original + "' for reading.")

    lines = READ_PDB.readlines()
    READ_PDB.close()


    max_start = 0
    max_end = 0
    for line in lines:

        line = line.strip()
        if re.match(r'^ATOM', line):

            match = re.match(r'^((\S+\s+){'+ str(B_iso_or_equiv) + r'}\S+)\s+(\S.*)', line)
            length_start = len(match.group(1))
            length_end = len(match.group(3))
            if length_start > max_start:

                max_start = length_start

            if length_end > max_end:

                max_end = length_end

    for line in lines:

        if re.match(r'^ATOM', line):

            line = line.strip()
            match = re.match(r'^((\S+\s+){'+ str(B_iso_or_equiv) + r'}\S+)\s+(\S.*)', line)
            line_start = match.group(1)
            line_end = match.group(3)
            while len(line_start) < max_start:

                line_start = line_start + " "

            while len(line_end) < max_end:

                line_end = " " + line_end

            spaced_lines.append(line_start + "     " + line_end + "\n")

        else:

            spaced_lines.append(line)




def check_if_rate4site_failed(r4s_log):

    res_flag = vars['r4s_out']
    if not os.path.exists(res_flag) or os.path.getsize(res_flag) == 0: # 1

        LOG.write("check_if_rate4site_failed : the file " + res_flag + " either does not exist or is empty. \n")
        return True

    try:

        R4S_RES = open(res_flag, 'r')

    except:

        LOG.write("check_if_rate4site_failed : can not open file: " + res_flag + ". aborting.\n")
        return True

    error = False
    line = R4S_RES.readline()
    while line != "":

        if "likelihood of pos was zero" in line:

            LOG.write("check_if_rate4site_failed : the line: \"likelihood of pos was zero\" was found in %s.\n" %r4s_log)
            error = True
            break

        if re.match(r'rate of pos\:\s\d\s=', line):

            # output found
            break

        if "Bad format in tree file" in line:

            exit_on_error('user_error', "check_if_rate4site_failed : There is an error in the tree file format. Please check that your tree is in the <a href = \"" + GENERAL_CONSTANTS.CONSURF_TREE_FAQ + "\">requested format</a> and reupload it to the server.<br>")

        line = R4S_RES.readline()

    R4S_RES.close()
    return error



"""
def check_validity_tree_file(nodes):

	# checks validity of tree file and returns an array with the names of the nodes
    try:

        TREEFILE = open(form['tree_name'], 'r')

    except:

        exit_on_error('sys_error', "check_validity_tree_file : can't open the file " + form['tree_name'] + " for reading.")

    tree = TREEFILE.read()
    TREEFILE.close()
    tree.replace("\n", "")
    if tree[-1] != ';':

	    tree += ';'

    try:

        TREEFILE = open(vars['working_dir'] + vars['tree_file'], 'w')

    except:

        exit_on_error('sys_error', "check_validity_tree_file : can't open the file " + vars['working_dir'] + vars['tree_file'] + " for writing.")

    TREEFILE.write(tree)
    TREEFILE.close()

    leftBrackets = 0
    rightBrackets = 0
    noRegularFormatChar = ""
    #nodes = []
    node_name = ""
    in_node_name = False
    in_node_score = False
    for char in tree:

        if char == ':':

            if in_node_name:

                nodes.append(node_name)

            node_name = ""
            in_node_name = False
            in_node_score = True

        elif char == '(':

            leftBrackets += 1

        elif char == ')':

            rightBrackets += 1
            in_node_score = False

        elif char == ',':

            in_node_score = False

        elif char != ';':

            if char in "!@#$^&*~`{}'?<>" and not char in noRegularFormatChar:

                noRegularFormatChar += " '" + char + "', "

            if not in_node_score:

                node_name += char
                in_node_name = True

    if leftBrackets != rightBrackets:

        msg = "The uploaded tree file, which appears to be in Newick format, is missing parentheses."
        exit_on_error('user_error', msg)

    if noRegularFormatChar != "":

        msg = "The uploaded tree file, which appears to be in Newick format, ontains the following non-standard characters: " + noRegularFormatChar[:-2]
        exit_on_error('user_error', msg)

    LOG.write("check_validity_tree_file : tree is valid\n")

    #return nodes
"""




def get_query_seq_in_MSA():

    # returns the query sequence with gaps as it's in the MSA
    try:

        FASTA = open(vars['msa_fasta'], 'r')

    except:

        exit_on_error('sys_error', "get_query_seq_in_MSA : can't open the file " + vars['msa_fasta'] + " for reading.")

    found = False
    seq = ""
    line = FASTA.readline()
    while line != "":

        first_word = line.split()[0]
        if found:

            if first_word[0] == '>':

                break

            else:

                seq += first_word

        elif first_word == ">"  + vars['msa_SEQNAME']:

                found = True

        line = FASTA.readline()

    FASTA.close()
    return seq

def get_positions_in_MSA():

    # returns the positions in the MSA were the query is not a gap
    seq = get_query_seq_in_MSA()
    positions = get_seq_legal_positions(seq)
    return positions

def get_seq_legal_positions(seq):

    # return a aaray with the positions of the legal chars
    positions = []
    if form['DNA_AA'] == "AA":

        legal_chars = ["A", "C", "D", "E", "F", "G", "H", "I", "K", "L", "M", "N", "P", "Q", "R", "S", "T", "V", "W", "Y", "X"]

    else:

        legal_chars = ["A", "C", "G", "T", "U", "N"]

    for i in range(len(seq)):

        if seq[i].upper() in legal_chars:

            positions.append(i)

    return positions



def write_MSA_percentage_file():

    # writes a file with the precentage of each acid in each position

    #index_of_pos = get_positions_in_MSA()
    percentage_per_pos = [] # precentage of each acid in each position in the MSA
    #unknown_per_pos = [] # precentage of unknown acid in each position in the MSA
    number_of_positions = len(vars['protein_seq_string'])
    #number_of_positions = len(index_of_pos)
    query_with_gaps = get_query_seq_in_MSA() # query sequence with gaps

    if form['DNA_AA'] == "AA":

        acids = ["A", "C", "D", "E", "F", "G", "H", "I", "K", "L", "M", "N", "P", "Q", "R", "S", "T", "V", "W", "Y", "X"]
        unknown = "X"

        for i in range(number_of_positions):

            percentage_per_pos.append({})
            #unknown_per_pos.append(0)

    else:

        acids = ["A", "C", "G", "T", "U", "N"]
        unknown = "N"

        for i in range(number_of_positions):

            percentage_per_pos.append({})
            #unknown_per_pos.append(0)

    try:

        FASTA = open(vars['msa_fasta'], 'r')

    except:

        exit_on_error('sys_error', "write_MSA_percentage_file : can't open the file " + vars['msa_fasta'] + " for reading.")

    try:

        PRECENTAGE_FILE = open(vars['Msa_percentageFILE'], 'w')

    except:

        exit_on_error('sys_error', "write_MSA_percentage_file : can't open the file " + vars['Msa_percentageFILE'] + " for writing.")

    # we find the precentage of each acid in each position
    seq = ""
    first = True
    line = FASTA.readline()
    while True:

        if line == "" or line[0] == '>':

            if first:

                first = False

            else:

                #pos = 0
                #for i in index_of_pos:
                i = 0
                for pos in range(number_of_positions):

                    while i < len(query_with_gaps) and query_with_gaps[i] == '-':

                        i += 1

                    char = seq[i]
                    if char in  acids:

                        if char in percentage_per_pos[pos]:

                            percentage_per_pos[pos][char] += 1

                        else:

                            percentage_per_pos[pos][char] = 1

                    """
                    elif char != '-':

                        unknown_per_pos[pos] += 1
                    """
                    i += 1

                seq = ""

            # this is for the last sequence in the msa
            if line == "":

                break

        else:

            seq += line.strip().upper()

        line = FASTA.readline()

    FASTA.close()

    # sort dictionaries
    for pos in range(number_of_positions):

        # we sort the amino acids but not the unknown character
        unknown_percent = percentage_per_pos[pos].pop(unknown, None)
        percentage_per_pos[pos] = dict(sorted(percentage_per_pos[pos].items(), key=lambda item: item[1], reverse=True))
        if unknown_percent:

            percentage_per_pos[pos][unknown] = unknown_percent

    # calculate the percentage
    for pos in range(number_of_positions):

        sum = 0.0
        for char in percentage_per_pos[pos]:

            sum += percentage_per_pos[pos][char]

        #sum += unknown_per_pos[pos]

        for char in percentage_per_pos[pos]:

            percentage_per_pos[pos][char] = 100 * (percentage_per_pos[pos][char] / sum)

        #unknown_per_pos[pos] = 100 * (unknown_per_pos[pos] / sum)

    # we write the file
    PRECENTAGE_FILE.write("\"The table details the residue variety in % for each position in the query sequence.\"\n\"Each column shows the % for that amino-acid, found in position ('pos') in the MSA.\"\n\"In case there are residues which are not a standard amino-acid in the MSA, they are represented under column 'OTHER'\"\n\npos,A,C,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,V,W,Y,OTHER,MAX ACID,ConSurf grade\n\n")
    pos = 0
    for i in range(number_of_positions):

        if vars['protein_seq_string'][i] == unknown:

            continue

		# position
        PRECENTAGE_FILE.write("%d" %(pos + 1))

		# known acids
        for char in acids:

            PRECENTAGE_FILE.write(",")
            if char in percentage_per_pos[pos]:

                PRECENTAGE_FILE.write("%.3f" %percentage_per_pos[pos][char])

            else:

                PRECENTAGE_FILE.write("0")

		# unknown acids
        #PRECENTAGE_FILE.write(",%.3f" %unknown_per_pos[pos])

		# max acid
        keys = list(percentage_per_pos[pos].keys())
        max_acid = ",%s %.3f" %(keys[0], percentage_per_pos[pos][keys[0]])
        PRECENTAGE_FILE.write(max_acid.rstrip("0").rstrip("."))

        # ConSurf grade
        PRECENTAGE_FILE.write(",%s\n" %vars['gradesPE_Output'][pos]['COLOR'])

        pos += 1

    PRECENTAGE_FILE.close()

    vars['percentage_per_pos'] = percentage_per_pos
    #vars['unknown_per_pos'] = unknown_per_pos


    if form['DNA_AA'] != "AA":

        vars['B/E'] = False
        return

    vars['B/E'] = True
    buried_exposed = []
    reveal_buried_exposed(buried_exposed)

    pos = 0
    for element in vars['gradesPE_Output']:

        while pos < number_of_positions and vars['protein_seq_string'][pos] == unknown:

            pos += 1

        element['B/E'] = buried_exposed[pos]
        pos += 1
        if element['B/E'] == "e":

            if element['COLOR'] == 9 or element['COLOR'] == 8:

                element['F/S'] = "f"

            else:

                element['F/S'] = " "

        elif element['COLOR'] == 9:

            element['F/S'] = "s"

        else:

            element['F/S'] = " "


def database_address(name):

    parts = name.split('|')
    if parts[0] == "ur":

        return("https://www.uniprot.org/uniref/UniRef90_" + parts[1])

    elif parts[0] == "up":

        return("https://www.uniprot.org/uniprot/" + parts[1])

    else:

        return("https://www.ncbi.nlm.nih.gov/nuccore/" + parts[1])

def get_details(first_line, second_line, third_line):

    query_seq = (first_line.split())[2]
    db_seq = (third_line.split())[2]

    gaps = 0
    positives = 0
    identities = 0
    length = len(query_seq)

    for char in query_seq:

        if char == "-":

            gaps += 1

    for char in db_seq:

        if char == "-":

            gaps += 1

    for char in second_line:

        if char == "+":

            positives += 1

        if char.isalpha():

            identities += 1

    details = ""

    if gaps != 0:

        details += ", Gaps = %d/%d (%d%%)" %(gaps, length, int((float(gaps) / length) * 100))

    else:

        details += ", Gaps = 0"

    if positives != 0:

        details += ", Positives = %d/%d (%d%%)" %(positives, length, int((float(positives) / length) * 100))

    else:

        details += ", Positives = 0"

    if identities != 0:

        details += ", Identities = %d/%d (%d%%)" %(identities, length, int((float(identities) / length) * 100))

    else:

        details += ", Identities = 0"

    return details


def replace_TmpFactor_Rate4Site_Scores_CIF(PdbFile, chain, HashRef, Out):

    # replace the tempFactor column in the PDB file

    [auth_seq_id_column, auth_comp_id_column, auth_asym_id_column, B_iso_or_equiv] = vars['pdb_object'].get_columns()

    spaced_lines = []
    insert_spaces(PdbFile, spaced_lines, B_iso_or_equiv)

    # write the PDB file and replace the tempFactor column
    # with the new one.
    try:

        WRITE_PDB = open(Out, 'w')

    except:

        exit_on_error('sys_error', "replace_TmpFactor_Rate4Site_Scores_CIF: Can't open '" + Out + "' for writing.")

    for line in spaced_lines:

        if re.match(r'^ATOM', line):

            words = line.split()
            PDBchain = words[auth_asym_id_column]
            ResNum = words[auth_seq_id_column]
            if PDBchain == chain and ResNum in HashRef:

                score = HashRef[ResNum]
                match = re.match(r'^((\S+\s+){'+ str(B_iso_or_equiv) + r'})(\S+\s+)(.+)', line)
                length_temp_fact = len(match.group(3))
                while len(score) < length_temp_fact:

                    score = score + " "

                WRITE_PDB.write(match.group(1) + score + match.group(4) + "\n")

            else:

                WRITE_PDB.write(line)

        else:

            WRITE_PDB.write(line)

    WRITE_PDB.close()



def replace_TmpFactor_Rate4Site_Scores_PDB(PdbFile, chain, HashRef, Out):

    # replace the tempFactor column in the PDB file

    # read the PDB file to an array
    try:

        READ_PDB = open(PdbFile, 'r')

    except:

        exit_on_error('sys_error', "replace_TmpFactor_Rate4Site_Scores_PDB : Can't open '" + PdbFile + "' for reading.")

    # write the PDB file and replace the tempFactor column
    # with the new one.
    try:

        WRITE_PDB = open(Out, 'w')

    except:

        exit_on_error('sys_error', "replace_TmpFactor_Rate4Site_Scores_PDB : Can't open '" + Out + "' for writing.")

    line = READ_PDB.readline()
    while line != "":

        if re.match(r'^ATOM', line):

            PDBchain = line[21:22]
            ResNum = (line[22:27]).strip()

            if PDBchain == chain and ResNum in HashRef:

                while len(HashRef[ResNum]) < 6:

                    HashRef[ResNum] = " " + HashRef[ResNum]

                WRITE_PDB.write(line[:60] + HashRef[ResNum] + line[66:])

            else:

                WRITE_PDB.write(line[:60] + "      " + line[66:])

        else:

            WRITE_PDB.write(line)

        line = READ_PDB.readline()

    READ_PDB.close()
    WRITE_PDB.close()




def read_Rate4Site_gradesPE(gradesPE_file, gradesPE_hash_ref):

    # the routine matches each position in the gradesPE file its Rate4Site grade.

    try:

        GRADES = open(gradesPE_file, 'r')

    except:

        exit_on_error("sys_error", "read_Rate4Site_gradesPE : Can't open '" + gradesPE_file + "' for reading.")

    line = GRADES.readline()
    while line != "":

        if re.match(r'^\s*\d+\s+\w', line):

            grades = line.split()
            #ResNum = re.sub(r'[a-z]', "", grades[2], flags=re.IGNORECASE)
            if grades[2] != "-":

                 grades[2] = (grades[2]).split(":")[1]

            gradesPE_hash_ref[grades[2]] = grades[3]

        line = GRADES.readline()

    GRADES.close()




def read_ConSurf_gradesPE(gradesPE_file, gradesPE_hash_ref, gradesPE_ISD_hash_ref):

    # the routine matches each position in the gradesPE file its ConSurf grade. In case there was a grade mark with *, we put it in a seperate hash with the grade 0.
    # the routine returns "yes" if a * was found and "no" otherwise

    insufficient = False

    try:

        GRADES = open(gradesPE_file, 'r')

    except:

        exit_on_error('sys_error', "read_ConSurf_gradesPE can't open '" + gradesPE_file + "' for reading.")

    line = GRADES.readline()
    while line != "":

        if re.match(r'^\s*\d+\s+\w', line):

            grades = line.split()
            if grades[2] != "-":

                 grades[2] = ((grades[2]).split(':'))[1]

            #grades[2] = re.sub(r'[a-z]', "", grades[2], flags=re.IGNORECASE)
            if re.match(r'\d\*?', grades[4]):

                # if it is insufficient color - we change its grade to 0, which will be read as light yellow
                if re.match(r'(\d)\*', grades[4]):

                    gradesPE_hash_ref[grades[2]] = grades[4]
                    gradesPE_ISD_hash_ref[grades[2]] = "10"
                    insufficient = True

                else:

                    gradesPE_hash_ref[grades[2]] = grades[4]
                    gradesPE_ISD_hash_ref[grades[2]] = grades[4]

        line = GRADES.readline()

    GRADES.close()

    return(insufficient)



def replace_TmpFactor_Consurf_Scores_CIF(atom_grades, query_chain, pdb_file, prefix):

    # Creates The ATOM section with ConSurf grades instead of the TempFactor column, creates PDB file with ConSurf grades

    pdb_with_grades = prefix + "_ATOMS_section_With_ConSurf.cif"
    pdb_with_grades_isd = prefix + "_ATOMS_section_With_ConSurf_isd.cif"
    pdb_with_scores = prefix + "_With_Conservation_Scores.cif"

    [auth_seq_id_column, auth_comp_id_column, auth_asym_id_column, B_iso_or_equiv] = vars['pdb_object'].get_columns()

    try:

        PDB = open(pdb_file, 'r')

    except:

        exit_on_error('sys_error', "could not open file '" + pdb_file + "' for reading.\n")

    try:

        GRADES = open(pdb_with_grades, 'w')

    except:

        exit_on_error('sys_error', "could not open the file '" + pdb_with_grades + "' for writing.\n")

    try:

        SCORES = open(pdb_with_scores, 'w')

    except:

        exit_on_error('sys_error', "could not open the file '" + pdb_with_scores + "' for writing.\n")

    if vars['insufficient_data']:

        try:

            GRADES_ISD = open(pdb_with_grades_isd, 'w')

        except:

            exit_on_error('sys_error', "could not open the file '" + pdb_with_grades_isd + "' for writing.\n")

    line = PDB.readline()
    while line != "":

        if line[:4] == "ATOM" or line[:6] == "HETATM":

            words = line.split()
            chain = words[auth_asym_id_column]
            residue_number = words[auth_seq_id_column]

            grade = "0"
            score = "0"
            grade_isd = "0"

            if residue_number in atom_grades and chain == query_chain:

                # getting the grade

                [grade, isd, score] = atom_grades[residue_number]

                if vars['insufficient_data']:

                    if isd == 1:

                        grade_isd = "10"

                    else:

                        grade_isd = grade

            match = re.match(r'^((\S+\s+){' + str(B_iso_or_equiv) + r'})(\S+\s+)(.+)', line)
            length_temp_fact = len(match.group(3))
            line_start = match.group(1)
            line_end = match.group(4)
            while len(grade) < length_temp_fact:

                grade = grade + " "

            while len(score) < length_temp_fact:

                score = score + " "

            while len(grade_isd) < length_temp_fact:

                grade_isd = grade_isd + " "

            GRADES.write(line_start + grade + line_end + "\n")
            SCORES.write(line_start + score + line_end + "\n")
            if vars['insufficient_data']:

                GRADES_ISD.write(line_start + grade_isd + line_end + "\n")


        else:

            GRADES.write(line)
            SCORES.write(line)
            if vars['insufficient_data']:

                GRADES_ISD.write(line)

        line = PDB.readline()

    GRADES.close()
    SCORES.close()
    vars['zip_list'].append(pdb_with_grades)
    vars['zip_list'].append(pdb_with_scores)
    if vars['insufficient_data']:

        show_py3dmol(pdb_with_grades_isd, "cif")
        print_instructions(pdb_with_grades, "CIF", pdb_with_grades_isd)

    else:

        show_py3dmol(pdb_with_grades, "cif")
        print_instructions(pdb_with_grades, "CIF")


def replace_TmpFactor_Consurf_Scores_PDB(atom_grades, query_chain, pdb_file, prefix):

    # Creates The ATOM section with ConSurf grades instead of the TempFactor column, creates PDB file with ConSurf grades


    pdb_with_grades = prefix + "_ATOMS_section_With_ConSurf.pdb"
    pdb_with_grades_isd = prefix + "_ATOMS_section_With_ConSurf_isd.pdb"
    pdb_with_scores = prefix + "_With_Conservation_Scores.pdb"

    try:

        PDB = open(pdb_file, 'r')

    except:

        exit_on_error('sys_error', "replace_TmpFactor_Consurf_Scores_PDB : could not open file '" + pdb_file + "' for reading.\n")

    try:

        GRADES = open(pdb_with_grades, 'w')

    except:

        exit_on_error('sys_error', "replace_TmpFactor_Consurf_Scores_PDB : could not open the file '" + pdb_with_grades + "' for writing.\n")

    try:

        SCORES = open(pdb_with_scores, 'w')

    except:

        exit_on_error('sys_error', "replace_TmpFactor_Consurf_Scores_PDB : could not open the file '" + pdb_with_scores + "' for writing.\n")

    if vars['insufficient_data']:

        try:

            GRADES_ISD = open(pdb_with_grades_isd, 'w')

        except:

            exit_on_error('sys_error', "replace_TmpFactor_Consurf_Scores_PDB : could not open the file '" + pdb_with_grades_isd + "' for writing.\n")

    line = PDB.readline()
    while line != "":

        if line.strip() == "":

            # line is empty
            line = PDB.readline()
            continue

        if line[:4] == "ATOM" or line[:6] == "HETATM":

            chain = line[21:22]
            if chain == " ":

                chain = "NONE"

            residue = (line[22:27]).strip()
            if residue in atom_grades and chain == query_chain:

                [grade, isd, score] = atom_grades[residue]
                while len(score) < 6:

                    score = " " + score

                # the TF is updated with the grades and scores
                GRADES.write(line[:60] + "     " + grade + "      \n")
                SCORES.write(line[:60] + score + line[66:])

                if vars['insufficient_data']:

                    # the TF is updated with the number from gradesPE showing isd
                    if isd == 1:

                        GRADES_ISD.write(line[:60] + "    10      \n")

                    else:

                        GRADES_ISD.write(line[:60] + "    " + grade + "      \n")

            else:

                GRADES.write(line[:60] + "            \n")
                SCORES.write(line[:60] + "            \n")
                if vars['insufficient_data']:

                    GRADES_ISD.write(line[:60] + "            \n")

        else:

            GRADES.write(line)
            SCORES.write(line)
            if vars['insufficient_data']:

                GRADES_ISD.write(line)

        line = PDB.readline()

    GRADES.close()
    SCORES.close()
    vars['zip_list'].append(pdb_with_grades)
    vars['zip_list'].append(pdb_with_scores)
    if vars['insufficient_data']:

        show_py3dmol(pdb_with_grades_isd, "pdb")
        print_instructions(pdb_with_grades, "PDB", pdb_with_grades_isd)

    else:

        show_py3dmol(pdb_with_grades, "pdb")
        print_instructions(pdb_with_grades, "PDB")

def design_string_with_spaces_for_pipe(part_input):

    if part_input.strip() == "":

        return ""

    words = part_input.split()
    newPart = "! \"" +words[0]
    part = ""
    for word in words[1:]:

        # if adding another word to the string will yeild a too long string - we cut it.
        if len(word) + 1 + len(newPart) > 76:

            part += newPart + " \" +\n"
            newPart = "! \"" + word

        else:

            newPart += " " + word

    part += newPart + "\" ;"

    return part



def add_pdb_data_to_pipe(pdb_file, pipe_file):

    # create the file to be shown using FGiJ. read the pdb file and concat header pipe to it.

    try:

        PIPE = open(pipe_file, 'a')

    except:

        exit_on_error('sys_error', "add_pdb_data_to_pipe: cannot open " + pipe_file + " for writing.")

    try:

        PDB_FILE = open(pdb_file, 'r')

    except:

        exit_on_error('sys_error', "add_pdb_data_to_pipe: cannot open the " + pdb_file + " for reading.")

    line = PDB_FILE.readline()
    while line != "":

        if not re.match(r'^HEADER', line):

            PIPE.write(line)

        line = PDB_FILE.readline()

    PIPE.close()
    PDB_FILE.close()



def print_selected(arr_ref, print_for_pipe):

    total_text = ""
    string = ""
    if print_for_pipe == "yes":

        string = "! select "

    else:

        string = "select "

    total_length = len(string)

    if len(arr_ref) > 0:

        for aa in arr_ref:

            aa = aa.replace(":", "")
            total_length += len(aa)
            if total_length > 80:

                if re.search(r', $', string):

                    string = string[:-2]

                total_text += string + "\n"
                if print_for_pipe == "yes":

                    string = "! select selected or %s, " %aa

                else:

                    string = "select selected or %s, " %aa

                total_length = len(string)

            else:

                string += aa + ", "
                total_length += 2

    else:

        total_text += string + "none"


    if re.search(r', $', string):

        string = string[:-2]
        total_text += string

    return total_text




def create_consurf_pipe_new(results_dir, IN_pdb_id_capital, chain, ref_header_title, final_pipe_file, identical_chains, partOfPipe, current_dir, run_number, msa_filename, query_name_in_msa = "", tree_filename = "", submission_time = "", completion_time = "", run_date = ""):


    # Create the pipe file for FGiJ

    if chain == 'NONE':

        chain = ""
        identical_chains = ""

    # read info from the pdb file
    [header_line, title_line] = ref_header_title

    if title_line == "":

        title_line = "! \"No title or compound description was found in the PDB file\";"

    else:

        title_line = design_string_with_spaces_for_pipe(title_line)

    # design the identical chains line
    identical_chains_line = "! consurf_identical_chains = \"%s\";" %identical_chains

    current_dir += "_sourcedir"
    # in case there is a source dir - we determine the var query_name_in_msa
    if os.path.exists(current_dir):

        try:

            SOURCEDIR = open(current_dir, 'r')

        except:

            exit_on_error('sys_error', "create_consurf_pipe : cannot open " + current_dir + " for reading.")

        match = re.match(r'(\d[\d\w]{3})\/(\w)', SOURCEDIR.readline())
        if match:

            query_name_in_msa = SOURCEDIR.group(1) + SOURCEDIR.group(2)
            SOURCEDIR.close()

    if query_name_in_msa == "":

        query_name_in_msa = IN_pdb_id_capital + chain.upper()

    # write to the pipe file
    try:

        PIPE_PART = open(partOfPipe, 'r')

    except:

        exit_on_error('sys_error', "create_consurf_pipe : cannot open " + partOfPipe + " for reading.")

    try:

        PIPE = open(final_pipe_file, 'w')

    except:

        exit_on_error('sys_error', "create_consurf_pipe : cannot open " + final_pipe_file + " for writing.")

    if header_line != "":

        PIPE.write(header_line + "\n")

    else:

        PIPE.write("HEADER                                 [THIS LINE ADDED FOR JMOL COMPATIBILITY]\n")

    PIPE.write("""!! ====== IDENTIFICATION SECTION ======
!js.init
! consurf_server = "consurf";
! consurf_version = "3.0";
! consurf_run_number = \"%s\";
! consurf_run_date = \"%s\";
! consurf_run_submission_time = \"%s\";
! consurf_run_completion_time = \"%s\";
!
! consurf_pdb_id = \"%s\";
! consurf_chain = \"%s\";
%s
! consurf_msa_filename = \"%s\";
! consurf_msa_query_seq_name = \"%s\";
! consurf_tree_filename = \"%s\";
!
""" %(run_number, run_date, submission_time, completion_time, IN_pdb_id_capital, chain, identical_chains_line, msa_filename, query_name_in_msa, tree_filename))

    titleFlag = 0
    line = PIPE_PART.readline()
    while line != "":

        if re.match(r'^~~~+', line):

            if titleFlag == 0:

                PIPE.write("! pipe_title = \"<i>ConSurf View:</i> %s chain %s.\"\n!! pipe_subtitle is from TITLE else COMPND\n!!\n! pipe_subtitle =\n%s\n" %(IN_pdb_id_capital, chain, title_line))
                titleFlag = 1

            elif chain != "":

                PIPE.write("! select selected and :%s\n" %chain)

            else:

                PIPE.write("! select selected and protein\n")

        else:

            PIPE.write(line)

        line = PIPE_PART.readline()

    PIPE_PART.close()
    PIPE.close()




def freq_array(isd_residue_color, no_isd_residue_color):

    # design the frequencies array

    consurf_grade_freqs_isd = "Array(" + str(len(isd_residue_color[10]))
    i = 1
    while i < 10:

        consurf_grade_freqs_isd += "," + str(len(isd_residue_color[i]))
        i += 1

    consurf_grade_freqs_isd += ")"

    consurf_grade_freqs = "Array(0"
    i = 1
    while i < 10:

        consurf_grade_freqs += "," + str(len(no_isd_residue_color[i]))
        i += 1

    consurf_grade_freqs += ")"

    return(consurf_grade_freqs_isd, consurf_grade_freqs)


def design_string_for_pipe(string_to_format):

    # take a string aaaaaaa and returns it in this format: ! "aaa" +\n! "aa";\n

    part = string_to_format
    newPart = ""

    while len(part) > 73:

        newPart += "! \"" + part[:73] + "\" +\n"
        part = part[73:]

    newPart += "! \"" + part + "\" ;"

    return newPart


def extract_data_from_pdb(input_pdb_file):

    header = ""
    title = ""
    compnd = ""

    try:

        PDB = open(input_pdb_file, 'r')

    except:

        exit_on_error('sys_error', "extract_data_from_pdb : Could not open the file " + input_pdb_file + " for reading.")

    line = PDB.readline()
    while line != "":

        match1 = re.match(r'^HEADER', line)
        if match1:

            header = line.rstrip()

        else:

            match2 =re.match(r'^TITLE\s+\d*\s(.*)', line)
            if match2:

                title += match2.group(1) + " "

            else:

                match3 = re.match(r'^COMPND\s+\d*\s(.*)', line)
                if match3:

                    compnd += match3.group(1) + " "

                elif re.match(r'^SOURCE', line) or re.match(r'^KEYWDS', line) or re.match(r'^AUTHOR', line) or re.match(r'^SEQRES', line) or re.match(r'^ATOM', line):

                    break # no nead to go over all the pdb

        line = PDB.readline()

    PDB.close()
    if title == "":

        return header, compnd

    else:

        return header, title


def create_part_of_pipe_new(pipe_file, unique_seqs, db, seq3d_grades_isd, seq3d_grades, length_of_seqres, length_of_atom, ref_isd_residue_color, ref_no_isd_residue_color, E_score, iterations, max_num_homol, MSAprogram, algorithm, matrix, Average_pairwise_distance, scale = "legacy"):

    # creating part of the pipe file, which contains all the non-unique information.
    # each chain will use this file to construct the final pdb_pipe file, to be viewed with FGiJ

    if scale == "legacy":

        scale_block = "!color color_grade0 FFFF96 insufficient data yellow\n!color color_grade1 10C8D1 turquoise variable\n!color color_grade2 8CFFFF\n!color color_grade3 D7FFFF\n!color color_grade4 EAFFFF\n!color color_grade5 FFFFFF\n!color color_grade6 FCEDF4\n!color color_grade7 FAC9DE\n!color color_grade8 F07DAB\n!color color_grade9 A02560 burgundy conserved"

    else:

        scale_block = "!color color_grade0 FFFF96 insufficient data yellow\n!color color_grade1 1b7837 variable\n!color color_grade2 5aae61\n!color color_grade3 a6dba0\n!color color_grade4 d9f0d3\n!color color_grade5 f7f7f7\n!color color_grade6 e7d4e8\n!color color_grade7 c2a5cf\n!color color_grade8 9970ab\n!color color_grade9 762a83 conserved"

    # design the seq3d to be printed out to the pipe file
    seq3d_grades_isd = design_string_for_pipe(seq3d_grades_isd)
    seq3d_grades = design_string_for_pipe(seq3d_grades)

    # creating the frequencies array which corresponds the number of residues in each grade
    [consurf_grade_freqs_isd, consurf_grade_freqs] = freq_array(ref_isd_residue_color, ref_no_isd_residue_color)

    # Taking Care of Strings
    if max_num_homol == "all":

        max_num_homol = "\"all\""

    # write to the pipe file
    try:

        PIPE = open(pipe_file, 'w')

    except:

        exit_on_error('sys_error', "create_part_of_pipe_new : cannot open the file " + pipe_file + " for writing.", 'PANIC')

    PIPE.write("""! consurf_psi_blast_e_value = %s;
! consurf_psi_blast_database = "%s";
! consurf_psi_blast_iterations = %s;
! consurf_max_seqs = %s;
! consurf_apd = %.2f;
! consurf_alignment = "%s";
! consurf_method = "%s";
! consurf_substitution_model =  "%s";
!
! consurf_seqres_length = %s;
! consurf_atom_seq_length = %s;
! consurf_unique_seqs = %s;
! consurf_grade_freqs_isd = %s;
! consurf_grade_freqs = %s;
!
! seq3d_grades_isd =
%s
!
! seq3d_grades =
%s
!
!
!! ====== CONTROL PANEL OPTIONS SECTION ======
!js.init
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
! pipe_title_enlarged = false;
! pipe_background_color = "white";
!
!! Specify the custom consurf control panel
!!
! pipe_cp1 = "consurf/consurf.htm";
!
!! If you want the frontispiece to be reset every time you enter this
!! page, use false. If this is a one-page presentation (no contents)
!! and you want to be able to return from QuickViews without resetting
!! the view, use true.
!!
! frontispiece_conditional_on_return = true;
!
!! Open the command input slot/message box to 30%% of window height.
!!
! pipe_show_commands = true;
! pipe_show_commands_pct = 30;
!
!! Don't show the PiPE presentation controls in the lower left frame.
!!
! pipe_hide_controls = true;
!
!! Hide development viewing mode links at the bottom of the control panel.
!!
! pipe_tech_info = false;
!
!! pipe_start_spinning = true; // default is PE's Preference setting.
!! top.nonStopSpin = true; // default: spinning stops after 3 min.
!!
!! ====== COLORS SECTION ======
!!
!color color_carbon C8C8C8
!color color_sulfur FFC832
!
!! Ten ConSurf color grades follow:
!!
%s
!
!
!! ====== SCRIPTS SECTION ======
!!----------------------------------------
!!
!spt #name=select_and_chain
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
!
!!----------------------------------------
!!
!spt #name=view01
! @spt consurf_view_isd
!
!!----------------------------------------
!!
!spt #name=hide_all
! restrict none
! ssbonds off
! hbonds off
! dots off
! list * delete
!
!!----------------------------------------
!! common_spt uses CPK carbon gray (or phosphorus yellow) for backbones.
!!
!spt #name=common_spt
! @spt hide_all
! select all
! color [xC8C8C8] # rasmol/chime carbon gray
! select nucleic
! color [xFFA500] # phosphorus orange
! select hetero
! color cpk
! select not hetero
! backbone 0.4
! javascript top.water=0
!
! ssbonds 0.3
! set ssbonds backbone
! color ssbonds @color_sulfur
!
! select hetero and not water
! spacefill 0.45
! wireframe 0.15
! dots 50
!
! select protein
! center selected
!
!!----------------------------------------
!!
!spt #name=consurf_view_isd
! @spt common_spt
! @for $=0, 9
! @spt select_isd_grade$
! @spt select_and_chain
! color @color_grade$
! spacefill
! @endfor
! zoom 115
!
!!----------------------------------------
""" %(E_score, db, iterations, max_num_homol, round(float(Average_pairwise_distance), 2), MSAprogram, algorithm, matrix, length_of_seqres, length_of_atom, unique_seqs, consurf_grade_freqs_isd, consurf_grade_freqs, seq3d_grades_isd, seq3d_grades, scale_block))

    lineToPrint = ""
    i = 9
    while i > 0:

        PIPE.write("!!\n!spt #name=select_isd_grade%d\n!\n" %i)
        lineToPrint = print_selected(ref_isd_residue_color[i], "yes")
        if re.search(r'select', lineToPrint):

            PIPE.write(lineToPrint + "\n")

        PIPE.write("!\n!\n!!----------------------------------------\n")
        i -= 1

    PIPE.write("!!\n!spt #name=select_isd_grade0\n")
    lineToPrint = print_selected(ref_isd_residue_color[10], "yes")
    if re.search(r'select', lineToPrint):

        PIPE.write(lineToPrint + "\n")

    PIPE.write("!\n!\n!!----------------------------------------\n")

    i = 9
    while i > 0:

        PIPE.write("!!\n!spt #name=select_grade%d\n!\n" %i)
        lineToPrint = print_selected(ref_no_isd_residue_color[i], "yes")
        if re.search(r'select', lineToPrint):

            PIPE.write(lineToPrint + "\n")

        PIPE.write("!\n!\n!!----------------------------------------\n")
        i -= 1

    PIPE.write("!!\n!spt #name=select_grade0\n! select none\n!!\n")
    PIPE.write("!! ====== END OF CONSURF PiPE BLOCK ======\n")
    PIPE.close()



def create_gradesPE(gradesPE, ref_match = "", pdb_file = "", chain = "", prefix = "", pdb_object = "", identical_chains = "", pdb_cif = "", atom_grades = ""):


    # printing the the ConSurf gradesPE file

    if pdb_cif == "pdb": # this is for the pipe file

        seq3d_grades_isd = ""
        seq3d_grades = ""
        # arrays showing how to color the residues. The subarrays holds residues of the same color
        no_isd_residue_color = [[],[],[],[],[],[],[],[],[],[],[]] # no insufficient data
        isd_residue_color = [[],[],[],[],[],[],[],[],[],[],[]] # insufficient data

    try:

        PE = open(gradesPE, 'w')

    except:

        exit_on_error('sys_error', "create_gradesPE : can't open '" + gradesPE + "' for writing.")

    if form['DNA_AA'] == "AA":

        unknown_char = "X"
        PE.write("\t Amino Acid Conservation Scores\n")

    else:

        unknown_char = "N"
        PE.write("\t Nucleic Acid Conservation Scores\n")

    PE.write("\t=======================================\n\n")
    PE.write("The layers for assigning grades are as follows.\n")
    for i in range(1, len(vars['layers_array'])):

        if vars['layers_array'][i - 1] < 0:

            left_end = "%.3f" %vars['layers_array'][i - 1]

        else:

            left_end = " %.3f" %vars['layers_array'][i - 1]


        if vars['layers_array'][i] < 0:

            right_end = "%.3f" %vars['layers_array'][i]

        else:

            right_end = " %.3f" %vars['layers_array'][i]

        PE.write("from %s to %s the grade is %d\n" %(left_end, right_end, 10 - i))

    PE.write("\nIf the difference between the colors of the CONFIDENCE INTERVAL COLORS is more than 3 or the msa number (under the column titled MSA) is less than 6, there is insufficient data and an * appears in the COLOR column.\n")


    PE.write("\n- POS: The position of the acid in the sequence.\n")
    PE.write("- SEQ: The acid one letter.\n")
    PE.write("- ATOM: When there's a model, The ATOM derived sequence in three letter code, including the acid's positions as they appear in the PDB file and the chain identifier.\n")
    PE.write("- SCORE: The normalized conservation scores.\n")
    PE.write("- COLOR: The color scale representing the conservation scores (9 - conserved, 1 - variable).\n")
    PE.write("- CONFIDENCE INTERVAL: When using the bayesian method for calculating rates, a confidence interval is assigned to each of the inferred evolutionary conservation scores, next to it are the colors of the lower and upper bounds of the confidence interval\n")
    if vars['B/E']:

        PE.write("- B/E: Burried (b) or Exposed (e) residue.\n")
        PE.write("- F/S: functional (f) or structural (s) residue (f - highly conserved and exposed, s - highly conserved and burried).\n")

    PE.write("- MSA DATA: The number of aligned sequences having an acid (non-gapped) from the overall number of sequences at each position.\n")
    PE.write("- RESIDUE VARIETY: The residues variety at each position of the multiple sequence alignment.\n\n")



    if form['ALGORITHM'] == "Bayes":

        CONFIDENCE_INTERVAL = "CONFIDENCE INTERVAL\t"

    else:

        CONFIDENCE_INTERVAL = ""

    # the size of the POS, ATOM and MSA DATA columns is variable

    pos_number_size = len(str(len(vars['protein_seq_string']))) # size of the of the number of positions
    pos_column_title_size = len("POS") # size of the title of the column POS
    pos_column_size = max(pos_column_title_size, pos_number_size) # size of the column POS
    num_spaces = pos_column_size - pos_column_title_size # number of spaces to add
    while num_spaces > 0:

        PE.write(" ")
        num_spaces -= 1

    PE.write("POS\tSEQ\t")

    if ref_match != "":

        # consurf. in conseq there is no atom because there is no model
        max_res_details = pdb_object.get_max_res_details()
        atom_column_title_size = len("ATOM") # size of the title of the column ATOM
        atom_column_size = max(atom_column_title_size, max_res_details) # size of the column ATOM
        num_spaces = atom_column_size - atom_column_title_size # number of spaces to add
        while num_spaces > 0:

            PE.write(" ")
            num_spaces -= 1

        PE.write("ATOM\t")

    if vars['B/E']:

        PE.write(" SCORE\tCOLOR\t%sB/E\tF/S\t" %CONFIDENCE_INTERVAL)

    else:

        PE.write(" SCORE\tCOLOR\t%s" %CONFIDENCE_INTERVAL)

    msa_size = len(str(vars['final_number_of_homologoues'])) # size of the number of homologs
    msa_column_title_size = len("MSA DATA") # size of the title of the column MSA DATA
    msa_column_size = max(msa_column_title_size, 2 * msa_size + 1) # size of the column MSA DATA
    num_spaces = msa_column_size - msa_column_title_size # number of spaces to add
    while num_spaces > 0:

        PE.write(" ")
        num_spaces -= 1

    PE.write("MSA DATA\tRESIDUE VARIETY\n\n")

    seq_index = 0 # the index of the position in the query sequence (the rate4site output doesn't contain unknown chars)
    for elem in vars['gradesPE_Output']:

        pos = elem['POS']
        var = ""
        num_spaces = pos_column_size - len(str(pos)) # number of spaces to add
        while num_spaces > 0:

            PE.write(" ")
            num_spaces -= 1

        PE.write("%d\t" %pos)
        PE.write("  %s\t" %elem['SEQ'])

        if ref_match != "":

            # consurf, in conseq there is no atom because there is no model
            atom_3L = ref_match[pos]
            num_spaces = atom_column_size - len(atom_3L) # number of spaces to add
            while num_spaces > 0:

                PE.write(" ")
                num_spaces -= 1

            PE.write("%s\t" %atom_3L)

            # save the grade of the residue inorder to write it on the pdb file
            if atom_3L != '-':

                residue_number = atom_3L.split(':')[1]
                atom_grades[residue_number] = [str(elem['COLOR']), elem['ISD'], "%6.3f" %elem['GRADE']]

        PE.write("%6.3f\t" %elem['GRADE'])
        if elem['ISD'] == 1:

            PE.write("   %d*\t" %elem['COLOR'])

        else:

            PE.write("   %d \t" %elem['COLOR'])

        if form['ALGORITHM'] == "Bayes":

            PE.write("%6.3f, " %elem['INTERVAL_LOW'])
            PE.write("%6.3f  " %elem['INTERVAL_HIGH'])
            PE.write("%d," %ColorScale[elem['INTERVAL_LOW_COLOR']])
            PE.write("%d\t" %ColorScale[elem['INTERVAL_HIGH_COLOR']])

        if vars['B/E']:

            PE.write("  %s\t  %s\t" %(elem['B/E'], elem['F/S']))

        homologs_in_pos = str(elem['MSA_NUM']) + "/" + elem['MSA_DENUM'] # number of homologs in the position
        num_spaces = msa_column_size - len(homologs_in_pos) # number of spaces to add
        while num_spaces > 0:

            PE.write(" ")
            num_spaces -= 1

        PE.write(homologs_in_pos)

		# we write the acids percentage in the msa
        while seq_index < len(vars['protein_seq_string']) and vars['protein_seq_string'][seq_index] == unknown_char:

            seq_index += 1

        for char in vars['percentage_per_pos'][seq_index]:

            if vars['percentage_per_pos'][seq_index][char] == 100:

                var += char

            else:

				# if the precentage is less than one write <1%
                if vars['percentage_per_pos'][seq_index][char] > 1:

                    var += "%s %2d%%, " %(char, vars['percentage_per_pos'][seq_index][char])

                else:

                    var += "%s <1%%, " %char

        """
        if vars['unknown_per_pos'][pos - 1] != 0:

			# if the precentage is less than one write <1%
            if vars['unknown_per_pos'][pos - 1] > 1:

                var += "%s %2d%%" %(unknown_char, vars['unknown_per_pos'][pos - 1])

            else:

                var += "%s <1%%" %unknown_char
        """
        if len(vars['percentage_per_pos'][seq_index]) != 1:

            var = var[:-2] # we delete the last comma

        PE.write("\t" + var + "\n")
        seq_index += 1

        # the amino-acid in that position, must be part of the residue variety in this column
        if not re.search(elem['SEQ'], var, re.IGNORECASE):

            PE.close()
            exit_on_error('sys_error', "create_gradesPE : in position %s, the amino-acid %s does not match the residue variety: %s." %(pos, elem['SEQ'], var))

        if pdb_cif != "pdb": # the next part is for the pipe file

            continue

        # printing the residue to the rasmol script
        # assigning grades to seq3d strings
        if not '-' in atom_3L:

            atom_3L = re.search(r'(.+):', atom_3L).group(1)
            if form['DNA_AA'] == "Nuc":

                atom_3L = "D" + atom_3L

            color = elem['COLOR']
            no_isd_residue_color[color].append(atom_3L)
            if elem['ISD'] == 1:

                isd_residue_color[10].append(atom_3L)
                seq3d_grades_isd += "0"

            else:

                isd_residue_color[color].append(atom_3L)
                seq3d_grades_isd += str(color)

            seq3d_grades += str(color)

        else:

            seq3d_grades_isd += "."
            seq3d_grades += "."

    PE.write("\n\n*Below the confidence cut-off - The calculations for this site were performed on less than 6 non-gaped homologue sequences,\n")
    PE.write("or the confidence interval for the estimated score is equal to- or larger than- 4 color grades.\n")
    PE.close()

    if pdb_cif != "pdb": # the next part is for the pipe file

        return

    if seq3d_grades_isd == "" or seq3d_grades == "":

        exit_on_error('sys_error', "create_gradesPE : there is no data in the returned values seq3d_grades_isd or seq3d_grades from the routine")

    """
    # This will create the pipe file for FGiJ
    pipeFile = prefix + "_consurf_firstglance.pdb"
    pipeFile_CBS = prefix + "_consurf_firstglance_CBS.pdb" # pipe for color blind friendly
    create_pipe_file(pipeFile, pipeFile_CBS, seq3d_grades, seq3d_grades_isd, isd_residue_color, no_isd_residue_color, pdb_file, chain, (prefix).upper(), identical_chains, pdb_object)

    # create RasMol files
    create_rasmol(prefix, chain, no_isd_residue_color, isd_residue_color)
    """

def match_pdb_to_seq(ref_fas2pdb, query_seq, pdbseq, pdb_object):

    # matches the position in the seqres/msa sequence to the position in the pdb

    UnKnownChar = ""
    if form['DNA_AA'] == "AA":

        UnKnownChar = "X"

    else:

        UnKnownChar = "N"

    # creating the hash that matches the position in the ATOM fasta to its position
    # in the pdb file and also the fasta ATOM position to the correct residue
    match_ATOM = pdb_object.get_positions()
    """
    # creating the hash that matches the position in the ATOM fasta to its position
    # in the pdb file and also the fasta ATOM position to the correct residue
    try:

        MATCH = open(atom_position, 'r')

    except:

        exit_on_error('sys_error', "match_pdb_to_seq : Could not open the file " + atom_position + " for reading.")

    max_res_details = 0 # longest residue details string. Dictates the size of the ATOM column in the results summary page
    match_ATOM = {}
    line = MATCH.readline()
    while line != "":

        words = line.split()
        if len(words) == 3:

            res_details = words[0] + ":" + words[2] + ":" + chain
            match_ATOM[int(words[1])] = res_details
            if res_details > max_res_details:

                max_res_details = res_details

        line = MATCH.readline()

    MATCH.close()

    length_of_atoms = 0
    for char in pdbseq:

        if char != '-' and char != UnKnownChar:

            length_of_atoms += 1

    length_of_seqres = 0
    for char in query_seq:

        if char != '-' and char != UnKnownChar:

            length_of_seqres += 1
    """

    query_pos = 1
    pdb_pos = 1
    for pos in range(len(query_seq)):

        if query_seq[pos] != '-' and query_seq[pos] != UnKnownChar:

            if pdbseq[pos] == '-' or pdb_pos not in match_ATOM:

                ref_fas2pdb[query_pos] = '-'

            else:

                ref_fas2pdb[query_pos] = match_ATOM[pdb_pos]
                pdb_pos += 1

            query_pos += 1

        elif pdbseq[pos] != '-':

            pdb_pos += 1

    #return length_of_seqres, length_of_atoms, max_res_details



def find_pdb_position(ref_fas2pdb, pdb_object):

    # finds the position of the the sequence in the pdb

    LOG.write("find_pdb_position(ref_fas2pdb, pdb_object)\n")

    UnKnownChar = ""
    if form['DNA_AA'] == "AA":

        UnKnownChar = "X"

    else:

        UnKnownChar = "N"

    match_ATOM = pdb_object.get_positions()
    # rate4site deletes the unknown chars
    rate4site_pos = 1
    pdb_pos = 1
    for char in vars['ATOM_without_X_seq']:

        if char != UnKnownChar:

            ref_fas2pdb[rate4site_pos] = match_ATOM[pdb_pos]
            rate4site_pos += 1

        pdb_pos += 1



def assign_colors_according_to_r4s_layers():

    LOG.write("assign_colors_according_to_r4s_layers(%s, %s)\n" %(vars['gradesPE_Output'], vars['r4s_out']))

    vars['insufficient_data'] = False
    """
    if form['DNA_AA'] == "AA":

	    #runs the PACC algorithm to calculate burried/exposed
        ref_Solv_Acc_Pred = predict_solvent_accesibility()
	    # this array connects the position to its index in ref_Solv_Acc_Pred (positions don't include unknown characters, PACC does)
        index_of_pos = get_seq_legal_positions(vars['protein_seq_string'])

    else:

        ref_Solv_Acc_Pred = ""

    if ref_Solv_Acc_Pred != "":

        vars['B/E'] = True

    else:

        vars['B/E'] = False

    """
	# we extract the data from the rate4site output
    try:

        RATE4SITE = open(vars['r4s_out'], 'r')

    except:

        exit_on_error('sys_error', "assign_colors_according_to_r4s_layers : can't open " + vars['r4s_out'])

    line = RATE4SITE.readline()
    while line != "":

        line.rstrip()

        if form['ALGORITHM'] == "Bayes":

            # baysean
            match1 = re.match(r'^\s*(\d+)\s+(\w)\s+(\S+)\s+\[\s*(\S+),\s*(\S+)\]\s+\S+\s+(\d+)\/(\d+)', line)
            if match1:

                vars['gradesPE_Output'].append({'POS' : int(match1.group(1)), 'SEQ' : match1.group(2), 'GRADE' : float(match1.group(3)), 'INTERVAL_LOW' : float(match1.group(4)), 'INTERVAL_HIGH' : float(match1.group(5)), 'MSA_NUM' : int(match1.group(6)), 'MSA_DENUM' : match1.group(7)})

        else:

            # Maximum likelihood
            match2 = re.match(r'^\s*(\d+)\s+(\w)\s+(\S+)\s+(\d+)\/(\d+)', line)
            if match2:

                vars['gradesPE_Output'].append({'POS' : int(match2.group(1)), 'SEQ' : match2.group(2), 'GRADE' : float(match2.group(3)), 'INTERVAL_LOW' : float(match2.group(3)), 'INTERVAL_HIGH' : float(match2.group(3)), 'MSA_NUM' : int(match2.group(4)), 'MSA_DENUM' : match2.group(5)})

        line = RATE4SITE.readline()

    RATE4SITE.close()

    # we find the maximum and the minimum scores
    max_cons = vars['gradesPE_Output'][0]['GRADE']
    min_cons = vars['gradesPE_Output'][0]['GRADE']
    for element in vars['gradesPE_Output']:

        if element['GRADE'] < max_cons:

            max_cons = element['GRADE']

        if element['GRADE'] > min_cons:

            min_cons = element['GRADE']

    # we divide the interval between min_cons to max_cons to nine intervals
    # 4 intervals on the left side are of length 2 * min_cons / 9
    # 4 intervals on the right side are of length 2 * max_cons / 9
    NoLayers = 10
    LeftLayers = 5
    RightLayers = 5
    ColorLayers = []
    i = 0
    while i < LeftLayers:

        ColorLayers.append(max_cons * ((9 - 2 * i) / 9.0))
        i += 1

    i = 0
    while i < RightLayers:

        ColorLayers.append(min_cons * ((1 + 2 * i) / 9.0))
        i += 1

    # each position gets a grade according to the layer its score is in
    for element in vars['gradesPE_Output']:

        i = 0
        while not 'INTERVAL_LOW_COLOR' in element or not 'INTERVAL_HIGH_COLOR' in element or not 'COLOR' in element:

            if not 'INTERVAL_LOW_COLOR' in element:

                if i == NoLayers - 1:

                    element['INTERVAL_LOW_COLOR'] = 8

                elif element['INTERVAL_LOW'] >= ColorLayers[i] and element['INTERVAL_LOW'] < ColorLayers[i + 1]:

                    element['INTERVAL_LOW_COLOR'] = i

                elif element['INTERVAL_LOW'] < ColorLayers[0]:

                    element['INTERVAL_LOW_COLOR'] = 0

            if not 'INTERVAL_HIGH_COLOR' in element:

                if i == NoLayers - 1:

                    element['INTERVAL_HIGH_COLOR'] = 8

                elif element['INTERVAL_HIGH'] >= ColorLayers[i] and element['INTERVAL_HIGH'] < ColorLayers[i + 1]:

                    element['INTERVAL_HIGH_COLOR'] = i

                elif element['INTERVAL_HIGH'] < ColorLayers[0]:

                    element['INTERVAL_HIGH_COLOR'] = 0

            if not 'COLOR' in element:

                if i == NoLayers - 1:

                    element['COLOR'] = ColorScale[i - 1]

                elif element['GRADE'] >= ColorLayers[i] and element['GRADE'] < ColorLayers[i + 1]:

                    element['COLOR'] = ColorScale[i]

            i += 1

		# there is insufficient data if there are more than 3 layers in the confidence interval or the number of homologs in the MSA where the position is not empty is less than 5
        if element['INTERVAL_HIGH_COLOR'] - element['INTERVAL_LOW_COLOR'] > bayesInterval or element['MSA_NUM'] <= 5:

            element['ISD'] = 1
            vars['insufficient_data'] = True

        else:

            element['ISD'] = 0
        """
        if vars['B/E']:

            element['B/E'] = ref_Solv_Acc_Pred[index_of_pos[element['POS'] - 1]]
            if element['B/E'] == "e":

                if element['COLOR'] == 9 or element['COLOR'] == 8:

                    element['F/S'] = "f"

                else:

                    element['F/S'] = " "

            elif element['COLOR'] == 9:

                element['F/S'] = "s"

            else:

                element['F/S'] = " "
        """
    vars['layers_array'] = ColorLayers
    LOG.write("assign_colors_according_to_r4s_layers : color layers are %s\n" %str(vars['layers_array']))




def extract_diversity_matrix_info(r4s_log_file):

    # extracting diversity matrix info

    matrix_disINFO = "\"\""
    matrix_lowINFO = "\"\""
    matrix_upINFO = "\"\""

    try:

        RES_LOG = open(r4s_log_file, 'r')

    except:

        exit_on_error('sys_error', "extract_diversity_matrix_info: Can't open '" + r4s_log_file + "' for reading\n")

    line = RES_LOG.readline()
    while line != "":

        line = line.rstrip()
        match1 = re.match(r'\#Average pairwise distance\s*=\s+(.+)', line)
        if match1:

            matrix_disINFO = match1.group(1)

        else:

            match2 = re.match(r'\#lower bound\s*=\s+(.+)', line)
            if match2:

                matrix_lowINFO = match2.group(1)

            else:

                match3 = re.match(r'\#upper bound\s*=\s+(.+)', line)
                if match3:

                    matrix_upINFO = match3.group(1)
                    break

        line = RES_LOG.readline()

    RES_LOG.close()

    vars['Average pairwise distance'] = matrix_disINFO


def add_sequences_removed_by_cd_hit_to_rejected_report(cd_hit_clusters_file, rejected_fragments_file, num_rejected_homologs):

    LOG.write("add_sequences_removed_by_cd_hit_to_rejected_report : running add_sequences_removed_by_cd_hit_to_rejected_report(%s, %s, %d)\n" %(cd_hit_clusters_file, rejected_fragments_file, num_rejected_homologs))

    try:

        REJECTED = open(rejected_fragments_file, 'a')

    except:

        exit_on_error('sys_error', "extract_sequences_removed_by_cd_hit: Can't open '" + rejected_fragments_file + "' for writing.")

    try:

        CDHIT = open(cd_hit_clusters_file, 'r')

    except:

        exit_on_error('sys_error', "extract_sequences_removed_by_cd_hit: Can't open '" + cd_hit_clusters_file + "' for reading.\n")

    REJECTED.write("\n\t Sequences rejected in the clustering stage by CD-HIT\n\n")

    cluster_members = {}
    cluster_head = ""

    line = CDHIT.readline()
    while line != "":

        match = re.match(r'^>Cluster', line)
        if match:

            # New Cluster
            for cluster_member in cluster_members.keys():

                REJECTED.write("%d Fragment %s rejected: the sequence shares %s identity with %s (which was preserved)\n" %(num_rejected_homologs, cluster_member, cluster_members[cluster_member], cluster_head))
                num_rejected_homologs += 1

            cluster_members = {}
            cluster_head = ""

        else:

            # Clusters Members
            words = line.split()
            if len(words) > 2:

                x = words[2][1:-3] # delete the symbols > and ... from the beginning and and ending of the sequence name
                if words[3] == "*":

                    cluster_head  = x

                elif len(words) > 3:

                    cluster_members[x] = words[4]

        line = CDHIT.readline()

    vars['num_rejected_homologs'] = num_rejected_homologs



def choose_homologoues_from_search_with_lower_identity_cutoff(searchType, query_seq_length, redundancyRate, frag_overlap, min_length_percent, min_id_percent, min_num_of_homologues, search_output, fasta_output, rejected_seqs, ref_search_hash, Nuc_or_AA):

    # searchType: HMMER, BLAST or MMseqs2
    # query_seq_length: Length of the query sequence
    # redundancyRate: The allowed similarity between the query and the hit
    # frag_overlap: The allowed overlap between the hits
    # min_length_percent: The hit can't be smaller than this percent of the query
    # min_id_percent: The minimum similarity between the query and the hit
    # min_num_of_homologues: Minimum number of homologs
    # search_output: Raw homolog search output
    # fasta_output: Accepted hits
    # rejected_seqs: Rejected hits
    # ref_search_hash: Hash with the evalues of the accepted hits. This is later used when choosing the final hits after cid-hit
    # Nuc_or_AA: Amino or nucleic acid

    LOG.write("choose_homologoues_from_search_with_lower_identity_cutoff(%s, %d, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s);\n" %(searchType, query_seq_length, redundancyRate, frag_overlap, min_length_percent, min_id_percent, min_num_of_homologues, search_output, fasta_output, rejected_seqs, ref_search_hash, Nuc_or_AA))

    # Defining the minimum length a homologue should have
    # 60% the query's length
    min_length = query_seq_length * min_length_percent

    # Reading blast/hmmer output and collect the homologues
    # Printing the selected homologues to a file and insert the e-value info to hash

    OUT_REJECT = open(rejected_seqs, 'w')
    OUT = open(fasta_output, 'w')
    RAW_OUT = open(search_output, 'r')

    num_homologoues = 0
    num_rejected = 1
    final_num_homologues = 0
    OUT_REJECT.write("\tSequences rejected for being too short, too similar, or not similar enough to the query sequence.\n\n")
    if searchType == "MMseqs2":

        # we skip the first two lines which have the query
        line = RAW_OUT.readline()
        line = RAW_OUT.readline()
        line = RAW_OUT.readline()
        while line != "":

            if line[0] == ">":

                # this line has the sequence details
                words = line.split()

                # new seq found

                num_homologoues += 1
                seq_name = words[0][1:]

                seq_eval = float(words[3])
                seq_beg = int(words[7])
                seq_end = int(words[8])
                seq_ident = float(words[2]) * 100

            elif line[0] != "\x00":

                # this line has the sequence. We take it and save it with the details of the previous line
                seq = line.strip() # the seq in fasta with gaps
                seq = re.sub(r'-', "", seq) # delete gaps
                seq_frag_name = "%s|%d_%d|%s" %(seq_name, seq_beg, seq_end, seq_eval)

                # deciding if we take the fragment
                ans = check_if_seq_valid_with_min_id(redundancyRate, min_length, min_id_percent, seq_ident, seq, seq_name, Nuc_or_AA)

                if seq_eval > form['E_VALUE']:

                    ans = "The E-value %.16g is over the limit %.16g." %(seq_eval, form['E_VALUE'])

                # after taking the info, check if the currecnt sequence is valid. If so - insert it to the hash
                if ans == "yes":

                    final_num_homologues += 1
                    OUT.write(">%s\n%s\n" %(seq_frag_name, seq))
                    ref_search_hash[seq_frag_name] = seq_eval

                else:

                    OUT_REJECT.write("%d Fragment %s rejected: %s\n" %(num_rejected, seq_frag_name, ans))
                    num_rejected += 1

            line = RAW_OUT.readline()

    OUT_REJECT.close()
    OUT.close()
    # Checking that the number of homologues found is legal

    if final_num_homologues < min_num_of_homologues:

        message = "Only %d unique sequences were chosen. The minimal number of sequences required for the calculation is %d. You can try to rerun with a multiple sequence alignment file of your own, increase the E value or decrease the minimal %%ID for homologs" %(final_num_homologues, min_num_of_homologues)
        exit_on_error("", message)

    vars['number_of_homologoues'] = num_homologoues
    vars['number_of_homologoues_before_cd-hit'] = final_num_homologues
    vars['num_rejected_homologs'] = num_rejected



def change_name(s_name):

    if s_name[:2] == "sp" or s_name[:2] == "tr":

        words = s_name.split("|")
        return  "up|" + words[1]

    elif s_name[:2] == "gi":

        words = s_name.split("|")
        return  "gi|" + words[1]

    elif s_name[:8] == "UniRef90":

        words = s_name.split("_")
        return "ur|" + words[1]

    elif "|" in s_name:

        words = s_name.split("|")
        return "up|" + words[1]

    else:

        return "gi|" + s_name




def check_if_seq_valid_with_min_id(redundancyRate, min_length, min_id, ident_percent, aaSeq, seqName, Nuc_or_AA):

    seq_length = len(aaSeq)

    if ident_percent >= redundancyRate:

        # the sequence identity is not too high
        return "identity percent %.2f is too big" %ident_percent

    if ident_percent < min_id:

        # the sequence identity is higher than the minium idnentity percent that was defined for homologus
        return "identity percent %.2f is too low (below %d)" %(ident_percent, min_id)

    elif seq_length < min_length:

        # the sequnece length is greater than the minimum sequence length
        return "the sequence length %d is too short. The minimum is %d" %(seq_length, min_length)

    return check_illegal_character(aaSeq, seqName, Nuc_or_AA)


def check_illegal_character(aaSeq, seqName, Nuc_or_AA):

    # the sequnece letters should be legal to rate4site

    if Nuc_or_AA == "AA":

        # AA seq
        if not re.match(r'^[ACDEFGHIKLMNPQRSTVWYBZXacdefghiklmnpqrstvwybzx]+$', aaSeq):

            return "illegal character was found in sequence: " + seqName

    else:

        # Nuc seq
        if not re.match(r'^[ACGTUINacgtuin]+$', aaSeq):

            return "illegal character was found in sequence: " + seqName

    return "yes"

def check_if_no_overlap(max_overlap, ref_seq_details, s_bgn, s_end):

    ans = "check_if_no_overlap : no ans was picked"

    i = 0
    while i < len(ref_seq_details):

        fragment_beg = ref_seq_details[i][0]
        fragment_end = ref_seq_details[i][1]
        fragment_length = int(fragment_end) - int(fragment_beg) + 1

        if s_bgn <= fragment_beg and s_end >= fragment_end:

            # fragment is inside subjct
            return "previous fragment found %s_%s is fully inside new fragment" %(fragment_beg, fragment_end)

        elif s_bgn >= fragment_beg and s_end <= fragment_end:

            # subjct is inside fragment
            return "new fragment is fully inside previous fragment found " + str(fragment_beg + fragment_end)

        elif fragment_end < s_end and fragment_end > s_bgn:

            # fragment begins before subjct
            overlap_length = fragment_end - s_bgn + 1
            if overlap_length > fragment_length * max_overlap:

                return "overlap length of fragment is %d which is greater than maximum overlap: %d" %(overlap_length, fragment_length * max_overlap)

            else:

                # when the fragment might be a good match, we can only insert it if it did not match to all the fragments
                if i == len(ref_seq_details) - 1:

                    ans = "yes"

        elif fragment_beg > s_bgn and  fragment_beg < s_end:

            # fragment begins after subjct
            overlap_length = s_end - fragment_beg + 1
            if overlap_length > fragment_length * max_overlap:

                return "overlap length of fragment is %d which is greater than maximum overlap: %d" %(overlap_length, fragment_length * max_overlap)

            else:

                # when the fragment might be a good match, we can only insert it if it did not match to all the fragments
                if i == len(ref_seq_details) - 1:

                    ans = "yes"

        elif fragment_beg >= s_end or fragment_end <= s_bgn:

            # no overlap
            if i == len(ref_seq_details) - 1:

                ans = "yes"

        i += 1

    return ans


def count_letters(s):

    l = 0
    for c in s:

        if c.isalpha():

            l += 1

    return l


def write_alignment(first_seq, first_seq_name, second_seq, second_seq_name, middle_line, clustalw_aln):

    # we want all the lines to begin at the same point
    while len(first_seq_name) < 15:

        first_seq_name += " "

    while len(second_seq_name) < 15:

        second_seq_name += " "

    middle_line_beginning = ""
    while len(middle_line_beginning) < 15:

        middle_line_beginning += " "

    try:

        CLUSTALW_ALN = open(clustalw_aln, 'w')

    except:

        exit_on_error('sys_error', "write_alignment : could not open " + clustalw_aln + " for writing.")

    seq_too_long_for_page = True
    while seq_too_long_for_page:

        # we show only 60 chars of the seq in each line
        if len(first_seq) <= 60:

            seq_too_long_for_page = False

        CLUSTALW_ALN.write(first_seq_name + first_seq[:60] + "\n")
        CLUSTALW_ALN.write(middle_line_beginning + middle_line[:60] + "\n")
        CLUSTALW_ALN.write(second_seq_name + second_seq[:60] + "\n\n")

        first_seq = first_seq[60:]
        middle_line = middle_line[60:]
        second_seq = second_seq[60:]

    CLUSTALW_ALN.close()

def pairwise_alignment(first_seq, second_seq, clustalw_aln = "", seq_type = ""):

	# for new Bio
    aligner = Align.PairwiseAligner()

    # Pairwise Alignment Paramaters

    aligner.mode = 'global' #Can be either global or local, if undetermined, biopython will choose optimal algorithem

    if form['DNA_AA'] == "AA":

        aligner.substitution_matrix = substitution_matrices.load("/content/matrix.txt")

    else:

        aligner.substitution_matrix = substitution_matrices.load("/content/matrix-nuc.txt")

    #Default Gap extension and opening penalties for ClustalW are 0.2 and 10.0.
    aligner.open_gap_score = -5.0 #-10.0
    aligner.extend_gap_score = -0.20
    #aligner.target_end_gap_score = 0.0
    #aligner.query_end_gap_score = 0.0

    alignments = aligner.align(first_seq, second_seq)
    #[first_seq_with_gaps, middle_line, second_seq_with_gaps] = (str(alignments[0])).split() # old Bio
    first_seq_with_gaps = ""
    second_seq_with_gaps = ""
    alignment_string = str(alignments[0])
    lines = alignment_string.split('\n')
    for line in lines:

        words = line.split()
        if len(words) > 2:

            if words[0] == "target":

                first_seq_with_gaps += words[2]

            elif words[0] == "query":

                second_seq_with_gaps += words[2]

    matches = 0
    length_without_gaps = 0
    for i in range(len(first_seq_with_gaps)):

        if first_seq_with_gaps[i] != '-' and second_seq_with_gaps[i] != '-':

            length_without_gaps += 1
            if first_seq_with_gaps[i] == second_seq_with_gaps[i]:

                matches += 1

    identity = (matches * 100.0) / length_without_gaps

    if clustalw_aln == "":

        # we don't write the alignment
        return identity

    try:

        CLUSTALW_ALN = open(clustalw_aln, 'w')

    except:

        exit_on_error('sys_error', "write_alignment : could not open " + clustalw_aln + " for writing.")

    if seq_type == "SEQRES":

        CLUSTALW_ALN.write("target - Seqres sequence\nguery - Atom sequence\n\n" + alignment_string)

    else:

        CLUSTALW_ALN.write("target - MSA sequence\nguery - Atom sequence\n\n" + alignment_string)

    CLUSTALW_ALN.close()
    #write_alignment(first_seq_with_gaps, seq_type + "_SEQ", second_seq_with_gaps, "ATOM_SEQ", middle_line, clustalw_aln) # old Bio
    return(first_seq_with_gaps, second_seq_with_gaps, identity)


def pairwise_alignment_old(first_seq, second_seq, clustalw_aln = "", seq_type = ""):

	# for old Bio
    aligner = Align.PairwiseAligner()

    # Pairwise Alignment Paramaters

    aligner.mode = 'global' #Can be either global or local, if undetermined, biopython will choose optimal algorithem

    if form['DNA_AA'] == "AA":

        aligner.substitution_matrix = substitution_matrices.load("/content/matrix.txt")

    else:

        aligner.substitution_matrix = substitution_matrices.load("/content/matrix-nuc.txt")

    #Default Gap extension and opening penalties for ClustalW are 0.2 and 10.0.
    aligner.open_gap_score = -5.0 #-10.0
    aligner.extend_gap_score = -0.20
    #aligner.target_end_gap_score = 0.0
    #aligner.query_end_gap_score = 0.0

    alignments = aligner.align(first_seq, second_seq)
    #[first_seq_with_gaps, middle_line, second_seq_with_gaps] = (str(alignments[0])).split() # old Bio
    alignment_string = str(alignments[0])
    lines = alignment_string.split('\n')
    first_seq_with_gaps = lines[0]
    middle_line = lines[1]
    second_seq_with_gaps = lines[2]

    matches = 0
    length_without_gaps = 0
    for i in range(len(first_seq_with_gaps)):

        if first_seq_with_gaps[i] != '-' and second_seq_with_gaps[i] != '-':

            length_without_gaps += 1
            if first_seq_with_gaps[i] == second_seq_with_gaps[i]:

                matches += 1

    identity = (matches * 100.0) / length_without_gaps

    if clustalw_aln == "":

        # we don't write the alignment
        return identity

    write_alignment(first_seq_with_gaps, seq_type + "_SEQ", second_seq_with_gaps, "ATOM_SEQ", middle_line, clustalw_aln) # old Bio
    return(first_seq_with_gaps, second_seq_with_gaps, identity)




class PDF(fpdf.FPDF):

    def __init__(self):

        super().__init__()
        self.lasth = 0
        self.cbs = False

    def Cell(self, w, h = 0, txt = '', border = 0, ln = 0, align = '', fill = False):

        self.cell(w, h, txt, border, ln, align, fill)
        self.lasth = h

    def Print_4_Lines_Element(self, Rows_Pos, Score, AA, Pos, Solv_Acc, font_size, Insufficient_Data, Funct_Struct = ""):


        if Pos != "":

            x = self.get_x()
            if Pos < 9:

                self.set_xy(self.get_x() + 0.5, self.get_y())

            elif Pos < 99:

                self.set_xy(self.get_x() + 1, self.get_y())

            elif Pos < 9999:

                self.set_xy(self.get_x() + 1.5, self.get_y())

            elif Pos < 99999:

                self.set_xy(self.get_x() + 2, self.get_y())

            self.Print_BackgroundColor(str(Pos + 1), "", font_size, 5)
            self.set_xy(x, self.get_y())

        self.set_xy(self.get_x(), self.get_y() + self.lasth)

        self.Print_BackgroundColor(AA, 'B', font_size, Score, 3, Insufficient_Data)
        Col_Pos = self.get_x() # position on the line after printing
        self.set_xy(self.get_x() - 2.5, self.get_y() + self.lasth + 0.2)

        if Solv_Acc == "e":

            self.Print_ForegroundColor(Solv_Acc, 'B', font_size, 1)

        elif Solv_Acc == "b":

            self.Print_ForegroundColor(Solv_Acc, 'B', font_size, 2)

        self.set_xy(self.get_x() - 2, self.get_y() + self.lasth - 1.2)

        if Funct_Struct == "f":

            self.Print_ForegroundColor(Funct_Struct, 'B', font_size, 3)

        elif Funct_Struct == "s":

            self.Print_ForegroundColor(Funct_Struct, 'B', font_size, 4)

        self.set_xy(Col_Pos, Rows_Pos)



    def Print_BackgroundColor(self, txt, print_style, font_size, Color_Num, spacer = 2, isd = False):

        cbs = self.cbs

        if Color_Num == 0:

            self.set_text_color(0, 0, 0)
            self.set_fill_color(255, 255, 150)

        elif Color_Num == 1:

            if cbs:

                self.set_fill_color(10, 125, 130)

            else:

                self.set_fill_color(15, 90, 35)

            self.set_text_color(255, 255, 255)

        elif Color_Num == 2:

            if cbs:

                self.set_fill_color(75, 175, 190)

            else:

                self.set_fill_color(90, 175, 95)

        elif Color_Num == 3:

            if cbs:

                self.set_fill_color(165, 220, 230)

            else:

                self.set_fill_color(165, 220, 160)

        elif Color_Num == 4:

            if cbs:

                self.set_fill_color(215, 240, 240)

            else:

                self.set_fill_color(215, 240, 210)

        elif Color_Num == 5:

            self.set_fill_color(255, 255, 255)

        elif Color_Num == 6:

            if cbs:

                self.set_fill_color(250, 235, 245)

            else:

                self.set_fill_color(230, 210, 230)

        elif Color_Num == 7:

            if cbs:

                self.set_fill_color(250, 200, 220)

            else:

                self.set_fill_color(195, 165, 205)

        elif Color_Num == 8:

            if cbs:

                self.set_fill_color(240, 125, 170)

            else:

                self.set_fill_color(155, 110, 170)

        elif Color_Num == 9:

            if cbs:

                self.set_fill_color(160, 40, 95)

            else:

                self.set_fill_color(120, 40, 130)

            self.set_text_color(255, 255, 255)

        if isd:

            self.set_fill_color(255, 255, 150)

        self.set_font("Courier", print_style, font_size)
        width = len(txt) - 1 + spacer
        high = font_size / 2
        self.Cell(width, high, txt, 0, 0, "C", True)
        self.set_fill_color(255, 255, 255) # return to default background color (white)
        self.set_text_color(0, 0, 0) # return to default text color (black)

    def Print_NEW_Legend(self, IS_THERE_FUNCT_RES, IS_THERE_STRUCT_RES, IS_THERE_INSUFFICIENT_DATA, B_E_METHOD):

        self.set_font("", 'B', 12)
        font_size = 12
        self.ln()
        self.Cell(40, 10, "The conservation scale:", 0, 1)
        self.set_xy(18, self.get_y())
        #self.Print_BackgroundColor('?', "", 12, 0, 4, 1)
        self.Print_BackgroundColor('1', "", 12, 1, 4)
        self.Print_BackgroundColor('2', "", 12, 2, 4)
        self.Print_BackgroundColor('3', "", 12, 3, 4)
        #Average_X = self.get_x()
        self.Print_BackgroundColor('4', "", 12, 4, 4)
        self.Print_BackgroundColor('5', "", 12, 5, 4)
        self.Print_BackgroundColor('6', "", 12, 6, 4)
        self.Print_BackgroundColor('7', "", 12, 7, 4)
        #Conserved_X = self.get_x()
        self.Print_BackgroundColor('8', '', 12, 8, 4)
        self.Print_BackgroundColor('9', '', 12, 9, 4)
        self.ln()
        self.set_font("Times", 'B', 9.5)
        self.Cell(10, 6, "Variable", 0, 0, 'R', False)
        self.set_xy(28, self.get_y())
        self.Cell(15, 6, "Average", 0, 0, 'R', False)
        self.set_xy(53, self.get_y())
        self.Cell(15, 6, "Conserved", 0, 0, 'R', False)
        self.ln()
        self.ln()

        if B_E_METHOD != "no prediction":


            offset_1 = 0
            offset_2 = 0
            if B_E_METHOD == "neural network algorithm":

                offset_1 = 63.5
                offset_2 = 62

            elif B_E_METHOD == "NACSES algorithm":

                offset_1 = 57.5
                offset_2 = 56


            self.Print_ForegroundColor('e', 'B', font_size, 1)
            self.set_xy(offset_1, self.get_y())
            self.Print_ForegroundColor(" - An exposed residue according to the %s." %B_E_METHOD, 'B', font_size, 9)
            self.ln()
            #self.set_y(self.get_y() + self.lasth + 5)
            #self.set_x(self.left_margin)
            self.Print_ForegroundColor('b', 'B', font_size, 2)
            self.set_xy(offset_2, self.get_y())
            self.Print_ForegroundColor(" - A buried residue according to the %s." %B_E_METHOD, 'B', font_size, 9)
            self.ln()

        #self.set_xy(45, self.get_y() + self.lasth + 5)
        #self.y += self.lasth + 5
        #self.x = self.left_margin
        if IS_THERE_FUNCT_RES:

            self.Print_ForegroundColor('f', 'B', font_size, 3)
            self.set_xy(64.5, self.get_y())
            self.Print_ForegroundColor(" - A predicted functional residue (highly conserved and exposed).", 'B', font_size, 9)
            self.ln()
            #self.set_y(self.get_y() + self.lasth + 5)
            #self.x = self.left_margin

        if IS_THERE_STRUCT_RES:

            self.Print_ForegroundColor('s', 'B', font_size, 4)
            self.set_xy(64, self.get_y())
            self.Print_ForegroundColor(" - A predicted structural residue (highly conserved and buried).", 'B', font_size, 9)
            self.ln()
            #self.set_y(self.get_y() + self.lasth + 5)
            #self.x = self.left_margin

        if IS_THERE_INSUFFICIENT_DATA:

            self.Print_BackgroundColor('x', 'B', font_size, 0, 2, 1)
            self.set_xy(58, self.get_y())
            self.Print_ForegroundColor(" - Insufficient data - the calculation for this site was", 'B', font_size, 9)
            self.ln()
            #self.set_y(self.get_y() + self.lasth + 4)
            #self.x = self.left_margin
            self.set_xy(48, self.get_y())
            self.Print_ForegroundColor("     performed on less than 10% of the sequences.",'B', font_size, 9)
            #self.set_y(self.get_y() + self.lasth + 5)
            #self.x = self.left_margin


    def Print_ForegroundColor(self, txt, print_style, font_size, Color, spacer = 2):

        if Color == 1: # orange

            self.set_text_color(255, 153, 0)

        elif Color == 2: # green

            self.set_text_color(0, 204, 0)

        elif Color == 3: # red

            self.set_text_color(255, 0, 0)

        elif Color == 4: # blue

            self.set_text_color(0, 0, 153)

        else: # black

            self.set_text_color(0, 0, 0)

        self.set_font("Courier", print_style, font_size)
        width = len(txt) - 1 + spacer
        high = font_size / 2
        self.Cell(width, high, txt, 0, 0, 'C', True)
        self.set_fill_color(255, 255, 255) # return to default background (white)
        self.set_text_color(0, 0, 0) # return to default text color (black)




def create_pdf_regular_or_cbs(cbs, name):

    #prediction_method = prediction_method.replace('-', ' ')
    pdf = PDF()
    pdf.add_page()
    #pdf.set_font("Times", "B", 20)
    pdf.add_font('DejaVu', '', '/content/dejavu-fonts-ttf-2.37/ttf/DejaVuSans.ttf', uni=True)
    pdf.set_font('DejaVu', '', 20)
    pdf.Cell(0, 0, "ConSurf Results. Date:" + vars['date'], 0, 0, 'C')
    pdf.set_font("Times", "B", 20)
    pdf.set_y(pdf.get_y() + 10)
    pdf.cbs = cbs
    Rows_Pos = 0

    """
    ConSurf_Grades = []
    Protein_Length = 0
    IS_THERE_INSUFFICIENT_DATA = False
    IS_THERE_FUNCT_RES = False
    IS_THERE_STRUCT_RES = False
    try:

        GRADES = open(vars['gradesPE'], 'r')

    except:

        exit_on_error("create_pdf: unable to open the file " + vars['gradesPE'] + " for reading.")

    line = GRADES.readline()
    while line != "":

        words = line.split()
        if len(words) > 5 and words[0].isnumeric():

            Protein_Length += 1
            details = {}

            details["AA"] = words[1]

            if '*' in words[color_column]:

                details["COLOR"] = 0
                IS_THERE_INSUFFICIENT_DATA = True

            else:

                details["COLOR"] = int(words[color_column])

            if B_E_column is not None:

                F_S_column = B_E_column + 1
                if words[B_E_column] == 'b' or words[B_E_column] == 'e':

                    details["B_E"] = words[B_E_column]
                    IS_THERE_STRUCT_RES = True
                    if words[F_S_column] == 'f' or words[F_S_column] == 's':


                        details["F_S"] = words[F_S_column]
                        IS_THERE_FUNCT_RES = True

                    else:

                       details["F_S"] = ""

                else:

                    details["B_E"] = ""
                    details["F_S"] = ""

            else:

                details["B_E"] = ""
                details["F_S"] = ""

            ConSurf_Grades.append(details)

        line = GRADES.readline()

    GRADES.close()
    """
    maxPosPerPage = 600
    for elem in vars['gradesPE_Output']:

        Pos = elem['POS'] - 1
        if vars['B/E']:

            prediction_method = "neural network algorithm"
            B_E = elem['B/E']
            F_S = elem['F/S'].strip()

        else:

            prediction_method = "no prediction"
            B_E = ""
            F_S = ""

        if Pos % maxPosPerPage == 0 and Pos != 0:

            pdf.add_page()

        if Pos % 50 == 0:

            pdf.ln()
            pdf.ln()
            pdf.ln()
            pdf.ln()

            Rows_Pos = pdf.get_y()

        elif Pos % 10 == 0:

            pdf.Print_ForegroundColor("", 'B', 10, 0, 4)

        if Pos % 10 == 0:

            pdf.Print_4_Lines_Element(Rows_Pos, elem['COLOR'], elem['SEQ'], Pos, B_E, 10, elem['ISD'], F_S)

        else:

            pdf.Print_4_Lines_Element(Rows_Pos, elem['COLOR'], elem['SEQ'], "", B_E, 10, elem['ISD'], F_S)

    pdf.ln()
    pdf.ln()
    pdf.ln()
    pdf.ln()
    pdf.Print_NEW_Legend(vars['B/E'], vars['B/E'], vars['insufficient_data'],  prediction_method)

    pdf.output(name)



def conseq_create_output():

    create_gradesPE(vars['gradesPE'])
    create_pdf()
    no_model_view()


def consurf_create_output():


    r4s2pdb = {} # key: poistion in SEQRES/MSA, value: residue name with position in atom (i.e: ALA:22:A)

    if vars['running_mode'] == "_mode_pdb_msa" or vars['running_mode'] == "_mode_pdb_msa_tree" or vars['SEQRES_seq'] != "":

        match_pdb_to_seq(r4s2pdb, vars['seqres_or_msa_seq_with_gaps'], vars['ATOM_seq_with_gaps'], vars['pdb_object'])

    else: # no seqres and no msa

        find_pdb_position(r4s2pdb, vars['pdb_object'])


    identical_chains = find_identical_chains_in_PDB_file(vars['pdb_object'], form['PDB_chain'])

    atom_grades = {}
    create_gradesPE(vars['gradesPE'], r4s2pdb, vars['pdb_file_name'], form['PDB_chain'], vars['Used_PDB_Name'], vars['pdb_object'], identical_chains, vars['cif_or_pdb'], atom_grades)
    replace_TmpFactor_Consurf_Scores(atom_grades, form['PDB_chain'], vars['pdb_file_name'], vars['Used_PDB_Name']) # Create ATOMS section and replace the TempFactor Column with the ConSurf Grades (will create also isd file if relevant)

    create_pdf()

def extract_data_from_MSA():

    #vars['query_string'] = form['msa_SEQNAME']
    #vars['protein_seq_string'] = vars['MSA_query_seq']

    ## mode :  include msa and pdb

    if vars['running_mode'] == "_mode_pdb_msa" or vars['running_mode'] == "_mode_pdb_msa_tree":

        compare_atom_seqres_or_msa("MSA")

def no_MSA():

    # if there is pdb : we compare the atom and seqres
    if vars['running_mode'] ==  "_mode_pdb_no_msa" and ('SEQRES_seq' in vars and len(vars['SEQRES_seq']) > 0):

        # align seqres and pdb sequences
        compare_atom_seqres_or_msa("SEQRES")

    vars['max_homologues_to_display'] = 500

    blast_hash = {}

    try:

        call_mmseqs2()

    except:

        os.chdir(vars['working_dir'])
        exit_on_error("calling mmseqs2 failed.")

    # choosing homologs, create fasta file for all legal homologs
    cd_hit_hash = {}
    #vars['hit_redundancy'] = float(form['MAX_REDUNDANCY']) # Now taken as argument from user #OLD: #CONSURF_CONSTANTS.FRAGMENT_REDUNDANCY_RATE
    vars['hit_overlap'] = 0.1
    #vars['min_num_of_hits'] = GENERAL_CONSTANTS.MINIMUM_FRAGMENTS_FOR_MSA
    vars['low_num_of_hits'] = 10
    vars['HITS_fasta_file'] = "query_homolougs.txt"
    vars['HITS_rejected_file'] = vars['job_name'] + "_rejected_homolougs.txt"

    choose_homologoues_from_search_with_lower_identity_cutoff(form['Homolog_search_algorithm'], len(vars['protein_seq_string']), vars['hit_redundancy'], vars['hit_overlap'], vars['hit_min_length'], float(form['MIN_IDENTITY']), vars['min_num_of_hits'], vars['BLAST_out_file'], vars['HITS_fasta_file'], vars['HITS_rejected_file'], blast_hash, form['DNA_AA'])

    vars['cd_hit_out_file'] = "query_cdhit.out"
    vars['unique_seqs'] = cluster_homologoues(cd_hit_hash)
    LOG.write("num_of_unique_seq: %d\n" %vars['unique_seqs'])
    add_sequences_removed_by_cd_hit_to_rejected_report(vars['cd_hit_out_file'] + ".clstr", vars['HITS_rejected_file'], vars['num_rejected_homologs'])
    choose_final_homologoues(blast_hash, cd_hit_hash, float(form['MAX_NUM_HOMOL']) -1, form['best_uniform_sequences'], vars['FINAL_sequences'], vars['HITS_rejected_file'], vars['num_rejected_homologs'])
    vars['zip_list'].append(vars['HITS_rejected_file'])

    if form['DNA_AA'] == "Nuc":

        # convert rna to dna

        LOG.write("convert_rna_to_dna(%s, %s)\n" %(vars['FINAL_sequences'], vars['FINAL_sequences'] + ".dna"))
        ans = convert_rna_to_dna(vars['FINAL_sequences'], vars['FINAL_sequences'] + ".dna")
        if ans[0] == "OK":

            vars['FINAL_sequences'] += ".dna"
            LOG.write("Seqs with u or U: " + str(ans[1]))
            for seq in ans[1]:

                print("Warnning: The seqeunce '" + seq + "' contains a 'U' replaced by 'T'")

        else:

            exit_on_error('sys_error', ans)

    LOG.write("make_sequences_file_HTML(%s, %s)\n" %(vars['FINAL_sequences'], vars['FINAL_sequences_html']))
    make_sequences_file_HTML(vars['FINAL_sequences'], vars['FINAL_sequences_html'])

    # we save to copies of the msa, one in fasta format and another in clustal format.
    #vars['msa_fasta'] = "msa_fasta.aln"
    #vars['msa_clustal'] = "msa_clustal.aln"
    create_MSA()
    vars['msa_SEQNAME'] = vars['query_string']

    print("%d homologues were collected." %vars['number_of_homologoues'])
    create_download_link(vars['FINAL_sequences_html'], "These %d are sequences used for creating the MSA." %vars['final_number_of_homologoues'])
    create_download_link(vars['HITS_rejected_file'], "Download the list of rejected homologues")

def call_mmseqs2():

    os.chdir(vars['root_dir'])

    msa_mode = "mmseqs2_uniref"
    pair_mode = "unpaired_paired"
    pairing_strategy = "greedy"
    result_dir = vars['job_name']

    csv_file = "/content/%s/%s.csv" %(vars['job_name'], vars['job_name'])

    CSV = open(csv_file, 'w')

    CSV.write("id,sequence\n%s,%s" %(vars['job_name'], vars['protein_seq_string']))
    CSV.close()

    result_dir = Path(result_dir)

    get_msa_and_templates(vars['job_name'], vars['protein_seq_string'], None, result_dir, msa_mode, False, None, pair_mode, pairing_strategy, 'https://api.colabfold.com', 'colabfold/google-colab-main')

    os.chdir(vars['working_dir'])


def extract_data_from_model():



    if vars['cif_or_pdb'] == "pdb":

        vars['pdb_object'] = pdbParser()

    else:

        vars['pdb_object'] = cifParser()

    vars['pdb_object'].read(vars['pdb_file_name'], form['PDB_chain'], form['DNA_AA'])


    #[vars['SEQRES_seq'], vars['ATOM_seq'], vars['ATOM_without_X_seq']] = get_seqres_atom_seq(vars['pdb_object'], form['PDB_chain'], vars['pdb_file_name'])
    vars['SEQRES_seq'] = vars['pdb_object'].get_SEQRES()
    All_atoms = vars['pdb_object'].get_ATOM_withoutX()
    if form['PDB_chain'] in All_atoms:

        vars['ATOM_without_X_seq'] = All_atoms[form['PDB_chain']]

    else:

        exit_on_error('user_error', "The chain is not in the PDB. Select the PDB chain using the flag --chain")

    analyse_seqres_atom()

    try:

        FAS = open(vars['protein_seq'], 'w')

    except:

        exit_on_error('sys_error',"cannot open the file " + vars['protein_seq'] + " for writing!")

    # we write the sequence to a fasta file for the homologues search
    # we save the name of the quey string for rate4site
    if vars['SEQRES_seq'] == "":

        vars['query_string'] = "Input_seq_ATOM_" + form['PDB_chain']
        vars['protein_seq_string'] = vars['ATOM_without_X_seq']
        FAS.write(">" + vars['query_string'] + "\n" + vars['ATOM_without_X_seq'])

    else:

        vars['query_string'] = "Input_seq_SEQRES_" + form['PDB_chain']
        vars['protein_seq_string'] = vars['SEQRES_seq']
        FAS.write(">" + vars['query_string'] + "\n" + vars['SEQRES_seq'])

    FAS.close()


def create_cd_hit_output(input_file, output_file, cutoff, ref_cd_hit_hash, type):


    seq = ""
    seq_name = ""
    cmd = ""
    n = 0

    # running cd-hit

    if type == "AA":

        cmd += "%scd-hit -i %s -o %s " %(vars['cd_hit_dir'], input_file, output_file)
        if cutoff > 0.7 and cutoff < 1:

            n = 5

        elif cutoff > 0.6 and cutoff <= 0.7:

            n = 4

        elif cutoff > 0.5 and cutoff <= 0.6:

            n = 3

        elif cutoff > 0.4 and cutoff <= 0.5:

            n = 2

    else:

        # DNA
        cmd += "cd-hit-est -i %s -o %s " %(input_file, output_file)
        if cutoff > 0.9 and cutoff < 1:

            n = 8

        elif cutoff > 0.88 and cutoff <= 0.9:

            n = 7

        elif cutoff > 0.85 and cutoff <= 0.88:

            n = 6

        elif cutoff > 0.8 and cutoff <= 0.85:

            n = 5

        elif cutoff > 0.75 and cutoff <= 0.8:

            n = 4

    cmd += "-c %f -n %d -d 0" %(cutoff, n)

    submit_job_to_Q("CD-HIT", cmd)

    if not os.path.exists(output_file) or os.path.getsize(output_file) == 0:

        exit_on_error("sys_error", "create_cd_hit_output : " + str(cmd) + ": CD-HIT produced no output!\n")

    num_cd_hits = 0

    try:

        CDHIT_OUTPUT = open(output_file, 'r')

    except:

        exit_on_error('sys_error', "create_cd_hit_output : could not open the file " + output_file + " for writing.")

    # inserting chosen homologues to a hash
    line = CDHIT_OUTPUT.readline()
    seq_name = ""
    seq = ""
    while line != "":

        line = line.strip()
        if line[0] == ">":

            seq_name = line[1:]

        else:

            seq = line
            if not seq_name in ref_cd_hit_hash:

                num_cd_hits += 1
                ref_cd_hit_hash[seq_name] =seq

        line = CDHIT_OUTPUT.readline()

    CDHIT_OUTPUT.close()
    return num_cd_hits



def make_sequences_file_HTML(plain_txt_sequences, HTML_sequences):

    try:

        HTML_SEQUENCES = open(HTML_sequences, 'w')

    except:

        exit_on_error('sys_error', "make_sequences_file_HTML : cannot open the file " + HTML_sequences + " for writing.")

    try:

        TXT_SEQUENCES = open(plain_txt_sequences, 'r')

    except:

        exit_on_error('sys_error', "make_sequences_file_HTML : cannot open the file " + plain_txt_sequences + " for reading.")

    counter = 1
    line = TXT_SEQUENCES.readline()
    while line != "":

        line = line.strip()
        if line == "":

            line = TXT_SEQUENCES.readline()
            continue

        if line[0] != ">":

            counter += 1
            HTML_SEQUENCES.write("<FONT FACE=\"courier new\" SIZE=3>" + line + "</FONT><BR>\n")

        else:

            line = line[1:]
            if line[:9] == "Input_seq":

                HTML_SEQUENCES.write("<FONT FACE=\"courier new\" SIZE=3>>%d_%s</FONT><BR>\n" %(counter, line))

            else:

                name = line.split("|")[0]
                HTML_SEQUENCES.write("<FONT FACE=\"courier new\" SIZE=3><A HREF=\"https://www.uniprot.org/uniref/%s\">>%d_%s</A></FONT><BR>\n" %(name, counter, line))

        line = TXT_SEQUENCES.readline()

    HTML_SEQUENCES.close()
    TXT_SEQUENCES.close()


def convert_rna_to_dna(Seqs, Seqs_dna):

    # replace the u with t and return the sequences names replaced

    try:

        OUT = open(Seqs_dna, 'w')

    except:

        return("convert_rna_to_dna: Can't open file " + Seqs_dna + " for writing.")

    try:

        SEQS = open(Seqs, 'r')

    except:

        return("convert_rna_to_dna: Can't open file " + Seqs + " for reading.")

    Seqs_Names = []
    seq_name = ""

    line = SEQS.readline()
    while line != "":

        line = line.rstrip()
        match1 = re.match(r'^>(.*)', line)
        if match1:

            seq_name = match1.group(1)


        elif 'u' in line or 'U' in line:

            Seqs_Names.append(seq_name)
            line = line.replace('u', 't')
            line = line.replace('U', 'T')

        OUT.write(line + "\n")
        line = SEQS.readline()

    OUT.close()
    SEQS.close()

    return("OK", Seqs_Names)


def create_pdf():

    create_pdf_regular_or_cbs(True, vars['Colored_Seq_PDF'])
    create_pdf_regular_or_cbs(False, vars['Colored_Seq_CBS_PDF'])


def create_pymol(input, prefix):

    cmd = "pymol -qc " + input + " -d \"run " + vars['pymol_color_script_isd'] + "\"\n"
    cmd += "pymol -qc " + input + " -d \"run " + vars['pymol_color_script_CBS_isd'] + "\"\n"

    LOG.write("create_pymol : %s\n" %cmd)
    submit_job_to_Q("PYMOL", cmd)

    pymol_session = "consurf_pymol_session.pse"
    pymol_session_CBS = "consurf_CBS_pymol_session.pse"

    if os.path.exists(pymol_session) and os.path.getsize(pymol_session) != 0:

        os.chmod(pymol_session, 0o664)
        os.rename(pymol_session, prefix + pymol_session)
        vars['zip_list'].append(prefix + pymol_session)

    if os.path.exists(pymol_session_CBS) and os.path.getsize(pymol_session_CBS) != 0:

        os.chmod(pymol_session_CBS, 0o664)
        os.rename(pymol_session_CBS, prefix + pymol_session_CBS)
        vars['zip_list'].append(prefix + pymol_session_CBS)


def create_chimera(input, prefix):

    run_chimera(input, prefix + "consurf_chimerax_session.cxs", vars['chimera_color_script'])
    run_chimera(input, prefix + "consurf_CBS_chimerax_session.cxs", vars['chimera_color_script_CBS'])

def run_chimera(input, output, script):

    cmd = "chimerax --nogui --script '%s %s %s' --exit\n" %(script, input, output)
    LOG.write("create_chimera : %s\n" %cmd)
    submit_job_to_Q("CHIMERA", cmd)

    vars['zip_list'].append(output)

"""
def check_msa_tree_match(ref_msa_seqs):

    ref_tree_nodes = []
    check_validity_tree_file(ref_tree_nodes)
    LOG.write("check_msa_tree_match : check if all the nodes in the tree are also in the MSA\n")

    for node in ref_tree_nodes:

        if not node in ref_msa_seqs:

            exit_on_error('user_error', "The uploaded tree file is inconsistant with the uploaded MSA file. The node '" + node + "' is found in the tree file, but there is no sequence in the MSA file with that exact name. Note that the search is case-sensitive!")

    LOG.write("check_msa_tree_match : check if all the sequences in the MSA are also in the tree\n")

    for seq_name in ref_msa_seqs: #check that all the msa nodes are in the tree

        if not seq_name in ref_tree_nodes:

            exit_on_error('user_error', "The uploaded MSA file is inconsistant with the uploaded tree file. The Sequence name '" + seq_name + "' is found in the MSA file, but there is no node with that exact name in the tree file. Note that the search is case-sensitive!")

    vars['unique_seqs'] = len(ref_msa_seqs)
    LOG.write("There are " + str(vars['unique_seqs']) + " in the MSA.\n")
"""

def get_info_from_msa(seq_names):

    # returns the name of the query sequence and fills the input array with the names of the sequences
    query_seq = ""
    msa_format = check_msa_format(vars['user_msa_file_name'])
    try:

        alignment = AlignIO.read(vars['user_msa_file_name'], msa_format)

    except:

        exit_on_error('sys_error', "get_info_from_msa : can't open the file " + vars['user_msa_file_name'] + " for reading.")

    try:

        MSA = open(vars['working_dir'] + vars['msa_fasta'], 'w')

    except:

        exit_on_error('sys_error', "get_info_from_msa : can't open the file " + vars['msa_fasta'] + " for writing.")

    for record in alignment:

        new_seq_name = str(record.id)
        if new_seq_name in seq_names:

            exit_on_error('user_error', "The sequence %s appears more than once in the MSA." %new_seq_name)


        seq_names.append(new_seq_name)

        seq = str(record.seq)
        if form['msa_SEQNAME'] == new_seq_name:

            query_seq = seq


        if form['DNA_AA'] == "Nuc" and ('u' in seq or 'U' in seq):

            seq = seq.replace('u', 't')
            seq = seq.replace('U', 'T')
            print("Warnning: The seqeunce '" + new_seq_name + "' contains a 'U' replaced by 'T'")

        MSA.write(">%s\n%s\n" %(new_seq_name, seq))

    MSA.close()


    num_of_seq = len(seq_names)
    vars['unique_seqs'] = num_of_seq
    vars['final_number_of_homologoues'] = num_of_seq
    LOG.write("MSA contains " + str(num_of_seq) + " sequences\n")
    if num_of_seq < 5:

        exit_on_error('user_error',"The MSA file contains only " + str(num_of_seq) + " sequences. The minimal number of homologues required for the calculation is 5.")

    query_seq = query_seq.replace("-", "")
    query_seq = query_seq.upper()

    if query_seq == "":

        exit_on_error('user_error', "The query sequence is not in the msa. Please choose the name of the query sequence by adding the flag --query")

    vars['msa_SEQNAME'] = form['msa_SEQNAME']
    #vars['query_string'] = form['msa_SEQNAME']
    vars['MSA_query_seq'] = query_seq
    """
    # there is no input seq, use msa seq instead
    vars['protein_seq_string'] = vars['MSA_query_seq']
    try:

        QUERY_FROM_MSA = open(vars['working_dir'] + vars['protein_seq'], 'w')

    except:

        exit_on_error('sys_error', "get_info_from_msa : Could not open %s for writing." %vars['protein_seq'])

    QUERY_FROM_MSA.write(">" + form['msa_SEQNAME'] + "\n")
    QUERY_FROM_MSA.write(vars['MSA_query_seq'] + "\n")
    QUERY_FROM_MSA.close()
    """


def create_pipe_file(pipeFile, pipeFile_CBS, seq3d_grades, seq3d_grades_isd, isd_residue_color_ArrRef, no_isd_residue_color_ArrRef, pdb_file_name, user_chain, IN_pdb_id_capital, identical_chains, pdb_object):

    # CREATE PART of PIPE
    partOfPipe = "partOfPipe"
    partOfPipe_CBS = "partOfPipe_CBS"

    length_of_seqres = pdb_object.get_num_known_seqs()
    length_of_atom = pdb_object.get_num_known_atoms()

    LOG.write("create_part_of_pipe_new(%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)\n" %(partOfPipe, vars['unique_seqs'], "user_DB", "seq3d_grades_isd", "seq3d_grades", length_of_seqres, length_of_atom, "isd_residue_color_ArrRef", "no_isd_residue_color_ArrRef", form['E_VALUE'], form['ITERATIONS'], form['MAX_NUM_HOMOL'], form['MSAprogram'], form['ALGORITHM'], form['SUB_MATRIX'], "legacy"))
    create_part_of_pipe_new(partOfPipe, vars['unique_seqs'], "user_DB", seq3d_grades_isd, seq3d_grades, length_of_seqres, length_of_atom, isd_residue_color_ArrRef, no_isd_residue_color_ArrRef, form['E_VALUE'], form['ITERATIONS'], form['MAX_NUM_HOMOL'], form['MSAprogram'], form['ALGORITHM'], form['SUB_MATRIX'], vars['Average pairwise distance'], "legacy")


    # create the color blind friendly version
    create_part_of_pipe_new(partOfPipe_CBS, vars['unique_seqs'], "user_DB", seq3d_grades_isd, seq3d_grades, length_of_seqres, length_of_atom, isd_residue_color_ArrRef, no_isd_residue_color_ArrRef, form['E_VALUE'], form['ITERATIONS'], form['MAX_NUM_HOMOL'], form['MSAprogram'], form['ALGORITHM'], form['SUB_MATRIX'], vars['Average pairwise distance'], "cb")


    LOG.write("extract_data_from_pdb(%s)\n" %pdb_file_name)
    header_pipe = extract_data_from_pdb(pdb_file_name)


    # GET THE FILE NAMES
    msa_filename = ""
    msa_query_seq_name = ""
    if vars['user_msa_file_name'] is not None:

        msa_filename = vars['user_msa_file_name']
        msa_query_seq_name = form['msa_SEQNAME']

    tree_filename = ""
    if form['tree_name'] is not None:

        tree_filename = vars['tree_file']

    # GET THE CURRENT TIME
    completion_time = str(datetime.now().time())
    run_date = str(datetime.now().date())

    # USE THE CREATED PART of PIPE to CREATE ALL THE PIPE TILL THE PDB ATOMS (DELETE THE PART PIPE)
    LOG.write("create_consurf_pipe_new(%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)\n" %(vars['working_dir'], IN_pdb_id_capital, user_chain, "header_pipe", pipeFile, identical_chains, partOfPipe, vars['working_dir'], form['Run_Number'], msa_filename, msa_query_seq_name, tree_filename, vars['submission_time'], completion_time, run_date))
    create_consurf_pipe_new(vars['working_dir'], IN_pdb_id_capital, user_chain, header_pipe, pipeFile, identical_chains, partOfPipe, vars['working_dir'], form['Run_Number'], msa_filename, msa_query_seq_name, tree_filename, vars['submission_time'], completion_time, run_date)

    # USE THE CREATED PART of PIPE to CREATE ALL THE PIPE TILL THE PDB ATOMS (DELETE THE PART PIPE) - Color friendly version
    LOG.write("create_consurf_pipe_new(%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)\n" %(vars['working_dir'], IN_pdb_id_capital, user_chain, "header_pipe", pipeFile_CBS, identical_chains, partOfPipe_CBS, vars['working_dir'], form['Run_Number'], msa_filename, msa_query_seq_name, tree_filename, vars['submission_time'], completion_time, run_date))
    create_consurf_pipe_new(vars['working_dir'], IN_pdb_id_capital, user_chain, header_pipe, pipeFile_CBS, identical_chains, partOfPipe_CBS, vars['working_dir'], form['Run_Number'], msa_filename, msa_query_seq_name, tree_filename, vars['submission_time'], completion_time, run_date)




    # Add the PDB data to the pipe
    LOG.write("add_pdb_data_to_pipe(%s, %s)\n" %(pdb_file_name, pipeFile))
    add_pdb_data_to_pipe(pdb_file_name, pipeFile)


    # Add the PDB data to the pipe - color blind version
    LOG.write("add_pdb_data_to_pipe(%s, %s)\n" %(pdb_file_name, pipeFile_CBS))
    add_pdb_data_to_pipe(pdb_file_name, pipeFile_CBS)


def compare_atom_to_query(Query_seq, ATOM_seq, pairwise_aln, PDB_Name):

    # in case there are both seqres and atom fields, checks the similarity between the 2 sequences.

    [first_seq, second_seq, score] = pairwise_alignment(Query_seq, ATOM_seq, pairwise_aln, "QUERY")
    return(first_seq, second_seq)


def upload_protein_sequence():


    if os.path.exists(vars['uploaded_Seq']) and os.path.getsize(vars['uploaded_Seq']) != 0: # file fasta uploaded

        try:

            UPLOADED = open(vars['uploaded_Seq'], 'r')

        except:

            exit_on_error('sys_error', "upload_protein_sequence : Cannot open the file " + vars['protein_seq'] + "for writing!")

        protein_seq_string = UPLOADED.read()
        UPLOADED.close()

        if protein_seq_string.count('>') > 1:

            exit_on_error('user_error', "The protein input <a href = \"%s\">sequence</a> contains more than one FASTA sequence. If you wish to upload MSA, please upload it as a file." %protein_seq_string)

        # delete sequence name and white spaces
        protein_seq_string = re.sub(r'>.*\n', "", protein_seq_string)
        protein_seq_string = re.sub(r'\s', "", protein_seq_string)

    else:

        exit_on_error('sys_error', 'upload_protein_sequence : no user sequence.')

    # we write the sequence to a file for the homologues search
    try:

        UPLOADED = open(vars['working_dir'] + vars['protein_seq'], 'w')

    except:

        exit_on_error('sys_error', "upload_protein_sequence : Cannot open the file " + vars['protein_seq'] + "for writing!")

    UPLOADED.write(">Input_seq\n" + protein_seq_string)
    UPLOADED.close()

    amino_acids = ["A", "C", "D", "E", "F", "G", "H", "I", "K", "L", "M", "N", "P", "Q", "R", "S", "T", "V", "W", "Y", "X"]
    nucleic_acids = ["A", "C", "G", "T", "U", "N"]
    if form['DNA_AA'] == "AA":

        amino = False
        for char in protein_seq_string:

            if not char.upper() in amino_acids:

                exit_on_error('user_error', "The input sequence contains the illegal character %s." %char)

            elif not char.upper() in nucleic_acids:

                amino = True

        if not amino:

            exit_on_error('user_error',"It seems that the protein input is only composed of Nucleotides (i.e. :A,T,C,G). Please note that you chose to run the server based on amino acids sequnce and not DNA / RNA sequence.<br />You may translate your sequence to amino acids and resubmit your query, or alternatively choose to analyze nucleotides.<br />")

    else:


        for char in protein_seq_string:

            if not char.upper() in amino_acids:

                exit_on_error('user_error', "The input sequence contains the illegal character %s." %char)

            elif not char.upper() in nucleic_acids:

                exit_on_error('user_error',"It seems that the input sequence contains the Amino Acid %s. Please note that you chose to run the server based on nucleotides sequnce and not protein sequence.<br />You may resubmit your query and choose to analyze Amino Acids.<br />" %char)

    vars['protein_seq_string'] = protein_seq_string
    vars['query_string'] = "Input_seq" # name of the sequence is saved for rate4site


def array_to_string(array):

    string = ""
    for word in array:

        string += " " +word

    return string


def zip_all_outputs():

    zipObj = ZipFile(vars['All_Outputs_Zip'], 'w')

    for file in vars['zip_list']:

        if os.path.exists(file):

            zipObj.write(file)

    zipObj.close()
    create_download_link(vars['All_Outputs_Zip'], "Download zip with all the files.")



def find_identical_chains_in_PDB_file(pdb_Object, query_chain):

    # Looking in the PDB for chains identical to the original chain
    ATOMS = pdb_Object.get_ATOM_withoutX()

    # string with identical chains
    identicalChains = query_chain

    # looking for chains identical to the original chain
    for chain in ATOMS:

        if query_chain != chain:

            other_seq = ATOMS[chain]
            query_seq = ATOMS[query_chain]
            chain_length = len(other_seq)
            OrgChain_length = len(query_seq)

            # if length not similar, skip
            if min(OrgChain_length, chain_length)/max(OrgChain_length, chain_length) <= 0.9:

                continue

            # compare the two chains
            try:

                if pairwise_alignment(other_seq, query_seq) > 0.95:

                    identicalChains += " " + chain

            except:

                LOG.write("find_identical_chains_in_PDB_file: Error comparing the chains %s and %s\n" %(query_chain, chain))

    return identicalChains



def replace_TmpFactor_Rate4Site_Scores(chain, pdb_file, gradesPE, pdb_file_with_score_at_TempFactor):

    # This will create a PDB file that contains the Rate4Site scores instead of the TempFactor Column

    Rate4Site_Grades = {}
    LOG.write("read_Rate4Site_gradesPE(%s, %s)\n" %(gradesPE, str(Rate4Site_Grades)))
    read_Rate4Site_gradesPE(gradesPE, Rate4Site_Grades)

    if vars['cif_or_pdb'] == "pdb":

        LOG.write("replace_TmpFactor_Rate4Site_Scores_PDB(%s, %s, %s, %s)\n" %(pdb_file, chain, "Rate4Site_Grades", pdb_file_with_score_at_TempFactor))
        replace_TmpFactor_Rate4Site_Scores_PDB(pdb_file, chain, Rate4Site_Grades, pdb_file_with_score_at_TempFactor)

    else:

        LOG.write("replace_TmpFactor_Rate4Site_Scores_CIF(%s, %s, %s, %s)\n" %(pdb_file, chain, "Rate4Site_Grades", pdb_file_with_score_at_TempFactor))
        replace_TmpFactor_Rate4Site_Scores_CIF(pdb_file, chain, Rate4Site_Grades, pdb_file_with_score_at_TempFactor)



def replace_TmpFactor_Consurf_Scores(atom_grades, chain, pdb_file, prefix):

    # This Will create a File containing the ATOMS records with the ConSurf grades instead of the TempFactor column

    if vars['cif_or_pdb'] == "pdb":

        LOG.write("replace_TmpFactor_Consurf_Scores_PDB(atom_grades, %s, %s, %s);\n" %(chain, pdb_file, prefix))
        replace_TmpFactor_Consurf_Scores_PDB(atom_grades, chain, pdb_file, prefix)

    else:
        LOG.write("replace_TmpFactor_Consurf_Scores_CIF(atom_grades, %s, %s, %s);\n" %(chain, pdb_file, prefix))
        replace_TmpFactor_Consurf_Scores_CIF(atom_grades, chain, pdb_file, prefix)


def install_rate4site(rate4site_dir, rate4site_slow_dir):

    LOG.write("Installing rate4site.\n")

    # create directory for rate4site
    submit_job_to_Q("download_rate4site", "git clone https://github.com/barakav/r4s_for_collab.git")

    # create directory for rate4site slow
    shutil.copytree(rate4site_dir, rate4site_slow_dir)

    # make rate4site
    submit_job_to_Q("install_rate4site", "cd %s\nmake\nchmod 755 rate4site" %rate4site_dir)

    # change the make file
    os.remove(rate4site_slow_dir + "Makefile")
    os.rename(rate4site_slow_dir + "Makefile_slow", rate4site_slow_dir + "Makefile")

    # make rate4site
    submit_job_to_Q("install_rate4site", "cd %s\nmake\nchmod 755 rate4site" %rate4site_slow_dir)


import os
import torch
import numpy as np
from Bio import SeqIO
from transformers import BertModel, BertTokenizer


def run_rate4site_PLM():
    print("Bypassing Rate4Site. Initializing AI Protein Language Model (ProtBert)...")

    vars['r4s_log'] = "r4s.log"
    vars['r4s_out'] = "r4s.res"
    vars['gradesPE_Output'] = [] # Initialize early

    try:
        # 1. Extract Sequence
        records = list(SeqIO.parse(vars['msa_fasta'], "fasta"))
        query_seq_raw = str(records[0].seq).replace("-", "")
        query_seq_spaced = " ".join(list(query_seq_raw))

        # 2. Model Inference
        tokenizer = BertTokenizer.from_pretrained("Rostlab/prot_bert", do_lower_case=False)
        model = BertModel.from_pretrained("Rostlab/prot_bert", output_attentions=True)

        inputs = tokenizer(query_seq_spaced, return_tensors="pt")
        with torch.no_grad():
            outputs = model(**inputs)

        # 3. Handle Attention & Slicing
        # ProtBert adds [CLS] at index 0 and [SEP] at the end. We must remove them.
        attentions = outputs.attentions
        last_layer_attn = attentions[-1].squeeze(0).mean(dim=0)
        importance_scores = last_layer_attn.sum(dim=0).numpy()

        # SLICE: Remove the first and last tokens to match original sequence length
        importance_scores = importance_scores[1:-1]

        # 4. Normalize (1-9)
        s_min, s_max = importance_scores.min(), importance_scores.max()
        consurf_style_scores = 1 + 8 * (importance_scores - s_min) / (s_max - s_min)

        # 5. Populate gradesPE_Output with every single expected legacy key
        for i, score in enumerate(consurf_style_scores):
            grade = int(np.clip(round(score), 1, 9))
            residue_letter = query_seq_raw[i]

            entry = {
                'POS': i + 1,                  # For PDB mapping
                'SEQ': residue_letter,         # For the grades file
                'RES': residue_letter,         # For logging
                'AA': residue_letter,          # Alias
                'SEQ_POS': i + 1,              # For the diversity matrix
                'GRADE': grade,                # 1-9 integer
                'SCORE': round(float(score), 3),
                'COLOR': grade,
                'INTERVAL_LOW': round(float(score), 3),
                'INTERVAL_HIGH': round(float(score), 3),
                'INTERVAL_LOW_COLOR': grade,
                'INTERVAL_HIGH_COLOR': grade,
                'MSA_NUM': 100,
                'MSA_DENUM': '100',            # <--- FIXES KeyError: 'MSA_DENUM'
                'B/E': ' ',                    # <--- Prevents future KeyError: 'B/E'
                'F/S': ' ',                    # <--- Prevents future KeyError: 'F/S'
                'ISD': 0,                      # 0 means "Insufficient Data" flag is OFF
                'MSA_DATA': 'AI_PREDICTED'
            }
            vars['gradesPE_Output'].append(entry)

        # 6. File Mocking
        with open(vars['r4s_out'], "w") as f:
            f.write("# POS  RES  SCORE\n")
            for item in vars['gradesPE_Output']:
                f.write(f"{item['SEQ_POS']}  {item['RES']}  {item['SCORE']}\n")

        with open(vars['r4s_log'], "w") as f:
            f.write("AI-Inference successful.")

        if not os.path.exists(vars['tree_file']):
            with open(vars['tree_file'], "w") as f:
                f.write("(Query_AI:0.00001);")

        print(f"Success! Populated gradesPE_Output with {len(vars['gradesPE_Output'])} scores.")

    except Exception as e:
        print(f"AI Optimization failed: {e}")
        # If this fails, the downstream code will still see an empty list and error out,
        # which is better than proceeding with "garbage" data.



def run_rate4site_oldkevin():

    rate4site_dir = "/content/r4s_for_collab/"
    rate4site_slow_dir = "/content/r4s_for_collab_slow/"
    vars['r4s_log'] = "r4s.log" # log file
    vars['r4s_out'] = "r4s.res" # output file

    #install_rate4site(rate4site_dir, rate4site_slow_dir)

    MatrixHash = {'JTT' : '-Mj', 'MTREV' : '-Mr', 'CPREV' : '-Mc', 'WAG' : '-Mw', 'DAYHOFF' : '-Md', 'T92' : '-Mt', 'HKY' : '-Mh', 'GTR' : '-Mg', 'JC_NUC' : '-Mn', 'JC_AA' : '-Ma', 'LG' : '-Ml'}

    params = "rate4site -a '%s' -s %s -zn %s -bn -o %s -v 9" %(vars['query_string'], vars['msa_fasta'], MatrixHash[(form['SUB_MATRIX']).upper()], vars['r4s_out'])
    if vars['running_mode'] == "_mode_pdb_msa_tree" or vars['running_mode'] == "_mode_msa_tree":

        params += " -t %s" %vars['tree_file']

    if form['ALGORITHM'] == "Bayes":

        params +=  " -ib"

    else:

        params +=  " -im"

    r4s_comm = rate4site_dir + params + " -l " + vars['r4s_log']

    LOG.write("run_rate4site : running command: %s\n" %r4s_comm)
    print("The conservation scores are being calculated. Please wait.")
    submit_job_to_Q("rate4site", r4s_comm)

    # if the run failed - we rerun using the slow verion
    if check_if_rate4site_failed(vars['r4s_log']):

        LOG.write("run_rate4site : The run of rate4site failed. Sending warning message to output.\nThe same run will be done using the SLOW version of rate4site.\n")
        #print("Warning: The given MSA is very large, therefore it will take longer for ConSurf calculation to finish. The results will be sent to the e-mail address provided.<br>The calculation continues nevertheless.")
        vars['r4s_log'] = "r4s_slow.log"
        r4s_comm = rate4site_slow_dir + params + " -l " + vars['r4s_log']
        LOG.write("run_rate4site : running command: %s\n" %str(r4s_comm))
        submit_job_to_Q("rate4siteSlow", r4s_comm)

        if check_if_rate4site_failed(vars['r4s_log']):

            exit_on_error('sys_error', "Both rate4site and rate4site slow failed.")

    extract_diversity_matrix_info(vars['r4s_log'])

    if not os.path.exists(vars['tree_file']):

        os.rename("TheTree.txt", vars['tree_file'])

def run_rate4site_old():

    rate4s = vars['script_dir'] + "/rate4site_bioseq/rate4site"
    rate4s_ML = vars['script_dir'] + "/rate4site_bioseq/rate4site.24Mar2010"
    rate4s_slow = vars['script_dir'] + "/rate4site_bioseq/rate4site.doubleRep"

    vars['r4s_log'] = "r4s.log" # log file
    vars['r4s_out'] = "r4s.res" # output file
    MatrixHash = {'JTT' : '-Mj', 'MTREV' : '-Mr', 'CPREV' : '-Mc', 'WAG' : '-Mw', 'DAYHOFF' : '-Md', 'T92' : '-Mt', 'HKY' : '-Mh', 'GTR' : '-Mg', 'JC_NUC' : '-Mn', 'JC_AA' : '-Ma', 'LG' : '-Ml'}

    params = " -a '%s' -s %s -zn %s -bn -o %s" %(vars['query_string'], vars['msa_fasta'], MatrixHash[(form['SUB_MATRIX']).upper()], vars['r4s_out'])
    if vars['running_mode'] == "_mode_pdb_msa_tree" or vars['running_mode'] == "_mode_msa_tree":

        params += " -t %s" %vars['tree_file']

    if form['ALGORITHM'] == "Bayes":

        params += " -ib -n 32 -v 9"
        r4s_comm = rate4s + params

    else:

        params += " -im -v 9"
        r4s_comm = rate4s_ML + params

    r4s_comm += " -l " + vars['r4s_log']
    LOG.write("run_rate4site : running command: %s\n" %r4s_comm)
    submit_job_to_Q("rate4site", r4s_comm)

    # if the run failed - we rerun using the slow verion
    if check_if_rate4site_failed(vars['r4s_log']):

        vars['r4s_log'] = "r4s_slow.log"
        LOG.write("run_rate4site : The run of rate4site failed. Sending warning message to output.\nThe same run will be done using the SLOW version of rate4site.\n")
        print("Warning: The given MSA is very large, therefore it will take longer for ConSurf calculation to finish. The results will be sent to the e-mail address provided.<br>The calculation continues nevertheless.")
        r4s_comm = rate4s_slow + params + " -l " + vars['r4s_log']
        LOG.write("run_rate4site : running command: %s\n" %r4s_comm)
        submit_job_to_Q("rate4siteSlow", r4s_comm)

        if check_if_rate4site_failed(vars['r4s_log']):

            exit_on_error('sys_error', "Both rate4site and rate4site slow failed.")

    extract_diversity_matrix_info(vars['r4s_log'])

def find_best_substitution_model():

    try:

        # convert fasta to phylip
        msa_phy_filepath = "input_msa.phy"
        #convert_msa_format(vars['msa_fasta'], "fasta", msa_phy_filepath, "phylip-relaxed")
        AlignIO.convert(vars['msa_fasta'], "fasta", msa_phy_filepath, "phylip-relaxed")
        os.chmod(vars['msa_fasta'], 0o644)
        os.chmod(msa_phy_filepath, 0o644)

        if form['DNA_AA'] == "Nuc":

            run_jmt(msa_phy_filepath)

        else:

            run_prottest(msa_phy_filepath)

    except:

        vars['best_fit'] = "model_search_failed"
        form['SUB_MATRIX'] = "JTT"
        print("The evolutionary model search has failed. The JTT model is chosen by default.")


def run_prottest(msa_file_path):

    output_file_path = vars['job_name'] + "_model_selection.txt"
    cmd = "java -jar %s -log disabled -i %s -AICC -o %s -S 1 -JTT -LG -MtREV -Dayhoff -WAG -CpREV -threads 1" %(vars['prottest'], msa_file_path, output_file_path)
    submit_job_to_Q("protest", cmd)
    LOG.write("run_protest: %s\n" %cmd)

    f = open(output_file_path, 'r')

    match = re.search(r"(?<=Best model according to AICc: ).*", f.read())
    f.close()
    if match:

        vars['best_fit'] = "model_found"
        model = match.group()
        model = model.strip('()')
        print("The best evolutionary model was selected to be: " + model)
        create_download_link(output_file_path, "See details")
        vars['zip_list'].append(output_file_path)

    else:

        vars['best_fit'] = "model_search_failed"
        model = "JTT"
        print("The evolutionary model search has failed. The JTT model is chosen by default.")

    form['SUB_MATRIX'] = model

def run_jmt(msa_file_path):

    JMT_JAR_FILE = GENERAL_CONSTANTS.JMODELTEST2
    output_file_path = "model_selection.txt"
    cmd = "java -Djava.awt.headless=true -jar %s -d %s -t BIONJ -AICc -f -o %s" %(JMT_JAR_FILE, msa_file_path, output_file_path)
    submit_job_to_Q("jmt", cmd)
    LOG.write("run_jmt: %s\n" %cmd)

    f = open(output_file_path, 'r')

    start_reading = False
    JMT_VALID_MODELS = ["JC","HKY","GTR"]

    # extract best model from table
    line = f.readline()
    while line != "":

        if start_reading:

            line = line.strip()
            split_row = line.split()
            model = split_row[0]
            if model in JMT_VALID_MODELS:

                f.close()
                #model = model.strip('()')
                if model == "JC":

                    form['SUB_MATRIX'] = "JC_Nuc"

                else:

                    form['SUB_MATRIX'] = model

                vars['best_fit'] = "model_found"
                print("The best evolutionary model was selected to be: " + model)
                return

        elif re.search(r'Model             -lnL    K     AICc       delta       weight   cumWeight', line, re.M):

            start_reading = True

        line = f.readline()

    vars['best_fit'] = "model_search_failed"
    form['SUB_MATRIX'] = "JC_Nuc"
    print("The evolutionary model search has failed. The JC model is chosen by default")
    f.close()


def convert_msa_format(infile, infileformat, outfile, outfileformat):

    try:

        AlignIO.convert(infile, infileformat, outfile, outfileformat)

    except:

        exit_on_error('sys_error', "convert_msa_format : exception")

def create_MSA():

    if form['MSAprogram'] == "CLUSTALW":

        cmd = "clustalw -infile=%s -outfile=%s" %(vars['FINAL_sequences'], vars['msa_clustal'])
        LOG.write("create_MSA : run %s\n" %cmd)
        submit_job_to_Q("clustalw", cmd)
        convert_msa_format(vars['msa_clustal'], "clustal", vars['msa_fasta'], "fasta")

    elif form['MSAprogram'] == "MAFFT":

        cmd = "mafft --localpair --maxiterate 1000 --quiet %s > %s" %(vars['FINAL_sequences'], vars['msa_fasta'])
        LOG.write("create_MSA : run %s\n" %cmd)
        submit_job_to_Q("MAFFT", cmd)
        #convert_msa_format(vars['msa_fasta'], "fasta", vars['msa_clustal'], "clustal")

    elif form['MSAprogram'] == "PRANK":

        cmd = "prank -d=%s -o=%s -F" %(vars['FINAL_sequences'], vars['msa_fasta'])
        print("Warning: PRANK is accurate but slow MSA program, please be patient.")
        LOG.write("create_MSA : run %s\n" %cmd)
        submit_job_to_Q("PRANK", cmd)

        if os.path.exists(vars['msa_fasta'] + ".2.fas"):

            vars['msa_fasta'] += ".2.fas"

        elif os.path.exists(vars['msa_fasta'] + ".1.fas"):

            vars['msa_fasta'] += ".1.fas"

        elif os.path.exists(vars['msa_fasta'] + ".best.fas"):

            vars['msa_fasta'] +=  ".best.fas"

        #convert_msa_format(vars['msa_fasta'], "fasta", vars['msa_clustal'], "clustal")

    elif form['MSAprogram'] == "MUSCLE":

        #cmd = "muscle -align %s -output %s" %(vars['FINAL_sequences'], vars['msa_fasta'])
        cmd = "muscle -in %s -out %s -quiet" %(vars['FINAL_sequences'], vars['msa_fasta'])
        LOG.write("create_MSA : run %s\n" %cmd)
        submit_job_to_Q("MUSCLE", cmd)
        #convert_msa_format(vars['msa_clustal'], "clustal", vars['msa_fasta'], "fasta")

    else:

        exit_on_error('user_error', "Choose one of the programs for creating the msa: clustalw, mafft, prank or muscle.")

    if not os.path.exists(vars['msa_fasta']) or os.path.getsize(vars['msa_fasta']) == 0:

        exit_on_error('user_error', "The %s program failed to create the MSA. Choose a different program to create the MSA." %form['MSAprogram'])

def choose_final_homologoues(ref_to_seqs_hash, ref_to_cd_hash, max_num_homologs, witch_unifrom, output_file, rejected_file, num_rejected_homologs):

    LOG.write("sort_sequences_from_eval(%s ,%s , %f, %s, %s, %s, %d)\n" %("ref_to_seqs_hash", "ref_to_cd_hash", max_num_homologs, witch_unifrom, output_file, rejected_file, num_rejected_homologs))


    query_name = ""
    query_AAseq = ""
    counter = 1


    try:

        FINAL = open(vars['FINAL_sequences'], 'w')

    except:

        exit_on_error('sys_error',"choose_final_homologoues : cannot open the file %s for writing" %vars['FINAL_sequences'])

    # we write the query sequence to the file of the final homologs
    FINAL.write(">%s\n%s\n" %(vars['query_string'], vars['protein_seq_string']))

    final_file_size = os.path.getsize(vars['FINAL_sequences']) # take the size of the file before we add more sequences to it

    try:

        REJECTED = open(rejected_file, 'a')

    except:

        exit_on_error('sys_error', "Can't open '" + rejected_file + "' for writing.")

    # write query details
    if query_AAseq != "":

        FINAL.write(">%s\n%s\n" %(query_name, query_AAseq))

    size_cd_hit_hash = len(ref_to_cd_hash)
    uniform = 1
    jump = 1

    if witch_unifrom == "sample":

        uniform = int(size_cd_hit_hash / max_num_homologs)
        if uniform == 0:

            uniform = 1

    final_number_of_homologoues = 1
    REJECTED.write("\n\tSequences rejected because of the requirement to select only %d representative homologs\n\n" %(max_num_homologs + 1))
    # write homologs
    for s_name in sorted(ref_to_seqs_hash.keys(), key = ref_to_seqs_hash.get):

        # write next homolog
        if s_name in ref_to_cd_hash: # and 'SEQ' in ref_to_cd_hash[s_name]:

            if counter != jump or counter > max_num_homologs * uniform:



                counter += 1
                REJECTED.write("%d %s\n" %(num_rejected_homologs, s_name))
                num_rejected_homologs += 1
                continue

            final_number_of_homologoues += 1
            FINAL.write(">%s\n%s\n" %(s_name, ref_to_cd_hash[s_name]))
            counter += 1
            jump += uniform

    FINAL.close()
    REJECTED.close()

    vars['final_number_of_homologoues'] = final_number_of_homologoues
    # check that more sequences were added to the file
    if not final_file_size < os.path.getsize(vars['FINAL_sequences']):

        exit_on_error('sys_error', "choose_final_homologoues : the file " + vars['FINAL_sequences'] + " doesn't contain sequences")

def cluster_homologoues(ref_cd_hit_hash):

    msg = ""
    LOG.write("cluster_homologoues : create_cd_hit_output(%s, %s, %f, %s, %s);\n" %(vars['HITS_fasta_file'], vars['cd_hit_out_file'], vars['hit_redundancy']/100, ref_cd_hit_hash, form['DNA_AA']))
    total_num_of_hits = create_cd_hit_output(vars['HITS_fasta_file'], vars['cd_hit_out_file'], vars['hit_redundancy']/100, ref_cd_hit_hash, form['DNA_AA'])

    if form['MAX_NUM_HOMOL'].upper() == 'ALL':

        form['MAX_NUM_HOMOL'] = total_num_of_hits

    if total_num_of_hits < vars['min_num_of_hits']: # less seqs than the minimum: exit

        if total_num_of_hits <= 1:

            msg = "There is only 1 "

        else:

            msg = "There are only %d " %total_num_of_hits

        msg += "unique hits. The minimal number of sequences required for the calculation is %d. You may try to: " %vars['min_num_of_hits']
        msg += "Re-run the server with a multiple sequence alignment file of your own. Increase the Evalue. Decrease the Minimal %ID For Homologs"

        if int(form['ITERATIONS']) < 5:

            msg += " Increase the number of " + form['Homolog_search_algorithm'] + " iterations."

        msg += "\n"
        exit_on_error('user_error',msg)

    elif total_num_of_hits + 1 < vars['low_num_of_hits']: # less seqs than 10 : output a warning.

        msg = "Warning: There are "

        if total_num_of_hits + 1 < vars['number_of_homologoues_before_cd-hit']: # because we will add the query sequence itself to all the unique sequences.

            msg += "%d hits, only %d of them are" %(vars['number_of_homologoues_before_cd-hit'], total_num_of_hits+1)

        else:

            msg += str(total_num_of_hits + 1)

        msg += " unique sequences. The calculation is performed on the %d unique sequences, but it is recommended to run the program with a multiple sequence alignment file containing at least %s sequences." %(total_num_of_hits + 1, vars['low_num_of_hits'])

    else:

        msg = "There are %d %s hits. %d of them are unique, including the query. The calculation is performed on " %(vars['number_of_homologoues_before_cd-hit'], form['Homolog_search_algorithm'], total_num_of_hits + 1)

        if total_num_of_hits <= int(form['MAX_NUM_HOMOL']):

            msg += "%d unique sequences." %(total_num_of_hits + 1)

        elif form['best_uniform_sequences'] == "best":

            msg += "the %s <a href=\"<?=$orig_path?>/%s\" style=\"color: #400080; text-decoration:underline;\">sequences</a> closest to the query (with the lowest E-value)." %(form['MAX_NUM_HOMOL'], vars['FINAL_sequences_html'])

        else:

            msg += "a sample of %s sequences that represent the list of homologues to the query." %form['MAX_NUM_HOMOL']
            #msg += "a sample of %s <a href=\"<?=$orig_path?>/%s\" style=\"color: #400080; text-decoration:underline;\">sequences</a> that represent the list of homologues to the query." %(form['MAX_NUM_HOMOL'], vars['FINAL_sequences_html'])

    print(msg)

    #if os.path.exists(vars['HITS_rejected_file']) and os.path.getsize(vars['HITS_rejected_file']) != 0:

        #print_message_to_output("Here is the <a href=\"<?=$orig_path?>/" + vars['HITS_rejected_file'] + "\" TARGET=Rejected_Seqs style=\"color: #400080; text-decoration:underline;\">list of sequences</a> that produced significant alignments, but were not chosen as hits.")
        #print_message_to_output("<a href=\"<?=$orig_path?>/" + vars['HITS_rejected_file'] + "\" TARGET=Rejected_Seqs style=\"color: #400080; text-decoration:underline;\">Click here</a> if you wish to view the list of sequences which produced significant alignments, but were not chosen as hits.")

    return(total_num_of_hits + 1)



def submit_job_to_Q(job_name_prefix, cmd):

    os.system("cd %s\n%s" %(vars['working_dir'], cmd))


def compare_atom_seqres_or_msa(what_to_compare):

    # in case there is a msa and pdb, we check the similarity between the atom and the msa sequences
    # in case there is no msa and there are both atom and seqres sequences, we check the similarity between them

    pairwise_aln = "PDB_" + what_to_compare + ".aln"
    atom_length = len(vars['ATOM_without_X_seq'])
    alignment_score = 0
    other_query_length = len(vars['protein_seq_string'])
    query_line = {}
    atom_line = "sequence extracted from the ATOM field of the PDB file"
    query_line['SEQRES'] = "sequence extracted from the SEQRES field of the PDB file"
    query_line['MSA'] = "sequence extracted from the MSA file"

    # compare the length of sequences. output a message accordingly
    if other_query_length != 0 and other_query_length < atom_length:

        print("The %s is shorter than the %s. The %s sequence has %d residues and the ATOM sequence has %d residues. The calculation continues nevertheless." %(query_line[what_to_compare],atom_line ,what_to_compare, other_query_length, atom_length))

    if atom_length < other_query_length:

        if atom_length < other_query_length * 0.2:

            print("Warning: The %s is significantly shorter than the %s. The %s sequence has %d residues and the ATOM sequence has only %d residues. The calculation continues nevertheless." %(atom_line, query_line[what_to_compare], what_to_compare, other_query_length, atom_length))

        else:

            print("The %s is shorter than the %s. The %s sequence has %d residues and the ATOM sequence has %d residues. The calculation continues nevertheless." %(atom_line, query_line[what_to_compare], what_to_compare, other_query_length, atom_length))

    # match the sequences
    LOG.write("compare_atom_seqres_or_msa : Align ATOM and " + what_to_compare + " sequences\n")
    [vars['seqres_or_msa_seq_with_gaps'], vars['ATOM_seq_with_gaps'], alignment_score] = pairwise_alignment(vars['protein_seq_string'], vars['ATOM_without_X_seq'], pairwise_aln, what_to_compare)

    if alignment_score < 100:

        if alignment_score < 30:

            exit_on_error('user_error',"The Score of the alignment between the %s and the %s is ONLY %d%% identity.<br>See <a href=\"<?=$orig_path?>/%s\" style=\"color: #400080; text-decoration:underline;\" TARGET=PairWise_Align>pairwise alignment</a>." %(query_line[what_to_compare], atom_line, alignment_score, pairwise_aln))

        else:

            print("The Score of the alignment between the %s and the %s is %d%% identity. The calculation continues nevertheless." %(query_line[what_to_compare], atom_line, alignment_score))


def exit_on_error(which_error, error_msg):

    complete_msg = "\n\nEXIT on error:\n\n" + error_msg + "\n"
    LOG.write(complete_msg)
    print(complete_msg)
    raise Exception("The error is not an exception")



def analyse_seqres_atom():

    # there is no ATOM field in the PDB

    if vars['ATOM_without_X_seq'] == "":

        exit_on_error('user_error', "There is no ATOM derived information in the PDB file.<br>Please refer to the OVERVIEW for detailed information about the PDB format.")

    # there is no SEQRES field in the PDB

    if vars['SEQRES_seq'] == "":

        msg = "Warning: There is no SEQRES derived information in the PDB file. The calculation will be based on the ATOM derived sequence. "

        if vars['running_mode'] == "_mode_pdb_no_msa":

            msg += "If this sequence is incomplete, we recommend to re-run the server using an external multiple sequence alignment file, which is based on the complete protein sequence."

        LOG.write("analyse_seqres_atom : There is no SEQRES derived information in the PDB file.\n")
        print(msg)

    if form['DNA_AA'] == "AA":

        # check if seqres contains nucleic acid
        if vars['pdb_object'].get_type() == "Nuc":

            exit_on_error('user_error', "The selected chain: " + form['PDB_chain'] + " contains nucleic acid, and you have selected amino acid")

    else:

        # check if seqres contains amino acid
        #type_SEQRES = vars['pdb_object'].get_type_SEQRES()
        #if form['PDB_chain'] in type_SEQRES and type_SEQRES[form['PDB_chain']] == "AA":
        if vars['pdb_object'].get_type() == "AA":

            exit_on_error('user_error', "The selected chain: " + form['PDB_chain'] + " contains amino acid, and you have selected nucleic acid")

    # if modified residues exists, print them to the screen

    MAXIMUM_MODIFIED_PERCENT = 0.15
    MODIFIED_COUNT = vars['pdb_object'].get_MODIFIED_COUNT()
    if MODIFIED_COUNT > 0:

        if form['DNA_AA'] == "AA":

            if len(vars['SEQRES_seq']) > 0 and MODIFIED_COUNT / len(vars['SEQRES_seq']) > MAXIMUM_MODIFIED_PERCENT:

                LOG.write("MODIFIED_COUNT %d\nSEQRES_seq %s\n" %(MODIFIED_COUNT, vars['SEQRES_seq']))
                exit_on_error('user_error', "Too many modified residues were found in SEQRES field; %0.3f%% of the residues are modified, the maximum is %0.3f%%." %(MODIFIED_COUNT / len(vars['SEQRES_seq']) ,MAXIMUM_MODIFIED_PERCENT))

            LOG.write("analyse_seqres_atom : modified residues found\n")
            print("Please note: Before the analysis took place, modified residues read from SEQRES field were converted back to the original residues:\n" + vars['pdb_object'].get_MODIFIED_LIST() + ".")

        else:

            LOG.write("analyse_seqres_atom : modified residues found\n")
            print("Please note: Before the analysis took place, modified nucleotides read from SEQRES field were converted back to the original nucleotides:\n" + vars['pdb_object'].get_MODIFIED_LIST() + ".")


def get_seqres_atom_seq(PDB_Obj, query_chain, pdb_file_name, model = False):

    # extract the sequences from the pdb

    seqres = PDB_Obj.get_SEQRES() # seqres sequence
    atom = PDB_Obj.get_ATOM() # atom sequence with gaps filled with X for amino acids and N for nucleic acids. This is only used to find the residues' place in the pdb
    atom_without_X = PDB_Obj.get_ATOM_withoutX()[query_chain] # atom sequence with gaps not filled

    if seqres == "" and atom == "":

        if model:

            return('user_error', "The protein sequence for chain '%s' was not found in SEQRES nor ATOM fields in the <a href=\"%s\">PDB file</a>." %(query_chain, pdb_file_name), "1")

        else:

            exit_on_error('user_error', "The protein sequence for chain '%s' was not found in SEQRES nor ATOM fields in the PDB file." %query_chain)

    return [seqres, atom, atom_without_X]




vars['run_log'] = "log.txt"
form['Homolog_search_algorithm'] = "MMseqs2"
form['DNA_AA'] = "AA"


LOG = open(vars['run_log'], 'w')

vars['BLAST_out_file'] = "%s/%s_all/uniref.a3m" %(vars['working_dir'], vars['job_name'])



vars['Msa_percentageFILE'] = vars['job_name'] + "_msa_aa_variety_percentage.csv"


vars['All_Outputs_Zip'] = "Consurf_Outputs_" + vars['job_name'] + ".zip"


vars['hit_min_length'] = 0.60 # minimum length of homologs
vars['min_num_of_hits'] = 5 # minimum number of homologs
vars['FINAL_sequences'] = "query_final_homolougs.fasta" # finial homologs for creating the MSA
vars['FINAL_sequences_html'] = vars['job_name'] + "_final_homolougs.html" # html files showing the finial homologs to the user
vars['r4s_log'] = "r4s.log" # rate4site log
vars['r4s_out'] = "r4s.res" # rate4site output
vars['r4s_slow_log'] = "r4s_slow.log" # rate4site slow log
vars['gradesPE'] = vars['job_name'] + "_consurf_grades.txt" # file with consurf output
vars['zip_list'] = []
vars['date'] = date.today().strftime("%d/%m/%Y")
vars['color_array'] = {1 : "#0A7D82", 2 : "#4BAFBE", 3 : "#A5DCE6", 4 : "#D7F0F0", 5 : "#FFFFFF", 6 : "#FAEBF5", 7 : "#FAC8DC", 8 : "#F07DAA", 9 : "#A0285F", 'ISD' : "#FFFF96"}
vars['color_array_CBS'] = {1 : "#0F5A23", 2 : "#5AAF5F", 3 : "#A5DCA0", 4 : "#D7F0D2", 5 : "#FFFFFF", 6 : "#E6D2E6", 7 : "#C3A5CD", 8 : "#9B6EAA", 9 : "#782882", 'ISD' : "#FFFF96"}

vars['Colored_Seq_PDF'] = vars['job_name'] + "_consurf_colored_seq.pdf"
vars['Colored_Seq_CBS_PDF'] = vars['job_name'] + "_consurf_colored_seq_CBS.pdf"

vars['msa_clustal'] = "msa_clustal.aln" # if the file is not in clustal format, we create a clustal copy of it

vars['protein_seq'] = "protein_seq.fas" # a fasta file with the protein sequence from PDB or from protein seq input

vars['gradesPE_Output'] = [] # an array to hold all the information that should be printed to gradesPE
# in each array's cell there is a hash for each line from r4s.res.
# POS: position of that aa in the sequence ; SEQ : aa in one letter ;
# GRADE : the given grade from r4s output ; COLOR : grade according to consurf's scale



try:

    vars['zip_list'].append(vars['tree_file'])
    vars['zip_list'].append(vars['gradesPE'])
    vars['zip_list'].append(vars['Msa_percentageFILE'])
    vars['zip_list'].append(vars['Colored_Seq_PDF'])
    vars['zip_list'].append(vars['Colored_Seq_CBS_PDF'])
    vars['zip_list'].append(vars['msa_fasta'])
    vars['zip_list'].append(vars['pymol'])
    vars['zip_list'].append(vars['pymol_CBS'])
    vars['zip_list'].append(vars['FINAL_sequences_html'])

    ## mode : include pdb

    # create a pdbParser, to get various info from the pdb file
    if vars['running_mode'] == "_mode_pdb_no_msa" or vars['running_mode'] == "_mode_pdb_msa" or vars['running_mode'] == "_mode_pdb_msa_tree":

        upload_PDB()
        extract_data_from_model()


    ## mode : only protein sequence

    # if there is only protein sequence: we upload it.
    elif vars['running_mode'] == "_mode_no_pdb_no_msa":

        upload_sequence()
        #upload_protein_sequence()

    ## mode : no msa - with PDB or without PDB

    if vars['running_mode'] == "_mode_pdb_no_msa" or vars['running_mode'] == "_mode_no_pdb_no_msa":

        create_MSA_parameters()
        no_MSA()

    ## mode : include msa

    elif vars['running_mode'] == "_mode_pdb_msa" or vars['running_mode'] == "_mode_msa" or vars['running_mode'] == "_mode_pdb_msa_tree" or vars['running_mode'] == "_mode_msa_tree":

        upload_MSA()
        extract_data_from_MSA()

    if form['SUB_MATRIX'] == "BEST":

        #vars['best_fit'] = True
        find_best_substitution_model()

    else:

        vars['best_fit'] = "model_chosen"

    run_rate4site()
    assign_colors_according_to_r4s_layers()
    write_MSA_percentage_file()

    ## mode : include pdb

    if vars['running_mode'] == "_mode_pdb_no_msa" or vars['running_mode'] == "_mode_pdb_msa" or vars['running_mode'] == "_mode_pdb_msa_tree":

        consurf_create_output()

    ## mode : ConSeq - NO PDB

    if vars['running_mode'] == "_mode_msa" or vars['running_mode'] == "_mode_no_pdb_no_msa" or vars['running_mode'] == "_mode_msa_tree":

        conseq_create_output()

    zip_all_outputs()
    create_download_link(vars['msa_fasta'], "Download Multiple Sequence Alignment in FASTA format")
    create_download_link(vars['gradesPE'], "Download Per-Residue Details")
    create_download_link(vars['Msa_percentageFILE'], "Download csv file showing residue variety per position in the MSA ")
    create_download_link(vars['tree_file'], "Download Phylogenetic Tree")

    ## Arrange The HTML Output File
    print("The calculation is done.")
    LOG.close()

except Exception as e:

    if str(e) != "The error is not an exception":

        LOG.write(traceback.format_exc())
        LOG.close()
        raise(e)

    else:

        LOG.close()







Do you have a PDB/uniprot ID? (Y/N):
Y
Please enter your ID:
P54132

The PDB has only one chain A

Please upload your MSA.


[1;30;43mStreaming output truncated to the last 5000 lines.[0m
14456. A0A0D2PS75_9AGAR/71-432
14457. SRR5262249_18766031/13-67
14458. A0A250XP69_9CHLO/429-551
14459. ERR1719323_1242782/71-229
14460. SRR5256885_13416807/41-125
14461. ERR1719489_182738/345-468
14462. SRR5688572_31117669/366-479
14463. A0A0K9NSY9_ZOSMR/278-611
14464. SRR2546423_381812/78-154
14465. ERR1740116_116240/534-743
14466. A0A0G1DZV3_9BACT/148-359
14467. SRR4051794_4692298/346-481
14468. SRR5882724_302418/339-456
14469. GraSoiStandDraft_45_1057281.scaffolds.fasta_scaffold4835035_1/181-382
14470. GraSoiStandDraft_45_1057281.scaffolds.fasta_scaffold4835035_1/478-544
14471. A0A0G4I1M6_9ALVE/316-645
14472. ERR1719322_136041/81-128
14473. ERR1719357_1402821/488-604
14474. ERR1719247_22510/243-569
14475. A0A0N4T593_BRUPA/186-536
14476. ERR1719341_995786/17-137
14477. SRR5579883_88560/31-358
14478. C4A095_BRAFL/1-185
14479. E6QTP4_9ZZZZ/27-360
14480. A0A2E1AYU0_9EURY/27-346
14481. A0A0E0CBZ2_9ORYZ/282-616
14482. SRR398

Loading weights:   0%|          | 0/487 [00:00<?, ?it/s]

BertModel LOAD REPORT from: Rostlab/prot_bert
Key                                        | Status     |  | 
-------------------------------------------+------------+--+-
cls.predictions.transform.dense.weight     | UNEXPECTED |  | 
cls.seq_relationship.weight                | UNEXPECTED |  | 
cls.predictions.transform.LayerNorm.weight | UNEXPECTED |  | 
cls.predictions.transform.LayerNorm.bias   | UNEXPECTED |  | 
cls.predictions.decoder.bias               | UNEXPECTED |  | 
cls.predictions.bias                       | UNEXPECTED |  | 
cls.predictions.decoder.weight             | UNEXPECTED |  | 
cls.predictions.transform.dense.bias       | UNEXPECTED |  | 
cls.seq_relationship.bias                  | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.


Success! Populated gradesPE_Output with 1417 scores.
------------------------------------------------------------------------------------------------------------------------
Regular View


0
1 11 21 31 41 M A A V P Q N N L Q E Q L E R H S A R T L N N K L S L S K P K F S G F T F K K K T S S D N N V S V T
51 61 71 81 91 N V S V A K T P V L R N K D V N V T E D F S F S E P L P N T T N Q Q R V K D F F K N A P A G Q E T Q
101 111 121 131 141 R G G S K S L L P D F L Q T P K E V V C T T Q N T P T V K K S R D T A L K K L E F S S S P D S L S T
151 161 171 181 191 I N D W D D M D D F D T S E T S K S F V T P P Q S H F V R V S T A Q K S K K G K R N F F K A Q L Y T
201 211 221 231 241 T N T V K T D L P P P S S E S E Q I D L T E E Q K D D S E W L S S D V I C I D D G P I A E V H I N E
251 261 271 281 291 D A Q E S D S L K T H L E D E R D N S E K K K N L E E A E L H S T E K V P C I E F D D D D Y D T D F
301 311 321 331 341 V P P S P E E I I S A S S S S S K C L S T L K D L D T S D R K E D V L S T S K D L L S K P E K M S M
351 361 371 381 391 Q E L N P E T S T D C D A R Q I S L Q Q Q L I H V M E H I C K L I D T I P D D K L K L L D C G N E L
401 411 421 431 441 L Q Q R N I R R K L L T E V D F N K S D A S L L G S L W R Y R P D S L D G P M E G D S C P T G N S M
451 461 471 481 491 K E L N F S H L P S N S V S P G D C L L T T T L G K T G F S A T R K N L F E R P L F N T H L Q K S F


------------------------------------------------------------------------------------------------------------------------
Color Blind View


0
1 11 21 31 41 M A A V P Q N N L Q E Q L E R H S A R T L N N K L S L S K P K F S G F T F K K K T S S D N N V S V T
51 61 71 81 91 N V S V A K T P V L R N K D V N V T E D F S F S E P L P N T T N Q Q R V K D F F K N A P A G Q E T Q
101 111 121 131 141 R G G S K S L L P D F L Q T P K E V V C T T Q N T P T V K K S R D T A L K K L E F S S S P D S L S T
151 161 171 181 191 I N D W D D M D D F D T S E T S K S F V T P P Q S H F V R V S T A Q K S K K G K R N F F K A Q L Y T
201 211 221 231 241 T N T V K T D L P P P S S E S E Q I D L T E E Q K D D S E W L S S D V I C I D D G P I A E V H I N E
251 261 271 281 291 D A Q E S D S L K T H L E D E R D N S E K K K N L E E A E L H S T E K V P C I E F D D D D Y D T D F
301 311 321 331 341 V P P S P E E I I S A S S S S S K C L S T L K D L D T S D R K E D V L S T S K D L L S K P E K M S M
351 361 371 381 391 Q E L N P E T S T D C D A R Q I S L Q Q Q L I H V M E H I C K L I D T I P D D K L K L L D C G N E L
401 411 421 431 441 L Q Q R N I R R K L L T E V D F N K S D A S L L G S L W R Y R P D S L D G P M E G D S C P T G N S M
451 461 471 481 491 K E L N F S H L P S N S V S P G D C L L T T T L G K T G F S A T R K N L F E R P L F N T H L Q K S F


------------------------------------------------------------------------------------------------------------------------
To create pymol with consurf:


which contains ConSurf's color grades.
Download the coloring script


or


1) Start the PyMOL program.
2) Drag the PDB file to the pymol window.
3) Drag the pymol coloring acript to the window.



The calculation is done.


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>