# Alphafold models analysis main program

##  Description of the materials and program

### Introduction

<div style="font-family: Arial, sans-serif; line-height: 1.5; text-align: justify;">
   
This jupyter notebook is created to perform a analysis of complexes generated by different versions of Alphafold. There are main 4 versions of AlphaFold available:

- AlphaFold2-Multimer v1 (v1).

- AlphaFold2-Multimer v2 (v2).

- AlphaFold2-Multimer v3 (v3).

- AF3.

</div>

### Description of the files and folders of AlphaFold2-Multimer

#### Complex folders


<div style="font-family: Arial, sans-serif; line-height: 1.5; text-align: justify;">
The folder in which the rest files are stored are named by the complex, composed by the name of the cristal in the PDB bank followed by the chains used to do the complex

#### PDBS files

<div style="font-family: Arial, sans-serif; line-height: 1; text-align: justify;">

Indicates the information of the protein structure. The names of the PDBs generated by Aplhafold Multimer are composed by: complex,state,rank,version of Alphafold, model and recycle (except cristals,"ranked" pdbs and Seed_0 pdbs).<br><br>

- Complex: the name of the complex registered in the PDB bank, it  is composed by letters and numbers.<br><br>
- States:
  - unrelaxed: are crude structures provided by Alphafold in which it does it iterative proccess .
  
  - relaxed: The last structure recycled relaxed using AMBER  force field in openMM. <br><br>

- Version: indicates which version of AlphaFold .<br><br>


- Model:

  - Models in Alphafold2: generates five predictions from the same seed, are named as "model_" followed by a number.
  

    - "ranked_" folled by a number: indicates in which position in the rank are the relaxed models according to the scores that alphafold assigns. Their name is entirely "ranked" it has no more data in it.


    - "pred_" followed by a number: identifies a model generated by the same seed, but with minor differences.<br><br>
    
  
  - Model in the versions of AM (v1,v2,v3,v3_short): the models are generate models 5 model from differents seeds and then it iterates the resolution of the structure until the tol variable surpass a threshold in which alphafold stop modeling ot reaches the recycle of 20.<br><br>

- Recycle: only for non-Alphafold2 predictions (at te moment).
  
  - "r_" followed by number : indicates the recycle of the model.


  - "Seed_0": is the same from recycle 20 that will be relaxed.<br><br>
  
- Rank folllowed by a number : it indicates which model of the five generated is best according to the highest score obtained in the last recyle, only in Alphafold2.<br><br>


- Examples of names:


  - unrelaxed_rank_001_alphafold2_multimer_v2_model_4_seed_000_r9.pdb (standart name in AM versions).


  - relaxed_model_4_multimer_v2_pred_1.pdb (standart name in Alphafold2 versions).


  - 3BT1.pdb (crystal).


  - ranked_0.pdb (relaxed and ranked in Alphafold2).

  
  - unrelaxed_rank_001_alphafold2_multimer_v3_model_2_seed_000_r0 ( Seed_0 example).

   
</div>


#### Log.txt files

<div style="font-family: Arial, sans-serif; line-height: 1; text-align: justify;">
   
It gathers infromation about the execution of alphafold, the most relevant information is:

- Timestamps: The file starts with timestamps indicating when each event occurred. These timestamps are in the format "YYYY-MM-DD HH:MM:SS,sss" (Year, Month, Day, Hour, Minute, Second, Milliseconds).

- Information about the software: The first few entries provide information about the software version (ColabFold 1.5.2).

- Recycle iterations: The log then proceeds to provide information about the iterative process of protein structure prediction. It mentions recycling and various metrics such as "pLDDT," "pTM," "ipTM," and "tol" for each recycle step.

- Model ranking: The final section ranks the models based on the "multimer" metric, and it mentions the relaxation times for each model.
</div>

### Description of the files and folders of Alphafold 3

#### JSON

Full-data JSON: It gives detailed information about each residue

Job_request JSOn: is the job submited to AF3 server. It contains the name of the job (usually the modeled complex), the seed designated (random) and the sequences of the desired molecules. If this job is uploaded to AF3 server y reproduces the same results.

Summary confidence: it gives information about the overall quality of the structure. It is mainly composed by:

 - "fraction_disordered": the disorded regions are defined  in the supplementary work of AF3

 - "has_clash": indicates the proportion of clases

 - "iptm": the interface of TM scored, is calculated with the same procedure as in AF2-Multimer

 - "num_recycles": number of recycles done by the pairformer, for more information (https://elanapearl.github.io/blog/2024/the-illustrated-alphafold/)

 - "ptm": proximated TM scored, is calculated with the same procedure as in AF2-Multimer
 
 - "ranking_score": new score of AF3 which includes iptm, ptm, fraction_disoredred and clases to acoid hallucinations: 0.8 · ipTM + 0.2 · pTM + 0.5 · disorder − 100 · has_clash. 

cif models: similar to PDB, you can use programs like Chimera X and Pymol to look at it

### Description of the program

<div style="font-family: Arial, sans-serif; line-height: 1.5; text-align: justify;">
   
The analysis in the main program is divided in 6 sections:

1. Libraries and initial values: It loads the libraries are needed. 
2. Paths and selected molecules: Selection of the target and the p_type
3. Contruction of the dataframes: In this sections we extract the information of ene files and log txt in to dataframes
4. Final Fusion and adjustments: 
5. Ranking: we calculate

There are two classes of folders. Ones have the pdb from Alphafold2 and the other are obtained from AlphaFold-multimers. The difference between them is how the information about the model confidence is stored, the ones from Alphafold2 have their model confidence stored in json archives and the ones from AM have in the log.txt. This implies a different aproach of gathering this data.
</div>


## Main program

### 1. Libraries

In [None]:
# File manegement
import os, zipfile 
import re 
import shutil

# Data manegement
import pandas as pd # used to manage dataframes
import numpy as np
from itertools import product
from Bio import PDB
from Bio.PDB import MMCIFParser, PDBIO, DSSP, NeighborSearch, Superimposer, PDBParser
from Bio.Align import PairwiseAligner
from scipy.spatial.transform import Rotation as R
from concurrent.futures import ProcessPoolExecutor, as_completed
import warnings

# Subprocess to calling bash
import subprocess # used to call bash and running external programs like pydock4

### 2. Paths and selected molecules

Selection of path and molecules

In [None]:
# Directories
main_folder="" # Name of

directory=f"{main_folder}/COMPLEX"
directory_csv= f"{main_folder}/CSV" # This is the the directory of the folder that will gather the outputs

print(directory)

Looking at available pdbs, to see if the modeling process have be done correctly

In [None]:
if not os.path.exists(directory_csv):
    os.makedirs(directory_csv)

# Folders of all models
carpetas = [nombre for nombre in os.listdir(directory) if os.path.isdir(os.path.join(directory, nombre))]

#PDB files of the folders and the way we will 
archivos_pdb=[]
patron = r'(.*(\d+)\.pdb$)'
#patron = r'fold_t\d+_b[a-z]+_a_\d+_model_\d+_supeimp\.pdb' # T272
datos_carpeta={}
for carpeta in carpetas:  
        patron ="("+carpeta[0:4]+ ".pdb)|" +patron
        direccion = directory + "/" + carpeta + "/"
        pdbs=[os.path.abspath(os.path.join(direccion, archivo)) for archivo in os.listdir(direccion) if re.match(patron, archivo)]
        datos_carpeta={**datos_carpeta,**{carpeta:len(pdbs)}}
        archivos_pdb.extend(pdbs)

In [None]:
carpetas

In [None]:
datos_carpeta

In [None]:
def numero_pdbs_by_dir(directory):  
    x=1
    n_archivos=0
    for carpeta in carpetas:
        # Accedemos a cada una de ellas y ponemos en un documento lista la dirección de cada uno de los .pdb
        direccion = directory + "/" + carpeta + "/"
        archivos_pdb = [archivo for archivo in os.listdir(direccion) if re.match(patron, archivo)]
        print(x,carpeta,len(archivos_pdb))
        n_archivos=n_archivos+len(archivos_pdb) 
        x=x+1
    return (n_archivos)
print(numero_pdbs_by_dir(directory))

### 3. Data Frame creation

#### 3.1 Description of the dataframe

<div style="font-family: Arial, sans-serif; line-height: 1.5; text-align: justify;">

The dataframe constructed from all the following process is:

Data refered to model's name	
- **Name**: Name of the object or element. Used to identify and merge data from different datasets.
- **PATH**: File path associated with the object. Stores the locations of the files corresponding to each object for additional input/output operations.
- **Complex**: Name or identifier of the studied complex.
- **State**: State of the complex (e.g., native, mutated, etc.).
- **Model**: Specific model used in the analysis.
- **Rank**: Ranking of the model or complex based on a specific criterion.
- **Version**: Version of the model or software used in the analysis.
- **Recycle**: Number of times the model has been recycled or reused in iterations.
- **Seed**: Seed value used by AlphaFold2.

Data of pydock

- **Ele**: Electrostatic energy of the complex. Measures the interaction between electric charges within the complex.
- **Desolv**: Desolvation energy. Represents the energetic cost associated with desolvating individual molecules to form the complex.
- **VDW**: Van der Waals energy. Measures the attractive and repulsive interactions between atoms that are not chemically bonded.
- **Total**: Total energy of the complex. Sum of all energetic contributions (Electrostatic + Desolvation + 0.1 Van der Waals).
- **Total2**: Unweighted total energy from pyDock (Electrostatic + Desolvation + Van der Waals).

Data by Alphafold2-Multimer and AF3 log and json

- **pLDDT**: Predicted Local Distance Difference Test. Measures the quality of the local structural prediction.
- **pTM**: Predicted Template Modeling. Measures the quality of the global structural prediction based on template modeling.
- **ipTM**: Interface Predicted Template Modeling. Measures the quality of the structural prediction at interfaces.
- **tol**: Tolerance of the model or simulation. (only Multimer)
- **Model_confidence**: Confidence in the predictive model. Calculated as ipTM\*0.8 + pTM\*0.3.

Z-scores and rankinng

- **MCZ-Score**: Model Confidence Z-score.
- **PLDDTZ-Score**: pLDDT Z-score.
- **TEZ-Score**: Z-score calculated from Total.
- **TE2Z-Score**: Z-score calculated from Total2.
- **Sum_Z**: Sum of the Z-scores for Model Confidence and Total.
- **Sum2_Z**: Sum of the Z-scores for Model Confidence and Total2.
- **Ranking_Z**: Ranking based on Sum_Z.
- **Ranking2_Z**: Ranking based on Sum2_Z.



<div style="font-family: Arial, sans-serif; line-height: 1.5; text-align: justify;">

#### 3.2 Bind energy dataframe

##### 3.2.1 Extraction of the information from .ene tables

In [None]:
# Empty dataframe that will gather all the results
total_df=pd.DataFrame()
resultado_df = pd.DataFrame()
extension_final = len(".ene")
patron = r".*\d\.ene$"

# Iteration of each folder, we extract all the names of the .ene inside 
for carpeta in carpetas:
    direccion = os.path.join(directory, carpeta )
    archivos_ene = [archivo for archivo in os.listdir(direccion) if re.match(patron, archivo)]
    resultado_df = pd.DataFrame() # the dataframe with all the ene data of the folder, important to distinguish between Af3 and Af2-multimer
    
    # Interation of each .ene and extracting their information in to a single dataframe
    for archivo in archivos_ene:
        print(os.path.join(direccion, archivo))
        tabla = []
        df = pd.read_csv(os.path.join(direccion, archivo), sep='\s+', skiprows=[1])
        df["Name"] = archivo[:-extension_final]
        df["PATH"] = os.path.join(direccion, archivo).rstrip(".ene")+".pdb"
        print(df)
        resultado_df = pd.concat([resultado_df, df], ignore_index=True)

    # We add the information of the complex depending if the folder is from AF3
    if carpeta.startswith('fold'):
        resultado_df["Complex"]=carpeta.split('_')[1].upper()
    else:
        resultado_df["Complex"]=carpeta[0:4]
    
    # Concatenation of each  total df from each folder
    total_df=pd.concat([total_df,resultado_df], ignore_index=True)
   

total_df.to_csv(directory_csv + "pydock4_raw.csv", index=False)

In [None]:
total_df

##### 3.2.2  Asignation of the data related to de name of the model: state, model, rank, version and recyle 

In [None]:
df = pd.read_csv(directory_csv + "pydock4_raw.csv", sep=r'\t|,')

In [None]:
# Loading the data_frame
df = pd.read_csv(directory_csv + "pydock4_raw.csv", sep=r'\t|,')

#information to retrieve with regular expresion
state_pattern = re.compile(r'.nrelaxed')
version_pattern = re.compile(r"((deepfold|alphafold2_multimer)_v\d+)_model")
model_pattern = re.compile(r'model_(\d+)')
rank_pattern = re.compile(r'(rank_(\d+))|(pred_\d+)|(ranked_.*)')
recycle_pattern = re.compile(r'(_|.)r(\d{1,})')
#seed_pattern = re.compile(r'seed_([0-9]+)\.')
#seed_pattern = re.compile(r'seed_([\d]+)\.')
seed_pattern = re.compile(r'seed_([0-9]+)(?:\.|$)')

# Defining empty list where the data from the file name will be gather
state=[]
model=[]
version = []
recycle = []
rank=[]
seed=[]
# Loop to gather the information entry by entry
for linea in (df["Name"].tolist()):
    
    #State relaxed, unrelaxed
    match = state_pattern.search(linea)
    if match:
        state.append(match.group(0))
    else:
        state.append("relaxed")
    
    # Model
    match = model_pattern.search(linea)
    if match:   
        model.append(match.group(1)) 
    else:
        model.append("cristal")   
    
    #Rank
    match = rank_pattern.search(linea)
    if match:   
        rank.append(match.group(0)) 
    else:
        rank.append("unrank")
    
    #Version 
    match = version_pattern.search(linea)
    if match: 
        version.append(match.group(1))
    else:
        version.append("cristal")
    
    # Recycle
    match = recycle_pattern.search(linea)
    if match:
        recycle.append(match.group(0)[2:])
    else:
        recycle.append("Seed_0")          
    #Seed
    match = seed_pattern.search(linea)
    if match:
        seed.append(match.group(1))
    else:
        seed.append("-")

# Adding the entries to the dataframe
df["State"]=state
df["Model"]=model
df["Rank"]=rank
df["Version"]=version
df["Recycle"]=recycle
df["Seed"]=seed

# Adding additional information 

df.loc[df["Rank"] == "unrank", "Version"] = "alphafold3" # Alphafold3 models 
df.loc[(df["Model"] == "cristal") & (df["Rank"] == "unrank"), ["Rank", "Recycle", "State", "Version"]] = "cristal" # Defining cristal entries 

# Old Alphafold2-Multimer- It may be removed
lista_valores = ["pred_0", "pred_1", "pred_2", "pred_3", "pred_4", "pred_5"]
df.loc[df["Rank"].isin(lista_valores), "Version"] = "Alphafold2"

# To have more available the path for possible future accesion
df['Name']= df['Name']+".pdb"

# Añadimos informacion de los ranked- Unrelevant information, it may be removed
df.loc[(df["Model"] == "cristal") & (df["Rank"] != "cristal"),  "Version"] = "Alphafold2"
df.loc[(df["Model"] == "cristal") & (df["Rank"] != "cristal"), ["Model",  "Recycle"]] = "ranked"

#The models were all relaxed since we used AMBER in openMM
df["State"]="relaxed"

df_pydock=df
df_pydock

In [None]:
df_pydock.to_csv(directory_csv+'/pydock4_all.csv', index=False)

#### 3.3 Log.txt information retrieving

The information related to the pLDDT, pTM, ipTM and tol will be gathered in a single dataframe that will be merged with df_pydock

In [None]:
df_pydock
# Folders of all models
carpetas_log = [nombre for nombre in os.listdir(directory) if os.path.isdir(os.path.join(directory, nombre))]
#carpetas_log.remove('Version1')
carpetas_log

In [None]:
# Dataframe columns
columns = ["Complex","Model","State",'Version', 'Recycle', 'pLDDT', 'pTM', 'ipTM', 'tol','Seed']

# Patrons in the text to gather the information
#complex_pattern = re.compile(r'((T|t)\/.*_A)') 
rank_pattern = re.compile(r'(rank_(\d+))|(pred_\d+)|(ranked_.*)')
model_pattern = re.compile(r'model_(\d+)')
state_pattern = re.compile(r'rank')
version_pattern = re.compile(r"((deepfold|alphafold2_multimer)_v\d+)_model")
recycle_pattern = re.compile(r'recycle=(\d+)')
plddt_pattern = re.compile(r'pLDDT=([\d.]+)')
ptm_pattern = re.compile(r'pTM=([\d.]+)')
iptm_pattern = re.compile(r'ipTM=([\d.]+)')
tol_pattern = re.compile(r'tol=([\d.]+)')
seed_pattern = re.compile(r'seed_([\d.]+)')
name_pattern = re.compile(r"(fold_t\d+_\d+_model_\d+)")
df_log=pd.DataFrame()

for carpeta in carpetas_log:
    directory_log=f"{directory}/{carpeta}/log.txt"
    # Loading the archive
    with open(directory_log, 'r') as file:
        lines = file.readlines()
    
    # Value extraction
    #name = None
    complex=None
    model=None
    version = None
    state=None
    recycle = None
    plddt = None
    ptm = None
    iptm = None
    tol = None
    seed = None
    data=[]
    for line in lines:
        
        # # Complex
        # match = complex_pattern.search(line)
        # if match:
        #     print()
        #     complex = match.group(0)
        #     complex=complex[2:-2]

        #Name
        # match = name_pattern.search(line)
        # if match:
        #     name = match.group(1)+'.pdb'
        # else:
        #     name =directory_log 
        #State
        match = state_pattern.search(line)
        if match:
            state="relaxed"
        else:
            state="unrelaxed"

        # Model
        match = model_pattern.search(line)
        if match:
            model= match.group(1)
        
        # Version
        match = version_pattern.search(line)
        if match:
            version = match.group(1)
        else:
            version = 'alphafold3'
            
        # Recycle
        match = recycle_pattern.search(line)
        if match:
            recycle = match.group(1)
        else:
            recycle = 'Seed_0'
        
        #  pLDDT
        match = plddt_pattern.search(line)
        if match:
            plddt = match.group(1)
        else:
            plddt = None
        
        #  pTM
        match = ptm_pattern.search(line)
        if match:
            ptm = match.group(1)
        else:
            ptm=None
        
        #  ipTM
        match = iptm_pattern.search(line)
        if match:
            iptm = match.group(1)
        else:
            iptm=None
        
        #  tol
        match = tol_pattern.search(line)
        if match:
            tol = match.group(1)
        else:
            tol="-"
        
        #seed
        match = seed_pattern.search(line)
        if match:
            seed = match.group(1)
        else:
            seed="-"
        
        # rank
        match = rank_pattern.search(line)
        if match:   
            recycle = 'Seed_0'
        # Guardar los valores en el DataFrame
        data.append([complex,model,state,version, recycle, plddt, ptm, iptm, tol,seed])

    # Crear el DataFrame
    df = pd.DataFrame(data, columns=columns)
    
    # Model Conficende calculated as AF2-Multimer paper
    df['ipTM'] = pd.to_numeric(df['ipTM'], errors='coerce')
    df['pTM'] = pd.to_numeric(df['pTM'], errors='coerce')
    df['Model_confidence'] = 0.8 * df['ipTM'] + 0.2 * df['pTM']
    if carpeta.startswith('fold'):
        df["Complex"]=carpeta.split('_')[1].upper()
    else:
        df["Complex"]=carpeta[0:4]
    df = df.dropna(subset=['Model_confidence'])
    df['pLDDT'] = pd.to_numeric(df['pLDDT'], errors='coerce')

    # Ensambling the df_log to gather all the information
    df_log=pd.concat([df_log,df])


In [None]:
#The models were all relaxed
df_log["State"]="relaxed"
df_log.to_csv(directory_csv+'/log_all.csv', index=False)
df_log

### 4.  Final fusion and Adjusments

<div style="font-family: Arial, sans-serif; line-height: 1.5; text-align: justify;">Now we ensemble a new_dataframe to collect all the data obtained during the calculation of for a posterior statistical analysis

</div>

#### 4.1 Checking for possible issues

In [None]:
#Loading the dataframes
df_pydock =pd.read_csv(directory_csv+'/pydock4_all.csv')
df_log=pd.read_csv(directory_csv+'/log_all.csv')

Looking at the dataframes

In [None]:
df_log

In [None]:
df_pydock

In [None]:
# Checking for the length of dataframes, the should match, if not revised
unicos=set(df_pydock["Complex"])
for complejo in unicos:
    print(complejo)
    n_pydock=len(df_pydock[df_pydock["Complex"]==complejo])
    n_log=len(df_log[df_log["Complex"]==complejo])
    print("Pydock:",n_pydock, " Log:",n_log,"Difference:",n_pydock-n_log)

#### 4.2 Merging dataframes

Determaining which columns are diferent and merging by the common ones

In [None]:
# Defining the  shared columns
columna4=(df_pydock.columns).tolist()
columna3=(df_log.columns).tolist()
compartidos2=list(set(columna4).intersection(columna3))

# Coercing to have the same type
df_pydock[compartidos2]=df_pydock[compartidos2].astype(str)
df_log[compartidos2]=df_log[compartidos2].astype(str)

# Merging the values
merged_df2 = df_pydock.merge(df_log, on= compartidos2, how='left')
print (compartidos2)

# Savind the results
merged_df2.to_csv(directory_csv+'/merged_df2.csv')


### 5. Ranking 

In [None]:
df_norm=pd.read_csv(directory_csv+'/merged_df2.csv')
df_norm

Z-score of pydock and Model confidence of the selected models

In [None]:
# Removing unnecesary columns
columnas=['Conf','RANK']
df_norm.drop(columnas, axis=1, inplace=True)
df_norm.dropna(subset=["Complex"],inplace=True)

# Removing duplicates
df_norm=df_norm.drop_duplicates(subset=["Name"],keep="first")
duplicados = df_norm[df_norm.duplicated(subset=["Name","Version","Complex","Recycle","State"])]

# Adding Total2 column
df_norm["Total2"]=df_norm["VDW"]+df_norm["Ele"]+df_norm["Desolv"] 

# Z-Score individuales, inicializacion
df_norm["MCZ-Score"] = 0 # Z-score de model_conficence
df_norm["PLDDTZ-Score"] = 0 # Z-score de pLDDT
df_norm["TEZ-Score"] = 0 # Z-score de Total
df_norm["TE2Z-Score"] = 0 # Z-score de Total2

# Suma de Z-Score, inicialicion
df_norm["Sum_Z"] = 0 # Z-score Model confidence + Total
df_norm["Sum2_Z"] = 0 # Z-score Model confidence + Total2
df_norm["Z-PLT"] = 0 # Z-score de pLDDT + Total
df_norm["Z-PLT2"]= 0 # Z-score de pLDDT + Total2

# Ranking Z-Score, inicializacion
df_norm["Ranking_Z"] = 0 # Ranking de Sum_Z
df_norm["Ranking2_Z"] = 0 # Ranking de Sum2_Z
df_norm["Ranking_PLT"] = 0 # Ranking de Z-PLT
df_norm["Ranking_PLT2"] = 0 # Ranking de Z-PLT2

# Calculo de medias y desviaciones segun complejo
grouped = df_norm.groupby(["Complex"])
medias=grouped.mean()
sdesv=grouped.std()

# Z-Score individuales
for name, group in grouped:
    # Calculamos Z_score de model_conficence y total energy
    df_norm.loc[group.index,["MCZ-Score"]] = (group["Model_confidence"]-medias.loc[name,"Model_confidence"])/sdesv.loc[name,"Model_confidence"]
    df_norm.loc[group.index,["TEZ-Score"]] = (group["Total"]-medias.loc[name,"Total"])/sdesv.loc[name,"Total"]
    df_norm.loc[group.index,["TE2Z-Score"]] = (group["Total2"]-medias.loc[name,"Total2"])/sdesv.loc[name,"Total2"]
    df_norm.loc[group.index,["PLDDTZ-Score"]] = (group["pLDDT"]-medias.loc[name,"pLDDT"])/sdesv.loc[name,"pLDDT"]

# Suma de Z-Score
df_norm.loc[:,"Sum_Z"]=df_norm.loc[:,"MCZ-Score"]-df_norm.loc[:,"TEZ-Score"]
df_norm.loc[:,"Sum2_Z"]=df_norm.loc[:,"MCZ-Score"]-df_norm.loc[:,"TE2Z-Score"]
df_norm.loc[:,"Z-PLT"]=df_norm.loc[:,"PLDDTZ-Score"]-df_norm.loc[:,"TEZ-Score"]
df_norm.loc[:,"Z-PLT2"]=df_norm.loc[:,"PLDDTZ-Score"]-df_norm.loc[:,"TE2Z-Score"]

# Ranking Z-Score
for name, group in grouped:
    df_norm.loc[group.index,"Ranking_Z"]=df_norm.loc[group.index,"Sum_Z"].rank(ascending=False)
    df_norm.loc[group.index,"Ranking2_Z"]=df_norm.loc[group.index,"Sum2_Z"].rank(ascending=False)
    df_norm.loc[group.index,"Ranking_PLT"]=df_norm.loc[group.index,"Z-PLT"].rank(ascending=False)
    df_norm.loc[group.index,"Ranking_PLT2"]=df_norm.loc[group.index,"Z-PLT2"].rank(ascending=False)



In [None]:
df_norm

In [None]:
df_norm.to_csv(directory_csv + "/df_norm_.csv",index=False)