# Practice 1: Molecular filtering: ADME and lead-likeness criteria

> **Note:** This book is available in two ways:
> 1. Downloading the repository and following the instructions in the file [README.md](https://github.com/ramirezlab/PILE/blob/main/README.md)
> 2. Clicking here on [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ramirezlab/PILE/blob/main/2.%20De%20datos%20a%20gráficas%3A%20Propiedades%20drug-likeness%20y%20similitud%20química%20con%20python/2.3_Practice-1.en.ipynb?hl=es)

## Concepts to work

### **Pharmacokinetics**

Pharmacokinetics are the study of what happens to a compound in an organism over a period of time<sup> **1** </sup>. It is divide into four steps: **A**bsorption, **D**istribution, **M**etabolism and **E**xcretion (ADME)<sup> **1, 2** </sup> . Some times also is includes **T**oxicology (ADMET) and **L**iberation (LADME).

 
<img src="img/ADME-en.jpg" alt="ADME" width="800"/>

*Figure 1*. Steps that make up the pharmacokinetics. From: [Somvanshi, Kharat, Jadhav, Thorat & Townley, 2021](https://doi.org/10.1016/B978-0-323-85050-6.00007-4)

   * **Absorption:** It refers to the amount and time it takes for a compound or substance to enter to the systemic circulation from the site of administration. It depends on multiple factors such as the ability of the compound to penetrate the intestinal wall, the solubility of the compound, the gastric emptying time, the chemical stability of the compound in the stomach, among others<sup> **1, 2** </sup>.
   * **Distribution:** It refers to how a substance is spread throughout the body. It depends on the diffusion and convection which may be influenced by the polarity, size, or binding abilities of the drug, the fluid status of the patient, or the body habitus of the individual. It is very important to achieve the effective drug concentration in the receptor site because to be effective a medication must reach its designated compartmental destination<sup> **1, 2** </sup>.
   * **Metabolism:** It refers to the processing of the drug by the body into subsequent compounds. It can be also to convert a drug  into more water-soluble substances in order to be more easily to excreted or in the case of prodrugs the metabolism is required to convert the drug into active metabolites<sup> **1, 2** </sup>.
   * **Excretion:** It refers to the process by which the drug is eliminated from the body. Generally the kidneys are the conduct of excretion by a passive filtration in the glomerulus or secretion in the tubules<sup> **1, 2** </sup>.

### **Lipinski's rule of five:**

The Lipinski's rule of five is one way to screen out compounds with probable absorption problems. This rule states that poor absorption or permeation of a drug is more probable when the chemical structure fulfils two or more of the following criteria<sup> **3** </sup>:
1. Molecular weight (MW) is greater than 500.
2. The calculated log P value is above 5.
3. There are more than 5 hydrogen bond donors (–NH–, –OH).
4. The number of hydrogen bond acceptors (–N ¼ , –O–) is greater than 10.

It is important to know that the rule of five does not definitively categorize all well and poorly absorbed compounds, although it is simple, fast, and provides a reasonable degree of classification.

## Problem Statement

For an investigation of a new drug we want to know if it is really absorbed by the body, if it is able to cross certain barriers to reach his target, how is it metabolized and how is excreted from the body. In this way doctors will have greater flexibility in prescribing and administering medications thus providing greater benefit with less risk and making adjustments as necessary, given the varied physiology and lifestyles of patients.

In order to know the absorption of the compounds we will use bioinformatic tools to be able to calculate Lipinski's rule of five and then we will calculate some statistics to plot them and analyze.

## Import the necessary libraries

In [None]:
!pip install rdkit
from rdkit import Chem
from rdkit.Chem import Descriptors
import pandas as pd
from rdkit.Chem import Draw
import numpy as np
from rdkit.Chem import QED

import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
from matplotlib.lines import Line2D
from math import pi
import os
from pathlib import Path
import requests
from io import StringIO

## Load dataset of P49841
The dataset contain the bioactive compounds against Glycogen synthase kinase-3 beta that we built in the tutorial 2.1_Dataframes.
The first thing we are going to do is import the database, we have to create a `root directory` (`ROOT_DIR`) to be able to navigate to the file.


In [None]:
# URL of the CSV file
csv_url = 'https://raw.githubusercontent.com/ramirezlab/PILE/main/2.%20De%20datos%20a%20gráficas%3A%20Propiedades%20drug-likeness%20y%20similitud%20química%20con%20python/data/compounds_P49841_full.csv'

# Download the file
response = requests.get(csv_url)
response.encoding = 'utf-8'  # Ensure the correct encoding

# Read the downloaded file as a pandas DataFrame
df_output = pd.read_csv(StringIO(response.text), encoding='utf-8')

# Display the first 5 rows
df_output.head()

## Lipinski's rule of five

The function below will allow us to calculate the chemical properties of Lipinski's rule of five having as input the `SMILES`. Then the conditions of the rule of five will be defined and finally we will have information whetherrule of five is violated.

In [None]:
def Ro5(df):
    
    smi = df['smiles']
    m = Chem.MolFromSmiles(smi)
    
    # Calculate rule of five chemical properties
    MW = Descriptors.ExactMolWt(m)
    HBA = Descriptors.NumHAcceptors(m)
    HBD = Descriptors.NumHDonors(m)
    LogP = Descriptors.MolLogP(m)
    
    # Rule of five conditions
    conditions = [MW <= 500, HBA <= 10, HBD <= 5, LogP <= 5]
    
    # Create pandas row for conditions results with values and information whether rule of five is violated 
    return pd.Series([MW, HBA, HBD, LogP, 'yes']) if conditions.count(True) >= 3 else pd.Series([MW, HBA, HBD, LogP, 'no'])

Now we are going to apply Lipinski's rule of 5 to our data set

In [None]:
# Apply the Ro5 function to each row of the DataFrame df_output and store the result in df_rule5
df_rule5 = df_output.apply(Ro5, axis=1)

# Assign column names to the DataFrame df_rule5
df_rule5.columns = ['MW', 'HBA', 'HBD', 'LogP', 'rule_of_five_conform']

# Display the first 5 rows of the updated DataFrame
df_rule5.head()

In [None]:
# Merge the DataFrame df_output with df_rule5, adding the columns generated by the rule of five (Ro5)
df_molecule = df_output.join(df_rule5)

# Display the first 5 rows of the updated DataFrame
df_molecule.head()

In [None]:
# Filter the DataFrame to keep only compounds that comply with the rule of five (Ro5)
fil_df = df_molecule[df_molecule['rule_of_five_conform'] == 'yes']

# Print the total number of compounds in the original DataFrame
print('# of compounds:', len(df_molecule))

# Print the number of compounds that comply with the rule of five
print('# of compounds in the filtered dataset:', len(fil_df))

# Print the number of compounds that do NOT comply with Lipinski's rule of five
print("# of compounds that do not comply with Lipinski's rule of five:", (len(df_molecule) - len(fil_df)))

# Count and display how many compounds comply and how many do not comply with the rule of five
print(df_molecule.rule_of_five_conform.value_counts())

# Generate a bar chart showing the distribution of compounds that comply and do not comply with the rule of five
df_molecule.rule_of_five_conform.value_counts().plot.bar()

Now we will save the dataset that has not been filtered

In [None]:
# Create a directory named 'data/' if it does not exist
!mkdir -p data/

# Save the DataFrame df_molecule as a CSV file inside the 'data/' directory
df_molecule.to_csv('data/compounds_P49841_lipinski.csv', index=False)

## Plot the properties of the rule of five per molecule as bar plots.

In [None]:
# Import the necessary libraries
import pandas as pd
import requests
from io import StringIO

# URL of the CSV file containing the dataset
csv_url = 'https://raw.githubusercontent.com/ramirezlab/PILE/main/2.%20De%20datos%20a%20gráficas%3A%20Propiedades%20drug-likeness%20y%20similitud%20química%20con%20python/data/compounds_P49841_lipinski.csv'

# Download the file from the URL
response = requests.get(csv_url)
response.encoding = 'utf-8'  # Ensure the file is correctly interpreted with UTF-8 encoding

# Attempt to read the CSV file using 'utf-8' encoding
try:
    lipinski_comp = pd.read_csv(StringIO(response.text), encoding='utf-8')
except UnicodeDecodeError:  
    # If an encoding error occurs, try reading with 'latin1'
    lipinski_comp = pd.read_csv(StringIO(response.text), encoding='latin1')

# Display the first 10 rows of the DataFrame to verify correct data import
lipinski_comp.head(10)

In [None]:
# Select the first 5 rows of the DataFrame lipinski_comp
comp_5_lipinski = lipinski_comp.iloc[:5]

# Display the subset of the first 5 compounds
comp_5_lipinski

#### Now we will make the bar plot.

In [None]:
# Define a dictionary with the properties of Lipinski's rule of five
ro5_properties = {
    "MW": (500, "molecular weight (g/mol)"),  # Maximum molecular weight allowed: 500 g/mol
    "HBA": (10, "# HBA"),  # Maximum 10 hydrogen bond acceptors (HBA)
    "HBD": (5, "# HBD"),  # Maximum 5 hydrogen bond donors (HBD)
    "LogP": (5, "logP"),  # Maximum logP partition coefficient of 5
}

In [None]:
# Create a figure with 4 subplots in a single row (1 row, 4 columns)
fig, axes = plt.subplots(figsize=(10, 2.5), nrows=1, ncols=4)

# Create an array with X-axis positions for the 5 selected compounds
x = np.arange(1, len(comp_5_lipinski) + 1)

# Define colors for the bars of each compound
colors = ["DarkMagenta", "LightGreen", "blue", "DarkSalmon", "yellow"]

# Create subplots for each property of Lipinski's rule of five (Ro5)
for index, (key, (threshold, title)) in enumerate(ro5_properties.items()):
    # Generate a bar plot for each Ro5 property
    axes[index].bar([0, 1, 2, 3, 4], comp_5_lipinski[key], color=colors)
    
    # Add a reference line at the threshold defined by Ro5
    axes[index].axhline(y=threshold, color="black", linestyle="dashed")
    
    # Assign a title to the subplot with the property name
    axes[index].set_title(title)
    
    # Remove X-axis labels for better clarity
    axes[index].set_xticks([])

# Create a legend with molecule identifiers and the threshold line
legend_elements = [mpatches.Patch(color=color, label=row["molecule_chembl_id"]) 
                   for color, (_, row) in zip(colors, comp_5_lipinski.iterrows())]

# Add the reference line to the legend
legend_elements.append(Line2D([0], [0], color="black", ls="dashed", label="Threshold"))

# Position the legend outside the figure for better visualization
fig.legend(handles=legend_elements, bbox_to_anchor=(1.2, 0.8))

# Adjust subplot layout to avoid overlap
plt.tight_layout()

# Show the figure
plt.show()

## Plot the properties of the rule of five per molecule as scatter plots.

In [None]:
# Create a 20x20 figure to improve visualization of the plots
fig = plt.figure(figsize=(20, 20))

# Generate a pairplot using Seaborn
ax = sns.pairplot(
    data=lipinski_comp,  # Use the DataFrame with the filtered compounds
    vars=['HBD', 'HBA', 'MW', 'LogP'],  # Select the variables to compare
    hue='rule_of_five_conform'  # Color the points based on whether they comply with the rule of five or not
)

# Display the plot
plt.show()

# Close the figure to free up memory
plt.close()

## Plot the properties of the rule of five per molecule as radar plot.

In [None]:
# URL of the CSV file containing the dataset
csv_url = 'https://raw.githubusercontent.com/ramirezlab/PILE/main/2.%20De%20datos%20a%20gráficas%3A%20Propiedades%20drug-likeness%20y%20similitud%20química%20con%20python/data/compounds_P49841_lipinski.csv'

# Download the file from the URL
response = requests.get(csv_url)
response.encoding = 'utf-8'  # Ensure the file is correctly interpreted with UTF-8 encoding

# Read the CSV file
lipinski_comp = pd.read_csv(StringIO(response.text), encoding='utf-8')

# Display the first 10 rows of the DataFrame to verify the correct import of the data
lipinski_comp.head(10)

Because the chemical properties of the rule of five are on different orders of magnitude, we need to transform them in order to visualize them on the radar diagram. In this case, the best way is to transform the data in such a way that the validation bounds are all 5:

- Original MW: 500 g/mol - Modified NW: 5 - Rule: MW/100 (Molecular weight (g/mol)/100)
- Original HBA: 10 - Modified HBA: 5 - Rule: HBA/2 (# H-bond acceptors/2)
- Original HBD: 5 - does not change (# H-bond donors)
- Original LogP: 5 - does not change (LogP)

Therefore, we are going to transform the `MW` and `HBA` columns, (the new ones are added in the last columns):

In [None]:
# Scale the molecular weight by dividing it by 100 and store it in a new column 'MW*100'
lipinski_comp['MW*100'] = lipinski_comp['MW'] / 100

# Scale the number of hydrogen bond acceptors by dividing it by 2 and store it in 'HBA*2'
lipinski_comp['HBA*2'] = lipinski_comp['HBA'] / 2

# Display the first 10 rows of the updated DataFrame
lipinski_comp.head(10)

For the radar chart we need the mean and standard deviations of a data set, so we will create a function that allows us to calculate these two statistics for the scaled values.

In [None]:
# Calculate statistics (mean and standard deviation) of the scaled Rule of Five properties
metrics_Ro5_stats_scaled = lipinski_comp[['MW*100', 'HBA*2', 'HBD', 'LogP']].agg(["mean", "std"])

# Display the DataFrame with the calculated metrics
metrics_Ro5_stats_scaled

Now let's create the function that performs the graph. The dataset must be given as input.
The function scales the data and finds the mean and standard deviation for the radarplot.

In [None]:
def plot_radar(dataframe):
    from math import pi
    import numpy as np

    # ------- PART 0: Scaled dataset / Metrics -------
    df = dataframe.copy()  # Create a copy of the original DataFrame to avoid modifying it directly

    # Scale molecular weight and hydrogen bond acceptors
    df['MW*100'] = df['MW'] / 100
    df['HBA*2'] = df['HBA'] / 2

    # Calculate the mean and standard deviation of the properties
    metrics_Ro5_stats_scaled = df[['MW*100', 'HBA*2', 'HBD', 'LogP']].agg(["mean", "std"])
    stats_mean = metrics_Ro5_stats_scaled.loc['mean']  # Extract the mean
    stats_std = metrics_Ro5_stats_scaled.loc['std']  # Extract the standard deviation

    # ------- PART 1: Create the background of the radar chart -------

    # Number of variables to plot
    N = 4

    # Calculate the angles for each axis in the radar chart
    angles = [n / float(N) * 2 * pi for n in range(N)]
    angles += angles[:1]  # Close the polygon

    # Initialize the figure and polar axes
    fig = plt.figure(figsize=(8, 8))
    ax = plt.subplot(111, polar=True)

    # If you want to rotate the chart so the first axis is at the top:
    # ax.set_theta_offset(pi/2)
    # ax.set_theta_direction(-1)

    # Define the categories for the chart
    categories = ['MW (g/mol)*100', '# HBA*2', '# HBD', 'LogP']
    plt.xticks(angles[:-1], categories, size=14)  # Axis labels

    # Draw Y-axis labels
    ax.set_rlabel_position(0)
    plt.yticks([1, 3, 5, 7], ["1", "3", "5", "7"], color="grey", size=12)
    plt.ylim(0, 7)  # Upper limit of the Y-axis

    # ------- PART 2: Add data to the chart -------

    # Mean data
    data = stats_mean.values
    data = np.append(data, data[0])  # Close the polygon
    ax.plot(angles, data, linewidth=3, linestyle='solid', color='purple', label="mean")

    # Mean + standard deviation data
    data_std_up = stats_mean.values + stats_std.values
    data_std_up = np.append(data_std_up, data_std_up[0])  # Close the polygon
    ax.plot(angles, data_std_up, linewidth=2, linestyle='dashed', color='limegreen', label="mean + std")

    # Mean - standard deviation data
    data_std_down = stats_mean.values - stats_std.values
    data_std_down = np.append(data_std_down, data_std_down[0])  # Close the polygon
    ax.plot(angles, data_std_down, linewidth=2, linestyle='dashed', color='limegreen', label="mean - std")

    # Add text with the total number of compounds in the dataset
    ax.text(-np.pi/3, 8, f'# Total data: {len(dataframe)}', size=14)

    # Area corresponding to Lipinski's Rule of Five (reference values)
    ro5_properties = [5, 5, 5, 5, 5]  # Represents the limits for MW/100, HBA/2, HBD, and LogP
    ax.fill(angles, ro5_properties, 'thistle', alpha=0.6, label="rule of five area")

    # Position the legend outside the figure for better visualization
    plt.legend(loc='upper right', bbox_to_anchor=(1.2, 1))

    # Display the chart
    plt.show()

In [None]:
# We plot the radarplot for the dataset of compounds (ALL).
plot_radar(df_molecule)

### Radar plot - Rof comfort: YES
Now we are going to repeat the process, but only with the molecules that passed the rule of five test.
We must first filter the set `rule_of_five_conform: yes`

In [None]:
# Filter the DataFrame to get only the compounds that comply with Lipinski's Rule of Five
df_molecule_Ro5_yes = df_molecule[df_molecule['rule_of_five_conform'] == 'yes']

# Reset the index of the DataFrame, dropping the previous index
df_molecule_Ro5_yes.reset_index(inplace=True, drop=True)

# Display the dataset of compounds that comply with the Rule of Five
df_molecule_Ro5_yes

We plot the radarplot for the filtered dataset

In [None]:
plot_radar(df_molecule_Ro5_yes)

## Radar plot - Rof comfort: NO
Now we are going to repeat the process, but only with the molecules that passed the rule of five test.
We must first filter the set `rule_of_five_conform: no`


In [None]:
# Filter the DataFrame to get only the compounds that do NOT comply with Lipinski's Rule of Five
df_molecule_Ro5_no = df_molecule[df_molecule['rule_of_five_conform'] == 'no']

# Reset the index of the DataFrame, dropping the previous index
df_molecule_Ro5_no.reset_index(inplace=True, drop=True)

# Display the dataset of compounds that do NOT comply with the Rule of Five
df_molecule_Ro5_no

We plot the radarplot for the dataset of compounds that violate the Ro5

In [None]:
plot_radar(df_molecule_Ro5_no)

## Quantitative Estimation of Drug-likeness (QED)
QED is a descriptor used to measure the drug-likeness of a compound. Although the Rule of Five (Ro5) is commonly used for this purpose, not all drugs strictly comply with its criteria. Approximately 16% of oral drugs fail to meet at least one of the Ro5 parameters, and 6% fail to meet two or more. To quantify the quality of compounds, the concept of desirability is applied, allowing for the establishment of a quantitative metric called QED (Quantitative Estimation of Drug-likeness). QED values range from zero, indicating the presence of all unfavorable properties, to one, representing a profile with all favorable properties<sup> **4** </sup>.

In [None]:
# URL of the CSV file containing the compound data
csv_url = 'https://raw.githubusercontent.com/ramirezlab/PILE/main/2.%20De%20datos%20a%20gráficas%3A%20Propiedades%20drug-likeness%20y%20similitud%20química%20con%20python/data/compounds_P49841_full.csv'

# Download the file from the URL
response = requests.get(csv_url)
response.encoding = 'utf-8'  # Ensure correct encoding

# Read the CSV file into a pandas DataFrame
df_output = pd.read_csv(StringIO(response.text), encoding='utf-8')

# Verify the available columns
print(df_output.columns)

# Convert SMILES to RDKit molecules and calculate QED
def calculate_qed(smiles):
    mol = Chem.MolFromSmiles(smiles)  # Convert SMILES to RDKit Mol object
    if mol:  # Check if the conversion was successful
        return QED.qed(mol)  # Calculate QED
    return None  # Return None if calculation was not possible

# Ensure the column containing SMILES is named 'smiles' or rename it if necessary
if 'smiles' not in df_output.columns:
    # Attempt to detect the correct name of the SMILES column
    smiles_col = [col for col in df_output.columns if 'smiles' in col.lower()]
    if smiles_col:
        df_output.rename(columns={smiles_col[0]: 'smiles'}, inplace=True)
    else:
        raise ValueError("No column with SMILES values was found in the file.")

# Apply the function to calculate QED and add it as a new column
df_output['QED'] = df_output['smiles'].apply(calculate_qed)

# Display the first rows of the DataFrame with the new QED column
df_output[['smiles', 'QED']].head()

To calculate the QED (Quantitative Estimation of Drug-likeness) for compounds that comply with Lipinski's Rule of Five.

In [None]:
# Create an explicit copy of the DataFrame df_molecule_Ro5_yes and assign it to df_molecule_QED
df_molecule_QED = df_molecule_Ro5_yes.copy()

# Verify the available columns
print(df_molecule_QED.columns)

# Attempt to find the correct column containing SMILES
smiles_col = [col for col in df_molecule_QED.columns if 'smiles' in col.lower()]
if smiles_col:
    df_molecule_QED = df_molecule_QED.rename(columns={smiles_col[0]: 'smiles'})
else:
    raise KeyError("No column with SMILES values was found in df_molecule_QED.")

# Function to calculate QED
def calculate_qed(smiles):
    mol = Chem.MolFromSmiles(smiles)  # Convert SMILES to an RDKit Mol object
    return QED.qed(mol) if mol else None  # Calculate QED only if the conversion was successful

# Apply the function to each row using .loc to avoid the warning
df_molecule_QED.loc[:, 'QED'] = df_molecule_QED['smiles'].apply(calculate_qed)

# Sort by QED in descending order
df_molecule_QED_sorted = df_molecule_QED.sort_values(by='QED', ascending=False).reset_index(drop=True)

# Select only the relevant columns
columns_to_show = ['smiles', 'molecule_chembl_id', 'MW', 'HBA', 'HBD', 'LogP', 'QED']
df_molecule_QED_sorted = df_molecule_QED_sorted[columns_to_show]

# Display the first rows of the DataFrame with the new QED column
df_molecule_QED_sorted.head(10)

In [None]:
# Sort the DataFrame by the 'QED' column in descending order
df_molecule_QED_sorted = df_molecule_QED.sort_values(by='QED', ascending=False).reset_index(drop=True)

# Select only the desired columns (adjust the names as needed)
columns_to_show = ['smiles', 'molecule_chembl_id', 'MW', 'HBA', 'HBD', 'LogP', 'QED']

# Display only the selected columns
df_molecule_QED_sorted[columns_to_show].head(10)

In [None]:
# Verify the available columns in the DataFrame
print(df_molecule_QED_sorted.columns)

In [None]:
def plot_radar(dataframe):
    # ------- PART 0: Scaled dataset / Metrics -------
    df = dataframe.copy()  # Create a copy of the original DataFrame to avoid modifying it directly

    # Scale MW, HBA, and other values to bring them to a similar scale
    df['MW*100'] = df['MW'] / 100
    df['HBA*2'] = df['HBA'] / 2
    df['QED*10'] = df['QED'] * 10  # Normalize QED so that 10 is equivalent to 1

    # Calculate the mean and standard deviation of the properties
    metrics_Ro5_stats_scaled = df[['MW*100', 'HBA*2', 'HBD', 'LogP', 'QED*10']].agg(["mean", "std"])
    stats_mean = metrics_Ro5_stats_scaled.loc['mean']  # Extract the mean
    stats_std = metrics_Ro5_stats_scaled.loc['std']  # Extract the standard deviation

    # ------- PART 1: Create the background of the radar chart -------

    # Number of variables to plot
    N = 5

    # Calculate the angles for each axis in the radar chart
    angles = [n / float(N) * 2 * pi for n in range(N)]
    angles += angles[:1]  # Close the polygon

    # Initialize the figure and polar axes
    fig = plt.figure(figsize=(8, 8))
    ax = plt.subplot(111, polar=True)

    # Define the categories for the chart
    categories = ['MW (g/mol)*100', '# HBA', '# HBD', 'LogP', 'QED*10']
    plt.xticks(angles[:-1], categories, size=14)  # Axis labels

    # Draw Y-axis labels
    ax.set_rlabel_position(0)
    plt.yticks([0.5, 2.5, 5, 7, 9, 11], ["0.5", "2.5", "5", "7", "9", "11"], color="grey", size=12)
    plt.ylim(0, 11)  # Upper limit of the Y-axis

    # ------- PART 2: Add data to the chart -------

    # Mean data
    data = stats_mean.values
    data = np.append(data, data[0])  # Close the polygon
    ax.plot(angles, data, linewidth=3, linestyle='solid', color='purple', label="Mean")

    # Mean + standard deviation data
    data_std_up = stats_mean.values + stats_std.values
    data_std_up = np.append(data_std_up, data_std_up[0])  # Close the polygon
    ax.plot(angles, data_std_up, linewidth=2, linestyle='dashed', color='limegreen', label="Mean + Std. Dev.")

    # Mean - standard deviation data
    data_std_down = stats_mean.values - stats_std.values
    data_std_down = np.append(data_std_down, data_std_down[0])  # Close the polygon
    ax.plot(angles, data_std_down, linewidth=2, linestyle='dashed', color='limegreen', label="Mean - Std. Dev.")

    # Add text with the total number of compounds in the dataset
    ax.text(-np.pi/3, 8, f'# Total data: {len(dataframe)}', size=14)

    # Area corresponding to Lipinski's Rule of Five (reference values)
    ro5_properties = [5, 5, 5, 5, 10]  # Adjusted with scaled values + normalized QED
    ro5_properties = np.append(ro5_properties, ro5_properties[0])  # Close the polygon

    ax.fill(angles, ro5_properties, 'thistle', alpha=0.7, label="Lipinski's Rule of 5 + QED")

    # Add legend
    plt.legend(loc='upper right')

    # Display the chart
    plt.show()

# Call the function with the DataFrame sorted by QED
plot_radar(df_molecule_QED_sorted)

## Practice Activity

Considering what has been reviewed in this second part, write a Python code that can:

Create a bar chart for the last 5 compounds that do not comply with Lipinski's Rule of Five, showing each compound against the 4 Lipinski rules with their respective limits, as well as the QED value.

Upon completion, you should prepare a document in "ipynb" format (Jupyter Notebook) that includes:

1. The proposed code for selecting the compounds to be plotted, along with its corresponding output.

2. The proposed code for creating the requested chart, along with its corresponding output.

## Conclusion

In this practice, we have learned about Lipinski's rule of five as a measure to estimate a compound's oral bioavailability and we have applied the rule on a dataset n order to filter it and discard those compounds that meet two or more of the criteria. Also, we learn dto make simple graphs such as bar graphs that allow us to visualize the data set as a whole or each compound of the data set. In addition, we learned to make scatterplots that allow us to observe the data set against the four criteria of the Lipinski rule. Finally, we build a more complex plot like the radar plot which allows us to compare multiple variables (Lipinski rules) on a single plot.

# References
1. Grogan, S., & Preuss, C. V. (2022). Pharmacokinetics. En StatPearls. StatPearls Publishing. http://www.ncbi.nlm.nih.gov/books/NBK557744/
2. Doogue, M. P., & Polasek, T. M. (2013). The ABCD of clinical pharmacokinetics. Therapeutic Advances in Drug Safety, 4(1), 5-7. https://doi.org/10.1177/2042098612469335
3. Turner, J. V., & Agatonovic-Kustrin, S. (2007). In silico prediction of oral bioavailability. En Comprehensive Medicinal Chemistry II (pp. 699-724). Elsevier. https://doi.org/10.1016/B0-08-045044-X/00147-4
4. Bickerton, G. R., Paolini, G. V., Besnard, J., Muresan, S., & Hopkins, A. L. (2012). Quantifying the chemical beauty of drugs. Nature chemistry, 4(2), 90–98. https://doi.org/10.1038/nchem.1243