# DxGPT Evaluation Dashboard

### This repository contains the code and resources for generating a dashboard that facilitates the comparison of prediction scores across different models. It is designed to streamline the evaluation process of machine learning models by providing visual insights into their performance metrics.

### For seeing the dashboard, you can use this repo in Google Colab with the following link:
### [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/foundation29org/dxgpt_testing/blob/main/dashboard.ipynb)

### Therefore, run all the cells in the notebook in order to see the dashboards.

# ----------

### Este repositorio contiene el código y los recursos para generar un tablero que facilite el comparativo de puntuaciones de predicción entre diferentes modelos de inteligencia artificial. Este dashboard está diseñado para simplificar el proceso de evaluación de los modelos de inteligencia artificial mediante la visualización de sus métricas de rendimiento.

### Para ver el tablero, puedes usar este repositorio en Google Colab con el siguiente enlace:
### [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/foundation29org/dxgpt_testing/blob/main/dashboard.ipynb)

### Para continuar, ejecuta todas las celdas del notebook en orden para ver los tableros.

In [None]:
!git clone https://github.com/foundation29org/dxgpt_testing.git
!pip install plotly
%cd dxgpt_testing
%ls

### In the previous cell, we cloned the repository and installed the necessary packages for this notebook.
### In the next cell, we filter the files in the "data" folder to only include those that start with "scores" and end with ".csv".
### We then print the list of files to the console.

# ----------

### En la anterior celda, clonamos el repositorio y instalamos los paquetes necesarios para este notebook.
### En la siguiente celda, filtramos los archivos en la carpeta "data" para incluir solo aquellos que comiencen con "scores" y terminen con ".csv".
### Luego, imprimimos la lista de archivos al consola.

In [1]:
import os
import pandas as pd
import plotly.graph_objects as go
from plotly.subplots import make_subplots
# This file is used to generate a dashboard for easily comparing the scores of the predictions between different models.

# Filter data folder for files starting with "scores"
path = "data"
files = [f for f in os.listdir(path) if f.startswith('scores') and f.endswith('.csv')]
print(files)

files_URG = [f for f in files if f.startswith('scores_URG_Torre_Dic_200')]

files_v2 = [f for f in files if f.startswith('scores_v2')]

files_RAMEDIS = [f for f in files if f.startswith('scores_RAMEDIS')]

files_PUMCH_ADM = [f for f in files if f.startswith('scores_PUMCH_ADM')]

print("Files Urgencias HM:")
print(files_URG)

print("Files v2:")
print(files_v2)

print("Files RAMEDIS:")
print(files_RAMEDIS)

print("Files PUMCH_ADM:")
print(files_PUMCH_ADM)

['scores.csv', 'scores_RAMEDIS_gpt4_0613.csv', 'scores_medisearch_v2_gpt4turbo1106.csv', 'scores_nov.csv', 'scores_v2_mistralmoe.csv', 'scores_URG_Torre_Dic_200_gpt4turbo0409.csv', 'scores_URG_Torre_Dic_200_mistralmoe.csv', 'scores_turbo.csv', 'scores_URG_Torre_Dic_200_improved_c3opus.csv', 'scores_v2_mistral7b.csv', 'scores_medisearch_v2_improved_c3sonnet.csv', 'scores_URG_Torre_Dic_200_c3sonnet.csv', 'scores_RAMEDIS_geminipro15.csv', 'scores_RAMEDIS_c3sonnet.csv', 'scores_URG_Torre_Dic_200_gpt4_0613.csv', 'scores_claude_v2_gpt4_0613.csv', 'scores_PUMCH_ADM_mistralmoe.csv', 'scores_v2_improved_c3sonnet.csv', 'scores_v2_gpt4_0613.csv', 'scores_medisearch_v2_improved_gpt4_0613.csv', 'scores_medisearch_v2_gpt4_0613.csv', 'scores_URG_Torre_Dic_200_gpt4turbo1106.csv', 'scores_PUMCH_ADM_llama2_7b.csv', 'scores_RAMEDIS_mistralmoe.csv', 'scores_v2_c3sonnet.csv', 'scores_claude_v2_gpt4turbo1106.csv', 'scores_2.csv', 'scores_RAMEDIS_llama2_7b.csv', 'scores_RAMEDIS_gpt4turbo0409.csv', 'scores_UR

### We define a function to calculate the statistics for each file group
### This function takes a DataFrame as input and calculates the number of P1, P5, and P0 scores, as well as the strict and lenient accuracy.
### It returns the counts for each score and the calculated accuracy metrics.

### P1 is the score when the correct diagnosis appears in the first position
### P5 is the score when the correct diagnosis appears in any of the five positions
### P0 is the score when the correct diagnosis does not appear at all

# ----------

### Definimos una función para calcular las estadísticas de cada grupo de archivos
### Esta función toma un DataFrame como entrada y calcula el nmero de P1, P5 y P0, así como la precisión estricta y la precisión leniente.
### Devuelve los contadores de cada puntuación y las métricas de precisión calculadas.

### P1 es la puntuación cuando el diagnóstico correcto aparece en la primera posición
### P5 es la puntuación cuando el diagnóstico correcto aparece en cualquiera de las cinco posiciones
### P0 es la puntuación cuando el diagnóstico correcto no aparece en absoluto

In [2]:
def get_stats_for_df(df):
    count_p1 = df['Score'].value_counts()['P1']
    count_p5 = df[df['Score'].isin(['P2', 'P3', 'P4', 'P5'])].shape[0]
    count_p0 = df['Score'].value_counts()['P0']

    # Calculate total number of predictions
    total_predictions = count_p1 + count_p5 + count_p0

    # Calculate Strict Accuracy
    strict_accuracy = (count_p1 / total_predictions) * 100

    # Calculate Lenient Accuracy
    lenient_accuracy = ((count_p1 + count_p5) / total_predictions) * 100

    return count_p1, count_p5, count_p0, strict_accuracy, lenient_accuracy


### In the next cell, we create a list to store the data for each file group
### We then process each file in the files_v2 list and store the data in the data_v2 list
### We do the same for the files_URG list, storing the data in the data_URG list

### Then we create a DataFrame for each file group
### We add the first table to the first subplot
### We add the second table to the second subplot
### We update the layout and display the dashboard

# ----------

### En la siguiente celda, creamos una lista para almacenar los datos de cada grupo de archivos
### Luego, procesamos cada archivo de la lista files_v2 y almacenamos los datos en la lista data_v2
### Hacemos lo mismo para la lista files_URG, almacenando los datos en la lista data_URG

### Luego, creamos un DataFrame para cada grupo de archivos
### Añadimos la primera tabla al primer subplot
### Añadimos la segunda tabla al segundo subplot
### Actualizamos el diseño y mostramos el tablero

In [3]:
# Create a list to store the data for each file group
data_v2 = []
data_URG = []
data_RAMEDIS = []
data_PUMCH_ADM = []

# Process files_v2 and store the data
for file in files_v2:
    df = pd.read_csv(f'{path}/{file}')
    count_p1, count_p5, count_p0, strict_accuracy, lenient_accuracy = get_stats_for_df(df)
    data_v2.append([file, count_p1, count_p5, count_p0, strict_accuracy, lenient_accuracy])

# Process files_URG and store the data
for file in files_URG:
    df = pd.read_csv(f'{path}/{file}')
    count_p1, count_p5, count_p0, strict_accuracy, lenient_accuracy = get_stats_for_df(df)
    data_URG.append([file, count_p1, count_p5, count_p0, strict_accuracy, lenient_accuracy])

for file in files_RAMEDIS:
    df = pd.read_csv(f'{path}/{file}')
    count_p1, count_p5, count_p0, strict_accuracy, lenient_accuracy = get_stats_for_df(df)
    data_RAMEDIS.append([file, count_p1, count_p5, count_p0, strict_accuracy, lenient_accuracy])

for file in files_PUMCH_ADM:
    df = pd.read_csv(f'{path}/{file}')
    count_p1, count_p5, count_p0, strict_accuracy, lenient_accuracy = get_stats_for_df(df)
    data_PUMCH_ADM.append([file, count_p1, count_p5, count_p0, strict_accuracy, lenient_accuracy])

# Create DataFrames for each file group
df_v2 = pd.DataFrame(data_v2, columns=['File', 'P1', 'P5', 'P0', 'Strict Accuracy', 'Lenient Accuracy'])
df_URG = pd.DataFrame(data_URG, columns=['File', 'P1', 'P5', 'P0', 'Strict Accuracy', 'Lenient Accuracy'])
df_RAMEDIS = pd.DataFrame(data_RAMEDIS, columns=['File', 'P1', 'P5', 'P0', 'Strict Accuracy', 'Lenient Accuracy'])
df_PUMCH_ADM = pd.DataFrame(data_PUMCH_ADM, columns=['File', 'P1', 'P5', 'P0', 'Strict Accuracy', 'Lenient Accuracy'])

# Create subplots with 2 rows and 1 column, specifying subplot type as 'domain'
fig = make_subplots(rows=4, cols=1, subplot_titles=('v2 Files', 'URG Files', 'RAMEDIS Files', 'PUMCH_ADM Files'), specs=[[{'type': 'domain'}], [{'type': 'domain'}], [{'type': 'domain'}], [{'type': 'domain'}]])

# Add the first table to the first subplot
fig.add_trace(
    go.Table(
        header=dict(values=list(df_v2.columns),
                    fill_color='paleturquoise',
                    align='left'),
        cells=dict(values=[df_v2.File, df_v2.P1, df_v2.P5, df_v2.P0, df_v2['Strict Accuracy'], df_v2['Lenient Accuracy']],
                   fill_color='lavender',
                   align='left')
    ),
    row=1, col=1
)

# Add the second table to the second subplot
fig.add_trace(
    go.Table(
        header=dict(values=list(df_URG.columns),
                    fill_color='paleturquoise',
                    align='left'),
        cells=dict(values=[df_URG.File, df_URG.P1, df_URG.P5, df_URG.P0, df_URG['Strict Accuracy'], df_URG['Lenient Accuracy']],
                   fill_color='lavender',
                   align='left')
    ),
    row=2, col=1
)

fig.add_trace(
    go.Table(
        header=dict(values=list(df_RAMEDIS.columns),
                    fill_color='paleturquoise',
                    align='left'),
        cells=dict(values=[df_RAMEDIS.File, df_RAMEDIS.P1, df_RAMEDIS.P5, df_RAMEDIS.P0, df_RAMEDIS['Strict Accuracy'], df_RAMEDIS['Lenient Accuracy']],
                   fill_color='lavender',
                   align='left')
    ),
    row=3, col=1
)

fig.add_trace(
    go.Table(
        header=dict(values=list(df_PUMCH_ADM.columns),
                    fill_color='paleturquoise',
                    align='left'),
        cells=dict(values=[df_PUMCH_ADM.File, df_PUMCH_ADM.P1, df_PUMCH_ADM.P5, df_PUMCH_ADM.P0, df_PUMCH_ADM['Strict Accuracy'], df_PUMCH_ADM['Lenient Accuracy']],
                   fill_color='lavender',
                   align='left')
    ),
    row=4, col=1
)

# Update the layout
fig.update_layout(
    title='Prediction Scores Dashboard',
    height=1200,
)

# Display the dashboard
fig.show()

### In the next cell, the difference is that we also discriminate between improved and non-improved prompts
### We create lists to store the data for each file group

### We then repeat as in the previous cell, but showing the graphs for the improved prompts

# ----------

### En la siguiente celda, la diferencia es que discriminamos entre prompts mejorados y no mejorados
### Creamos listas para almacenar los datos de cada grupo de archivos

### Repetimos lo anterior, pero mostrando los gráficos para los prompts mejorados

In [4]:
# Create lists to store the data for each file group
data_v2 = []
data_v2_improved = []
data_URG = []
data_URG_improved = []

# Process files_v2 and store the data
for file in files_v2:
    df = pd.read_csv(f'{path}/{file}')
    count_p1, count_p5, count_p0, strict_accuracy, lenient_accuracy = get_stats_for_df(df)
    if 'improved' in file:
        data_v2_improved.append([file, count_p1, count_p5, count_p0, strict_accuracy, lenient_accuracy])
    else:
        data_v2.append([file, count_p1, count_p5, count_p0, strict_accuracy, lenient_accuracy])

# Process files_URG and store the data
for file in files_URG:
    df = pd.read_csv(f'{path}/{file}')
    count_p1, count_p5, count_p0, strict_accuracy, lenient_accuracy = get_stats_for_df(df)
    if 'improved' in file:
        data_URG_improved.append([file, count_p1, count_p5, count_p0, strict_accuracy, lenient_accuracy])
    else:
        data_URG.append([file, count_p1, count_p5, count_p0, strict_accuracy, lenient_accuracy])

# Create DataFrames for each file group
df_v2 = pd.DataFrame(data_v2, columns=['File', 'P1', 'P5', 'P0', 'Strict Accuracy', 'Lenient Accuracy'])
df_v2_improved = pd.DataFrame(data_v2_improved, columns=['File', 'P1', 'P5', 'P0', 'Strict Accuracy', 'Lenient Accuracy'])
df_URG = pd.DataFrame(data_URG, columns=['File', 'P1', 'P5', 'P0', 'Strict Accuracy', 'Lenient Accuracy'])
df_URG_improved = pd.DataFrame(data_URG_improved, columns=['File', 'P1', 'P5', 'P0', 'Strict Accuracy', 'Lenient Accuracy'])

# Create subplots with 4 rows and 1 column, specifying subplot type as 'domain'
fig = make_subplots(rows=4, cols=1, subplot_titles=('v2 Files', 'v2 Improved Files', 'URG Files', 'URG Improved Files'),
                    specs=[[{'type': 'domain'}], [{'type': 'domain'}], [{'type': 'domain'}], [{'type': 'domain'}]])

# Add the tables to the respective subplots
fig.add_trace(go.Table(
    header=dict(values=list(df_v2.columns), fill_color='paleturquoise', align='left'),
    cells=dict(values=[df_v2.File, df_v2.P1, df_v2.P5, df_v2.P0, df_v2['Strict Accuracy'], df_v2['Lenient Accuracy']],
               fill_color='lavender', align='left')
), row=1, col=1)

fig.add_trace(go.Table(
    header=dict(values=list(df_v2_improved.columns), fill_color='paleturquoise', align='left'),
    cells=dict(values=[df_v2_improved.File, df_v2_improved.P1, df_v2_improved.P5, df_v2_improved.P0, df_v2_improved['Strict Accuracy'], df_v2_improved['Lenient Accuracy']],
               fill_color='lavender', align='left')
), row=2, col=1)

fig.add_trace(go.Table(
    header=dict(values=list(df_URG.columns), fill_color='paleturquoise', align='left'),
    cells=dict(values=[df_URG.File, df_URG.P1, df_URG.P5, df_URG.P0, df_URG['Strict Accuracy'], df_URG['Lenient Accuracy']],
               fill_color='lavender', align='left')
), row=3, col=1)

fig.add_trace(go.Table(
    header=dict(values=list(df_URG_improved.columns), fill_color='paleturquoise', align='left'),
    cells=dict(values=[df_URG_improved.File, df_URG_improved.P1, df_URG_improved.P5, df_URG_improved.P0, df_URG_improved['Strict Accuracy'], df_URG_improved['Lenient Accuracy']],
               fill_color='lavender', align='left')
), row=4, col=1)

# Update the layout
fig.update_layout(
    title='Prediction Scores Dashboard',
    height=1200
)

# Display the dashboard
fig.show()

### Lastly, we create a dashboard with 4 rows and 3 columns
### We add the tables to the respective subplots
### We also create graphs for the accuracy and scores for each file group
### We update the layout and display the dashboard

# ----------

### Por último, creamos un tablero con 4 filas y 3 columnas
### Añadimos las tablas a los subplots correspondientes
### También creamos gráficos para la precisión y puntuaciones de cada grupo de archivos
### Actualizamos el diseño y mostramos el tablero

In [5]:
# Create subplots with 4 rows and 3 columns
fig = make_subplots(rows=4, cols=3, subplot_titles=('v2 Files', 'v2 Accuracy', 'v2 Scores',
                                                    'v2 Improved Files', 'v2 Improved Accuracy', 'v2 Improved Scores',
                                                    'URG Files', 'URG Accuracy', 'URG Scores',
                                                    'URG Improved Files', 'URG Improved Accuracy', 'URG Improved Scores'),
                    specs=[[{'type': 'domain'}, {'type': 'xy'}, {'type': 'xy'}],
                           [{'type': 'domain'}, {'type': 'xy'}, {'type': 'xy'}],
                           [{'type': 'domain'}, {'type': 'xy'}, {'type': 'xy'}],
                           [{'type': 'domain'}, {'type': 'xy'}, {'type': 'xy'}]])

# Add the tables to the respective subplots
fig.add_trace(go.Table(
    header=dict(values=list(df_v2.columns), fill_color='paleturquoise', align='left'),
    cells=dict(values=[df_v2.File, df_v2.P1, df_v2.P5, df_v2.P0, df_v2['Strict Accuracy'], df_v2['Lenient Accuracy']],
               fill_color='lavender', align='left')
), row=1, col=1)

fig.add_trace(go.Table(
    header=dict(values=list(df_v2_improved.columns), fill_color='paleturquoise', align='left'),
    cells=dict(values=[df_v2_improved.File, df_v2_improved.P1, df_v2_improved.P5, df_v2_improved.P0, df_v2_improved['Strict Accuracy'], df_v2_improved['Lenient Accuracy']],
               fill_color='lavender', align='left')
), row=2, col=1)

fig.add_trace(go.Table(
    header=dict(values=list(df_URG.columns), fill_color='paleturquoise', align='left'),
    cells=dict(values=[df_URG.File, df_URG.P1, df_URG.P5, df_URG.P0, df_URG['Strict Accuracy'], df_URG['Lenient Accuracy']],
               fill_color='lavender', align='left')
), row=3, col=1)

fig.add_trace(go.Table(
    header=dict(values=list(df_URG_improved.columns), fill_color='paleturquoise', align='left'),
    cells=dict(values=[df_URG_improved.File, df_URG_improved.P1, df_URG_improved.P5, df_URG_improved.P0, df_URG_improved['Strict Accuracy'], df_URG_improved['Lenient Accuracy']],
               fill_color='lavender', align='left')
), row=4, col=1)

# Add accuracy comparison for v2 files
fig.add_trace(go.Bar(x=df_v2['File'], y=df_v2['Strict Accuracy'], name='Strict Accuracy', marker_color='blue'), row=1, col=2)
fig.add_trace(go.Bar(x=df_v2['File'], y=df_v2['Lenient Accuracy'], name='Lenient Accuracy', marker_color='green'), row=1, col=2)

# Add score comparison for v2 files
fig.add_trace(go.Bar(x=df_v2['File'], y=df_v2['P1'], name='P1', marker_color='red'), row=1, col=3)
fig.add_trace(go.Bar(x=df_v2['File'], y=df_v2['P5'], name='P5', marker_color='orange'), row=1, col=3)
fig.add_trace(go.Bar(x=df_v2['File'], y=df_v2['P0'], name='P0', marker_color='purple'), row=1, col=3)

# Add accuracy comparison for v2 improved files
fig.add_trace(go.Bar(x=df_v2_improved['File'], y=df_v2_improved['Strict Accuracy'], name='Strict Accuracy', marker_color='blue'), row=2, col=2)
fig.add_trace(go.Bar(x=df_v2_improved['File'], y=df_v2_improved['Lenient Accuracy'], name='Lenient Accuracy', marker_color='green'), row=2, col=2)

# Add score comparison for v2 improved files
fig.add_trace(go.Bar(x=df_v2_improved['File'], y=df_v2_improved['P1'], name='P1', marker_color='red'), row=2, col=3)
fig.add_trace(go.Bar(x=df_v2_improved['File'], y=df_v2_improved['P5'], name='P5', marker_color='orange'), row=2, col=3)
fig.add_trace(go.Bar(x=df_v2_improved['File'], y=df_v2_improved['P0'], name='P0', marker_color='purple'), row=2, col=3)

# Add accuracy comparison for URG files
fig.add_trace(go.Bar(x=df_URG['File'], y=df_URG['Strict Accuracy'], name='Strict Accuracy', marker_color='blue'), row=3, col=2)
fig.add_trace(go.Bar(x=df_URG['File'], y=df_URG['Lenient Accuracy'], name='Lenient Accuracy', marker_color='green'), row=3, col=2)

# Add score comparison for URG files
fig.add_trace(go.Bar(x=df_URG['File'], y=df_URG['P1'], name='P1', marker_color='red'), row=3, col=3)
fig.add_trace(go.Bar(x=df_URG['File'], y=df_URG['P5'], name='P5', marker_color='orange'), row=3, col=3)
fig.add_trace(go.Bar(x=df_URG['File'], y=df_URG['P0'], name='P0', marker_color='purple'), row=3, col=3)

# Add accuracy comparison for URG improved files
fig.add_trace(go.Bar(x=df_URG_improved['File'], y=df_URG_improved['Strict Accuracy'], name='Strict Accuracy', marker_color='blue'), row=4, col=2)
fig.add_trace(go.Bar(x=df_URG_improved['File'], y=df_URG_improved['Lenient Accuracy'], name='Lenient Accuracy', marker_color='green'), row=4, col=2)

# Add score comparison for URG improved files
fig.add_trace(go.Bar(x=df_URG_improved['File'], y=df_URG_improved['P1'], name='P1', marker_color='red'), row=4, col=3)
fig.add_trace(go.Bar(x=df_URG_improved['File'], y=df_URG_improved['P5'], name='P5', marker_color='orange'), row=4, col=3)
fig.add_trace(go.Bar(x=df_URG_improved['File'], y=df_URG_improved['P0'], name='P0', marker_color='purple'), row=4, col=3)

# Update the layout
fig.update_layout(
    title='Prediction Scores Dashboard',
    height=1600,
    showlegend=True
)

# Display the dashboard
fig.show()