# Analysis of Results

This notebook aims to analyze the results of the Machine Learning models. We will use the results obtained during the induction phase for this analysis. The model that achieves the best performance will be selected for a more detailed evaluation, considering different areas and units.<br>  
**Source file:** select_202425091103-[GPT, Llama, etc].pickle<br>  

In [1]:
import logging
logging.basicConfig(level=logging.WARNING)

In [2]:
cnpq = ['cnpq_area_level_1',
        'cnpq_area_level_2',
        'cnpq_area_level_3',
        'cnpq_area_level_4']

## Reading model performance files

In [3]:
import gzip, os, pickle

In [4]:
files = [f for f in os.listdir('../results') if f.startswith('select_202425091103')]
files.sort()

In [5]:
performances, configs = [], []
for file in files:
    with gzip.open('../results/' + file, 'rb') as handle:
        performances.append(pickle.load(handle))
        configs.append(file.replace('select_202425091103-', '').replace('.pickle', '').split())

## Analyzing Model Performance

In [6]:
import pandas as pd
pd.set_option('display.max_rows', None)

In [7]:
results = [
    {'Transformador': config[1] if len(config) == 4 else config[0], 
     'Estratégia': config[2] if len(config) == 4 else config[1], 
     'Classificador': config[3] if len(config) == 4 else config[2]}
    for config in configs
]

In [8]:
df = pd.concat([pd.DataFrame(results), pd.DataFrame(performances)], axis=1)

In [9]:
df

Unnamed: 0,Transformador,Estratégia,Classificador,F1-score,Precision,Recall,Level 0,Level 1,Level 2,Level 3
0,-,LCPPN,BERT,0.558247,0.531882,0.587363,0.779392,0.557961,0.352699,0.05755
1,-,LCPPN,GPT,0.339037,0.328959,0.349751,0.591145,0.308737,0.301771,0.806375
2,-,LCPPN,Llama,0.33114,0.312534,0.352102,0.535891,0.310862,0.339906,0.828571
3,BoW,FLAT,DT,0.416853,0.41732,0.416387,0.65242,0.369067,0.372373,0.8183
4,BoW,FLAT,NB,0.43328,0.434143,0.43242,0.662102,0.386541,0.391263,0.82137
5,BoW,FLAT,RF,0.547926,0.543603,0.552318,0.752184,0.521133,0.509563,0.845691
6,BoW,FLAT,SVM,0.47949,0.475002,0.484064,0.676033,0.444746,0.467532,0.831169
7,BoW,LCPL,DT,0.387337,0.376813,0.398466,0.671901,0.332231,0.342739,0.81039
8,BoW,LCPL,NB,0.360524,0.341081,0.382318,0.604486,0.32255,0.357497,0.823259
9,BoW,LCPL,RF,0.531616,0.513979,0.550507,0.755372,0.514994,0.505549,0.847107


## Selecting the best model

In [10]:
df.loc[df['F1-score'].idxmax()]

Transformador       TFIDF
Estratégia           LCPN
Classificador         SVM
F1-score         0.597351
Precision        0.573664
Recall           0.623078
Level 0          0.801417
Level 1          0.591027
Level 2          0.575325
Level 3          0.877568
Name: 46, dtype: object

## Testing the best model

In [11]:
import sys
sys.path.append(os.path.abspath('../src'))

In [12]:
from svm import SVM
from embedding import Normalizer, TFIDF

  from tqdm.autonotebook import tqdm, trange


In [13]:
#TFIDF-LCPN-SVM
with gzip.open('../models/' + files[46], 'rb') as handle:
    model = pickle.load(handle)

In [14]:
#Ciências da Saúde;Saúde Coletiva;Epidemiologia
model.predict(['Events preceding death among chikungunya virus infected patients: a systematic review Since its re-emergence in the late 1990s, there have been reports of Chikungunya fever (CHIK-F) presenting with severe or atypical findings. There is little knowledge regarding the clinical events leading to the death of patients with CHIK-F. This study aimed to systematically review the literature regarding CHIK-F and identify clinical features preceding death. We searched PubMed, Scopus, Embase, Lilacs, and IsiWeb for case-reports, case-series, or cohorts of CHIK-F reporting at least one death, up to December 2019. Fifty-seven reports were analyzed, including 2140 deaths. Data about specific clinical events that precede death are scarce. The central tendency of time between disease onset and death ranged from 2 days to 150 days. The most common clinical findings among decedents were fever (22.0%), arthralgia (15.7%), myalgia (10.7%), and headache (8.2%). Excluding pediatric populations, the reported central tendency of age among the decedents was 53 or older, with a non-weighted median of 67, ranging up to 80 years old. Authors mentioned organic dysfunction in 91.2% reports. Among all the 2140 decedents, the most common dysfunctions were cardiovascular (7.2%), respiratory (6.4%), neurological (5.4%), renal (4.2%), liver (3.0%), and hematological (1.3%) dysfunction. Exacerbation of previous diabetes (5.6%) or hypertension (6.9%) was mentioned as conditions preceding death. Currently, older age, primary neurological, cardiovascular, or respiratory dysfunction and a previous diagnosis of diabetes or hypertension are the main clinical events preceding death Chikungunya fever,Chikungunya Fever,Chikungunya virus,Death,Disease progression,Disease Progression,Mortality Journal of the Brazilian Society of Tropical Medicine Infectious Diseases ; Microbiology  ; Parasitology Immunology and Microbiology; Medicine'])

array([['Ciências da Saúde', 'Medicina', 'Clínica Médica',
        'Doenças Infecciosas e Parasitárias']], dtype='<U228')

In [15]:
#Ciências da Saúde;Medicina;Saúde Materno-Infantil
model.predict(['Does pregabalin act in pain control after lateral pharyngoplasties and tonsillectomies? A pilot study Objective Some studies have pointed to gabapentinoids as promising medications in postoperative pain control. The objective of the present study was to evaluate the efficacy of pregabalin in reducing postoperative pain in tonsillectomy and lateral pharyngoplasties. Study design Double-blind randomized controlled trial. Setting Tertiary care center. Methods A double-blind randomized controlled trial was conducted with patients undergoing tonsillectomies and lateral pharyngoplasties between Aug 29, 2017, and Oct 31, 2020. Data of interest such as opioid consumption, pain scores, and adverse outcomes such as dizziness, nausea, headache, and sedation within 7 days following surgeries were analyzed. Results No statistically significant difference was observed in pain scores and opioid consumption between the groups studied in the pilot project. The use of pregabalin was associated with lower incidence of dizziness compared to controls. Conclusion Gabapentinoids, especially pregabalin, are drugs whose potential for controlling pain after pharyngeal surgery, such as tonsillectomy and sleep apnea surgery, still needs to be more fully evaluated. After the conclusion of the present study, we hope to answer this question about the role of pregabalin in oropharyngeal surgeries Pain,Postoperative,Pregabalin,Tonsillectomy Sleep and Breathing Otorhinolaryngology ; Neurology Medicine'])

array([['Ciências da Saúde', 'Medicina', 'Saúde Materno-Infantil', '']],
      dtype='<U228')

In [16]:
#Ciências Biológicas;Parasitologia
model.predict(['Current understancing of the Trypanosoma cruzi-cardiomyocyte interaction Trypanosoma cruzi, the etiological agent of Chagas disease, exhibits multiple strategies to ensure its establishment and persistence in the host. Although this parasite has the ability to infect different organs, heart impairment is the most frequent clinical manifestation of the disease. Advances in knowledge of T cruzi-cardiomyocyte interactions have contributed to a better understanding of the biological events involved in the pathogenesis of Chagas disease. This brief review focuses on the current understanding of molecules involved in T cruzi-cardiomyocyte recognition, the mechanism of invasion, and on the effect of intracellular development of T cruzi on the structural organization and molecular response of the target cell Apoptosis,Apoptosis,Cardiomyocyte,Cardiomyocyte,Cell junction,Cell recognition,Cytoskeleton,Endocytosis,Endocytosis,Extracel-lular matrix,Extracellular matrix,Cell junction,Extracellular matrix,Cell recognition,Cell recognition,T. cruzi,Trypanosoma cruzi,Trypanosoma cruzi, Frontiers in Immunology Immunology ; Immunology and Allergy Immunology and Microbiology; Medicine'])

array([['Ciências Biológicas', 'Parasitologia',
        'Protozoologia de Parasitos', 'Protozoologia Parasitária Humana']],
      dtype='<U228')

In [17]:
#Ciências Exatas e da Terra;Química;Química Orgânica
model.predict(['Antimycobacterial Profile of 5-phenyl-1,3,4-thiadiazole-2-arylhydrazone Derivatives In this work we report the tuberculostatic profile of a series of 5-phenyl-1,3,4-thiadiazole-2-arylhydrazone derivatives (1a-m). The evaluation of their in vitro antibacterial activity against Mycobacterium tuberculosis H37Rv was expressed as the minimum inhibitory concentration (MIC) in mu g/mL. The compounds 1g and 1h exhibited inhibitory activity of 6.25 mu g/mL and 1.25 mu g/mL respectively, and can be considered as a good start point for the discovery of new lead compounds in the field of multi-drug resistant tuberculosis 1,3,4-thiadiazoles,Antimycobacterial activity,N-arylhydrazones,Tuberculosis Letters in drug Design & Discovery Pharmaceutical Science ; Drug Discovery ; Molecular Medicine Biochemistry, Genetics and Molecular Biology; Pharmacology, Toxicology and Pharmaceutics'])

array([['Ciências Biológicas', 'Microbiologia', 'Microbiologia Aplicada',
        'Microbiologia Médica']], dtype='<U228')

In [18]:
#Ciências Biológicas;Genética;Genética Molecular e de Microorganismos
model.predict(['Gene regulatory network inference and analysis of multidrug-resistant Pseudomonas aeruginosa BACKGROUND Healthcare-associated infections caused by bacteria such as Pseudomonas aeruginosa are a major public health problem worldwide. Gene regulatory networks (GRN) computationally represent interactions among regulatory genes and their targets. They are an important approach to help understand bacterial behaviour and to provide novel ways of overcoming scientific challenges, including the identification of potential therapeutic targets and the development of new drugs. OBJECTIVES The goal of this study was to reconstruct the multidrug-resistant (MDR) P. aeruginosa GRN and to analyse its topological properties. METHODS The methodology used in this study was based on gene orthology inference using the reciprocal best hit method. We used the genome of P. aeruginosa CCBH4851 as the basis of the reconstruction process. This MDR strain is representative of the sequence type 277, which was involved in an endemic outbreak in Brazil. FINDINGS We obtained a network with a larger number of regulatory genes, target genes and interactions as compared to the previously reported network. Topological analysis results are in accordance with the complex network representation of biological processes. MAIN CONCLUSIONS The properties of the network were consistent with the biological features of P. aeruginosa. To the best of our knowledge, the P. aeruginosa GRN presented here is the most complete version available to date Biologia Computacional,Biologia de Sistemas,Gene regulatory network,Multid rug resistance,Multidrug resistance,Pseudomonas aeruginosa Memories of the Oswaldo Cruz Institute Medicine  ; Microbiology Medicine'])

array([['Ciências Biológicas', 'Genética',
        'Genética Molecular e de Microorganismos', '']], dtype='<U228')

In [19]:
#Engenharias;Engenharia Sanitária;Saneamento Ambiental;Qualidade do Ar, das Águas e do Solo
model.predict(["Regulation of the synthetic estrogen 17α-ethinylestradiol in water bodies in Europe, the United States, and Brazil The synthetic estrogen 17 alpha-ethinylestradiol, the principal component of oral contraceptives, has been identified as one of the main compounds accounting for adverse effects on the endocrine system in various species. This study aimed to analyze the state-of-the-art in legislation and guidelines for the control of this synthetic estrogen in water bodies in Europe and the United States and to draw a parallel with the Brazilian reality. Countries have generally attempted to expand the regulation and monitoring of certain emerging micropollutants not previously covered by legislation. Europe is more advanced in terms of water quality, while in the United States this estrogen is only regulated in water for human consumption. Brazil still lacks legal provisions or standards for this estrogen, which can be explained by the relatively limited maturity of the country's system for controlling water pollutants Water Quality Criteria,Water Quality Criteria,Water Quality Criteria,Water Quality Criteria,Endocrine Disruptors,Endocrine Disruptors,Endocrine Disruptors,Endocrine Disruptors,Endocrine Disruptors,Ethinyl Estradiol,Ethinyl Estradiol,Ethinylestradiol,Water quality criteria ,Water Quality Criteria Public Health Notebooks Medicine  ; Public Health, Environmental and Occupational Health Medicine"])

array([['Ciências da Saúde', 'Saúde Coletiva', 'Saúde Pública', '']],
      dtype='<U228')

## Analyzing overall performance

In [20]:
import numpy as np
from collections import defaultdict
from sklearn.model_selection import train_test_split

In [21]:
# %load ../src/evaluate.py
from hiclass.metrics import precision, recall, f1

def accuracy_class(y_true, y_pred, level):

    total, hits = defaultdict(int), defaultdict(int)

    for t, p in zip(y_true, y_pred):

        total[t[level]] += 1
        if t[level] == p[level]:
            hits[t[level]] += 1

    return {classe: hits[classe] / total[classe] for classe in total}

def accuracy_unit(units, true, pred, level):

    acc = []
    for unit in set(units):
        true_vals = [t[level] for u, t in zip(units, true) if u == unit]
        pred_vals = [p[level] for u, p in zip(units, pred) if u == unit]
        acc.append((unit, true_vals, pred_vals))

    return acc

def accuracy_level(y_true, y_pred, level):
    acc = [(1 if true[level] != '' and true[level] == pred[level] else 0) for true, pred in zip(y_true, y_pred)]
    return sum(acc)/len(acc)

def flatly(y_true, y_pred):
    return {'Level ' + str(level) : accuracy_level(y_true, y_pred, level) for level in range(4)}

def hierarchy(y_true, y_pred, type='micro'):
    return {'F1-score': f1(y_true, y_pred, type),
            'Precision': precision(y_true, y_pred, type),
            'Recall': recall(y_true, y_pred, type)}

def performance(y_true, y_pred):
    return hierarchy(y_true, y_pred) | flatly(y_true, y_pred)

In [22]:
df = pd.read_csv('../input/select_202425091103-translated.csv', dtype=str, na_filter=False)

In [23]:
true = [row[cnpq].tolist() for idx, row in df.iterrows()]

In [None]:
pred = model.predict(df['all'])

In [None]:
performance(true, pred)

## Analyzing performance by level

In [26]:
import matplotlib.pyplot as plt

### Broad areas

In [27]:
data = accuracy_class(true, pred, 0)

In [28]:
data = dict(sorted(data.items(), key=lambda item: item[1], reverse=True))

In [None]:
fig, ax = plt.subplots(figsize=(15, 10))
ax.bar(data.keys(), data.values(), color='skyblue')

ax.set_ylabel('Acurácia')
ax.set_title('Acurácia por Grande Área')

ax.set_xticks(range(len(data)))
ax.set_xticklabels(data.keys(), rotation=45, ha='right')

plt.tight_layout()
plt.show()

### Areas

In [30]:
data = accuracy_class(true, pred, 1)

In [31]:
data = dict(sorted(data.items(), key=lambda item: item[1], reverse=True))

In [None]:
fig, ax = plt.subplots(figsize=(20, 10))
ax.bar(data.keys(), data.values(), color='skyblue')

ax.set_ylabel('Acurácia')
ax.set_title('Acurácia por Área')

ax.set_xticks(range(len(data)))
ax.set_xticklabels(data.keys(), rotation=45, ha='right')

plt.tight_layout()
plt.show()

### Subareas

In [33]:
data = accuracy_class(true, pred, 2)

In [34]:
data = dict(sorted(data.items(), key=lambda item: item[1], reverse=True))

In [None]:
fig, ax = plt.subplots(figsize=(30, 10))
ax.bar(data.keys(), data.values(), color='skyblue')

ax.set_ylabel('Acurácia')
ax.set_title('Acurácia por Subárea')

ax.set_xticks(range(len(data)))
ax.set_xticklabels(data.keys(), rotation=45, ha='right')

plt.tight_layout()
plt.show()