Notebook destinado para a realização de testes

In [3]:
# ======= Importando bibliotecas ======= #
# gerais
import pandas as pd

# NLP, deep learning
import torch
import torch.nn as nn
from sklearn.manifold import TSNE
from sklearn.preprocessing import normalize
from sentence_transformers import SentenceTransformer
from classes.Transformador import Transformador

# visualização
import matplotlib.pyplot as plt
from tqdm import tqdm # This will make a progress bar for us
import plotly.express as px

  from tqdm.autonotebook import tqdm, trange





In [4]:
# Carregar o dataset e modelo
df = pd.read_csv('data/arxiv3.csv')

df = df.drop(columns=['Unnamed: 0', 'update_date'])

df.head()

Unnamed: 0,title,authors,abstract,categories
0,Bosonic characters of atomic Cooper pairs acro...,Y. H. Pong and C. K. Law,We study the two-particle wave function of p...,cond-mat.mes-hall
1,Measurement of the Hadronic Form Factor in D0 ...,"The BABAR Collaboration, B. Aubert, et al",The shape of the hadronic form factor f+(q2)...,hep-ex
2,Spectroscopic Properties of Polarons in Strong...,A. S. Mishchenko (1 and 2) and N. Nagaosa (1 a...,We present recent advances in understanding ...,cond-mat.str-el cond-mat.stat-mech
3,Tuning correlation effects with electron-phono...,J.P.Hague and N.d'Ambrumenil,We investigate the effect of tuning the phon...,cond-mat.str-el
4,Convergence of the discrete dipole approximati...,"Maxim A. Yurkin, Valeri P. Maltsev, Alfons G. ...",We performed a rigorous theoretical converge...,physics.optics physics.comp-ph


In [8]:
class Finder():
    def __init__(self, df, embeddings_matrix, model, transformador, tuned_matrix):
        self.df = df
        self.embeddings_matrix = embeddings_matrix
        self.tuned_matrix = tuned_matrix
        self.model = model
        self.transformador = transformador

    def predict_query(self, query, tuned=False, limit=0.5):
        query_processed = self.model.encode([query])

        if tuned:
            query_processed = self.transformador(torch.tensor(query_processed))[1].detach().numpy()
            embeddings_matrix_ = normalize(self.tuned_matrix)
        else:
            embeddings_matrix_ = normalize(self.embeddings_matrix)

        query_processed_ = normalize(query_processed.reshape(1, -1))




        R = embeddings_matrix_ @ query_processed_.T

        df_ = self.df.copy()
        relevance = R.flatten()
        df_["relevance"] = relevance

        df_filtered = df_[relevance > limit]
        df_final = df_filtered.sort_values("relevance", ascending=False)

        # Selecionar colunas de interesse
        df_final = df_final[['title', 'abstract', 'relevance']]

        # print the top 10 abstracts
        tam = min(10, len(df_final))
        for i in range(tam):
            print(df_final['abstract'].iloc[i])
            print('-----------------------------------')
            
        return df_final.head(10)




In [11]:
embeddings = torch.load('model_embedding/embeddings_bert.pt')
X_tuned = torch.load('model_embedding/embeddings_transformados.pt')
model = SentenceTransformer('all-MiniLM-L6-v2')
model.load_state_dict(torch.load('model_embedding/modelo.pth'))
transformador = Transformador(
    n_inputs=384,
    n_hidden=200
)
transformador.load_state_dict(torch.load('model_embedding/transformador.pth'))

finder = Finder(df, embeddings, model, transformador, X_tuned )

  embeddings = torch.load('model_embedding/embeddings_bert.pt')
  X_tuned = torch.load('model_embedding/embeddings_transformados.pt')
  model.load_state_dict(torch.load('model_embedding/modelo.pth'))
  transformador.load_state_dict(torch.load('model_embedding/transformador.pth'))


In [15]:
finder.predict_query('neural network', True, 0.7)

  An associative memory model and a neural network model with a Mexican-hat
type interaction are the two most typical attractor networks used in the
artificial neural network models. The associative memory model has discretely
distributed fixed-point attractors, and achieves a discrete information
representation. On the other hand, a neural network model with a Mexican-hat
type interaction uses a line attractor to achieves a continuous information
representation, which can be seen in the working memory in the prefrontal
cortex and columnar activity in the visual cortex. In the present study, we
propose a neural network model that achieves discrete and continuous
information representation. We use a statistical-mechanical analysis to find
that a localized retrieval phase exists in the proposed model, where the memory
pattern is retrieved in the localized subpopulation of the network. In the
localized retrieval phase, the discrete and continuous information
representation is achieved by 

Unnamed: 0,title,abstract,relevance
9979,Neural network model with discrete and continu...,An associative memory model and a neural net...,0.808772
5521,Learning of correlated patterns by simple perc...,Learning behavior of simple perceptrons is a...,0.804218
16765,Theory and modeling of the magnetic field meas...,The magnetic diagnostics subsystem of the LI...,0.797617
7466,A Growing Self-Organizing Network for Reconstr...,"Self-organizing networks such as Neural Gas,...",0.786276
227,A balanced memory network,A fundamental problem in neuroscience is und...,0.78016
12539,Modeling Connectivity in Terms of Network Acti...,A new complex network model is proposed whic...,0.777826
16033,Galaxy Zoo: Reproducing Galaxy Morphologies Vi...,We present morphological classifications obt...,0.776197
3732,Optimal network topologies for information tra...,This work clarifies the relation between net...,0.77326
2939,Dynamics of Neural Networks with Continuous At...,We investigate the dynamics of continuous at...,0.769132
194,Period-two cycles in a feed-forward layered ne...,The effects of dominant sequential interacti...,0.768802
