Skip to content

Commit

Permalink
merge
Browse files Browse the repository at this point in the history
  • Loading branch information
leandro-driguez committed Dec 21, 2022
2 parents a2d2e9d + be54211 commit 22fee63
Show file tree
Hide file tree
Showing 11 changed files with 238 additions and 68 deletions.
20 changes: 10 additions & 10 deletions docs/Marcos_part/Proyecto_Final_Sherlock_marcos_part.aux
Original file line number Diff line number Diff line change
Expand Up @@ -56,22 +56,22 @@
\@writefile{toc}{\contentsline {subsection}{\numberline {5.2}Implementaci\'on.}{8}{}\protected@file@percent }
\@writefile{toc}{\contentsline {section}{\numberline {6}Evaluaci\'on de los modelos}{8}{}\protected@file@percent }
\citation{B1}
\@writefile{toc}{\contentsline {subsection}{\numberline {6.1}Modelo Booleano}{10}{}\protected@file@percent }
\@writefile{toc}{\contentsline {subsection}{\numberline {6.2}Modelo Vectorial}{11}{}\protected@file@percent }
\@writefile{toc}{\contentsline {subsection}{\numberline {6.3}Modelo Fuzzy}{12}{}\protected@file@percent }
\@writefile{toc}{\contentsline {subsection}{\numberline {6.1}Modelo Booleano}{9}{}\protected@file@percent }
\@writefile{toc}{\contentsline {subsection}{\numberline {6.2}Modelo Vectorial}{10}{}\protected@file@percent }
\citation{B2}
\@writefile{toc}{\contentsline {subsection}{\numberline {6.3}Modelo Fuzzy}{11}{}\protected@file@percent }
\citation{B6}
\citation{B1}
\@writefile{toc}{\contentsline {section}{\numberline {7}Agrupamiento}{14}{}\protected@file@percent }
\@writefile{toc}{\contentsline {subsection}{\numberline {7.1}K-means}{14}{}\protected@file@percent }
\@writefile{toc}{\contentsline {subsection}{\numberline {7.2}Objetivo perseguido}{15}{}\protected@file@percent }
\@writefile{toc}{\contentsline {subsection}{\numberline {7.3}Implementaci\'on}{15}{}\protected@file@percent }
\@writefile{toc}{\contentsline {section}{\numberline {7}Agrupamiento}{12}{}\protected@file@percent }
\@writefile{toc}{\contentsline {subsection}{\numberline {7.1}K-means}{13}{}\protected@file@percent }
\@writefile{toc}{\contentsline {subsection}{\numberline {7.2}Objetivo perseguido}{13}{}\protected@file@percent }
\@writefile{toc}{\contentsline {subsection}{\numberline {7.3}Implementaci\'on}{13}{}\protected@file@percent }
\bibcite{B1}{1}
\@writefile{toc}{\contentsline {subsection}{\numberline {7.4}Resultados}{14}{}\protected@file@percent }
\bibcite{B2}{2}
\bibcite{B3}{3}
\bibcite{B4}{4}
\bibcite{B5}{5}
\@writefile{toc}{\contentsline {subsection}{\numberline {7.4}Resultados}{16}{}\protected@file@percent }
\@writefile{toc}{\contentsline {section}{\numberline {8}Conclusiones y trabajo futuro}{16}{}\protected@file@percent }
\bibcite{B6}{6}
\gdef \@abspage@last{20}
\@writefile{toc}{\contentsline {section}{\numberline {8}Conclusiones y trabajo futuro}{15}{}\protected@file@percent }
\gdef \@abspage@last{18}
Binary file modified docs/Marcos_part/Proyecto_Final_Sherlock_marcos_part.pdf
Binary file not shown.
Binary file modified docs/Marcos_part/Proyecto_Final_Sherlock_marcos_part.synctex.gz
Binary file not shown.
22 changes: 8 additions & 14 deletions docs/Marcos_part/Proyecto_Final_Sherlock_marcos_part.tex
Original file line number Diff line number Diff line change
Expand Up @@ -334,23 +334,18 @@

El modelo booleano puede ser complejo para usuarios inexpertos, debido a la dificultad de expresar las necesidades de informaci\'on en la consulta como una expresi\'on l\'ogica. No ofrece un ranking, por lo que potencialmente los documentos mostrados al inicio no sean los de mayor inter\'es para el usuario. No establece correlaci\'on entre t\'erminos por lo que puede devolver documentos que cumplan con la consulta, pero que no sean los m\'as precisos para esta. Adem\'as, al todos los t\'erminos son igual de importantes, cuando en la pr\'actica no es as\'i, y solo permitir coincidencias exactas, puede que no se devuelvan documentos en donde si bien, por ejemplo, falta un t\'ermino, estos sean relevantes a la informaci\'on requerida.

Las conjuntos de consultas de prueba tanto de \emph{Cranfield} como de \emph{Vaswani} no est\'an pensadas para aplicarlas sobre un modelo booleano. Esto se puede apreciar del simple hecho de que las mismas constituyen oraciones y no expresiones l\'ogicas.
Las conjuntos de consultas de prueba tanto de \emph{Cranfield} como de \emph{Vaswani} y \emph{Cord19} no est\'an pensadas para aplicarlas sobre un modelo booleano. Esto se puede apreciar del simple hecho de que las mismas constituyen oraciones y no expresiones l\'ogicas. Por esta raz\'on, al ser consultas tan largas sin operadores y asumirse que todos los t\'erminos de las mismas deben aparecer, en la mayor\'ia de los casos no se obtiene ning\'un resultado de b\'usqueda.


\begin{center}
\includegraphics[width=10cm]{cranfield_boolean}

\includegraphics[width=10cm]{vaswani_boolean}

\includegraphics[width=10cm]{cord19_boolean}
\end{center}
% \begin{center}
% \includegraphics[width=10cm]{boolean}
% \end{center}

\subsection{Modelo Vectorial}

El modelo vectorial no se comporta bien en corpus de dominio específico. Tanto \emph{Cranfield}, como \emph{Vaswani} y \emph{Cord19} son corpus de dominio específico, por lo que las evaluaciones no son muy acertadas.

\begin{center}
\includegraphics[width=10cm]{PR_plot(k=300).png}
\includegraphics[width=10cm]{PR_plot(k=300).png}
\end{center}

En esta gráfica se muestra hasta los primeros 300 documentos, como se comporta el modelo con los 3 corpus.
Expand All @@ -374,9 +369,7 @@
En el caso del conjunto de datos \emph{cord19} este no pudo usarse en el modelo fuzzy ya que la gran cantidad de documentos y t\'erminos que posee este es muy grande, lo que hace que calcular la correlaci\'on entre cada par de t\'erminos incurra en un uso de recursos muy elevado. Si dicha correlaci\'on se precalcula, har\'ian falta m\'as de $20 GB$ de RAM seg\'un los c\'alculos hechos (que no se presentan pues son aproximaciones poco formales). Si en cambio se hace en el momento de cada consulta solo para los t\'erminos de la consulta, ejecutar cada consulta tomar\'ia m\'as de $5$ minutos. Adem\'as, el modelo fuzzy no has sido extensamente probado en experimentos con colecciones grandes de documentos\footnote{Esta idea se tom\'o de \cite{B2}, p\'agina $38$.}.

\begin{center}
\includegraphics[width=10cm]{cranfield_fuzzy}

\includegraphics[width=10cm]{vaswani_fuzzy}
\includegraphics[width=10cm]{fuzzy}
\end{center}

\section{Agrupamiento}
Expand Down Expand Up @@ -434,7 +427,8 @@

\section{Conclusiones y trabajo futuro}

Este trabajo presenta una propuesta para la modelaci\'on del problema de recuperaci\'on de informaci\'on. Se detallaron aspectos generales del modelo vectorial cl\'asico, as\'i como decisiones de dise\~no espec\'ificas de la propia interpretaci\'on del problema.



\begin{thebibliography}{20}
\bibitem{B1} Maning C. D.: \emph{An Introduction To Information Retrieval} (2009).
Expand Down
26 changes: 13 additions & 13 deletions docs/Marcos_part/Proyecto_Final_Sherlock_marcos_part.toc
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
\babel@toc {english}{}\relax
\babel@toc {spanish}{}\relax
\babel@toc {english}{}\relax
\babel@toc {spanish}{}\relax
\babel@toc {english}{}
\babel@toc {spanish}{}
\babel@toc {english}{}
\babel@toc {spanish}{}
\contentsline {section}{\numberline {1}Introducci\'on}{1}{}%
\contentsline {section}{\numberline {2}Dise\~no del sistema}{1}{}%
\contentsline {subsection}{\numberline {2.1}Documentos}{1}{}%
Expand All @@ -24,12 +24,12 @@
\contentsline {subsection}{\numberline {5.1}Descripci\'on del modelo usado}{7}{}%
\contentsline {subsection}{\numberline {5.2}Implementaci\'on.}{8}{}%
\contentsline {section}{\numberline {6}Evaluaci\'on de los modelos}{8}{}%
\contentsline {subsection}{\numberline {6.1}Modelo Booleano}{10}{}%
\contentsline {subsection}{\numberline {6.2}Modelo Vectorial}{11}{}%
\contentsline {subsection}{\numberline {6.3}Modelo Fuzzy}{12}{}%
\contentsline {section}{\numberline {7}Agrupamiento}{14}{}%
\contentsline {subsection}{\numberline {7.1}K-means}{14}{}%
\contentsline {subsection}{\numberline {7.2}Objetivo perseguido}{15}{}%
\contentsline {subsection}{\numberline {7.3}Implementaci\'on}{15}{}%
\contentsline {subsection}{\numberline {7.4}Resultados}{16}{}%
\contentsline {section}{\numberline {8}Conclusiones y trabajo futuro}{16}{}%
\contentsline {subsection}{\numberline {6.1}Modelo Booleano}{9}{}%
\contentsline {subsection}{\numberline {6.2}Modelo Vectorial}{10}{}%
\contentsline {subsection}{\numberline {6.3}Modelo Fuzzy}{11}{}%
\contentsline {section}{\numberline {7}Agrupamiento}{12}{}%
\contentsline {subsection}{\numberline {7.1}K-means}{13}{}%
\contentsline {subsection}{\numberline {7.2}Objetivo perseguido}{13}{}%
\contentsline {subsection}{\numberline {7.3}Implementaci\'on}{13}{}%
\contentsline {subsection}{\numberline {7.4}Resultados}{14}{}%
\contentsline {section}{\numberline {8}Conclusiones y trabajo futuro}{15}{}%
Binary file added docs/Marcos_part/boolean.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/Marcos_part/fuzzy.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions src/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
from traitlets import FuzzyEnum
from models.kmeans_based_model import VectorModelKMEANS
# from models.kmeans_based_model import VectorModelKMEANS

from models.boolean_model import BooleanModel
from models.corpus import Corpus
from models.fuzzy_model import FuzzyModel
from models.vector_model import VectorModel
from models.kmeans_based_model import VectorModelKMEANS
# from models.kmeans_based_model import VectorModelKMEANS
from models.relevance_feedback import RelevanceFeedback
import dictdatabase as ddb

Expand Down
102 changes: 102 additions & 0 deletions src/plots_evaluation_meassures.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
from turtle import mode
from numpy import *
import math
import matplotlib.pyplot as plt
from sympy import im
import sys
import dictdatabase as ddb

# t = linspace(0, 2*math.pi, 400)
# a = sin(t)
# b = cos(t)
# c = a + b

# plt.plot(t, a, 'r') # plotting t, a separately
# plt.plot(t, b, 'b') # plotting t, b separately
# plt.plot(t, c, 'g') # plotting t, c separately
# plt.show()

import numpy as np
import matplotlib
import json
import matplotlib.pyplot as plt
import os


def plot_eval_meassures(data, path, plot_color):

fig = plt.figure()
ax = plt.axes()

plt.xlabel("Recall")
plt.ylabel("Precision")

for i in range(0,len(data)):
P_vs_R = {}

for key in data[i]:
if key != "max":
P_vs_R[data[i][key]["P"]] = data[i][key]["R"]

P_vs_R = sorted(P_vs_R.items(), key=lambda x: x[1])

x = [P_vs_R[i][1]
for i in range(0, len(P_vs_R))] # Recall
y = [P_vs_R[i][0] for i in range(0, len(P_vs_R))] # Precision


ax.plot(x, y, color=plot_color[i])

for i in range(0,len(data)):
if sys.platform.startswith('win'):
fig.savefig(path[i] + "\PR_plot.png")
elif sys.platform.startswith('linux'):
fig.savefig(path[i] + "/PR_plot.png")

path = os.getcwd()

models = ["BooleanModel", "VectorModel", "FuzzyModel"]
corpus = [("cranfield", "orange"), ("vaswani", "plum"),
("cord19\\trec-covid\\round1", "turquoise") if sys.platform.startswith('win') else ("cord19/trec-covid/round1", "turquoise")]

for model in models:
data_list = []
path_list = []
for corp in corpus:
if sys.platform.startswith('linux'):
if model != "FuzzyModel" or corp[0] != "cord19/trec-covid/round1":
temp = path.removesuffix("/src") + f'/ddb_storage/{model}/{corp[0]}'
path_list.append(temp)

with open(temp + "/k_Rank.json") as json_file:
data = json.load(json_file)
data_list.append(data)
elif sys.platform.startswith('win'):
if model != "FuzzyModel" or corp[0] != "cord19\\trec-covid\\round1":
temp = path.removesuffix("\src") + f'\ddb_storage\{model}\{corp[0]}'
path_list.append(temp)

with open(temp + "\k_Rank.json") as json_file:
data = json.load(json_file)
data_list.append(data)

plot_eval_meassures(data_list, path_list, [corpus[0][1], corpus[1][1], corpus[2][1]])

print(path_list)

# path1 = path + f'/ddb_storage/VectorModel/{corpus[0][0]}'

# with open(path1 + "/k_Rank.json") as json_file:
# data1 = json.load(json_file)

# path2 = path + f'/ddb_storage/VectorModel/{corpus[1][0]}'

# with open(path2 + "/k_Rank.json") as json_file:
# data2 = json.load(json_file)

# path3 = path + f'/ddb_storage/VectorModel/{corpus[2][0]}'

# with open(path3 + "/k_Rank.json") as json_file:
# data3 = json.load(json_file)

# plot_eval_meassures([data1, data2, data3], [path1, path2, path3], [corpus[0][1], corpus[1][1], corpus[2][1]])
Loading

0 comments on commit 22fee63

Please sign in to comment.