Skip to content

Commit

Permalink
More edits, fixed up motivation and such.
Browse files Browse the repository at this point in the history
  • Loading branch information
rdrake committed Apr 26, 2014
1 parent fb2e233 commit bea9916
Show file tree
Hide file tree
Showing 15 changed files with 213 additions and 218 deletions.
Binary file modified thesis/document/Thesis.pdf
Binary file not shown.
6 changes: 5 additions & 1 deletion thesis/document/Thesis.tdo
@@ -1,4 +1,8 @@
\defcounter {refsection}{0}\relax
\contentsline {todo}{Write me}{iii}{section*.2}
\defcounter {refsection}{0}\relax
\contentsline {todo}{All captions must be at least 10pt}{53}{section*.75}
\contentsline {todo}{citation?}{3}{section*.11}
\defcounter {refsection}{0}\relax
\contentsline {todo}{reword?}{21}{section*.24}
\defcounter {refsection}{0}\relax
\contentsline {todo}{All captions must be at least 10pt}{52}{section*.77}
2 changes: 2 additions & 0 deletions thesis/document/Thesis.tex
Expand Up @@ -45,6 +45,8 @@
\printglossaries
\end{preliminary}

%\include{introduction}

\include{data-model}

\include{have-it-all}
Expand Down
61 changes: 4 additions & 57 deletions thesis/document/along-came-clojure.tex
@@ -1,64 +1,11 @@
\chapter{Along Came Clojure}
\label{chap:along-came-clojure}
In this chapter we further discuss our implementation. In \cref{sec:basic-principles-fp} we discuss the basic tenets of functional programming with an emphasis on how Clojure implements these tenets. Further attention is paid to how Clojure uses these tenets to implement its concurrency model.
In this chapter we discuss the implementation details of our system. In \cref{sec:basic-principles-fp} we discuss the basic tenets of functional programming with an emphasis on how Clojure implements these tenets. Further attention is paid to how Clojure uses these tenets to implement its concurrency model.

In \cref{sec:search-with-clojure} we illustrate how Clojure's \gls{jvm} interoperability is used to interface with Lucene and \gls{jdbc} drivers in order to perform indexing and search on data. This section also covers keyword as well as graph search in document space.

In \cref{sec:web-interface} we present a simple web-based interface to the system. It functions as a demonstration of the system's capabilities.

\input{functional-programming}
\input{search-with-clojure}

\section{Web Interface}
\label{sec:web-interface}
In order to make the system more accessible to novice users, a web-based interface was created.

The first step involves searching for entities by a value. We use approximate (\(n\)-gram) string matching to find relevant values despite potential character substitutions, deletions, or additions.

\begin{figure}[H]
\centering
\includegraphics[scale=0.5]{figures/images/step-1}

\caption{Approximate string matching of values}
\label{fig:webui-step-1}
\end{figure}

Entities that match the desired values are displayed to the user. They have the option of specifying the entity as either the source, or the target.

\begin{figure}[H]
\centering
\includegraphics[scale=0.5]{figures/images/step-2}

\caption{Tabular display of entities}
\label{fig:webui-step-2}
\end{figure}

When an entity is chosen, the navigation bar at the top of the page is updated to reflect the new selection. This allows the user to hover their cursor over the respective element in order to remind themselves of their selection.

\begin{figure}[H]
\centering
\includegraphics[scale=0.5]{figures/images/step-3}

\caption{Chosen entities are displayed at the top}
\label{fig:webui-step-3}
\end{figure}

When both a source and target entity are selected, the user is able to search for the shortest path between them. They are given the option of which graph search algorithm implementation to use.

\begin{figure}[H]
\centering
\includegraphics[scale=0.5]{figures/images/step-4}

\caption{The algorithm implementation may be selected}
\label{fig:webui-step-4}
\end{figure}

A short message displaying the search duration as well as memory consumption is followed by a series of tables representing the intermediate entities between the source and target entities.

\begin{figure}[H]
\centering
\includegraphics[scale=0.5]{figures/images/step-5}

\caption{Result of a search between entities}
\label{fig:webui-step-5}
\end{figure}

This interface allows users to query the database for information in a familiar manner.
\input{web-interface}
58 changes: 0 additions & 58 deletions thesis/document/background.tex

This file was deleted.

45 changes: 41 additions & 4 deletions thesis/document/data-model.tex
@@ -1,16 +1,53 @@
% !TEX root = Thesis.tex
\chapter{A Tale of Two Data Models}
\label{chap:tale-of-two-data-models}
The term ``data model'' refers to a notation for describing data and/or information. It consists of the data structure, operations that may be performed on the data, as well as constraints placed on the data \cite{dbsys-06}.
The term ``data model'' refers to a notation for describing data and/or information. It consists of the data structure, operations that may be performed on the data, as well as a set of constraints placed on the data \cite{dbsys-06}.

In this chapter we provide background and motivation for this thesis. We will discuss the evolution of data sets, their logical models, and their corresponding query languages. We feel that modern day data sets call for a new data model with a new query languages.
In this chapter we provide background and motivation for this thesis. We will discuss the evolution of data models and their corresponding query languages. We feel that modern day data sets call for a new data model with a new query language.

We also provide a formal definition of the relational data model, discuss its merits, its shortcomings, and contrast it to the document data model. Contrary to the relational model, the document model permits fast and flexible keyword search without requiring explicit domain knowledge of the data. In addition, we demonstrate the feasibility of encoding a relational model into a document model in a lossless manner.
We provide a formal definition of the relational data model, discuss its merits, its shortcomings, and contrast it to the document data model. Contrary to the relational model, the document model permits fast and flexible keyword search without requiring explicit domain knowledge of the data.

%\input{background}
\input{background}
\input{relational-model}
\input{document-model}

\section{Problem}
Each of these data models has its own pros and cons. One must choose between highly normalized, structured data and fast, flexible keyword search.

What is needed is a hybrid model, combining the pros of both the relational and document models together. Relational data would automatically be transformed into the document model. Such a system would require some initial configuration, but would require little user intervention afterwards.

\section{Thesis Statement \& Scope of Research}
\begin{displayquote}
\textbf{Thesis Statement:} A system could be built that is capable of transforming data from the relational model to the document model. The transformation is reversible, allowing the original data model to be recovered.

Such a system could use the fast search capabilities of the document model, combined with the relational information of the relational model, to quickly discover related fragments of information. This process could further be sped up by conducting this search in a concurrent manner.
\end{displayquote}

In order to achieve the goals of our thesis statement, we

\begin{itemize}
\item define a formal framework to describe data sets with relational structures and text components
\item design a collection of expressive query operators for analyzing text relational data sets; and
\item investigate implementation techniques to make the query operators performant on modern, multicore computers.
\end{itemize}

\section{Contributions}
We provide a formal definition for a system that is capable of transforming data from the relational model to the document model. By performing this transformation, we gain the flexible search characteristics of the document model.

This transformation is done in such a way that it is reversible. In order to be reversible, the relational information is encoded in a document form. This allows us to perform graph search over documents.

In addition, we investigate the effect a concurrent graph search implementation has on the performance.\todo{reword?}

% Motivation -> Hypothesis -> Contributions ->
% Why concurrent? Why is keyword search important?

% Improve performance and allow more flexibility

% Why dataset chosen

% 5.5 -> threats to validity (external mostly)
% 6.2 -> limitations & future work (not just about tool, also about research questions left unanswered)

\section{Outline of Thesis}
\cref{chap:along-came-clojure} describes the details of our implementation the data transformation and query operators for graph search in the linked document space. Our choice of utilizing a modern functional programming language for our implementation makes high degree of concurrency possible.

Expand Down

0 comments on commit bea9916

Please sign in to comment.