-
Notifications
You must be signed in to change notification settings - Fork 6
/
HiDb_PuttaguntaDebrayTu.tex
103 lines (76 loc) · 17.4 KB
/
HiDb_PuttaguntaDebrayTu.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
\documentclass[10pt]{article}
\input{preamble}
\begin{document}
\maketitle
\begin{abstract}
We describe our experience implementing an in-memory relational database in Haskell that supports the standard CRUD (create, read, update, delete) operations while providing the requisite ACID (atomicity, consistency, isolation, durability) guarantees. We rely on Haskell's STM module to provide atomicity and isolation. We use a combination of STM, Haskell's type system, and dynamic type-checking in order to enforce consistency. We implement undo-redo logging and eventual disk writes to provide durability. We also provide a Haskell library which clients can use to connect to and send transactions to the database. We found that while the STM module greatly eased the implementation of transactions, the lack of support for de-serializing data into a dynamic type was not ideal.
\end{abstract}
\vspace{5mm}
\begin{multicols}{2}
\section{Introduction}
Haskell's \texttt{STM} module provides an alternative to lock-based programming that we suspected would be particularily well-suited to database implementation. \texttt{STM} shields the programmer from having to worry about correctness in a concurrent environment because it builds up actions but only commits them if all pointers contained in \texttt{TVar}s have not been changed since they have been read for that transaction. Uncommitted transactions can be retried, which is exactly what we need in a database implementation. At the same time, the module provides an appropriate level of granularity; if the value in a \texttt{TVar} that was not read for a transaction changes as the transaction is being built up, the transaction will still commit. Moreover, the module's \texttt{atomically :: STM a -> IO a} function provides a clean abstraction with which to group actions into a single atomic action, suggesting that \texttt{STM} would make it simple to group database operations into transactions \cite{harris}.
The fact that \texttt{TVar}s hold pointers to data structures in memory informed our choice to implement an in-memory database. In-memory databases are fast (since they obviate the need for slow disk seeks and writes), useful for small applications whose data can be held in memory, and gaining in popularity as memory becomes cheaper. One example of a popular in-memory database is Redis, a key-value store that can store data types that are more complex than traditional SQL types \cite{redis}. In contrast, we chose to implement a fairly traditional relational database, primarily because we wanted to write a database that could be swapped in for any of the most popular databases (e.g., Oracle, MySQL, PostgreSQL).
\section{Database Operations}
\subsection{Supported Syntax}
\label{simplifying_parsing_assumptions}
The database operations that we support are \texttt{CREATE TABLE}, \texttt{DROP TABLE}, \texttt{ALTER TABLE}, \texttt{SELECT}, \texttt{INSERT}, \texttt{SHOW TABLES}, \texttt{UPDATE}, and \texttt{DELETE}. Although we did not finish the parser, we intended to support the following syntax for specifying these operations:
\begin{itemize}
\item Each individual command must go on one line.
\item The list of supported types is \verb+integer+, \verb+boolean+, \verb+real+, \verb+char(+$n$\verb+)+ (which denotes an array of characters), and \verb+bit(+$n$\verb+)+ (a bit array).
\item For \texttt{CREATE TABLE}, the syntax is \texttt{CREATE TABLE} $\mathit{tablename}$ \verb+(+$\mathit{fieldname}_1$ $\mathit{type}_1$\verb+,+ \dots{} $\mathit{fieldname}_n$ $\mathit{type}_n$\verb+)+. Each type name should be one of the above specified names, and each fieldname should have no spaces or parentheses in it. In addition, one of the types may be followed by \texttt{PRIMARY KEY}.
\item For \texttt{DROP TABLE}, the syntax is \verb+DROP TABLE+ \textit{tablename}. The given tablename should not contain spaces.
\item For \texttt{ALTER TABLE}, there are two choices.
\begin{itemize}
\item \texttt{ALTER TABLE} \textit{tablename} \texttt{ADD} \textit{fieldname typename}, where the tablename and fieldname do not contain spaces, and the typename is one of the ones specified above.
\item \texttt{ALTER TABLE} \textit{tablename} \texttt{DROP} \textit{fieldname}, with the same constraints on the tablename and fieldname as above.
\end{itemize}
\item For \texttt{SELECT}, the syntax is \texttt{SELECT} \textit{fieldnames} \texttt{FROM} \textit{tablename} \texttt{WHERE} \textit{conditions}, where the table name has the same restrictions as usual, the fieldnames should be a comma-separated list of column names (but without spaces), and the conditions as follows:
\begin{itemize}
\item All types may be compared with \verb+>+, \verb+<+, \verb+>=+, \verb+<=+, and \verb+==+. Array types additionally have \verb+in+ and \verb+contains+.
\item Comparisons may be stacked with \verb+and+ and \verb+or+, and grouped within parentheses.
\item Spaces are allowed in condition statements.
\end{itemize}
That is, they must follow the following context-free grammar. Here, $\mathit{name}_1$ and $\mathit{name}_2$ denote column names or constants of the corresponding type, and $\mathit{op}$ is one of \verb+>+, \verb+<+, \verb+>=+, \verb+<=+, \verb+==+, \verb+in+, and \verb+contains+.
\begin{align*}
P &\to \mathit{name}_1\ \mathit{op}\ \mathit{name}_2\\
Q &\to (Q) \mid P \mid Q\text{ \texttt{and} } Q\mid Q\text{ \texttt{or} } Q
\end{align*}
Then, a valid condition is represented by $Q$.
\item For \texttt{INSERT}, the syntax is \texttt{INSERT INTO} \textit{tablename} \texttt{VALUES} \textit{values}, where \textit{values} should be a list separated by commas (spaces are allowed) corresponding to an entire new row (i.e. they should correspond to the types of the corresponding columns).
\item \texttt{SHOW TABLES} has no other syntax.
\item For \texttt{UPDATE}, the syntax is \texttt{UPDATE} \textit{tablename} \texttt{SET} \textit{assignments} \texttt{WHERE} \textit{conditions}, where the conditions are as for a \verb+SELECT+ call, and the assignment is of the form \textit{fieldname}\verb+=+\textit{const}, for a constant that is of the type held in that column. In particular, there should not be spaces in the assignment clause.
\item For \texttt{DELETE}, the syntax is \texttt{DELETE FROM} \textit{tablename} \texttt{WHERE} \textit{conditions}, where the conditions are as for a \verb+SELECT+ call.
\end{itemize}
\subsection{Implementation}
Each operation is implemented as a Haskell function. Operations which make changes to the database, should they succeed, should return lists of \texttt{LogOperation}s, where the \texttt{LogOperation} datatype represents log entries and we use different constructors for different types of entries (see durability section below for further discussion). Since we cannot perform IO from within STM, the calling function is responsible for actually writing these \texttt{LogOperation}s, which are instances of \texttt{Read} and \texttt{Show} to allow for easy serializability and de-serializability, to the in-memory log. If the operation was malformed (for example, the referenced table does not exist, or the user failed to specify the value for a column for which there is no default value), then we return some error message to the user. For operations such as \texttt{SELECT} that do not modify the database, we return a \texttt{String} that can be written back to the user.
It is interesting to note that while being forced to return everything that needs to be written to the log resulted in some cumbersome type signatures,
Haskell's explicit separation of pure and impure computations is also essentially what makes a safe implementation of \texttt{STM} possible; because all computation done within \texttt{STM} is effectless, there is no danger of performing an action with side-effects multiple times as the transaction is retried. In the context of our database, there is no danger of writing the same \texttt{LogOperation} to the log multiple times.
In our Haskell implementation of \texttt{SELECT}, we choose to return a \texttt{Table} rather than a \texttt{String} because while we did not implement this functionality in this verison of HiDb, in theory could we perform futher operations involving the returned table. We also chose to make use of a \texttt{Row} datatype, which is a wrapper around a \texttt{Fieldname -> Maybe Element} function. This allows us to use functions of the type \texttt{Row -> STM(Bool)} as the condition for whether a row of a table should be deleted or updated. It also allows us to express an update as \texttt{Row -> Row}.
One interesting implication of implementing these operations in Haskell is that the laziness of the language essentially makes our transactions lazy as well; the \texttt{IO} actions are not evaluated until we attempt to write the corresponding \texttt{LogOperation}s, until we attempt to write the results back to the user (i.e., the string resulting from a \texttt{SELECT}), or until another transaction does something that requires thunks from the earlier transaction to be evaluated.
\section{Parsing}
\input{parsing}
\section{Storage: Data Structures}
\lstinputlisting[caption=Main data structures for the database.]{lsting2.tex}
Note that we needed to use the \texttt{Existential Quantification} language extension in our definition of \texttt{Element}, which allows us to put any type that is an instance of \texttt{Show}, \texttt{Ord}, \texttt{Eq}, \texttt{Read}, and \texttt{Typeable} into \texttt{Element}. Note that this means that we do not statically enforce the fact that every column should contain elements of just one type. It seems that it is not possible to statically enforce this in Haskell, since when the user attempts an insert, there is no way to know statically (since the user only provides the input at runtime) whether this input will be of the same type as the other elements in the column. We therefore must enforce the type of columns at run-time in the parsing stage (as previously discussed).
We use multiple layers of \texttt{TVar}s in order to allow operations that touch different parts of the database to proceed at the same time. For example, having each \texttt{Table} be stored in a \texttt{TVar} means that if client $A$ updates a row in one table, client $B$ can concurrently update a row in another table. Moreover, consider what happens if we attempt changes that affect the database at different levels of granularity: Suppose we concurrently attempt two operations, $O_1$ and $O_2$, where $O_1$ is the insertion of a new column into a table and $O_2$ is the updating of a value in an existing column in the same table. If $O_1$ completes first, then $O_2$ will have to be retried because the pointer (which had to be read for $O_2$ to occur) in the \texttt{TVar} surrounding the \texttt{Table} type will have changed. However, if $O_2$ completes first, then $O_1$ will complete without issue because the only \texttt{TVar} that $O_2$ changed are ones that did not need to be read for $O_1$.
\section{Durability}
Almost by definition, a database needs to maintain its stored data between sessions; a crash cannot erase all of the data. As such we cannot rely solely on in-memory storage. We must push our database to disk periodically to ensure that we have a relatively recent and persistent copy of our data at all times. We call these periodic pushes \textit{checkpoints}.
\subsection{Checkpoint}
The most naive checkpointing algorithm would simply push all of the tables in memory to disk. However, this push might capture partially completed transactions, which might lead to inconsistent or corrupted data. The simple solution would be to block new transactions from starting when we want to run a checkpoint, and then do the push to disk once the transactions in progress finish. In this manner, we can ensure that no partially completed transaction is stored on disk. The obvious drawback is that we must stop all operations when we want to run a checkpoint, and then we must wait for the disk to finish its writes. This algorithm is clearly unacceptable as stopping in-memory operations for disk operations destroys the point of having an in-memory database.
Checkpointing algorithms that do not require all transactions to stop are called \textit{quiescent checkpointing}. Quiescent checkpoints necessarily push partially completed checkpoints to disk, but we can compensate by logging the operations we do. By pushing this log to disk, we can theoretically repair partially completed transactions. Formalizing this notion a bit, we log an operation whenever we modify an element in some table by storing the quadruple consisting of the table identifier, the element identifier, the old value, and the new value. If we ensure that the log corresponding to an operation is pushed to disk before the operation is pushed to disk, we can identify partially completed transactions that have been pushed to disk and then repair any inconsistencies.
The only need for this repairing is when the database is starting up, at which point it reads the stored data on disk. Once read in, there are two possible types of repairs to do for partially completed transactions - transactions that are completely logged on disk might need to be redone and transactions that are partially logged on disk might need to be undone. To distinguish between these two types of repairs, we must scan the log from the end. When we see a log entry corresponding to a transaction commit, we add the transaction to a list of transactions to be redone. When we see log entries for modifying an element of a table in a transaction not marked to be redone, we must undo the operation. After completing this first pass, we rescan the log from the start to the end, redoing transactions that we marked in the first pass. Undoing and redoing a modification is as simple as setting the element to the old and new logged value, respectively.
As currently described, the log will continuously grow, getting overwhelmingly large before long - the final detail to discuss is shrinking the log. We can delete a transaction in the log after we are sure that the entire transaction has been pushed to disk. In particular, if we start a checkpoint after a transaction has committed, we can remove that transaction from the log once the checkpoint terminates. As such, we trim and flush (again) the log after finishing a checkpoint.
\subsection{Serializing}
As our objects representing the database are wrapped in \texttt{TVar}s, we cannot define a method like \texttt{show} that converts a \texttt{Table} to a \texttt{String} without using an unsafe function such as \texttt{unsafePerformIO}. We can, however, define a function with the type \texttt{Table -> IO String}. Similarly we also can define a function with the type \texttt{String -> IO (TVar Table)}.
Since we couldn't use automatically derived instances of \texttt{show} and \texttt{read}, we were faced with two prospects - manually write the encoding for our Objects or to somehow convert them into data structures that can derive \texttt{show} and \texttt{read}. The latter route seemed safer and much easier, so we defined two (private) data structures \texttt{ShowableTable} and \texttt{ShowableColumn} that derive \texttt{show} and \texttt{read}. Converting between our regular \texttt{Table} and \texttt{Column} data structures and their showable variants is simply a matter of extracting all of the necessary variables from their \texttt{TVar} containers.
The last problem with respect to serialization is serializing the existentially quantified data structure \texttt{Element}. We must know the type of the element stored in the \texttt{Element} if we want to \texttt{read} it. As we only use three different types in our database, we simply prepend the type in \texttt{show} and then check the prefix to identify the type during \texttt{read}. We use a similar procedure to serialize the types of a column (stored as a \texttt{TypeRep}).
\section{Summary \& Future Work}
We found that Haskell's support for lightweight threads and the \texttt{STM} module were, as expected, very well-suited to database implementation. However, we also need dynamic types, which presents difficulties given Haskell's unusually strong type system. Furthermore, we found it difficult to implement the Operation module cleanly; the code indeed appears quite imperative due the constant use of \texttt{do} notation (which we could have replaced with \texttt{>>=}, but we do not believe that would have made the code any easier to read).
Given that we were unable to complete the parser and were ultimately only able to test the operations and the serialization and the deserialization of the database, logical next steps would be to complete the parser and to make it more robust. After that, implementing operations such as \texttt{INNER JOIN}, specifications such as \texttt{FOREIGN KEY}, and just generally fleshing out our implementation to include the full set of traditional SQL operations would be a logical next step. Another logical direction of exploration would be to compare the performance of this database to the relational databases whose functionality we tried to emulate.\\\\
\textit{Code is available at \url{https://github.com/susanctu/Haskell-In-Memory-DB/}. Tests verifying that operations and rehydration work are in \texttt{Main.hs}.}
\end{multicols}
\begin{thebibliography}{1}
\bibitem{harris} Harris T., Marlow S. Jones S., Herlihy M. "Composable Memory Transactions". \url{http://research.microsoft.com/pubs/67418/2005-ppopp-composable.pdf}
\bibitem{redis} Redis. \url{http://redis.io/}
\end{thebibliography}
\end{document}