Skip to content

A simple heap/Sorted file implementation of in memory database engine. This can provide insert,update and search capabilities as in any other database

Notifications You must be signed in to change notification settings

sainath09/Simple-Database-Engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Database Engine

This is a simple in-memory database engine to run SQL queries.

We implemented a database engine that creates, Inserts and perform simple queries like sum and group By for a dataset to understand the basics of any SQL query engine. We implemented 2 diffeerent types of data structures for storing it.

  • Heap based implementation
    • Each record of data sits on top of an other. There is no particular order in which data is sequenced.
  • Sorted file Implementation
    • Data is place in the file after sorting on one or more columns defined in the query.
    • Sorting makes query operations on sorted columns efficient.
  • We planned to implement B+ trees as in any other database engine but were not successful. Any one can use this code and implement B+ trees.

Implementation Details

We developed this engine incrementally.

  • We started it of with implementing the heap data base engine.
    • Record.cc
      • It contains all the functions required to process one record. For example toBinary and fromBinary functions convert data to and from binary format to store them in .bin file.
      • composeRecord makes the record with given schema and the source file name.
    • File.cc
      • It is the higher abstraction to Record in Db. File object contains number of records based on block size.
      • The default block size we used as in most of the database engines ins 4 Kb.
    • Schema.cc, Catalog.cc
      • They contain implemtation fuctions for schema of any table in the database. Schema of the database is stored in catalog file.
    • AbstractDbFile.cc, DBFile.cc, Heap.cc, Sort.cc
      • They implement heap data structure and sort data structure. DBFile objects abstracts the heap or sort data structure.
    • BigQ.cc , Pipe.cc, ComparisionEngine.cc and Comparsion.cc
      • BigQ uses priorty queues to sort the records with order made by ordermaker.
      • It maintains two pipes inputPipe and outputPipe. Data is fed into the input pipe which is sorted based on ** External Merge Sort Technique **
      • BigQs are used not only in the sorted data structure but also to perform Join and Group By operations.

There are few different basic operations that our database can perform Most of the operations in the database are paralized.

  • Join
  • Select from File
  • Select from a pipe - if my db is already in memory.
  • Group By
  • Sum over a column
  • Project - squezes the record to required columns by discarding others.
  • duplicate removal - removes duplicate records after sorting using BigQ.

Example Queries

SELECT ps.ps_partkey ,
ps.ps_suppkey ,
ps.ps_availqty ,
ps.ps_supplycost ,
ps.ps_comment
FROM partsupp AS ps WHERE (ps.ps_supplycost < 1.03)

SELECT SUM (l.l_extendedprice * (1 - l.l_discount)), l.l_orderkey, o.o_orderdate, o.o_shippriority
FROM customer AS c, orders AS o, lineitem AS l
WHERE (c.c_mktsegment = 'BUILDING') AND
(c.c_custkey = o.o_custkey) AND (l.l_orderkey = o.o_orderkey) AND
(l.l_orderkey < 100 OR o.o_orderkey < 100)
GROUP BY l_orderkey, o_orderdate, o_shippriority

Key Words

Keywords Keywords
SELECT GROUP
DISTINCT BY
FROM WHERE
SUM AS
AND OR
CREATE TABLE
HEAP SORTED
INSERT INTO
DROP SET
OUTPUT STDOUT
NONE ON

Data Types supported

  • INTEGER
  • STRING
  • DOUBLE

We tested our code using 1 Gb TPCH-C data set.

Required dependencies

  • The project is run on C++11 clang compiler
  • It requires flex lexer and bison parser. To install these in linux
sudo apt install clang flex bison

To Start Querying

Download the tpch data with the following commands.

cd settings/tpch-dbgen-master
make

to genetrate 10Mb data

./dbgen -s 0.01

to generate 1Gb data

./dbgen -s 1
mv *.tbl  ../../data

This project is tested with 1 Gb data.

To run the queries -

make simpleDb
./bin/simpleDb

You will be prompted to enter queries to execute.

There are some sample queires in doc/create-insert-queries.txt and doc/QueriesToExecute.

Create and Insert queries

Below is the gif to demo the create and insert queries -

Sample execution

We provided the flexibility of storing the returned query records in a file or output them into the screen.

SET OUTPUT <file_name> 

.....dumps the records into <file_name>

  • Make sure the empty file is already present in the current working directory.

Working of Sum operation


Limitations and known issues

  • All key words are case sensitive. You cannot go back and edit the query. This is the limitation from bison parser which we are working on fixing it.
  • There are few seg faults due to Join operation.
  • Although I made sure all operations are done in parallel, Join operation take a lot more time than any other operation. So, We let the join thread complete before starting other processes.
  • Output records to a file need to be created before Seting output to the file or else it is creating a seg fault

If there are segmentation faults -

  • Make sure your data and bin file paths are correct.
  • Make sure catalog file is not deleted.
  • Do not delete bin files. Drop them through the queries.
  • Clean the project and run again.

Any improvements are welcome.

About

A simple heap/Sorted file implementation of in memory database engine. This can provide insert,update and search capabilities as in any other database

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published