Skip to content

This project shows the difference between the table scan and index seek method of RDBMS and shows how optimized is the index seek method which uses a B-Tree for indexing

Notifications You must be signed in to change notification settings

skyshines/B-Tree-RDBMS-Indexing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

B-Tree-RDBMS-Indexing

Java Project made with the use of Java Libraries and core concepts like Java OOPS, Multithreading, Serialization, Java Swing, etc.

Tech Stack Used

Java, Java Swing and AWT, and core Java Programming concepts like OOPS, Multithreading, Serialization, etc.

Introduction and Aim of the Project

The relational database i.e. a database in the form of rows and columns is stored in the form of extents and data pages. The extents can be understood as folders and the data pages can be understood as files. So, each row is stored in the data page sequentially. Usually, the limit of a data page is 8KB. So, after a data page gets completely filled, the next row goes into a new data page. Usually, an extent can accomodate 8 data pages i.e. its size is usually limited to 64 KB. So, after an extent has accomodated 8 data pages, a new extent is created and a new data page is created and the further rows start getting stored from there.

Table Scan Method

Now, if we want to search some data, the entire database is searched for it i.e. every extent, and every data page in those extents and every row in those data pages is searched. So, the time complexity of this searching is O(N) where N is the number of rows in the data base. This technique of searching the data is called Table Scan as the entire table is being scanned.

Index Seek Method

We know that if the data has some particular indices attached to it, the searching becomes faster. We also know that arranging the data in the form of a Binary Tree makes searching faster. In a binary tree, the data in the left sub-tree of a node is smaller than its data and the data in the right sub-tree of a node is larger than it. So, a B-Tree has the same property but not just with 2 children, with multiple children. So, we create indices upon the attributes of the relational database and for each attribute, we maintain a seperate B-Tree that stores the indices. The data is present in the last level of the B-Tree and it tells us the exact location of the data. Exact location means the extent number, data page number and the row-offset of the data. So, we first have to reach the correct index and then that index helps us reach the data. Reaching the correct index means traversing the B-Tree and it takes O(logdN) (d is in the base of the log) time where d is the number of child nodes of a node in B-Tree and N is the number of rows in the database. Then, it tells us the exact location of the data which means it takes O(1) time to reach the data after we have reached the correct index. This technique of using indices to make searching faster is called Index Seek Method.

Aim of the Project

So, the project aims at showing the differnce in speed and searching time of the Index Seek and Table Scan Searching methods and analyzing the advantages and siadvantages of both the methods.

Advantage of Index Seek Over Table Scan

The advantage of Index Seek Method over the Table Scan method is very clear. The searching time is reduced from Linear to Logarithmic.

Drawbacks of Index Seek

  1. In index seek method, let us say that there are C number of columns (attributes). So, if we want to create indices upon every attribute, we will have to scan the entire data base C times initially for creating the indices. So, the entire database will be scanned for creating the indices every time. After this, the search upon the attribute using Index Seek method will always be logarithmic.
  2. The second major drawback is extra space. We will create Index Pages. These index pages are nothing but the nodes of the B-Tree. So, extra space is used for making the searching faster.

How to run the Project?

Download the source code from GitHub to your local. Now, go to the location B-Tree-master\IndexDemo\IndexDemo\src. Here, compile the Java Client program using the command: javac Client.java Now, you have to create an empty folder at any location. This will be the folder in which the entire database will be created. So, we have to run the Java Client file and while running, provide the path of the empty folder that you have just created as a command line argument. For instance, in my case, I have created a folder named IndexPrj. So, this is the command that I run on the terminal: java Client E:\IndexPrj

image

When you do this, the Java Application should run and this is what you will see.

image

Creating Test Database

For creating the random database, you have to input the number of rows in the text area available in the Data-Panel as shown below.

image

A random test data of 10000 rows will be created in the same folder whose location you mentioned in the command line arguments.

image

image

image

You can see the data in the form of extents and data pages. You can open any random page in any random extent. You will see some random strings seperated by the pipe (|) symbol. They are Roll Number, Name, UserName and Password respectively.

image

The Metadata file contains the number of rows in the database.

image

The size of each page is maximum 8 KB and the size of each extent is maximum 64 KB.

Searching using Table Scan

Open any random extnet and random data page. Pick any row from it and choose any attribute. Let us say UserName. Now, search the UserName in the Query Panel.

For instance, the UserName with Roll Number 7658 in Page 4 of extent 5 is chosen here.

image

image

As you can see, it takes 201ms time and 35 pages were read.

Creating Indices upon an Attribute

Now, let us create indices upon UserName.

image

We can see the indices being created in the progress bar of the Project as well as in the terminal.

image

image

This is a time consuming process as the entire database is being read.

After the process is completed, we can see that indices have been created upon the UserName.

image

image

image

image

image

image

As you can see above, we have a root page and all the other pages are the nodes of the B-Tree. The metadata file consists the data that which pages are non-leaf, root, and leaves, etc.

Index Seek

So, let us now search for the same username. Now that indices have been created, Index Seek method will be used automatically by the Project.

image

So, you can see that the Search time reduced to 7ms and only 2 pages were read. This is the advantage of index seek method over the table scan method.

About

This project shows the difference between the table scan and index seek method of RDBMS and shows how optimized is the index seek method which uses a B-Tree for indexing

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages