# Overview of Data Structures (Linear Data Structures)

A data structure is a particular way of organizing data in a computer so that it can be used effectively. <br>The idea is to reduce the space and time complexities of different tasks. <br>Below is an overview of some popular linear data structures.

1. Array
2. Linked List
3. Stack
4. Queue

## Array

Array is a data structure used to store homogeneous elements at contiguous locations.
```
Let size of array be n.
Accessing Time: O(1) [This is possible because elements
                      are stored at contiguous locations]   
Search Time:   O(n) for Sequential Search: 
               O(log n) for Binary Search [If Array is sorted]
Insertion Time: O(n) [The worst case occurs when insertion 
                     happens at the Beginning of an array and 
                     requires shifting all of the elements]
Deletion Time: O(n) [The worst case occurs when deletion 
                     happens at the Beginning of an array and 
                     requires shifting all of the elements]
```

<b>Example:</b> <br>let us say, we want to store marks of all students in a class, we can use an array to store them. This helps in reducing the use of number of variables as we don’t need to create a separate variable for marks of every subject. All marks can be accessed by simply traversing the array.

# -----------------------------------------------------------------------------

## Linked List

A linked list is a linear data structure (like arrays) where each element is a separate object. <br>Each element (that is node) of a list is comprising of two items – the data and a reference to the next node.

#### Types of Linked List:</b>
    
1. <b>Singly Linked List:</b> In this type of linked list, every node stores address or reference of next node in list and the last node has next address or reference as NULL. <br>
    Eg. 1->2->3->4->NULL
    

2. <b>Doubly Linked List:</b> In this type of Linked list, there are two references associated with each node, One of the reference points to the next node and one to the previous node. Advantage of this data structure is that we can traverse in both the directions and for deletion we don’t need to have explicit access to previous node. <br>
    Eg. NULL<-1<->2<->3->NULL


3. <b>Circular Linked List:</b> Circular linked list is a linked list where all nodes are connected to form a circle. There is no NULL at the end. A circular linked list can be a singly circular linked list or doubly circular linked list. Advantage of this data structure is that any node can be made as starting node. This is useful in implementation of circular queue in linked list.<br>
    Eg. 1->2->3->1 [The next pointer of last node is pointing to the first]
```
Accessing time of an element : O(n)
Search time of an element : O(n)
Insertion of an Element : O(1) [If we are at the position 
                                where we have to insert 
                                an element] 
Deletion of an Element : O(1) [If we know address of node
                               previous the node to be 
                               deleted] 
                               ```
                               
<b>Example:</b> <br>Consider the previous example where we made an array of marks of student. Now if a new subject is added in the course, its marks also to be added in the array of marks. But the size of the array was fixed and it is already full so it can not add any new element. If we make an array of a size lot more than the number of subjects it is possible that most of the array will remain empty. We reduce the space wastage Linked List is formed which adds a node only when a new element is introduced.<br><br>
<b>Advantages:</b><br>Insertions and deletions also become easier with linked list.<br><br>
<b>Disadvantages:</b><br>
One big drawback of linked list is, random access is not allowed. With arrays, we can access i’th element in O(1) time. In linked list, it takes Θ(i) time.

# -----------------------------------------------------------------------------

## Stack

A stack or LIFO (last in, first out) is an abstract data type that serves as a collection of elements, with two principal operations: 
- <b>push:</b> which adds an element to the collection
    
- <b>pop:</b> which removes the last element that was added. <br>

In stack both the operations of push and pop takes place at the same end that is top of the stack. <br>It can be implemented by using both array and linked list.
```
Insertion : O(1)
Deletion :  O(1)
Access Time : O(n) [Worst Case]
Insertion and Deletion are allowed on one end. ```

<b>Example:</b><br>
Stacks are used for maintaining function calls (the last called function must finish execution first), we can always remove recursion with the help of stacks. <br>Stacks are also used in cases where we have to reverse a word, check for balanced parenthesis and in editors where the word you typed the last is the first to be removed when you use undo operation. <br>Similarly, to implement back functionality in web browsers.

# -----------------------------------------------------------------------------

# Queue

A queue or FIFO (first in, first out) is an abstract data type that serves as a collection of elements, with two principal operations: 
- <b>enqueue:</b> the process of adding an element to the collection.(The element is added from the rear side)
- <b>dequeue:</b> the process of removing the first element that was added. (The element is removed from the front side). 

It can be implemented by using both array and linked list.
```
Insertion : O(1)
Deletion  : O(1)
Access Time : O(n) [Worst Case]```

<b>Example:</b> <br>
Queue as the name says is the data structure built according to the queues of bus stop or train where the person who is standing in the front of the queue(standing for the longest time) is the first one to get the ticket. <br>So any situation where resources are shared among multiple users and served on first come first server basis. <br>Examples include CPU scheduling, Disk Scheduling. <br><br>Another application of queue is when data is transferred asynchronously (data not necessarily received at same rate as sent) between two processes. <br>Examples include IO Buffers, pipes, file IO, etc.

### Circular Queue 
The advantage of this data structure is that it reduces wastage of space in case of array implementation, As the insertion of the (n+1)’th element is done at the 0’th index if it is empty.

# -----------------------------------------------------------------------------
# -----------------------------------------------------------------------------

# Overview of Data Structures (Binary Tree, BST, Heap and Hash)

We have discussed Overview of Array, Linked List, Queue and Stack. <br>
Now, the following Data Structures are discussed.

5. Binary Tree
6. Binary Search Tree
7. Binary Heap
9. Hashing

## Binary Tree

Unlike Arrays, Linked Lists, Stack and queues, which are linear data structures, trees are hierarchical data structures.<br>
A binary tree is a tree data structure in which each node has at most two children, which are referred to as the left child and the right child.<br> It is implemented mainly using Links.

### Binary Tree Representation:
A tree is represented by a pointer to the topmost node in tree. If the tree is empty, then value of root is NULL.<br> A Binary Tree node contains following parts.
1. Data
2. Pointer to left child
3. Pointer to right child

A Binary Tree can be traversed in two ways:<br>
- <b>Depth First Traversal:</b> Inorder (Left-Root-Right), Preorder (Root-Left-Right) and Postorder (Left-Right-Root)
- <b>Breadth First Traversal:</b> Level Order Traversal

### Binary Tree Properties:
```
The maximum number of nodes at level ‘l’ = 2l-1.

Maximum number of nodes = 2h – 1.
Here h is height of a tree. Height is considered 
as is maximum number of nodes on root to leaf path

Minimum possible height =  ceil(Log2(n+1))   

In Binary tree, number of leaf nodes is always one 
more than nodes with two children.

Time Complexity of Tree Traversal: O(n)
```
<b>Examples:</b><br> One reason to use binary tree or tree in general is for the things that form a hierarchy. They are useful in File structures where each file is located in a particular directory and there is a specific hierarchy associated with files and directories. <br>Another example where Trees are useful is storing heirarchical objects like JavaScript Document Object Model considers HTML page as a tree with nesting of tags as parent child relations.

# -----------------------------------------------------------------------------

## Binary Search Tree

Binary Search Tree is a Binary Tree with following additional properties:
1. The left subtree of a node contains only nodes with keys less than the node’s key.
2. The right subtree of a node contains only nodes with keys greater than the node’s key.
3. The left and right subtree each must also be a binary search tree.


<b>Time Complexities:</b>
```
Search :  O(h)
Insertion : O(h)
Deletion : O(h)
Extra Space : O(n) for pointers

h: Height of BST
n: Number of nodes in BST

If Binary Search Tree is Height Balanced, 
then h = O(Log n) 

Self-Balancing BSTs such as AVL Tree, Red-Black
Tree and Splay Tree make sure that height of BST 
remains O(Log n)
```
- BST provide moderate access/search (quicker than Linked List and slower than arrays).
- BST provide moderate insertion/deletion (quicker than Arrays and slower than Linked Lists).

<b>Examples:</b><br> Its main use is in search application where data is constantly entering/leaving and data needs to printed in sorted order. <br>For example in implementation in E- commerce websites where a new product is added or product goes out of stock and all products are lised in sorted order.

# -----------------------------------------------------------------------------

## Binary Heap

A Binary Heap is a Binary Tree with following properties.
1) It’s a complete tree (All levels are completely filled except possibly the last level and the last level has all keys as left as possible). This property of Binary Heap makes them suitable to be stored in an array.
2) A Binary Heap is either Min Heap or Max Heap. In a Min Binary Heap, the key at root must be minimum among all keys present in Binary Heap. The same property must be recursively true for all nodes in Binary Tree. Max Binary Heap is similar to Min Heap. It is mainly implemented using array.
```
Get Minimum in Min Heap: O(1) [Or Get Max in Max Heap]
Extract Minimum Min Heap: O(Log n) [Or Extract Max in Max Heap]
Decrease Key in Min Heap: O(Log n)  [Or Extract Max in Max Heap]
Insert: O(Log n) 
Delete: O(Log n)
```
<b>Example:</b><br> Used in implementing efficient priority-queues, which in turn are used for scheduling processes in operating systems. Priority Queues are also used in Dijstra’s and Prim’s graph algorithms.<br>
The Heap data structure can be used to efficiently find the k smallest (or largest) elements in an array, merging k sorted arrays, median of a stream, etc.<br>
Heap is a special data structure and it cannot be used for searching of a particular element.

# -----------------------------------------------------------------------------

## HashingHash Function

<b>HashingHash Function:</b><br> A function that converts a given big input key to a small practical integer value.<br>The mapped integer value is used as an index in hash table.<br> A good hash function should have following properties
1) Efficiently computable.
2) Should uniformly distribute the keys (Each table position equally likely for each key)

<b>Hash Table:</b><br> An array that stores pointers to records corresponding to a given phone number. An entry in hash table is NIL if no existing phone number has hash function value equal to the index for the entry.

<b>Collision Handling:</b><br> Since a hash function gets us a small number for a key which is a big integer or string, there is possibility that two keys result in same value. The situation where a newly inserted key maps to an already occupied slot in hash table is called collision and must be handled using some collision handling technique. Following are the ways to handle collisions:

<b>Chaining:</b><br>The idea is to make each cell of hash table point to a linked list of records that have same hash function value. Chaining is simple, but requires additional memory outside the table.

<b>Open Addressing:</b><br> In open addressing, all elements are stored in the hash table itself. Each table entry contains either a record or NIL. When searching for an element, we one by one examine table slots until the desired element is found or it is clear that the element is not in the table.
```
Space : O(n)
Search    : O(1) [Average]    O(n) [Worst case]
Insertion : O(1) [Average]    O(n) [Worst Case]
Deletion  : O(1) [Average]    O(n) [Worst Case]
```

Hashing seems better than BST for all the operations. But in hashing, elements are unordered and in BST elements are stored in an ordered manner. Also BST is easy to implement but hash functions can sometimes be very complex to generate. In BST, we can also efficiently find floor and ceil of values.

<b>Example:</b><br> Hashing can be used to remove duplicates from a set of elements. Can also be used find frequency of all items. For example, in web browsers, we can check visited urls using hashing. In firewalls, we can use hashing to detect spam. We need to hash IP addresses. Hashing can be used in any situation where want search() insert() and delete() in O(1) time.

# -----------------------------------------------------------------------------
# -----------------------------------------------------------------------------

# Overview of Data Structures (Graph, Trie, Segment Tree and Suffix Tree)

In this section, the following Data Structures are discussed.
9. Graph
10. Trie
11. Segment Tree
12. Suffix Tree

## Graph

Graph is a data structure that consists of following two components:
- A finite set of vertices also called as nodes.
- A finite set of ordered pair of the form (u, v) called as edge. 
The pair is ordered because (u, v) is not same as (v, u) in case of directed graph(di-graph). <br>The pair of form (u, v) indicates that there is an edge from vertex u to vertex v. <br>The edges may contain weight/value/cost.
- V -> Number of Vertices.
- E -> Number of Edges.

Graph can be classified on the basis of many things, below are the two most common classifications :

### Direction :
- <b>Undirected Graph:</b> The graph in which all the edges are bidirectional.
- <b>Directed Graph:</b> The graph in which all the edges are unidirectional.

### Weight :
- <b>Weighted Graph:</b> The Graph in which weight is associated with the edges.
- <b>Unweighted Graph:</b> The Graph in which their is no weight associated to the edges.

Graph can be represented in many ways, below are the two most common representations :

Let us take below example graph two see two representations of graph.

![Graph.jpg](attachment:Graph.jpg)
    
### <center>1. Adjacency Matrix Representation of the above graph
    
![Adjacency%20List.jpg](attachment:Adjacency%20List.jpg)

### <center>2. Adjacency List Representation of the above Graph 
    
![Adjacency%20Matrix.png](attachment:Adjacency%20Matrix.png)

```
Time Complexities in case of Adjacency Matrix :
Traversal :(By BFS or DFS) O(V^2)
Space : O(V^2)

Time Complexities in case of Adjacency List :
Traversal :(By BFS or DFS) O(ElogV)
Space : O(V+E)
```
<b>Examples:</b><br> The most common example of the graph is to find shortest path in any network. Used in google maps or bing. <br>Another common use application of graph are social networking websites where the friend suggestion depends on number of intermediate suggestions and other things.

# -----------------------------------------------------------------------------

## Trie

Trie is an efficient data structure for searching words in dictionaries, search complexity with Trie is linear in terms of word (or key) length to be searched. <br>If we store keys in binary search tree, a well balanced BST will need time proportional to <b>M * log N</b>, where M is maximum string length and N is number of keys in tree. <br>Using trie, we can search the key in <b>O(M)</b> time. So it is much faster than BST.

Hashing also provides word search in <b>O(n)</b> time on average. 

<b>Advantages:</b><br>There are no collisions (like hashing) so worst case time complexity is <b>O(n)</b>. <br>Also, the most important thing is Prefix Search. With Trie, we can find all words beginning with a prefix (This is not possible with Hashing). 

<b>Disadvantages:</b><br>The only problem with Tries is they require a lot of extra space. Tries are also known as radix tree or prefix tree.

The Trie structure can be defined as follows :

```
struct trie_node
{
    int value; /* Used to mark leaf nodes */
    trie_node_t *children[ALPHABET_SIZE];
};

```
![Trie.png](attachment:Trie.png)

<center> The leaf nodes are in <b>blue. </b></center>


```
Insert time : O(M) where M is the length of the string.
Search time : O(M) where M is the length of the string.
Space : O(ALPHABET_SIZE * M * N) where N is number of 
        keys in trie, ALPHABET_SIZE is 26 if we are 
        only considering upper case Latin characters.
Deletion time : O(M)
```

<b>Example:</b><br> The most common use of Tries is to implement dictionaries due to prefix search capability. Tries are also well suited for implementing approximate matching algorithms, including those used in spell checking.<br> It is also used for searching Contact from Mobile Contact list OR Phone Directory.

# -----------------------------------------------------------------------------

## Segment Tree

This data structure is usually implemented when there are a lot of queries on a set of values.<br> These queries involve minimum, maximum, sum, .. etc on a input range of given set. <br>Queries also involve updation of values in given set. 


Segment Trees are implemented using array.

![Segment%20Tree.jpg](attachment:Segment%20Tree.jpg)

```
Construction of segment tree : O(N)
Query : O(log N)
Update : O(log N)
Space : O(N) [Exact space = 2*N-1]
```
<b>Example:</b><br> It is used when we need to find Maximum/Minumum/Sum/Product of numbers in a range.

# -----------------------------------------------------------------------------

## Suffix Tree

Suffix Tree is mainly used to search a pattern in a text. The idea is to preprocess the text so that search operation can be done in time linear in terms of pattern length. <br>The pattern searching algorithms like KMP, Z, etc take time proportional to text length. This is really a great improvement because length of pattern is generally much smaller than text.<br>
Imagine we have stored complete work of William Shakespeare and preprocessed it. You can search any string in the complete work in time just proportional to length of the pattern. But using Suffix Tree may not be a good idea when text changes frequently like text editor, etc.

Suffix Tree is compressed trie of all suffixes, so following are very abstract steps to build a suffix tree from given text.

1. Generate all suffixes of given text.
2. Consider all suffixes as individual words and build a compressed trie.

![Suffix%20tree.jpg](attachment:Suffix%20tree.jpg)

<b>Example:</b><br> Used to find find all occurrences of the pattern in string. It is also used to find the longest repeated substring (when text doesn’t change often), the longest common substring and the longest palindrome in a string.

# -----------------------------------------------------------------------------
# -----------------------------------------------------------------------------

# Abstract Data Types (ADT)

Abstract Data type (ADT) is a type (or class) for objects whose behavior is defined by a set of value and a set of operations.<br>
The definition of ADT only mentions what operations are to be performed but not how these operations will be implemented. It does not specify how data will be organized in memory and what algorithms will be used for implementing the operations. <br>It is called <b>“abstract”</b> because it gives an implementation independent view. 

<b>The process of providing only the essentials and hiding the details is known as abstraction.</b><br>
The user of data type need not know that data type is implemented, for example, we have been using int, float, char data types only with the knowledge with values that can take and operations that can be performed on them without any idea of how these types are implemented. <br>So a user only needs to know what a data type can do but not how it will do it. We can think of ADT as a black box which hides the inner structure and design of the data type. 

Now we’ll define three ADTs namely 
- List ADT
- Stack ADT
- Queue ADT

## List ADT

A list contains elements of same type arranged in sequential order<br>
Following operations can be performed on the list:

- <b>get()</b> – Return an element from the list at any given position.
- <b>insert()</b> – Insert an element at any position of the list.
- <b>remove()</b> – Remove the first occurrence of any element from a non-empty list.
- <b>removeAt()</b> – Remove the element at a specified location from a non-empty list.
- <b>replace()</b> – Replace an element at any position by another element.
- <b>size()</b> – Return the number of elements in the list.
- <b>isEmpty()</b> – Return true if the list is empty, otherwise return false.
- <b>isFull()</b> – Return true if the list is full, otherwise return false.

## Stack ADT

A Stack contains elements of same type arranged in sequential order. All operations takes place at a single end that is top of the stack.<br>
Following operations can be performed on the Stack:

- <b>push()</b> – Insert an element at one end of the stack called top.
- <b>pop()</b> – Remove and return the element at the top of the stack, if it is not empty.
- <b>peek()</b> – Return the element at the top of the stack without removing it, if the stack is not empty.
- <b>size()</b> – Return the number of elements in the stack.
- <b>isEmpty()</b> – Return true if the stack is empty, otherwise return false.
- <b>isFull()</b> – Return true if the stack is full, otherwise return false.

## Queue ADT

A Queue contains elements of same type arranged in sequential order. Operations takes place at both ends, insertion is done at end and deletion is done at front. <br>
Following operations can be performed on the Queue:

- <b>enqueue()</b> – Insert an element at the end of the queue.
- <b>dequeue()</b> – Remove and return the first element of queue, if the queue is not empty.
- <b>peek()</b> – Return the element of the queue without removing it, if the queue is not empty.
- <b>size()</b> – Return the number of elements in the queue.
- <b>isEmpty()</b> – Return true if the queue is empty, otherwise return false.
- <b>isFull()</b> – Return true if the queue is full, otherwise return false.

From these definitions, we can clearly see that the definitions do not specify how these ADTs will be represented and how the operations will be carried out. <br>There can be different ways to implement an ADT.<br>For example, the List ADT can be implemented using arrays, or singly linked list or doubly linked list. <br>Similarly, stack ADT and Queue ADT can be implemented using arrays or linked lists.


## Reference: https://en.wikipedia.org/wiki/Abstract_data_type