# Skip Lists

Skip lists are done at indexing time and skips are placed at every `sqrt(Length)`

### Tradeoffs
1. Depending on the distribution of inputs (eg. more inputs within 2 skip indexes) then there won't be much skipping
2. If we increase the number of skip indexes, then there would be many comparisons to be made on each skip index (2x more comparisons)
3. If we reduce the number of skip indexes (larger skip span) then there would be fewer successful skips
4. Skip lists will suffer if `L` keeps changing because of updates (dynamically changing list)


### Important notes
Queries done on a skip list will always have upper bound of the posting size eg.  
word1.length = N ------> Then it will have an upperbound of **O(N)** assuming it doesnt skip anything.

## The Node

This node contains the current value and also the index to skip next

In [16]:
import math

class Node:
    def __init__ (self, value, skip_index=None):
        self.value = value
        self.skip_index = skip_index

    def getValue (self):
        return self.value

    def skip(self):
        return self.skip_index

## The List

This imlpements python lists and stores a list of `Node` type

In [17]:
# Condition: Data input must be sorted
class SkipList:
    def __init__ (self, data = []):
        self.skip_list = []

        length = len(data)
        skip_length = math.pow(length, 0.5)
        skip_length = int(skip_length)

        curr_skip = 0
        for i in range(0, length):
            if (i == curr_skip):
                self.skip_list.append(Node(data[i], i+skip_length))
                curr_skip = i + skip_length
            else:
                self.skip_list.append(Node(data[i]))
        
    def get(self, index):
        if (index >= len(self.skip_list)):
            return -1 # Returns -1 to show out of bounds
        return self.skip_list[index].getValue()

    def getSkipIndex(self, index):
        if (index >= len(self.skip_list)):
            return -1 # Returns -1 to show out of bounds

        return self.skip_list[index].skip()

    def getLength (self):
        return len(self.skip_list)


    def find(self, value, start_index=0):
        if (start_index >= len(self.skip_list)):
            return -1
         
        for i in range(start_index, len(self.skip_list)):
            if (self.skip_list[i].getValue() == value):
                return i

            elif (self.skip_list[i].skip() != None and self.skip_list[i].skip() < len(self.skip_list)):
                skip_index = self.skip_list[i].skip()

                # IF skip value is less than the one we want then move iterator to this skip index
                if (self.skip_list[skip_index].getValue() < value):
                    print ("skipped", i)
                    i = skip_index

                # IF the skip value is the one we want then just return that
                elif (self.skip_list[skip_index].getValue() == value):
                    print ("skipped", i)
                    return skip_index

In [18]:
# Creating a list of 1 to 10

myList = SkipList([1,2,3,4,5,6,7,8,9,10])

**Observe the skips only happen every 3 blocks.** `sqrt(10) = 3`

In [19]:
for i in range(0, myList.getLength()):
    print (myList.getSkipIndex(i))

3
None
None
6
None
None
9
None
None
12


**Notice how we have skipped iterator at 0, 3 and 6 to eventually get to index 9**

In [20]:
print (myList.find(10, 0))

skipped 0
skipped 3
skipped 6
9


# Extensions

Skip lists can be used for inverted indexing or positional indexing as well.

### Positional indexing

Yes eg. if we have a phrase "to be", and we are looking for the phrase in the **same document**, we can skip pointers for "be" once we have located "to" in the same document