# [CptS 215 Data Analytics Systems and Algorithms](https://github.com/gsprint23/cpts215)
[Washington State University](https://wsu.edu)

[Gina Sprint](http://eecs.wsu.edu/~gsprint/)
# Lists, Linked Lists, and Sets

Learner objectives for this lesson:
* Review lists, sets, and linked lists
* Perform algorithm analysis of lists, sets, and linked lists

## Acknowledgments
Content used in this lesson is based upon information in the following sources:
* [Miller and Ranum](http://interactivepython.org/runestone/static/pythonds/index.html)
* [Python Sets](https://docs.python.org/3/tutorial/datastructures.html#sets)

## Linear vs Non-Linear Data Structures
Stacks, queues, deques, and lists are examples of data collections whose items are ordered depending on how they are added or removed. Once an item is added, it stays in that position relative to the other elements that came before and came after it. Collections such as these are often referred to as linear data structures. Linear structures can be thought of as having two ends. Sometimes these ends are referred to as the "left" and the "right" or in some cases the "front" and the "rear." You could also call them the "top" and the "bottom." The names given to the ends are not significant. What distinguishes one linear structure from another is the way in which items are added and removed, in particular the location where these additions and removals occur. For example, a structure might allow new items to be added at only one end. Some structures might allow items to be removed from either end.

Sets, dictionaries, trees, and graphs are examples of non-linear data collections. For non-linear structures, the relationship of adjacency is not maintained between the elements.

## Lists
A list is a *sequence of items*. In a string, the items are characters. In a list, they can be any type. Items in a list are also called *elements*. Here are common operations associated with the list implementation in Python:

|Operation|Time Complexity|Notes|
|---------|---------------|-----|
|Index []|$\mathcal{O}(1)$||
|Index assignment|$\mathcal{O}(1)$||
|`append()`|$\mathcal{O}(1)$||
|`pop()` (last item in list)|$\mathcal{O}(1)$||
|`pop(i)` (item at index `i`)|$\mathcal{O}(n)$||
|`insert(i, item)` (item at index `i`)|$\mathcal{O}(n)$||
|`del` (delete item)|$\mathcal{O}(n)$||
|Iteration|$\mathcal{O}(n)$||
|`in` (membership)|$\mathcal{O}(n)$||
|Slice `[x:y]`|$\mathcal{O}(k)$ where $k$ is the size of the list being concatenated||
|`del` (deleted slice)|$\mathcal{O}(n)$||
|Slice assignment|$\mathcal{O}(n + k)$||
|`reverse()`|$\mathcal{O}(n)$||
|Concatenation|$\mathcal{O}(k)$||
|`sort()`|$\mathcal{O}(n log_{2} n)$||
|`multiply()`|$\mathcal{O}(k)$||

A list in Python is an implementation of a list abstract data type. A list ADT may be specified as *unordered* or *ordered* (the items in the list are sorted in some way other than by position). Unless explicitly stated, lists are generally assumed to be unsorted. We will implement an unordered and ordered list ADT using linked lists.

### Unordered List Abstract Data Type
* `List()`: creates a new list that is empty. It needs no parameters and returns an empty list.
* `add(item)`: adds a new item to the list. It needs the item and returns nothing. 
* `remove(item)`: removes the item from the list. It needs the item and modifies the list. 
* `search(item)`: searches for the item in the list. It needs the item and returns a boolean value.
* `is_empty()`: tests to see whether the list is empty. It needs no parameters and returns a boolean value.
* `size()`: returns the number of items in the list. It needs no parameters and returns an integer.
* `append(item)`: adds a new item to the end of the list making it the last item in the collection. It needs the item and returns nothing.
* `index(item)`: returns the position of item in the list. It needs the item and returns the index.
* `insert(pos, item)`: adds a new item to the list at position `pos`. It needs the item and returns nothing. 
* `pop()`: removes and returns the last item in the list. It needs nothing and returns an item.
* `pop(pos)`: removes and returns the item at position `pos`. It needs the position and returns the item.

### Ordered List Abstract Data Type
* `OrderedList()`: creates a new ordered list that is empty. It needs no parameters and returns an empty list.
* `add(item)`: adds a new item to the list making sure that the order is preserved. It needs the item and returns nothing.
* `remove(item)`: removes the item from the list. It needs the item and modifies the list.
* `search(item)`: searches for the item in the list. It needs the item and returns a boolean value.
* `is_empty()`: tests to see whether the list is empty. It needs no parameters and returns a boolean value.
* `size()`: returns the number of items in the list. It needs no parameters and returns an integer.
* `index(item)`: returns the position of item in the list. It needs the item and returns the index. 
* `pop()`: removes and returns the last item in the list. It needs nothing and returns an item. 
* `pop(pos)`: removes and returns the item at position `pos`. It needs the position and returns the item.

## Linked Lists
A linked list is a list of linked items with no requirement that the items are stored contiguously in memory. If we know the location of the first item in memory, the first item will tell us where the second item is (they are linked), and the second item will tell us where the third item is, and so on. We need to store an external reference to the first item in the list to maintain access to the list. This external reference is called the *head* of the list. 

Each item in the list is stored in a *node*. A node is the building block of this list, storing the item data and a link to the subsequent node in the list. 

<img src="https://upload.wikimedia.org/wikipedia/commons/1/1b/C_language_linked_list.png" width="500">
(image from [https://upload.wikimedia.org/wikipedia/commons/1/1b/C_language_linked_list.png](https://upload.wikimedia.org/wikipedia/commons/1/1b/C_language_linked_list.png))

There are many different implementation of linked lists, such as singly linked, doubly linked (nodes store links to previous nodes in the list), ordered, circular linked (last node stores a link to the first node), and with/without a dummy head node. Depending on the needs of the linked list, different implementations may be deployed.

Since linked lists were covered in detail in the prerequisite courses for CptS 215, we will take a look at a singly linked list implementation in detail and analyze the efficiency of common linked list operations such as `insert()` and `remove()`.

### Node Implementation

In [1]:
class Node:
    '''
    
    '''
    def __init__(self, data):
        '''
        
        '''
        self.data = data
        self.next = None
        
    def __str__(self):
        '''
        
        '''
        return str(self.data)

    def get_data(self):
        '''
        
        '''
        return self.data

    def get_next(self):
        '''
        
        '''
        return self.next

    def set_data(self, newdata):
        '''
        
        '''
        self.data = newdata

    def set_next(self, newnext):
        '''
        
        '''
        self.next = newnext

### (Unordered) Linked List Implementation

In [2]:
class LinkedList:
    '''
    
    '''
    def __init__(self):
        '''
        Creates a new list that is empty. It needs no parameters and returns an empty list.
        '''
        self.head = None
        
    def __str__(self):
        '''
        
        '''
        list_str = ""
        curr = self.head
        while curr is not None:
            list_str += str(curr)
            list_str += "->"
            curr = curr.get_next()
        list_str += "None"
        return list_str
        
    def add(self, item):
        '''
        Adds a new item to the list. It needs the item and returns nothing. Assume the item is not already in the list.
        '''
        temp = Node(item)
        temp.set_next(self.head)
        self.head = temp
        
    def append(self, item):
        '''
        Adds a new item to the end of the list making it the last item in the collection. 
        It needs the item and returns nothing. Assume the item is not already in the list.
        '''
        curr = self.head
        while curr.get_next() is not None:
            curr = curr.get_next()
        temp = Node(item)
        curr.set_next(temp)
        
    def insert(self, index, item):
        '''
        Adds a new item to the list at position pos. It needs the item and returns nothing. 
        Assume the item is not already in the list and there are enough existing items to have position pos.
        '''
        if index == 0:
            self.add(item)
        else: # not adding at front. stop one before location
            curr = self.head
            i = 0
            while curr.get_next() is not None and i < index - 1:
                curr = curr.get_next()
                i += 1
            temp = Node(item)
            temp.set_next(curr.get_next())
            curr.set_next(temp)
        
    def pop(self, index=None):
        '''
        Removes and returns the item at position index. It needs the position and returns the item. 
        If index is not specified, removes and returns the last item in the list.
        Assume the item is in the list.
        '''    
        if index is None:
            index = self.size() - 1

        if index == 0:
            curr = self.head
            self.head = self.head.get_next()
            return curr
        else: # not popping front. stop one before location
            curr = self.head
            i = 0
            while curr.get_next() is not None and i < index - 1:
                curr = curr.get_next()
                i += 1
            to_pop = curr.get_next()
            curr.set_next(to_pop.get_next())
            return to_pop
    
    def remove(self, item):
        '''
        Removes the item from the list. It needs the item and modifies the list. Assume the item is present in the list.
        '''
        current = self.head
        previous = None
        found = False
        while not found:
            if current.get_data() == item:
                found = True
            else:
                previous = current
                current = current.get_next()

        if previous == None:
            self.head = current.get_next()
        else:
            previous.set_next(current.get_next())
    
    def search(self, item):
        '''
        Searches for the item in the list. It needs the item and returns the index of the item (-1 if not found).
        Combined a Boolean search(item) with index(item) function.
        '''
        current = self.head
        found = -1
        loc = 0
        while current != None and found == -1:
            if current.get_data() == item:
                found = loc
            else:
                current = current.get_next()
            loc += 1

        return found
    
    def is_empty(self):
        '''
        Tests to see whether the list is empty. It needs no parameters and returns a boolean value.
        '''
        return self.head == None
    
    def size(self):
        '''
        Returns the number of items in the list. It needs no parameters and returns an integer.
        '''
        current = self.head
        count = 0
        while current != None:
            count = count + 1
            current = current.get_next()

        return count

groceries = LinkedList()
groceries.add("Eggs")
groceries.add("Milk")
groceries.add("Apples")
groceries.add("Carrots")
print(groceries)
print(groceries.is_empty())
print(groceries.size())
print("Search index found Milk", groceries.search("Milk"))
groceries.remove("Milk")
print(groceries)
print("Search index found Milk", groceries.search("Milk"))
groceries.append("Cheese")
print(groceries)
print(groceries.pop())
print(groceries)
print(groceries.pop(2))
print(groceries)
groceries.insert(2, "Chips")
print(groceries)

Carrots->Apples->Milk->Eggs->None
False
4
Search index found Milk 2
Carrots->Apples->Eggs->None
Search index found Milk -1
Carrots->Apples->Eggs->Cheese->None
Cheese
Carrots->Apples->Eggs->None
Eggs
Carrots->Apples->None
Carrots->Apples->Chips->None


## Ordered Linked Lists
We will now consider a type of list known as an ordered list. For example, if the list of integers shown above were an ordered list (ascending order), then it could be written as 17, 26, 31, 54, 77, and 93. Since 17 is the smallest item, it occupies the first position in the list. Likewise, since 93 is the largest, it occupies the last position.

The structure of an ordered list is a collection of items where each item holds a relative position that is based upon some underlying characteristic of the item. The ordering is typically either ascending or descending and we assume that list items have a meaningful comparison operation that is already defined. Many of the ordered list operations are the same as those of the unordered list.

### Ordered Linked List Implementation

In [3]:
class OrderedLinkedList(LinkedList):
    '''
    
    '''
    def __init__(self):
        '''
        Creates a new ordered list that is empty. It needs no parameters and returns an empty list.
        '''
        self.head = None
        super().__init__()
        
        
    def add(self, item):
        '''
        Adds a new item to the list making sure that the order is preserved.
        It needs the item and returns nothing. Assume the item is not already in the list.
        '''
        current = self.head
        previous = None
        stop = False
        while current != None and not stop:
            if current.get_data() > item:
                stop = True
            else:
                previous = current
                current = current.get_next()

        temp = Node(item)
        if previous == None:
            temp.set_next(self.head)
            self.head = temp
        else:
            temp.set_next(current)
            previous.set_next(temp)
        
    def search(self, item):
        '''
        Searches for the item in the list. It needs the item and returns the index of the item (-1 if not found).
        Combined a Boolean search(item) with index(item) function.
        Early exit based on ordered nature of OrderedLinkedList.
        '''
        current = self.head
        found = -1
        loc = 0
        stop = False
        while current != None and found == -1 and not stop:
            if current.get_data() == item:
                found = loc
            else:
                if current.get_data() > item:
                    stop = True
                else:
                    current = current.get_next()
            loc += 1

        return found
        

groceries = OrderedLinkedList()
groceries.add("Eggs")
groceries.add("Milk")
groceries.add("Apples")
groceries.add("Carrots")
print(groceries)
print(groceries.is_empty())
print(groceries.size())
print("Search index found Milk", groceries.search("Milk"))
groceries.remove("Milk")
print(groceries)
print("Search index found Milk", groceries.search("Milk"))
print(groceries.pop())
print(groceries)
print(groceries.pop(1))
print(groceries)
groceries.insert(2, "Chips")
print(groceries)

Apples->Carrots->Eggs->Milk->None
False
4
Search index found Milk 3
Apples->Carrots->Eggs->None
Search index found Milk -1
Eggs
Apples->Carrots->None
Carrots
Apples->None
Apples->Chips->None


### Linked List Time Complexities

|Operation|Time Complexity|Notes|
|---------|---------------|-----|
|`add(item)`|$\mathcal{O}(1)$|$\mathcal{O}(n)$ for an ordered linked list (have to traverse nodes to find insertion location)|
|`append()`|$\mathcal{O}(n)$|Can be $\mathcal{O}(1)$ with a doubly linked list or by adding a `tail` reference to the `LinkedList` class|
|`insert(i, item)` (item at index `i`)|$\mathcal{O}(n)$||
|`pop()` (first item in list)|$\mathcal{O}(1)$||
|`remove(item)`|$\mathcal{O}(n)$||
|`search(item)`|$\mathcal{O}(n)$||
|`size()`|$\mathcal{O}(n)$|Can be $\mathcal{O}(1)$ if maintain a counter attribute|
|`is_empty()`|$\mathcal{O}(1)$|||

Note: You may also have noticed that the performance of this implementation differs from the actual performance given earlier for Python lists. This suggests that linked lists are not the way Python lists are implemented. The actual implementation of a Python list is based on the notion of an array.

## Sets
A set is an *unordered collection with no duplicate elements*. Basic uses include membership testing and eliminating duplicate entries. Set objects also support mathematical operations like union, intersection, difference, and symmetric difference. 

### Set Abstract Data Type
* `Set()`: creates a new set that is empty. It needs no parameters and returns an empty set.
* `add(item)`: adds a new item to the set making sure that there are no duplicates of the item. It needs the item and returns nothing. 
* `remove(item)`: removes the item from the set. It needs the item and modifies the set. 
* `search(item)`: searches for the item in the set. It needs the item and returns a boolean value.
* `is_empty()`: tests to see whether the set is empty. It needs no parameters and returns a boolean value.
* `size()`: returns the number of items in the set. It needs no parameters and returns an integer.
* `union(others)`: returns a set with elements from the set and all others in the list. Overloads the `|` operator.
* `intersection(others)`: returns a new set with elements common to the set and all others in the list. Overloads the `&` operator.
* `difference(others)`: returns a new set with elements in the set that are not in the others. Overloads the `-` operator.
* `symmetric_difference(other)`: returns a new set with elements in either the set or other but not both. Overloads the `^` operator.
* `subset(other)`: test whether every element in the set is in other. Returns a boolean value. Overloads the `<=` operator.
* `proper_subset(other)`: Test whether the set is a proper subset of other, that is, `set <= other and set != other`. Returns a boolean value. Overloads the `<` operator.
* `superset(other)`: test whether every element in other is in the set. Returns a a boolean value. Overloads the `>=` operator.
* `proper_subset(other)`: test whether the set is a proper superset of other, that is, `set >= other and set != other`. Returns a boolean value. Overloads the `>` operator.

Implementing the set ADT is straightforward using a Python list with restrictions on adding items to the set. This is a practice problem included at the end of this lesson. For now, to show an example of how sets work, we will use the built-in set type in Python. Curly braces or the `set()` function can be used to create sets. 

Note: to create an empty set you have to use set(), not {}; the latter creates an empty dictionary.

In [4]:
states = {"wa", "id", "or"}
capitals = set(("olympia", "boise", "portland"))
word = "hello"
word_set = set(word)

print(states)
print(capitals)
print("Unique letters in %s: " %(word), end="")
print(word_set)
print(len(word) == len(word_set))

print(type(states))
print("id" in states)
print("mt" in states)

{'wa', 'or', 'id'}
{'boise', 'portland', 'olympia'}
Unique letters in hello: {'e', 'l', 'o', 'h'}
False
<class 'set'>
True
False


And now testing the union, intersection, difference, subset, and superset set operations:

In [5]:
word_set1 = set("hello")
word_set2 = set("help")

print("Union of:", word_set1, word_set2)
print(word_set1 | word_set2)
print(word_set1.union(word_set2))

print("Intersection of:", word_set1, word_set2)
print(word_set1 & word_set2)
print(word_set1.intersection(word_set2))

print("Difference of:", word_set1, word_set2)
print(word_set1 - word_set2)
print(word_set1.difference(word_set2))

print("Symmetric difference of:", word_set1, word_set2)
print(word_set1 ^ word_set2)
print(word_set1.symmetric_difference(word_set2))

# changing word sets
word_set1 = set("abc")
word_set2 = set("abcd")
print("Is subset of:", word_set1, word_set2)
print(word_set1 <= word_set2)
print(word_set1.issubset(word_set2))

print("Is superset of:", word_set1, word_set2)
print(word_set1 >= word_set2)
print(word_set1.issubset(word_set2))

Union of: {'e', 'l', 'o', 'h'} {'e', 'l', 'p', 'h'}
{'e', 'l', 'o', 'p', 'h'}
{'e', 'l', 'o', 'p', 'h'}
Intersection of: {'e', 'l', 'o', 'h'} {'e', 'l', 'p', 'h'}
{'e', 'l', 'h'}
{'e', 'l', 'h'}
Difference of: {'e', 'l', 'o', 'h'} {'e', 'l', 'p', 'h'}
{'o'}
{'o'}
Symmetric difference of: {'e', 'l', 'o', 'h'} {'e', 'l', 'p', 'h'}
{'o', 'p'}
{'o', 'p'}
Is subset of: {'a', 'c', 'b'} {'d', 'a', 'c', 'b'}
True
True
Is superset of: {'a', 'c', 'b'} {'d', 'a', 'c', 'b'}
False
True


## Practice Problems
### 1
Assume a linked list called `cities` is defined as the following:
* Pullman (head)
* Moscow
* Clarkston
* Lewiston (tail)

For the following sequence of operations, indicate the result of each operation and show the new linked list if it changed.
1. `cities.add("Spokane")`
1. `cities.add("Seattle")`
1. `first_city = names.pop()`
1. `index = names.search("Clarkston")`
1. `cities.insert(1, "Colton")`
1. `cities.remove("Lewiston")`
1. 
```
while not names.is_empty():
    print(names.pop())
```

### 2
Assuming a circular doubly linked list (without a dummy head node), and a reference to the first node in the list is called `head`:

In [None]:
def mystery():
    if not self.is_empty():
        f = self.head
        b = self.head.prev
        
        while True:
            temp = f.data
            f.data = b.data
            b.data = temp
            
            f = f.next
            b = b.prev
            
            if f != b and b != f.prev:
                break

1. What does `mystery()` do? What is the state of the list after `mystery()` is called? Answer: Reverses the contents of a linked list by working outside in and swapping data...Swaps data of two nodes on the outside of the list, moving in towards the center and stopping when `b` and `f` are pointing at the same node (in the case of an odd size list) or when `b` passes `f` (in the case of an even size list).
1. What is the time complexity of `mystery()`? Answer: $\mathcal{O}(n) = n$ because each item in the list is traversed only once and each item is traversed. Also because there is only one loop, no nested loops or recursion or repeated divisions by two (log behavior).

The following two questions refer to the `DoublyLinkedCircularList` class defined below:

In [2]:
class DoublyLinkedCircularList:
    def __init__(self):
        self.head = None
        self.size = 0
        
    def is_empty():
        return self.size == 0

### 3
Write the `remove(data)` method for `DoublyLinkedCircularList`. This method removes the first occurrence of the data from the linked lists, starting at the front of the list. The method returns `True` if successful, `False` otherwise.

### 4
Write a sort method for `DoublyLinkedCircularList`. Name your sorting method to the algorithm you are using.

### 5
Implement the set ADT using a Python list.