# Algorithms by Yandex

[youtube playlist](https://www.youtube.com/playlist?list=PL6Wui14DvQPySdPv5NUqV3i8sDbHkCKC5)

## Lesson 3. Sets

### What are sets

There is no order in elements in a set.   
What a set should be able to do:
- Add an element
- Check for the presence of an element
- Remove an element

How a set is structured:
- Come up with some function that maps each element to a small number
- Calculate the function of the element
- Put the element in a list with a number equal to the value of the function

Example of an add function for numbers (hash function):
- Function - the last digit of number X (i.e. F(X) = X % 10)
- Calculate the function of the element
- Put the element in a list with a number equal to the value of the function

If we have a collision (two different numbers have the same hash), then we create a list of such numbers with the same hash.

We use a simple linear search to look for an element in a set. 

To write something to a memory is always slower than to read. 

The element is deleted after finding it in O(N), but the search for this element takes O(K/N).  
N is the size of the hash table  
K/N is the average length of the lists used by the hash table.

The time complexity of adding an element to a set implemented as a hash table is typically O(1) on average. This means that in an average case scenario, the time taken to add an element to the set is constant and does not depend on the number of elements in the set. However, in the worst case scenario, when there is a collision and multiple elements have the same hash, the time complexity can increase to O(n), where n is the number of elements in the list with the same hash. In this case, a linear search through the list would be required to find the position to add the element. But overall, the time complexity of adding an element to a hash table set is considered to be efficient and fast.

#### Our own multi-set realisation

A multiset is a set in which each element can appear multiple times.

In [1]:
# Initialize the setsize to 10 and create a list of empty lists with the length of setsize
setsize = 10
myset = [[] for _ in range(setsize)]

# Function to add an element to the set
# Find the modulo of the element and the setsize, and add the element to the list at that index
def add(x):
    myset[x % setsize].append(x)

# Function to find an element in the set
# Find the modulo of the element and the setsize, and search for the element in the list at that index
def find(x):
    for now in myset[x % setsize]:
        if now == x:
            return True
    return False

# Function to delete an element from the set
# Find the modulo of the element and the setsize, 
# then find the index of the element in the list and replace it with the last element
# Finally, pop the last element from the list
def delete(x):
    xlist = myset[x % setsize]
    for i in range(len(xlist)):
        if xlist[i] == x:
            xlist[i], xlist[len(xlist) - 1] = xlist[len(xlist) - 1], xlist[i]
            xlist.pop()
            return


What can be stored in a set efficiently:
- Anything can be stored
- Efficiently, only immutable objects
- For immutable objects, you can calculate the hash function value when they are created
- The hash function should provide uniform distribution.

### Amortized complexity

The problems with hash tables:
- When the size is too big - consumes a lot of memory O(N)
- WHhen the size is too small - high fill factor and slow search and deletion O(K/N)
- A reasonable balance is desired, for example, the fill factor is not more than one (i.e. K <= N). Then all operations will on average take O(1).   
  
A simple solution to this problem: when the hash table is full - let's just twice increase its size and simply rebuild it. 

The complexity of adding to such a table is O(NlogN)?:
- we take a table with a starting size of 1
- we will add N = 2^p elements, i.e. p = logN
- only on P steps (when the size of the table is equal to the next power of two) can a table reconstruction occur in O(N)
- **in reality, the complexity of adding N elements is O(N)**, since 1 + 2 + 4 + 8 +...+2^p = 2^p + 1 - 1 = 2N - 1 = O(N).

Амортизированная сложность - среднее время выполнения операции (условно).  
У нас амортизационная сложность операции O(1) - всего было N операций и суммарно на это ушло O(N).  
В худшем случае отдельная операция выполняется за O(N) - может не проходить для систем реального времени. 

### Tasks

#### Task 1.

Given a sequence of positive numbers of length N and a number X, find two different numbers A and B from the sequence such that A + B = X or return the pair 0, 0 if such a pair of numbers does not exist.

##### O(N^2) solution

In [9]:
# This function takes in two arguments:
# nums is a list of positive numbers
# x is a target number that we want to find two different terms from the list that sum up to it
def twotermswithsumx(nums, x):
    # We use two for-loops to check every possible combination of two terms from the list
    for i in range(len(nums)):
        for j in range(i + 1, len(nums)):
            # If we find two terms that sum up to x, we return them
            if nums[i] + nums[j] == x:
                return nums[i], nums[j]
    # If we don't find any two terms that sum up to x, we return (0, 0)
    return 0, 0


# We call the function and pass the list [1, 3, 5, 8, 2] and the target number 7 as arguments
twotermswithsumx([1, 3, 5, 8, 2], 7)

(5, 2)

##### O(N) solution

In [11]:
def twotermswithsumx(nums, x):
    """
    This function finds two different elements in the given list 'nums' 
    such that their sum is equal to 'x', or returns 0,0 
    if no such pair exists.
    """
    # A set to store elements seen so far
    prevnums = set()

    # Iterate over elements in the list 'nums'
    for nownum in nums:
        # Check if the complement of the current element (x - nownum) is present in the set 'prevnums'
        if x - nownum in prevnums:
            # If yes, return the pair (nownum, x - nownum)
            return nownum, x - nownum
        # Add the current element to the set 'prevnums'
        prevnums.add(nownum)

    # Return (0,0) if no such pair exists
    return 0, 0

    
twotermswithsumx([1, 3, 5, 8, 2], 7)

(2, 5)

#### Task 2.

Given a dictionary of N words, each with a length of not more than K.
In the recording of each of the M words of the text (each with a length of up to K), one letter may be missing. For each word, say whether it is included (possibly with one missing letter) in the dictionary.

##### O(NK^2 + M) solution

This function checks if each word in the text is present in the dictionary or with one letter removed. It does this by first creating a set goodwords that contains all words from the dictionary and all possible variations with one letter removed. Then, for each word in the text, it checks if it's in the set of goodwords, and adds the result to a list ans. Finally, the function returns the list ans, which is a list of Boolean values indicating if each word in the text is present in the dictionary or with one letter removed.

In [18]:
def wordsindict(dictionary, text):
    # Create a set of words from the dictionary, with possible
    # one letter removed.
    goodwords = set(dictionary)
    for word in dictionary:
        for delpos in range(len(word)):
            goodwords.add(word[:delpos] + word[delpos+1:])

    # Check if each word in text is in the set of good words.
    ans = [] 
    for word in text:
        ans.append(word in goodwords)
    return ans


# an example to test the function
dictionary = ['apple', 'banana', 'cherry']
text = ['aple', 'banana', 'pear', 'cherries']

wordsindict(dictionary, text)

[True, True, False, False]