## 6.12 Find the first occurence of a substring 

Given two strings s(the 'search string') and t(the 'text'), find the frist occurence of s in t. 

**Sol:** The brute force solution uses two nested loops, the first iterates throuhg t, and second tests if s occurs starting at the current index in t. The worst case complexity is O(n^2). In general, it costs O(mn), where m is length of t and n is lenght of s. 

There are three linear time string matching algorithms: KMP, Boyer-Moore, and Rabin-Karp. Of these, Rabin-Karp is by far the simplest to understand and implement. 

In [1]:
import functools

In [7]:
def rabin_karp(t: str, s:str)-> int:
    if len(s) > len(t):
        return -1 # s is not a substring of t 
    
    base = 26
    # Hash codes for the substring of t and s 
    t_hash = functools.reduce(lambda h, c: h*base+ord(c), t[:len(s)], 0)
    s_hash = functools.reduce(lambda h, c: h*base+ord(c), s, 0)
    power_s = base**max(len(s)- 1, 0) # base^|s-1|
    
    for i in range(len(s), len(t)):
        # Check the two substrings are acutally equal or not, to protect 
        # against hash collision.
        if t_hash == s_hash and t[i - len(s): i] == s:
            return (i - len(s)) # Found a match 
        # Uses rolling hash to compute the hash code 
        t_hash -= ord(t[i - len(s)])*power_s
        t_hash = t_hash * base + ord(t[i])
        
    # Tries to match s and t[-len(s):]
    if t_hash == s_hash and t[-len(s):] == s:
        return len(t) - len(s)
    return -1 # s is not a substring of t

In [10]:
t = 'dnsmdnsndjeicnfjeigndj'
s = 'mdn'
rabin_karp(t,s)

3

## 6.11 Implement run-length encoding (RLE)

Implement run-length encoding and decoding functions. Assume the string to be encoded consists of letters of the alphabet, with no digits, and the string to be decoded in a valid encoding. 

In [11]:
for i in range(0,5,2):
    print(i)

0
2
4


In [12]:
def decoding(s: str)-> str:
    count, result = 0, []
    
    for c in s:
        if c.isdigit():
            count = count*10 + int(c)
        else: # c is a letter of alphabet
            result.append(c * count) # Appends count copies of c to result 
            count = 0
    return ''.join(result)       
        

In [13]:
s = '17e9c5l'
decoding(s)

'eeeeeeeeeeeeeeeeeccccccccclllll'

In [28]:
def encoding(s: str) -> str:
    count, result = 1, []
    
    for i in range(1,len(s)+1):
        if i == len(s):
            result.append(str(count))
            result.append(s[i-1])
            count = 1
        elif s[i] == s[i-1]:
            count += 1
        else:
            result.append(str(count))
            result.append(s[i-1])
            count = 1
    return ''.join(result)

In [29]:
s = 'jjjjjiijjeeeddeee'
encoding(s)

'5j2i2j3e2d3e'

The time complexity is O(n), where n is the length of the string. 

## 6.10 Write a string sinusoidally

Define the snakestring of s to be the left-right top-to-bottom sequence in which characters appear when s is written in sinusoidal fasion. For example, the snakestring string for "Hello World!" is "e LHL0Wrdlo!". 

Oberseve that the result begins with the characters s[1], s[5], s[9], ..., followed by s[0], s[2], s[4], ...., and then s[3], s[7], s[11], .... Therefore, we can create the snakestring directly with three itrations through s.  

In [None]:
def snake_string(s: str) -> str:
    result = []
    # Outputs the first row, i.e. s[1] s[5] s[9]...
    for i in range(1, len(s), 4):
        result.append(s[i])
    # Outputs the second row, i.e. s[0], s[2], s[]