## Strings boot camp

A palindromic string is one which reads the same when it is reversed. The program below checks whether a string is palindromic. Rather than creating a new string for the reverse of the input string, it traverses the input string forwards and backwards, thereby saving space. Notice how it uniformly handles even and odd length. 

In [1]:
def is_palindromic(s: str) -> bool:
    # Notice that s[~i] for i in [0,len(s) -1] is s[-(i+1)].
    return all(s[i] == s[~i] for i in range(len(s) // 2))

In [2]:
s = 'abcde'
is_palindromic(s)

False

In [3]:
s = 'abcba'
is_palindromic(s)

True

Time Complexity: $O(n)$. 

Space Complexity: $O(1)$. 

In [4]:
'Euclid,Axiom 5,Parallel Lines'.split(',')

['Euclid', 'Axiom 5', 'Parallel Lines']

In [5]:
3* '01'

'010101'

In [6]:
','.join(('Gauss', 'Prince of Mathematicians', '1777-1855'))

'Gauss,Prince of Mathematicians,1777-1855'

In [8]:
s = 'ABTRFejis'
s.lower()

'abtrfejis'

In [9]:
'Name {name}, Rank {rank}'.format(name = 'Archimedes', rank = 3)

'Name Archimedes, Rank 3'

**String is IMMUTABLE,** i.e. the need to allocate a new string when concatenating immutable strings. **Alternatives** to immutable strings, e.g., a list in Python. 

Updating a mutable string from the front is slow, to see if it's possible to write values from the back. 

## 6.1 Intervonvert strings and integers 

Implement an integer to string conversion function, and a string to integer conversion function.

In [17]:
chr(5)

'\x05'

In [18]:
chr(ord('0') + 5)

'5'

In [21]:
3%10

3

In [31]:
def int_to_string(x: int) -> str:
    is_negative = False
    if x <0:
        x, is_negative = -x, True
    s = []
    while True:
        s.append(chr(ord('0')  + x%10))
        x //=10
        if x == 0:
            break
            
    # Adds negative sign back if it_negative
    return ('-' if is_negative else '') +''.join(reversed(s))

In [32]:
x = -7849573738
int_to_string(x)

'-7849573738'

In [45]:
import string
s = '123'

string.digits.index(s[1])


2

In [46]:
def string_to_int(s: str) -> int:
    x = 0
    if s[0] == '-':
        for i in range(1,len(s)):
            x = x*10 + string.digits.index(s[i])
        x = -x 
    else:  
        for i in range(len(s)):
            x = x*10 + string.digits.index(s[i])
    return x 

In [47]:
string_to_int(s)

123

In [48]:
s = '674959683'
string_to_int(s)

674959683

In [49]:
s = '-384589594783'
string_to_int(s)

-384589594783

Time Complexity $O(n)$.

Space Complexity $O(n)$.

## 6.2 Base conversion 

The base $b$ number system generalizes the decimal number system: the string "$a_{k-1}a_{k-2}\cdots a_1 a_0$", where $0 \leq a_i < b$, denotes in base-$b$ the integer $a_0 \times b^0 + a_1 \times b^1 + a_2 \times b^2 + \cdots a_{k-1} \times b^{k-1}$. 

Write a program that performs base conversion. The input is a string, an integer $b_1$, and anohter integer $b_2$. The string represents an integer in base $b_1$. The output should be the string representing the integer in base $b_2$. Assume $2 \leq b_1, b_2 \leq 16$. Use 'A' to represent $10$, 'B' to represent $11$, \cdots, $F$ to represent $15$. 

In [53]:
import functools

In [51]:
def convert_base(num_as_string: str,b1: int,b2: int) -> str:
    # calculate the true value in base 10 
    def construct_from_base(num_as_int, base):
        return ('' if num_as_int == 0 else
                   construct_from_base(num_as_int //base, base) + 
                   string.hexdigits[num_as_int % base].upper())
    
    is_negative = num_as_string[0] == '-'
    num_as_int = functools.reduce(
        lambda x, c: x*b1 + string.hexdigits.index(c.lower()),
        num_as_string[is_negative:], 0)
    return ('-' if is_negative else '') + ('0' if num_as_int == 0 else
                                          construct_from_base(num_as_int, b2))

In [54]:
num_as_string = '615'
b1 = 7
b2 = 13
convert_base(num_as_string ,b1 ,b2)

'1A7'

Time Complexity is $O(n( 1+ \log_{b1}(b2)))$, where $n$ is the length of $s$. First we perform $n$ multiply-and-adds to get $x$ from $s$. Then we perform $\log_{b2}(x)$ multiply and adds to get the result. $x \leq b1^n$ and $\log_{b2}(b1^n) = n \log_{b2}(b1).$

## 6.3 Compute the spreadsheet column encoding 

Implement a function that converts a spreadsheet column id to the corresponding integer, with "A" corresponding to $1$. For example, return 4 for 'D', 27 for 'AA', 702 for 'ZZ', etc. How would you test your code?

**Sol:** Specifically, this problem is basically the problem of converting a string representing a base-26 number to the corresponding integer, except that "A" corresponds to 1 not 0. 

In [55]:
def ss_decode_col_id(col:str) -> int:
    x = 0
    for c in col:
        x = x*26 + ord(c) - ord('A') + 1
    return x
        
        

In [56]:
col = 'AA'
ss_decode_col_id(col)

27

In [57]:
col = 'ZZZ'
ss_decode_col_id(col)

18278

In [58]:
col = 'ZZ'
ss_decode_col_id(col)

702

In [59]:
col = 'B'
ss_decode_col_id(col)

2

Good test cases are around boundaries, e.g., "A", "B", "Y", "Z", "AA", "AB", "ZY", "ZZ",and some random strings, e.g. "M", "BZ", "CCC". 

Time Complexity O(n).

**Variant:** Implement a function that converts an integer to the spreadsheet column id. For example, you should return "D" for 4, "AA" for 27 and "ZZ" for 702.

In [90]:
def ss_encode_col_id(x:int) -> str:
    s = []
    while True:
        s.append(chr(ord('A')  + x%26 -1 ))
        x //=26
        if x == 0:
            break
    return s
    

In [91]:
ss_encode_col_id(26)

['@', 'A', 'Z']

In [87]:
chr(26&26-1 + ord('A'))

'\x1a'

## 6.4 Replace and remove 

Write a program which takes as input an array of characters, and removes each 'b' and replaces 'a' by two 'd's. 

**Sol:** Library array implementations often have methods for inserting into a specific location (all later entries are shifted right, and the array is resized) and deleting from a specific location (all later entries are shifted left, and the size of the array is decremented). If the input array had such method, we could apply them- however, the time complexity would be O(n^2), where n is the array's length. The reason is that each insertion and deletion from the array would have O(n) time complexity. 

This problem is trivial to solve in O(n) time if we wrtie result to a new array -- we skip 'b's, replace 'a's by two 'd's, and copy over all other characters. However, this entails O(n) additional space. 

If there are no 'a's, we can implement the function without allocating additional space with one forward iteration by skipping 'b''s and copyting over the other characters. 

If there are no 'b's, we can implement the function without additional space as follows. First, we compute the final length of the resulting string, which is the lenght of the array plus the number of 'a's. We can then write the result, character by character, starting from the last character, working our way backwards. 

Combine the two methods, we can do replacement and removing without additional space. 

In [121]:
def replace_and_remove(size: int, s:list) -> list:
    # Forward iteration: remove 'b's and count the number of 'a's
    write_idx, a_count = 0,0 
    for i in range(size):
        if s[i] != 'b':
            print(i)
            s[write_idx] = s[i]
            write_idx += 1
        if s[i] == 'a':
            print(i)
            a_count += 1
    # Backward iteration: replace 'a's with 'dd's starting from the end 
    cur_idx = write_idx - 1
    write_idx += a_count -1
    final_size = write_idx + 1
    while cur_idx >= 0:
        if s[cur_idx] == 'a':
            s[write_idx -1:write_idx +1] == 'dd'
            write_idx -= 2
        else:
            s[write_idx] = s[cur_idx]
            write_idx -= 1
        cur_idx -= 1
    return final_size, s

In [123]:
size = 5
s = 'abcba'
s = list(s)
print(s)

['a', 'b', 'c', 'b', 'a']


In [124]:
replace_and_remove(size, s)

0
0
2
4
4


(5, ['a', 'c', 'c', 'b', 'a'])

## 6.5 Test Palindormicity 

For the purpose of this problem, define a palindromic string to be a string which when all the nonalphanumeric are removed it reads the same front to back ignoring case. 

Implement a funciton which takes as input a string s and return true if s is a palindromic string. 


In [125]:
def is_palindrome(s:str) -> bool:
    # i moves forward, and j moves backward.
    i, j = 0, len(s) -1
    while i < j:
        # i and j both skip non-alphanumeric characters
        while not s[i].isalnum() and i <j:
            i += 1
        while not s[j].isalnum() and i<j:
            j -= 1
        if s[i].lower() != s[j].lower():
            return False
        i,j = i+1, j-1 
    return True 

In [129]:
s = 'abece.!ba'
is_palindrome(s)

True

In [131]:
def is_palindrome_pythonic(s):
    return all(a == b for a, b in zip(
        map(str.lower, filter(str.isalnum, s)),
        map(str.lower, filter(str.isalnum, reversed(s)))))

In [132]:
is_palindrome_pythonic(s)

True

We spend O(1) per character, so the time complexity is O(n), where n is the length of s. 

## 6.6 Reverse all the words in a sentence 

Given a string containing a set of words separated by whitespace, we would like to transform it to a string in which the words appear in the reverse order. For example, "Alice likes Bob" transforms to "Bob likes Alice". We do not need to keep the original string. 

In [146]:
def reverse_words(s):
    def reverse_range(s, start, finish):
        while start < finish:
            s[start], s[finish] = s[finish], s[start]
            start, finish = start+1, finish-1
    # First, reverse the whole string.
    reverse_range(s, 0, len(s)-1)
    print('reversed s')
    print(s)
    
    start = 0
    while True:
        finish = start
        while finish < len(s) and s[finish] != ' ':
            finish += 1
        if finish == len(s):
            break
        # Reverse each word in the string
        reverse_range(s,start,finish-1)
        start = finish +1
        
    # reverse the last words
    reverse_range(s, start, len(s)-1)
    
    
    return s 
            

In [147]:
s = 'Bob likes Alice'
s = list(s)
print(s)
reverse_words(s)

['B', 'o', 'b', ' ', 'l', 'i', 'k', 'e', 's', ' ', 'A', 'l', 'i', 'c', 'e']
reversed s
['e', 'c', 'i', 'l', 'A', ' ', 's', 'e', 'k', 'i', 'l', ' ', 'b', 'o', 'B']


['A', 'l', 'i', 'c', 'e', ' ', 'l', 'i', 'k', 'e', 's', ' ', 'B', 'o', 'b']

Since we spend O(1) per character, the time complexity is O(n), where n is the length of s. The computation in space, i.e., the additional space complexity is O(1). 

## 6.7 The look-and-say problem 

The look-and-say sequence starts with 1. Subsequent numbers are derived by describing the previous number in terms of consecutive digits. Specifically, to generate an entry of the sequence from the previous entry, read off the digits of the previous entry, counting the number of digits in groups of the same digit. 

Write a program that takes as input an integer n and returns the nth integer in the look-and-say sequence. Return the result as string. 

In [148]:
def look_and_say(n: int) -> str:
    def next_number(s):
        result, i = [], 0
        while i < len(s):
            count = 1
            while i + 1 < len(s) and s[i] == s[i+1]:
                i += 1
                count += 1
            result.append(str(count) + s[i])
            i += 1
        return ''.join(result)
    s = '1'
    for _ in range(1,n):
        s = next_number(s)
    return s 

In [149]:
look_and_say(8)

'1113213211'

In [150]:
look_and_say(3)

'21'

In [153]:
import itertools

In [157]:
# Pythonic solution uses the power of itertools.groupby()
def look_and_say_pythonic(n):
    s = '1'
    for _ in range(n-1):
        s = ''.join(
        str(len(list(group))) + key for key, group in itertools.groupby(s))
    return s 


In [158]:
look_and_say_pythonic(8)

'1113213211'

 ## 6.8 Convert from roman to decimal 
 
 The Roman numeral representation of positive integers uses the symbols I, V, X, L, C, D, M. Each symbol represents a value, with I being 1, V being 5, X being 10, L being 50, C being 100, D being 500 and M being 1000.
 
 In this problem we give simplified rules for representing numbers in this systems. Specifically, define a string over the Roman number symbols to be a valid Roman number string if symbols appear in nonincreasing order, with the following exceptions allowed:
 * I can immediately precede V and X. 
 * X can immediately precede L and C. 
 * C can immediately precede D and M. 
 
Back-to-back exceptions are not allowed, e.g., IXC is invalid, as is CDM. 

Write a program which takes as input a valid Roman number string s and returns the integer it corresponds to. 

Hint: Start by solving the problem assuming no ecveptions cases. 

In [4]:
import functools

In [5]:
def roman_to_integer(s: str) -> int:
    T = {'I': 1, 'V': 5, 'X': 10, 'L': 50, 'C': 100, 'D': 500, 'M': 1000}
    
    return functools.reduce(
        lambda val, i: val + (-T[s[i]] if T[s[i]] < T[s[i+1]] else T[s[i]]),
        reversed(range(len(s)- 1)), T[s[-1]])

In [6]:
roman_to_integer('IC')

99

In [7]:
roman_to_integer('LIX')

59

In [8]:
roman_to_integer('LVIII')

58

Each character of s is processed in O(1) time, yielding an O(n) overall time complexity, where n is the length of s. 

## 6.9 Compute all valid IP addresses 

Write a program that determines where to add periods to a decimal string so that the resulting string is a valid IP address. There may be more than one valid IP addresses corresponding to a string, in which case you should print all possibilities. 

In [9]:
def get_valid_ip_address(s: str) -> list:
    def is_valid_parts(s):
        # '00', '000', '01', etc. are not valid, but '0' is valid. 
        return len(s) == 1 or (s[0] != '0' and int(s) <= 255)
    
    result, parts = [], ['']*4
    for i in range(1, min(4,len(s))):
        parts[0] = s[:i]
        if is_valid_parts(parts[0]):
            for j in range(1, min(4, len(s) -i )):
                parts[1] = s[i: i+j]
                if is_valid_parts(parts[1]):
                    for k in range(1, min(4, len(s) -i -j)):
                        parts[2], parts[3] = s[i+j: i+j+k], s[i+j+k:]
                        if is_valid_parts(parts[2]) and is_valid_parts(parts[3]):
                            result.append('.'.join(parts))
    return result 
                        
                            
                

In [10]:
s = '1921681201'
get_valid_ip_address(s)

['19.216.81.201', '192.16.81.201', '192.168.1.201', '192.168.120.1']

In [11]:
s = '1121681201'
get_valid_ip_address(s)

['11.216.81.201', '112.16.81.201', '112.168.1.201', '112.168.120.1']

In [14]:
s = '19216811'
get_valid_ip_address(s)

['1.92.168.11',
 '19.2.168.11',
 '19.21.68.11',
 '19.216.8.11',
 '19.216.81.1',
 '192.1.68.11',
 '192.16.8.11',
 '192.16.81.1',
 '192.168.1.1']

The total number of IP addresses is a constant (2^32), implying an O(1) time complexity for the above algorithm. 

## 6.10 Write a strting sinusoidally 


Write a program which takes as input a string s and returns the snakestring of s. 

In [15]:
def snake_string(s: str) -> str:
    result = []
    # Outputs the first row, i.e. s[1], s[5], s[9], ....
    for i in range(1, len(s), 4):
        result.append(s[i])
    # Outputs the second row, i.e. s[0], s[2], s[4], ...
    for i in range(2, len(s), 2):
        result.append(s[i])
    # Outputs the last row, i.e. s[3], s[7], s[11], ...
    for i in range(3, len(s), 4):
        result.append(s[i])
    
    return ''.join(result)

In [16]:
def snake_string_pythonic(s):
    return s[1::4] + s[::2] + s[3::4]

In [17]:
s = 'Hello World!'
snake_string_pythonic(s)

'e lHloWrdlo!'

## 6.11 Implement run-length encoding 

Encode successive repeated characters by the repetition count and the character. For example, the RLE of 'aaaabcccaa' is '4a1b3c2a'. The decoding of '3e4f2e' returns 'eeeffffee'. 

In [18]:
def decoding(s: str) -> str:
    count, result = 0, []
    for c in s:
        if c.isdigit():
            count = count * 10 + int(c)
        else: # c is not a number 
            result.append(c*count) # Append count copies of c to result 
            count = 0
    return ''.join(result)

In [19]:
s = '5e2f1d3w'
decoding(s)

'eeeeeffdwww'

In [30]:
def encoding(s:str) -> str:
    count, result = 1, []
    for i in range(1, len(s)+1):
        if i == len(s) or s[i-1] != s[i]:
            result.append(str(count))
            result.append(s[i-1])
            count = 1
        else:
            count += 1


    return ''.join(result) 
        

In [31]:
s = 'jjjjiiooop'
encoding(s)

'4j2i3o1p'

The time complexity is O(n), where n is the length of the string. 