# String Manipulation

In [2]:
# Implement a function that returns whether a character is a lowercase or uppercase English letter or a digit
def is_alphanumeric(c):
    ascii = ord(c)
    upper = ascii >= ord('A') and ascii <= ord('Z')
    lower = ascii >= ord('a') and ascii <= ord('z')
    numeric = ascii >= ord('0') and ascii <= ord('9')
    return upper or lower or numeric

In [3]:
is_alphanumeric('(')

False

In [4]:
#Implement a function that converts a lowercase character to uppercase (or does nothing if it's not passed a lowercase letter)
def to_uppercase(c):
    try: 
        ascii = ord(c)
        if ascii >= ord('a') and ascii <= ord('z'): #establishes whether c is lowercase
            return chr(ord('A') + ascii - ord('a'))
    except:
        pass

In [5]:
to_uppercase('%')

#### String mutability in python
Strings are immutable in python, which means that if you add a character, python actually creates a new string, copying all elements from the existing string to the new one. This takes O(len(s)) time, so we should avoid using += syntax with strings.

The better alternative is to use a dynamic array and then converting to a string at the end of the operation, using ''.join(my_array) .

#### String Split

Without using a built-in string split method, implement a `split(s, c)` method, which receives a string `s` and a character `c` and splits `s` at each occurrence of `c`, returning a list of strings.

##### Explanation of approach
I create an empty starting array which will be used for the final output and another empty array which will be used to collect characters from the string s as the function is run. I know I'll need to cycle through all of the characters in s looking for the c variable and while I search I prefer to store the characters in a dynamic array than to keep recomposing strings, as this is a more efficient approach.

I use a for-loop to sequence through the characters in s, adding each character to the temporary buffer array unless I reach the target variable for splitting, in which case, I join the characters in the buffer together to create a string and add it to the final result array (and then empty the buffer). When I reach the end of the sequence, I perform one final string creation from the buffer and add this to the result array, then return the result array.

In [9]:
def split(s, c):
    if not s:
        return []
    result = []
    current = []
    for char in s:
        if char == c:
            result.append(''.join(current))
            current = []
        else:
            current += char
    result.append(''.join(current))
    return result

In [10]:
# Examples
s, c = "split by space", ' '
print(split(s, c))

s, c = "beekeeper needed", 'e'
print(split(s, c))

s, c = "/home/./..//Documents/", '/'
print(split(s, c))

['split', 'by', 'space']
['b', '', 'k', '', 'p', 'r n', '', 'd', 'd']
['', 'home', '.', '..', '', 'Documents', '']


#### Analysis
`split()` takes O(len(s)) time, since dynamic arrays take O(1) amortized time per character.

The space complexity is also O(len(s)). It includes the string, the final result array and the temporary array, each of which we might approximate to O(len(s)), although the temporary array is likely smaller.

##### Alternate approach
It's possible to optimise this approach by using two pointers and then slicing the string straight to the result array, bypassing the temporary array. 

In most programming languages, string slicing (or substring) operations typically have a time complexity of O(k), where k is the length of the substring being extracted. However, in some languages (like Python), string slicing can be O(1) in time complexity because strings are immutable and slices can be implemented as views on the original string.

Regarding space complexity, slicing generally creates a new string, which would be O(k) where k is the length of the slice.


In [13]:
def split(s, c):
    if not s:
        return []
    result = []
    j = 0
    for i in range(len(s)):
        if s[i] == c:
            result.append(s[j:i])
            j = i+1
    result.append(s[j:])
    return result
    

In [14]:
# Examples
s, c = "split by space", ' '
print(split(s, c))

s, c = "beekeeper needed", 'e'
print(split(s, c))

s, c = "/home/./..//Documents/", '/'
print(split(s, c))

['split', 'by', 'space']
['b', '', 'k', '', 'p', 'r n', '', 'd', 'd']
['', 'home', '.', '..', '', 'Documents', '']


#### String Join
Without using a built-in string split method, implement a `join(arr, s)` method, which receives an array of strings,`arr` and an array of individual characters `s` and returns a joined up string.

Assume you have access to a function `array_to_string(arr)`, which converts an array of individual characters to a string.

In [16]:
def join(arr, s):
    #if len(arr) == 0:
        #raise IndexError('Index out of bounds')
    joined_array = []
    for i in range(len(arr)):
        if i !=0:
            for c in s:
                joined_array.append(s)
        for c in str:
            joined_array.append(c)

    return array_to_string(joined_array)

#### String Matching
Implement an `index_of(s, t)` method, which returns the first index where string t appears in string s or -1 if s does not contain t.

In [18]:
def index_of(s, t):
    for i in range(len(s) - len(t) + 1):
        if s[i:i+len(t)] == t:
            return i
    return -1

In [19]:
index_of("Hello", "ll")

2

#### Analysis
`join(arr, s)` takes O(len(arr) x (len(s)+1)) time, or O(len(s)), since dynamic arrays take O(1) per character. The size is basically O(3xlen(s)), since we have to account for the original arrays, the temporary array and the final string.

`index_of(s,t)` takes slightly less than O(len(s)) time for the pointer to check across s for matches. Performing a slice takes O(1) time in python, but comparing each of the characters takes O(len(t)) so the total worst case is O(len(s) x len(t)). Space complexity is O(1), since the function doesn't create any new data structures that depend on the input size.

In [21]:
# The following provides a slight optimization. Rather than comparing the whole of t with each slice of s, 
# it uses a for-loop to compare each letter and has a break clause to end the loop as soon as a non-match is found.

def index_of(s, t):
    if t == "":
        return 0
    lt = len(t)    
    for i in range(len(s) - lt + 1):
        for j in range(lt):
            if s[i+j] != t[j]:
                break
            else:
                if j == lt - 1:
                    return i

    return -1

In [22]:
index_of("Hello", "ll")

2

In [23]:
# Constraints:

# - The input strings can contain any valid ASCII character
# - The length of s is at most 10^5
# - The length of t is at most 10^5

In [24]:
# - t can be empty, in which case return 0
index_of("Hello", "")

0

In [25]:
# - s can be empty, in which case return -1 if t is non-empty, 0 if t is empty
index_of("", "")

0

In [26]:
index_of("", "Hello")

-1