# Data Structures and Algorithms in Python - Ch.5: Array-based Sequences
### AJ Zerouali, 2023/08/30

These are the exercises of Section 12 of Portilla's course "Data structures, algorithms and interviews", which are interview questions.

**Comment:**
Although this is supposed to be the chapter about arrays, the optimal solutions do not necessarily rely on lists and strings only. Finding an optimal algorithm also seems to rely on a much broader knowledge of the material, as one would in principle take the interview questions *after* having studied data structures and algorithms. In conclusion, I will essentially learn to make better algorithms after making the necessary mistakes/suboptimal implementations. 

## Exercise 1: Anagrams

https://github.com/jmportilla/Python-for-Algorithms--Data-Structures--and-Interviews/blob/master/02-Array%20Sequences/Array%20Sequences%20Interview%20Questions/01-Anagram-Check/01-Anagram%20Check%20.ipynb

Given two strings, check to see if they are anagrams. An anagram is when the two strings can be written using the exact same letters (so you can just rearrange the letters to get a different phrase or word).

For example:

"public relations" is an anagram of "crap built on lies."

"clint eastwood" is an anagram of "old west action"

Note: Ignore spaces and capitalization. So "d go" is an anagram of "God" and "dog" and "o d g".

In [None]:
def anagram(s1,s2):    
    pass

In [None]:
anagram('dog','god')

In [None]:
anagram('clint eastwood','old west action')

In [None]:
anagram('aa','bb')

In [None]:
"""
RUN THIS CELL TO TEST YOUR SOLUTION
"""
from nose.tools import assert_equal

class AnagramTest(object):
    
    def test(self,sol):
        assert_equal(sol('go go go','gggooo'),True)
        assert_equal(sol('abc','cba'),True)
        assert_equal(sol('hi man','hi     man'),True)
        assert_equal(sol('aabbcc','aabbc'),False)
        assert_equal(sol('123','1 2'),False)
        print("ALL TEST CASES PASSED")

# Run Tests
t = AnagramTest()
t.test(anagram)

#### Solution:

##### 1) My solution:

My approach is to take the two strings to compare, sort them after removing the spaces and putting them in lower-case, and then compare the two lists obtained. In particular, this solution requires 2 additional implementations:
1) Finding the index of the lowest value in an array (*get_min_idx()*).
2) Sorting a string (*sort_str()*).

Finding the index of the lowest value is $O(n)$, and my implementation of sorting is $O(n^2)$, so that 

In [1]:
def get_min(in_list):
    '''
        Get min el't in a list
    '''
    if len(in_list) == 0:
        return None
    
    min_val = in_list[0]
    for x in in_list:
        if x<min_val:
            min_val = x
    
    return min_val

def get_min_idx(in_list):
    '''
        Get min el't in a list
    '''
    if len(in_list) == 0:
        return None
    
    min_idx = 0
    for i in range(len(in_list)):
        if in_list[i]<in_list[min_idx]:
            min_idx = i
    
    return min_idx

def sort_str(string):
    '''
        Function that takes a string
        as input and returns
        a sorted list of chars.
    '''
    str_list = list(string)
    out_list = []
    list_tmp = str_list
    
    for i in range(len(str_list)):
        min_idx = get_min_idx(list_tmp)
        if list_tmp[min_idx]!=" ":
            out_list.append(list_tmp[min_idx])
        list_tmp = list_tmp[:min_idx]+list_tmp[min_idx+1:]
        
    return out_list

def anagram(str_1, str_2):
    '''
        Function that determines whether 2 strings are 
        anagrams of one another.
    '''
    
    # Take lists of lowercase chars.
    # from both strings
    lst_1 = list(str_1.lower())
    lst_2 = list(str_2.lower())
    
    # Sort first list
    lst_1 = sort_str(lst_1)
    
    # Sort second list
    lst_2 = sort_str(lst_2)
    
    if len(lst_1)!=len(lst_2):
        out = False
    else:
        out = True
        for i in range(len(lst_1)):
            if lst_1[i] != lst_2[i]:
                out = False
            
    return out
    
    

In [2]:
anagram('dog','god')

True

In [3]:
anagram('clint eastwood','old west action')

True

In [4]:
anagram('public relations','crap built on lies')

True

In [5]:
anagram('aa','bb')

False

In [7]:
from nose.tools import assert_equal

class AnagramTest(object):
    
    def test(self,sol):
        assert_equal(sol('go go go','gggooo'),True)
        assert_equal(sol('abc','cba'),True)
        assert_equal(sol('hi man','hi     man'),True)
        assert_equal(sol('aabbcc','aabbc'),False)
        assert_equal(sol('123','1 2'),False)
        print("ALL TEST CASES PASSED")

# Run Tests
t = AnagramTest()
t.test(anagram)

ALL TEST CASES PASSED


##### 2) The optimal solution

The optimal solution to this problem is $O(n)$. Instead of sorting the two strings, we use dictionaries to enumerate the number of occurences of each character.

In [1]:
def anagram(string_1, string_2):
    '''
        Function that determines whether 2 strings are 
        anagrams of one another.
        
    '''
    
    #  lowecase and remove spaces
    str_1 = string_1.replace(' ', '').lower()
    str_2 = string_2.replace(' ', '').lower()
    
    # Return False if lengths aren't equal
    if len(str_1)!=len(str_2):
        return False
    
    # Init. character count dict.
    char_count_1 = {}
    char_count_2 = {}
    
    # Count occurences
    for i in range(len(str_1)):
        try:
            char_count_1[str_1[i]] += 1
        except:
            char_count_1[str_1[i]] = 1
        
        try:
            char_count_2[str_2[i]] += 1
        except:
            char_count_2[str_2[i]] = 1
            
    return (char_count_1 == char_count_2)

In [2]:
from nose.tools import assert_equal

class AnagramTest(object):
    
    def test(self,sol):
        assert_equal(sol('go go go','gggooo'),True)
        assert_equal(sol('abc','cba'),True)
        assert_equal(sol('hi man','hi     man'),True)
        assert_equal(sol('aabbcc','aabbc'),False)
        assert_equal(sol('123','1 2'),False)
        print("ALL TEST CASES PASSED")

# Run Tests
t = AnagramTest()
t.test(anagram)

ALL TEST CASES PASSED


In [3]:
anagram('dog','god')

True

In [4]:
anagram('clint eastwood','old west action')

True

In [5]:
anagram('public relations','crap built on lies')

True

In [6]:
anagram('aa','bb')

False

For reference, here's Portilla's solution

In [None]:
def anagram2(s1,s2):
    
    # Remove spaces and lowercase letters
    s1 = s1.replace(' ','').lower()
    s2 = s2.replace(' ','').lower()
    
    # Edge Case to check if same number of letters
    if len(s1) != len(s2):
        return False
    
    # Create counting dictionary (Note could use DefaultDict from Collections module)
    count = {}
    
    
        
    # Fill dictionary for first string (add counts)
    for letter in s1:
        if letter in count:
            count[letter] += 1
        else:
            count[letter] = 1
            
    # Fill dictionary for second string (subtract counts)
    for letter in s2:
        if letter in count:
            count[letter] -= 1
        else:
            count[letter] = 1
    
    # Check that all counts are 0
    for k in count:
        if count[k] != 0:
            return False

    # Otherwise they're anagrams
    return True

## Exercise 2: Array pair sum

https://github.com/jmportilla/Python-for-Algorithms--Data-Structures--and-Interviews/blob/master/02-Array%20Sequences/Array%20Sequences%20Interview%20Questions/02-Array-Pair-Sum/02-Array%20Pair%20Sum%20.ipynb

Given an integer array, output all the **unique** pairs that sum up to a specific value k.

So the input:

pair_sum([1,3,2,2],4)

would return 2 pairs:

 (1,3)
 (2,2)

NOTE: FOR TESTING PURPOSES CHANGE YOUR FUNCTION SO IT OUTPUTS THE NUMBER OF PAIRS

### 1) My solution

My first attempt is $O(n^3)$.

In [9]:
def pair_sum(int_list, N_sum):
    '''
        Function that returns ordered couples
        of elements (i,j) in int_list that sum
        to N_sum.
        
        :param int_list: List of integers
        :param N_sum: Desired sum
        
        :return out_dict: Dictionary containing
            the list of pairs (i,j) from int_list
            that sum to N_sum, and the number of 
            these sums
    '''
    pair_list = []
    out_dict = {}
    
    for i in range(len(int_list)):
        for j in range(i+1, len(int_list)): 
            if int_list[i]+int_list[j] == N_sum:
                if ((int_list[i], int_list[j]) not in pair_list)\
                and ((int_list[j], int_list[i]) not in pair_list): # This is an O(n) search (maybe O(log(n)))
                    pair_list.append((int_list[i], int_list[j]))
                    
    out_dict["pair_list"] = pair_list
    out_dict["n_pairs"] = len(pair_list)
    
    return out_dict

In [10]:

pair_sum([1,3,2,2],4)

{'pair_list': [(1, 3), (2, 2)], 'n_pairs': 2}

In [11]:
from nose.tools import assert_equal

class TestPair(object):
    
    def test(self,sol):
        assert_equal(sol([1,9,2,8,3,7,4,6,5,5,13,14,11,13,-1],10)["n_pairs"],6)
        assert_equal(sol([1,2,3,1],3)["n_pairs"],1)
        assert_equal(sol([1,3,2,2],4)["n_pairs"],2)
        print('ALL TEST CASES PASSED')
        
#Run tests
t = TestPair()
t.test(pair_sum)

ALL TEST CASES PASSED


### 2) A better solution

A more optimal solution is $O(n^2)$. This will rely on 2 tricks:
1) Instead of adding two numbers from the array and comparing to $N_sum$, we can search for the difference between the current element of the array and $N_sum$. Differences are a common trick in algorithmic efficiency.
2) When the existence of an element in a set is in question, we can replace the search in a Python list by checking containment in a Python set. This can be used to reduce an $O(n^2)$ implementation to an $O(n)$ algorithm.

Here is the proposed solution, which is $O(n)$ instead of $O(n^3)$:

In [23]:
def pair_sum(int_list, N_sum):
    '''
        Function that returns ordered couples
        of elements (i,j) in int_list that sum
        to N_sum.
        
        :param int_list: List of integers
        :param N_sum: Desired sum
        
        :return out_dict: Dictionary containing
            the list of pairs (i,j) from int_list
            that sum to N_sum, and the number of 
            these sums
    '''
    encountered_numbers = set()
    pair_set = set()
    out_dict = {}
    
    # Get rid of edge case first
    if len(int_list)<2:
        return out_dict
    
    for x in int_list:
        
        y = N_sum -x
        
        if y in encountered_numbers: # This is O(1) instead of O(n)
            pair_set.add((min(x,y), max(x,y)))
        else:
            encountered_numbers.add(x)

                    
    out_dict["pair_set"] = pair_set
    out_dict["n_pairs"] = len(pair_set)
    
    return out_dict

In [24]:
from nose.tools import assert_equal

class TestPair(object):
    
    def test(self,sol):
        assert_equal(sol([1,9,2,8,3,7,4,6,5,5,13,14,11,13,-1],10)["n_pairs"],6)
        assert_equal(sol([1,2,3,1],3)["n_pairs"],1)
        assert_equal(sol([1,3,2,2],4)["n_pairs"],2)
        print('ALL TEST CASES PASSED')
        
#Run tests
t = TestPair()
t.test(pair_sum)

ALL TEST CASES PASSED


## Exercise 3: Missing element from a list

https://github.com/jmportilla/Python-for-Algorithms--Data-Structures--and-Interviews/blob/master/02-Array%20Sequences/Array%20Sequences%20Interview%20Questions/03-Finding-the-Missing-Element/03-Find%20the%20Missing%20Element%20.ipynb

Consider an array of non-negative integers. A second array is formed by shuffling the elements of the first array and deleting a random element. Given these two arrays, find which element is missing in the second array.

Here is an example input, the first array is shuffled and the number 5 is removed to construct the second array.

Input:

finder([1,2,3,4,5,6,7],[3,7,2,1,4,6])

Output:

5 is the missing number


In [None]:
def finder(arr1,arr2):
    
    pass

In [None]:
arr1 = [1,2,3,4,5,6,7]
arr2 = [3,7,2,1,4,6]
finder(arr1,arr2)

In [None]:
arr1 = [5,5,7,7]
arr2 = [5,7,7]

finder(arr1,arr2)

In [None]:
"""
RUN THIS CELL TO TEST YOUR SOLUTION
"""
from nose.tools import assert_equal

class TestFinder(object):
    
    def test(self,sol):
        assert_equal(sol([5,5,7,7],[5,7,7]),5)
        assert_equal(sol([1,2,3,4,5,6,7],[3,7,2,1,4,6]),5)
        assert_equal(sol([9,8,7,6,5,4,3,2,1],[9,8,7,5,4,3,2,1]),6)
        print('ALL TEST CASES PASSED')

# Run test
t = TestFinder()
t.test(finder)

### 1) My solution

Here are some ideas:
- If the original array's elements are distinct, then the modified array must be exactly one element less than the first. Returning the missing element can then be done in $O(n)$ time, after checking the lengths of the two arrays converted to sets.
- For an original array with repeated elements, I could use two counting dictionaries and compare them.

Here is an even better one: They're supposed to be arrays of non-negative integers. In principle, the difference between the sums of their elements should be the missing integer.


In [3]:
def finder(origin_arr, mod_array):
    '''
        Function to find the *unique* element
        of origin_array missing from the mod_array
        
        :param origin_arr: Original list of non-negative integers. 
        :param mod_array: Shuffled array with one missing element
    '''
    if (len(origin_arr)-1) != len(mod_array):
        raise ValueError("origin_arr must contain exactly one element more than mod_array")
    
    origin_sum = origin_arr[-1]
    mod_sum =0 
    for i in range(len(mod_array)):
        origin_sum += origin_arr[i]
        mod_sum += mod_array[i]
        
    return origin_sum-mod_sum
    

In [4]:
"""
RUN THIS CELL TO TEST YOUR SOLUTION
"""
from nose.tools import assert_equal

class TestFinder(object):
    
    def test(self,sol):
        assert_equal(sol([5,5,7,7],[5,7,7]),5)
        assert_equal(sol([1,2,3,4,5,6,7],[3,7,2,1,4,6]),5)
        assert_equal(sol([9,8,7,6,5,4,3,2,1],[9,8,7,5,4,3,2,1]),6)
        print('ALL TEST CASES PASSED')

# Run test
t = TestFinder()
t.test(finder)

ALL TEST CASES PASSED


### 2) More optimal solutions

Portilla's solution is here:

https://github.com/jmportilla/Python-for-Algorithms--Data-Structures--and-Interviews/blob/master/02-Array%20Sequences/Array%20Sequences%20Interview%20Questions/03-Finding-the-Missing-Element/03-Find%20the%20Missing%20Element%20-%20SOLUTION.ipynb

Portilla makes the following points:
- In an interview, it is expected that the given solution is linear if possible. Again, this is done using hash tables like Python dictionaries. The second solution in the notebook is similar to the solution of Exercise 1 above (anagrams).
- The trick that I use in my solution could lead to overflow issues if the added numbers are very large. He refers to it as a "clever" solution, but from a standpoint of data structures it could be problematic.

The smartest way of solving this problem seems to be the following trick: 

In [None]:
def finder3(arr1, arr2): 
    result=0 
    
    # Perform an XOR between the numbers in the arrays
    for num in arr1+arr2: 
        result^=num 
        print(result)
        
    return result 

## Exercise 4: Largest contiguous sum

https://github.com/jmportilla/Python-for-Algorithms--Data-Structures--and-Interviews/blob/master/02-Array%20Sequences/Array%20Sequences%20Interview%20Questions/04-Largest-Continuous-Sum/04-Largest%20Continuous%20Sum%20.ipynb

Given an array of integers (positive and negative) find the largest contiguous sum.

**Note:** Portilla calls that a continuous sum instead of a contiguous sum. By definition, a contiguous subarray is simply a subarray with the same order of elements as the original array.

An efficient algorithm for finding the largest contiguous sum is described here:

https://www.geeksforgeeks.org/largest-sum-contiguous-subarray/#

A well-known solution to this problem is called Kadane's algorithm.

In [None]:
def large_cont_sum(arr): 
    pass

In [None]:
large_cont_sum([1,2,-1,3,4,10,10,-10,-1])
#29

In [None]:

from nose.tools import assert_equal

class LargeContTest(object):
    def test(self,sol):
        assert_equal(sol([1,2,-1,3,4,-1]),9)
        assert_equal(sol([1,2,-1,3,4,10,10,-10,-1]),29)
        assert_equal(sol([-1,1]),1)
        print('ALL TEST CASES PASSED')
        
#Run Test
t = LargeContTest()
t.test(large_cont_sum)

### 1) My solution

Here are a couple of remarks:
- Given an array $(a_1,\cdots, a_n)$ of integers of length $n$, for any $k=1,\cdots,n$, there are $(n-k+1)$ contiguous subarrays. In total then, there are $n(n+1)/2$ sums to compute and compare.
- It is possible to compute the contiguous sums recursively. The simplest way to do this is to index the sums by $(i,k)$, where $i$ is the first index of the contiguous subarray and $k$ is its length. In this case:
$$S_i^{(k)} = \sum_{j=i}^{i+k-1}a_j$$
    is the contiguous sum of the subarray $(a_i,\cdots, a_{(i+k-1)})$. We notice here that for the contiguous sum starting at $(i+1)$:
$$S_{(i+1)}^{(k)} = S_{i}^{(k+1)} - a_i,$$
    and that:
$$S_i^{(k-1)} = S_i^{(k)} - a_{(i+k-1)}.$$
- For any starting index $i$ of a contiguous subarray, the maximal subarray length $k$ is $(n-i+1)$.
- The last two points therefore allow us reduce the time complexity of our algorithm by one order, provided we use a hash table to access previously computed values.

Here's my implementation:

In [10]:
def large_cont_sum(arr):
    '''
        Function to find the largest contiguous sum
        from input array arr.
        
        :param arr: Input array of numbers
        
        :return max_cont_sum: Maximal contiguous sum
        :return cont_sums: Dictionary of contiguous sums
        
    '''
    # Init.
    n = len(arr)
    cont_sums = {}
    max_cont_sum = 0
    arr_sum = 0
    arr_last_a = 0

    # Compute S_0^(n)
    for a in arr:
        arr_sum += a

    # Initialize variables for S_0^(k), k=1,...,n
    i = 0
    k_max = n-i #max length
    k= k_max
    cont_sums_i = {}

    # Loop to compute S_0^(k), k=1,...,n
    while k>0:
        # Compute and store sum of subarray [a_i, ..., a_(i+k)]
        cont_sums_i[k] = arr_sum - arr_last_a

        # Update arr_sum
        arr_sum = cont_sums_i[k]

        # Check if last contiguous sum is larger than current max
        '''
            Portilla typically replaces this line by
        max_cont_sum = max(max_cont_sum, cont_sums_i[k])
        '''
        if cont_sums_i[k]>max_cont_sum:
            max_cont_sum = cont_sums_i[k]

        # Update a_(i+k)
        arr_last_a = arr[(i+k)-1]

        # Update k
        k -= 1

    cont_sums[i] = cont_sums_i

    for i in range(1,n):
        cont_sums_i = {}
        for k in cont_sums[i-1]:
            if k>1:
                # Compute S_i^(k-1) = S_(i-1)^(k)-a_(i-1)
                cont_sums_i[k-1] = cont_sums[i-1][k] - arr[i-1]

                # Check if last contiguous sum is larger than current max
                if cont_sums_i[k-1]>max_cont_sum:
                    max_cont_sum = cont_sums_i[k-1]

        cont_sums[i] = cont_sums_i
        
    return max_cont_sum, cont_sums

In [11]:
large_cont_sum([1,2,-1,3,4,10,10,-10,-1])

(29,
 {0: {9: 18, 8: 19, 7: 29, 6: 19, 5: 9, 4: 5, 3: 2, 2: 3, 1: 1},
  1: {8: 17, 7: 18, 6: 28, 5: 18, 4: 8, 3: 4, 2: 1, 1: 2},
  2: {7: 15, 6: 16, 5: 26, 4: 16, 3: 6, 2: 2, 1: -1},
  3: {6: 16, 5: 17, 4: 27, 3: 17, 2: 7, 1: 3},
  4: {5: 13, 4: 14, 3: 24, 2: 14, 1: 4},
  5: {4: 9, 3: 10, 2: 20, 1: 10},
  6: {3: -1, 2: 0, 1: 10},
  7: {2: -11, 1: -10},
  8: {1: -1}})

In [12]:
from nose.tools import assert_equal

class LargeContTest(object):
    def test(self,sol):
        assert_equal(sol([1,2,-1,3,4,-1])[0],9)
        assert_equal(sol([1,2,-1,3,4,10,10,-10,-1])[0],29)
        assert_equal(sol([-1,1])[0],1)
        print('ALL TEST CASES PASSED')
        
#Run Test
t = LargeContTest()
t.test(large_cont_sum)

ALL TEST CASES PASSED


#### Note:

I think Portilla made a mistake in this exercise. Other than "continuous" instead of contiguous, he only looked at contiguous subarrays starting at index 0.

## Exercise 5: Sentence reversal

https://github.com/jmportilla/Python-for-Algorithms--Data-Structures--and-Interviews/blob/master/02-Array%20Sequences/Array%20Sequences%20Interview%20Questions/05-Sentence-Reversal/05-Sentence%20Reversal.ipynb

Given a string of words, reverse all the words. For example, given:

'This is the best'

Return:

'best the is This'

As part of this exercise you should remove all leading and trailing whitespace. So that inputs such as:

'  space here'  and 'space here      '

both become:

'here space'

In [None]:
def rev_word(s):
    pass

In [None]:
rev_word('Hi John,   are you ready to go?')

In [None]:
rev_word('    space before')

In [None]:

"""
RUN THIS CELL TO TEST YOUR SOLUTION
"""

from nose.tools import assert_equal

class ReversalTest(object):
    
    def test(self,sol):
        assert_equal(sol('    space before'),'before space')
        assert_equal(sol('space after     '),'after space')
        assert_equal(sol('   Hello John    how are you   '),'you are how John Hello')
        assert_equal(sol('1'),'1')
        print("ALL TEST CASES PASSED")
        
# Run and test
t = ReversalTest()
t.test(rev_word)

### 1) My solution

My first attempt is to re-implement Python's *split()* method.

In [10]:
def reverse_sentence(string):
    N = len(string)
    if N == 0:
        return None
    # Split the string into substrings separated by " "
    substr_list = []
    tmp_substr = ""
    for i in range(N):
        if string[i]==" ":
            substr_list.append(tmp_substr)
            tmp_substr = ""
        else:
            tmp_substr += string[i]
            if i == (N-1):
                substr_list.append(tmp_substr)
    
    # Fill output
    out_string = ""
    for i in range(len(substr_list)-1,-1,-1):
        if substr_list[i] != "":
            # Add a space after every word
            out_string += (substr_list[i] + " ")
    # Remove last space from output
    return out_string[:-1]

In [11]:
reverse_sentence("   sentence spaces trailing    ")

'trailing spaces sentence'

In [5]:
"""
SOLUTION TEST
"""

from nose.tools import assert_equal

class ReversalTest(object):
    
    def test(self,sol):
        assert_equal(sol('    space before'),'before space')
        assert_equal(sol('space after     '),'after space')
        assert_equal(sol('   Hello John    how are you   '),'you are how John Hello')
        assert_equal(sol('1'),'1')
        print("ALL TEST CASES PASSED")
        
# Run and test
t = ReversalTest()
t.test(reverse_sentence)

ALL TEST CASES PASSED


### 2) Comments on Portilla's solution

Portilla proposes the first the obvious choice: Using Python's built-in string functions, which can be done in one line as follows:

In [6]:
def rev_word(s):
    return " ".join(reversed(s.split()))

In [7]:
rev_word('   Hello John    how are you   ')

'you are how John Hello'

On the other hand, his second solution, which is more appropriate for an interview setting seems $O(n^2)$ at first glance.

In [None]:
def rev_word3(s):
    """
    Manually doing the splits on the spaces.
    """
    
    words = []
    length = len(s)
    spaces = [' ']
    
    # Index Tracker
    i = 0
    
    # While index is less than length of string
    while i < length:
        
        # If element isn't a space
        if s[i] not in spaces:
            
            # The word starts at this index
            word_start = i
            
            while i < length and s[i] not in spaces:
                
                # Get index where word ends
                i += 1
            # Append that word to the list
            words.append(s[word_start:i])
        # Add to index
        i += 1
        
    # Join the reversed words
    return " ".join(reversed(words))

## Exercise 6: String compression

https://github.com/jmportilla/Python-for-Algorithms--Data-Structures--and-Interviews/blob/master/02-Array%20Sequences/Array%20Sequences%20Interview%20Questions/06-String-Compression/06-String%20Compression%20.ipynb

Given a string in the form 'AAAABBBBCCCCCDDEEEE' compress it to become 'A4B4C5D2E4'. For this problem, you can falsely "compress" strings of single or double letters. For instance, it is okay for 'AAB' to return 'A2B1' even though this technically takes more space.

The function should also be case sensitive, so that a string 'AAAaaa' returns 'A3a3'.

In [None]:
def compress(s):
    pass

In [None]:
compress('AAAAABBBBCCCC')

In [None]:

"""
RUN THIS CELL TO TEST YOUR SOLUTION
"""
from nose.tools import assert_equal

class TestCompress(object):

    def test(self, sol):
        assert_equal(sol(''), '')
        assert_equal(sol('AABBCC'), 'A2B2C2')
        assert_equal(sol('AAABCCDDDDD'), 'A3B1C2D5')
        print('ALL TEST CASES PASSED')

# Run Tests
t = TestCompress()
t.test(compress)

### 1) My solution

My first approach would be to convert the string to a set, and then loop over the string with a counter. 

In [1]:
from collections import defaultdict

def str_compress(string):
    
    # Initializations
    N = len(string)
    if N == 0:
        return string
    
    char_count = defaultdict(lambda: 0)
    out_str = ""
    
    # First character
    i = 0
    char_count[string[i]]+=1
    
    # Main loop
    for i in range(1,N):
        char_count[string[i]]+=1
        
        if string[i] != string[i-1]:
            out_str += (string[i-1] + str(char_count[string[i-1]]))
        
    out_str += (string[N-1] + str(char_count[string[N-1]]))
    
    return out_str

In [4]:
"""
RUN THIS CELL TO TEST YOUR SOLUTION
"""
from nose.tools import assert_equal

class TestCompress(object):

    def test(self, sol):
        assert_equal(sol(''), '')
        assert_equal(sol('AABBCC'), 'A2B2C2')
        assert_equal(sol('AAABCCDDDDD'), 'A3B1C2D5')
        assert_equal(sol('Z'), 'Z1')
        print('ALL TEST CASES PASSED')

# Run Tests
t = TestCompress()
t.test(str_compress)

ALL TEST CASES PASSED


## Exercise 7: Character uniqueness in a string

https://github.com/jmportilla/Python-for-Algorithms--Data-Structures--and-Interviews/blob/master/02-Array%20Sequences/Array%20Sequences%20Interview%20Questions/07-Unique-Characters-in-String/07-Unique%20Characters%20in%20String.ipynb

Given a string, determine if it is comprised of unique characters only. For example, the string 'abcde' has all unique characters and should return True. The string 'aabcde' contains duplicate characters and should return false.

In [None]:
def uni_char(s):
    pass

In [None]:
"""
RUN THIS CELL TO TEST YOUR CODE>
"""
from nose.tools import assert_equal


class TestUnique(object):

    def test(self, sol):
        assert_equal(sol(''), True)
        assert_equal(sol('goo'), False)
        assert_equal(sol('abcdefg'), True)
        print('ALL TEST CASES PASSED')
        
# Run Tests
t = TestUnique()
t.test(uni_char)

### 1) My solutions

My initial approach to this problem is to use a default dictionary to count the number of occurences of each character in a string.

The simplest way of doing this however would be to create a set from the string and just compare the lengths. This seems like an $O(n)$ algorithm, as the conversion from a string to a set will loop through all characters of the string.

In [13]:
set("abcdefg")

{'a', 'b', 'c', 'd', 'e', 'f', 'g'}

#### 1.a - One liner

In [14]:
def uni_char(string):
    chars = set(string)
    return (len(chars) == len(string))

In [15]:
from nose.tools import assert_equal


class TestUnique(object):

    def test(self, sol):
        assert_equal(sol(''), True)
        assert_equal(sol('goo'), False)
        assert_equal(sol('abcdefg'), True)
        print('ALL TEST CASES PASSED')
        
# Run Tests
t = TestUnique()
t.test(uni_char)

ALL TEST CASES PASSED


#### 1.b - An interview solution

I will make the following assumptions to solve this problem:
- We convert all letters of the string to lower-case.
- The function will be case-sensitive.

For this solution, it is useful to know about the *collections.defaultdict* class in Python, whose constructor takes as parameter a function that assigns default values to non-existent keys.

In [32]:
from collections import defaultdict

def uni_char(string):
    
    # Inits and edge case
    all_chars_unique = True
    N = len(string)
    # Make a dictionary whose default value is 0
    char_count = defaultdict(lambda: 0)
    
    if N==0:
        return all_chars_unique, char_count
    
    # Main loop
    for c in string:
        char_count[c] += 1
        if all_chars_unique and (char_count[c]>1):
            all_chars_unique = False
    
    # output
    return all_chars_unique, char_count

In [33]:
class TestUnique(object):

    def test(self, sol):
        assert_equal(sol('')[0], True)
        assert_equal(sol('goo')[0], False)
        assert_equal(sol('abcdefg')[0], True)
        print('ALL TEST CASES PASSED')
        
# Run Tests
t = TestUnique()
t.test(uni_char)

ALL TEST CASES PASSED


### 2) Portilla's solutions

He proposes two solutions, the first one being the one-liner above. The second solution is the following:

In [None]:

def uni_char2(s):
    chars = set()
    for let in s:
        # Check if in set
        if let in chars:
            return False
        else:
            #Add it to the set
            chars.add(let)
    return True

So he simply returns False once a letter has been encountered before. I preferred my approach because it also gives me a starting point for Exercise 6.