# ACSL 2020-2021 Contest #2 - Lex String

## Intermediate Division

### PROBLEM :
Transform a given string of all capital letters so that:
- repeated blocks of letters are at the front, 
- arranged such that longer blocks come first and blocks of the same length are in alphabetical order.
- Each string has an associated number, m . 
- In the final output, all groupings of the same character must be no longer than m.

Other than sorting the groups of letters that have the same frequency in the original
string, no other rearranging is done.

For example, in the input line “MBAMMDXXMMMGGMMZ 3”, the string contains:
- one block of 3 letters (the M’s); 
- four blocks of 2 letters (M, X, G, and M), and 
- 5 single letters (M, B, A, D, and Z). 

The 3-letter block comes first, then the 2-letter blocks (in alphabetical order), and finally, the single letters (in
alphabetical order): MMMGGMMMMXXABDMZ. 

- The number 3 requires that there is no substring of a single letter that is longer than 3 characters. 

In this example, MMMGGMMMXXABDMZ is output.

INPUT: There will be 5 lines of data. Each line will contain a string of no more than 100 characters, all
uppercase letters followed by a space and a positive integer that will be less than the length of the string.


In [114]:
def upperletters_v0(text):
    """ convert an input text string to lower case letters and eliminiate all non-letters
    """
    res = ""
    for c in text:
        if c.isalpha():
            res += c.upper()
            
    return res

In [115]:
def upperletters_v1(text):
    """ convert an input text string to lower case letters and eliminiate all non-letters
    """
    return ''.join([c.upper() for c in text if c.isalpha()])


In [116]:
input = "This is a test with 100 Characters!"
output = upperletters_v0(input)
print(output)

THISISATESTWITHCHARACTERS


In [117]:
input = "This is a test with 100 Characters!"
output = upperletters_v1(input)
print(output)

THISISATESTWITHCHARACTERS


In [118]:
import re
def upperletters(text):
    """ convert an input text string to lower case letters and eliminiate all non-letters
    """
    return re.sub('[^a-zA-Z]', '', text).upper()

In [120]:
input = "This is a test with 100 Characters!"
output = upperletters(input)
print(output)

THISISATESTWITHCHARACTERS


In [121]:
input = "MBAMMDXXMMMGGMMZ 3"
text, m = input.split()
print(text, m)

MBAMMDXXMMMGGMMZ 3


In [99]:
from itertools import groupby

groups = groupby(text)
result = [{label:sum(1 for g in group)} for label, group in groups]
result

[{'M': 1},
 {'B': 1},
 {'A': 1},
 {'M': 2},
 {'D': 1},
 {'X': 2},
 {'M': 3},
 {'G': 2},
 {'M': 2},
 {'Z': 1}]

In [100]:
from itertools import groupby
def map_blocks(text):
    groups = groupby(text)
    return  [(label*sum(1 for g in group)) for label, group in groups]

In [101]:
print(text)
print(map_blocks(text))

MBAMMDXXMMMGGMMZ
['M', 'B', 'A', 'MM', 'D', 'XX', 'MMM', 'GG', 'MM', 'Z']


In [102]:
def map_blocks(text):
    res = []
    key = ''
    count = 0
    for a, b in zip(text, text[1:]):
        if a == b:
            count +=1
        else:
            if key:
                count += 1
                res.append(key*count)
            else:
                res.append(a)
            count = 0
        key = b
    # last element
    if text[-1] == key:
         res.append(key*(count+1))
    else:
         res.append(text[-1])
    return res

In [103]:
print(text)
blocks = map_blocks(text)
print(blocks)

MBAMMDXXMMMGGMMZ
['M', 'B', 'A', 'MM', 'D', 'XX', 'MMM', 'GG', 'MM', 'Z']


In [104]:
def sort_blocks(blocks):
    res = ""
    values = [len(item) for item in blocks]
    sorted_values = sorted(set(values), reverse=True)
    for v in sorted_values:
        sorted_count_blocks = sorted([item for item in blocks if len(item) == v])
        res +=''.join(sorted_count_blocks)
    return res

In [105]:
sorted_blocks = sort_blocks(blocks)
print(sorted_blocks)

MMMGGMMMMXXABDMZ


In [106]:
def truncate_string(text, m):
    count = 1
    res = text[0]
    prev =  text[0]
    for i in range(1, len(text)):
        if text[i-1] == text[i]:
            count +=1
        else:
            count = 1        
        prev = text[i]
        if count <= m:
            res +=text[i]
    return res

In [107]:
output = truncate_string(sorted_blocks, 3)
print(output)

MMMGGMMMXXABDMZ


In [122]:
def lexstring(input):
    """
    A Lex String Substitution function for ACSL 2021 Contest #2 Programming Problem    
    """
    
    text, m = input.split()
    
    # Step 1  map blocks
    blocks = map_repeatedblocks(text)

    # Step 2 - sort blocks
    
    res = sort_blocks (blocks)
    
    # Step 3 -  truncate a substring longer than m 
    return truncate_string(res, int(m))

In [123]:
input = "MBAMMDXXMMMGGMMZ 3"
output = lexstring(input)
print(output)

MMMGGMMMXXABDMZ


In [125]:
input = "MHHHHJLDDHHDDD 3"
output = lexstring(input)
print(output)

HHHDDDHHJLM


In [126]:
inputs =["MBAMMDXXMMMGGMMZ 3", 
         "MHHHHJLDDHHDDD 3", 
         "THETENNESSEEVOLUNTEERS 2",
         "MISSISSIPPI 3",
         "BOOOKEEEPEEERR 4"]
answers = ["MMMGGMMMXXABDMZ",
           "HHHDDDHHJLM",
           "EENNSSEEHLNORSTTUV",
           "PPSSSIIIM",
           "EEEEOOORRBKP"]
for index, input in enumerate(inputs):
         output = lexstring(input)
         if output != answers[index]:
             print("Error: wrong answer on input {index+1}")
         else:
             print(f"{index+1}. {output}")
         

1. MMMGGMMMXXABDMZ
2. HHHDDDHHJLM
3. EENNSSEEHLNORSTTUV
4. PPSSSIIIM
5. EEEEOOORRBKP
