# Text Sequence Manipulation – Python String Operations

## Introduction
Complete all of the exercises in this document using only Python str methods.  Do not use regular expressions.

## Text Manipulation – Low Level of Difficulty
For each exercise below, author a function that performs the explicated activity.
All of your functions can appear in a single code cell. You may name your functions as you see fit; however,
the function annotation information must be observed.  Replace the FxNm function name placeholder with a
function name of your choosing.

1. Function Annotation: FxNm( input: str ) -> dict

Write a Python program to count the number of characters (character frequency) in a string.
```
Sample String: ‘google.com'
Expected Result : {'o': 3, 'g': 2, '.': 1, 'e': 1, 'l': 1, 'm': 1, 'c': 1}
```

In [None]:
# Char Freq
def char_frequency(str1: str) -> dict:
    dict = {}
    for n in str1:

        if n not in dict:
            dict[n] = 1
        else:
            dict[n] + 1

        # The above 'if' could/should be replaced with
        #dict[n] = dict.get(n,0) + 1

    return dict

print('***Count Character Frequency:')
print(char_frequency('google.com'))

2. Function Annotation: FxNm( input: str ) -> str

Write a Python program to return a string from a given string where all occurrences of its first char have been
changed to '$', except the first char itself.
```
Sample String : 'restart'
Expected Result : 'resta$t'
```

In [7]:
# First char to $
def change_char(str1: str)->str:
    # Using slice.
    # remember that strings are immutable so, .replace() returns
    # a new string where all occurrences of the character at [0] are replaced with $
    return str1[0] + str1[1:].replace(str1[0], '$')

def change_char_more_verbose(str1: str)->str:
  first_char = str1[0]
  str2 = str1[1:].replace(first_char, '$')
  return first_char + str2


print('***Change matching 1st chars to $:')
print(change_char('restart'))
print(change_char_more_verbose('restart'))

***Change matching 1st chars to $:
resta$t
resta$t


3. Function Annotation: FxNm( input: [str] ) -> int

Write a Python function that takes a list of words and returns the length of the longest word.

In [8]:
# longest word in list
def find_longest_word(words_list: [str])->int:
    # start with the assumption that the word at index 0 is the longest
    long_str = words_list[0]
    for word in words_list[1:]:
        # if this word is longer than the current longest word,
        # swap the words
        if len(word) > len(long_str):
            long_str = word
    return len(long_str)

def find_longest_word_2(words_list: [str])->int:
    # read the docs on the key argument to max
    # ie, help(max)

    long_str = max(words_list, key=len)
    return len(long_str)

print('***Find longest word:')
print(find_longest_word(["PHP", "Exercises", "Backend"]))
print(find_longest_word_2(["PHP", "Exercises", "Backend"]))

***Find longest word:
9
9


4. Function Annotation: FxNm( input: str ) -> str

Write a Python program to remove the characters appearing at odd index values of a given string
and return the resulting string.
```
Sample String:  restart
Expected Result : rsat
```

In [None]:
def odd_values_string(str:str)->str:
    result = ""
    for i,c in enumerate(str):
        if i % 2 == 0:
            result += c
    return result

print('***Remove chars at odd index values:')
print(odd_values_string('restart'))

5. Function Annotation: FxNm( input: str ) -> str

Write a Python function to reverses a string if it's length is a multiple of 4 and returns resulting string.
```
Sample String : 'rest'
Expected Result : 'tser'
```

In [3]:
# reverse multiples of 4
def reverse_string(str1:str)->str:
    if len(str1) % 4 == 0:
       return reversed(str1)
    return str1

print('***Reverse modulo 4 strings')
print("".join(reverse_string('rest')))

### ALTERNATIVE
# reverse multiples of 4
def reverse_string_too(str1:str)->str:
    if len(str1) % 4 == 0:
       return str1[::-1]
    return str1

print('***Reverse modulo 4 strings')
print("".join(reverse_string_too('rest')))

***Reverse modulo 4 strings
tser
***Reverse modulo 4 strings
tser


6. Function Annotation: FxNm( input: str ) -> str

Write a Python function to convert a given string to all uppercase if it contains at least 2 uppercase characters
in the first 4 characters.  Return the result.
```
Sample String : 'reSTart'
Expected Result : 'RESTART''
```

In [None]:
# To uppercase
def to_uppercase(str1:str)->str:
    num_upper = 0
    for letter in str1[:4]:
        if letter.isupper():
            num_upper += 1

    if num_upper >= 2:
        return str1.upper()
    return str1

# To uppercase
def to_uppercase_2(str1):
    # The following []'d code is a list comprehension.
    # See - https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions
    #
    ul = [c for c in str1[:4] if c.isupper()]
    return str1 if len(ul) < 2 else str1.upper()

print('***Convert to UC is 2 UC in 1st four chars')
print(to_uppercase('reSTart'))
print(to_uppercase_2('reSTart'))

7. Function Annotation: FxNm( input: str, input: int) -> str

Write a Python program to create a Caesar encryption.
Note : In cryptography, a Caesar cipher, also known as Caesar's cipher, the shift cipher,
Caesar's code or Caesar shift, is one of the simplest and most widely known encryption techniques.
It is a type of substitution cipher in which each letter in the plaintext is replaced by a letter
some fixed number of positions down the alphabet. For example, with a shift of 2, D would be
replaced by F, E would become G, and so on. The method is named after Julius Caesar, who used
it in his private correspondence.
```
Sample Input : ‘abc’, 2
Expected Result : cde
```

In [13]:
# the string module includes some 'letter' oriented
# class variables.  string.ascii_lowercase is the list of
# ascii alpha chars.  Either of the following lines works
# to create a string of 26 LC ascii letters.
# Use the 'constant value' naming convention of UPPERCASE variable name
#
from string import ascii_lowercase
A_STR = ascii_lowercase
# OR
# A_STR = 'abcdefghijklmnopqrstuvwxyz'

def caesar_encrypt(real_text:str, step:int)->str:
	answer = ''

	for c in real_text:
		# leave non-alpha characters alone
		if c.isalpha():
			# c may or may not already be lc.  But A_STR is all lc.
			index = A_STR.index(c.lower())
            # modulo (%) 26 allows the index values to wrap around from say 'z' to 'b'
            # if the step value were 2.  Think about it...
			crypt_idx = (index + step) % 26

			# uc letters should have their cipher letter equiv returned as uc.
			# Ternary operator interpretation:
			# set x to y if x is lowercase otherwise set x to uppercase y

			# The following line is Python's version of a ternary operator. Equiv to:
			# if c.islower():
			# 	crypt_c = A_STR[crypt_idx]
			# else:
			# 	 crypt_c = A_STR[crypt_idx].upper()
			crypt_c = A_STR[crypt_idx] if c.islower() else A_STR[crypt_idx].upper()
		else:
            # characters that are not letter (not isalpha()) are not shifted. They
            # are used as is
			crypt_c = c

		answer += crypt_c

	return answer

print('***Caesar cipher')
s = caesar_encrypt('abc', 2)
print(f'\tabc=>{s}')


***Caesar cipher
	abc=>cde


8. Function Annotation: FxNm( input: str) -> str

Write a Python program to remove existing indentation from a given string. You may assume that only
a space (' ') and tab (\t) characters are used for indentation.  Return the resulting string.
```
Sample String : ‘   abc’
Expected Result : ‘abc’
```

In [2]:
# remove indentation

def remove_indent(text: str) -> str:
    ans_str = ''
    for l in text.split('\n'):
        l = l.lstrip()
        # add back newlines
        ans_str += f'{l}\n'
    return ans_str

print('***text_without_Indentation')
sample_text = '''
    Python is a widely used high-level, general-purpose, interpreted,
    dynamic programming language. Its design philosophy emphasizes
    code readability, and its syntax allows programmers to express
    concepts in fewer lines of code than possible in languages such
    as C++ or Java.
    '''
s=remove_indent(sample_text)
print(f'Sample:\n{sample_text}')
print(f'Soln:\n{s}')

***text_without_Indentation
Sample:

    Python is a widely used high-level, general-purpose, interpreted,
    dynamic programming language. Its design philosophy emphasizes
    code readability, and its syntax allows programmers to express
    concepts in fewer lines of code than possible in languages such
    as C++ or Java.
    
Soln:

Python is a widely used high-level, general-purpose, interpreted,
dynamic programming language. Its design philosophy emphasizes
code readability, and its syntax allows programmers to express
concepts in fewer lines of code than possible in languages such
as C++ or Java.




9. Function Annotation: FxNm( input: str, input: str) -> str

Write a Python program to strip the set of provided characters from a string.  Return the resulting string.
```
Sample Input : ‘abc’, ‘ac’
Expected Result : ‘b’
```

In [15]:
def strip_chars(str1:str, chars:str) -> str:
    return ''.join([c for c in str1 if c not in chars])

def strip_chars_verbose(str1:str, chars:str) -> str:
    answer = ''
    for c in str1:
        if c not in chars:
            answer += c

    return answer

def strip_chars_worse(str1:str, chars:str) -> str:
    answer = str1
    for c in chars:
        answer = answer.replace(c,'')
    return answer

print("***Strip characters from string")
s = "The quick brown fox jumps over the lazy dog."
print(f'String vowels from => {s}')
print(strip_chars("The quick brown fox jumps over the lazy dog.", "aeiou"))
print(strip_chars_verbose("The quick brown fox jumps over the lazy dog.", "aeiou"))
print(strip_chars_worse("The quick brown fox jumps over the lazy dog.", "aeiou"))

***Strip characters from string
String vowels from => The quick brown fox jumps over the lazy dog.
Th qck brwn fx jmps vr th lzy dg.
Th qck brwn fx jmps vr th lzy dg.
Th qck brwn fx jmps vr th lzy dg.


10. Function Annotation: FxNm( input: str ) -> dict

Write a Python program to count repeated characters in a string. Return a dictionary where the
keys are the repeated characters and the values are the count.
```
Sample string: 'thequickbrownfoxjumpsoverthelazydog'
Expected output: {o: 4, e: 3, u: 2, h: 2, r: 2, t: 2}
```

In [16]:
def count_repeated(in_str:str) -> dict:
    ans_dict = {}
    for c in in_str:
        # get() method is similar to using [] to access values for key with
        # one major difference. get() allows for a 2nd argument which is returned
        # if the key does not exist in the dictionary.
        ans_dict[c] = ans_dict.get(c,0) + 1

    # The following {}'d code is a dictionary comprehension.
    # See - https://www.datacamp.com/community/tutorials/python-dictionary-comprehension

    return {c:v for c,v in ans_dict.items() if v > 1}

# count repeated chars
def count_repeated_2(in_str:str) -> dict:
    # Counter class is provided by the collections module
    # to support convenient and rapid tallies.
    from collections import Counter
    return {c:v for c,v in Counter(in_str).items() if v > 1}

print('***Count Repeated Characters:')
d = count_repeated('thequickbrownfoxjumpsoverthelazydog')
print(f'\t{d}')
d = count_repeated_2('thequickbrownfoxjumpsoverthelazydog')
print(f'\t{d}')

***Count Repeated Characters:
	{'t': 2, 'h': 2, 'e': 3, 'u': 2, 'r': 2, 'o': 4}
	{'t': 2, 'h': 2, 'e': 3, 'u': 2, 'r': 2, 'o': 4}


11. Function Annotation: FxNm( input: str ) -> bool

Write a Python program to check if a string contains all letters of the alphabet.
```
Sample string: 'thequickbrownfoxjumpsoverthelazydog'
Expected output: false
```

In [11]:
def contains_alphabet(istr: str) -> bool:
    for c in ascii_lowercase:
        if c not in istr:
            return False
    return True

from string import ascii_lowercase
def contains_alphabet_2(istr: str) -> bool:
    # use set operations.  Set values will be unique.
    # create set of alphabet
    aset = set(ascii_lowercase)
    iset = set(istr.lower())
    return iset >= aset

print('***Contains all alphabet characters:')
istr = 'The quick brown fox jumps over the lazy dog'
tf = contains_alphabet(istr)
print(f'Does {istr} contain all alphabet characters? Answer - {tf}.')
tf = contains_alphabet_2(istr)
print(f'Does {istr} contain all alphabet characters? Answer - {tf}.')

***Contains all alphabet characters:
Does The quick brown fox jumps over the lazy dog contain all alphabet characters? Answer - True.
Does The quick brown fox jumps over the lazy dog contain all alphabet characters? Answer - True.


12. Function Annotation: FxNm( input: str ) -> str

Write a Python program to swap comma and dot in a string. Return the resulting string.

**Challenge:** try using the translate() and maketrans() methods of string.
```
Sample string: "32.054,23"
Expected Output: "32,054.23"
```

In [None]:
def swap_chars(text:str, c1:str, c2:str) -> str:
    # make use of the translation capabilities of strings
    translation_tbl = text.maketrans(c1+c2, c2+c1)
    answer = text.translate(translation_tbl)
    return answer

print('***Swap commas and dots')
i = '32.054,23'
s = swap_chars(i, '.', ',')
print(f'Swapped .s and ,s in {i} => {s}')

13. Function Annotation: FxNm( input: str ) -> str

Write a Python program to remove duplicate characters of a given string. Your solution should leave the
1st occurrence and remove all subsequent occurrences.  Other than the removals, the order of the
characters in the string should not be affected. Return the resulting string.
```
Sample string: "32.054,23"
Expected Output: "32.054,"
```

In [None]:
# remove dup chars
#
def remove_duplicate(str1):
    # Relies on dictionary having unique keys
    # multiple adds of same value to dict has no impact
    # as of Py 3.7 dict key order is preserved - oddly, this is not true of sets
    #
    key_dict = {k:0 for k in str1}
    return "".join(key_dict)

print('***Remove duplicate characters')
i = '32.054,23'
s = remove_duplicate(i)
print(f'Removed dup chars from {i} => {s}')

14. Function Annotation: FxNm( input: str, input: str) -> str

Write a Python program to create a string from two given strings by concatenating the characters that
are not contained by both strings.  The characters from the 1st string should appear before the characters
from the 2nd string. Return the resulting string.
```
Sample input: ‘0abcxyz’, ‘abcxyz1’
Expected Output: ‘01’
```

In [10]:
def uncommon_chars_concat(s1, s2):
    # Use the builtin python set datatype to find
    # the relative complement of s1 in s2 (also called the set-theoretic difference)
    # See - https://docs.python.org/3/library/stdtypes.html?highlight=set#set-types-set-frozenset
    #
    set1 = set(s1)
    set2 = set(s2)
    # rel compl of s2-s1. whats in s2 that's not in s1
    s1 = set2-set1
    # rel compl of s1-s2. whats in s1 that's not in s2
    s2 = set1-set2

    # the * is the expansion operator.  Essentially
    # it extracts all of the values from the set
    # FYI - * also works with lists. ** for dictionaries
    # See https://treyhunner.com/2018/10/asterisks-in-python-what-they-are-and-how-to-use-them/
    #
    # Expand s1 and s2 into a tuple (light wt list)
    return ''.join((*s1, *s2))

print('***Concatenating characters not in common')
s1 = '0abcxyz'
s2 = 'abcxyz1'
s3 = uncommon_chars_concat(s1, s2)
print(f'Original Substrings:i {s1} and {s2}')
print(f'Results: {s3}')

***Concatenating characters not in common
Original Substrings:i 0abcxyz and abcxyz1
Results: 10


## Text Manipulation – Moderate Level of Difficulty
15. Function Annotation: FxNm( input: [str], input: str) -> list

A researcher has gathered thousands of news articles. But she wants to focus her attention on articles including
a specific word.  Your function must meet the following criteria

Do not include documents where the keyword string shows up only as a part of a larger word.
For example, if she were looking for the keyword “closed”, you would not include the string “enclosed.”
She does not want you to distinguish upper case from lower case letters. So the phrase
“Closed the case.” would be included when the keyword is “closed”
Periods or commas should not affect what is matched. “It is closed.” would be included when the
keyword is “closed”. You may assume that source document punctuation is limited to periods and commas only.
Return a list of the index values into the original list for all documents containing the keyword.
```
Sample input: [ "The Learn Python Challenge Casino.",
                           "They bought a car while at the casino",
                           "Casinoville" ], ‘casino’
Expected Output: [0, 1]
```

In [23]:
def word_search(doc_list: [str], keyword: str) -> list:
    '''
    Takes a list of documents (each document is a string) and a keyword.
    Returns list of the index values into the original list for all documents
    containing the keyword.
    '''

    result_list = []
    lckw = keyword.lower()

    for idx, doc in enumerate(doc_list):
        # do a bunch of steps at once - lowercase, replace , and . and split words to a list
        lst = doc.lower().replace(',',' ').replace('.',' ').split()
        if lckw in lst:
            result_list.append(idx)

        # This application would be a good opportunity for a regular expression.

    return result_list

# This application would be a good opportunity for a regular expression.
import re
def word_search_re(doc_list: [str], keyword: str) -> list:
    result_list = []

    for idx, doc in enumerate(doc_list):
        # use the keyword to construct an re
        # Something new: combine f-string with r-string (ie, fr'{a}sdfasdf' or rf'{a}sdfasdf' - same diff)
        m = re.search(fr'\b{keyword}\b', doc, flags=re.IGNORECASE)
        if m:
            result_list.append(idx)

    return result_list


doc_list = ["The Learn Python Challenge Casino.", "They bought a car at the casino", "Casinoville"]
l = word_search(doc_list, 'casino')
print(l)

l = word_search_re(doc_list, 'casino')
print(l)

[0, 1]
[0, 1]


16. Function Annotation: FxNm( input: str ) -> str

This function receives a (word/string) as a parameter. The objective of the function is to convert
the incoming string to its "Pig Latin" equivalent and return the Pig Latin string.

For our purposes, we will use the following as the rules for translation of a word into "Pig Latin":

1. A word is a consecutive sequence of letters (a-z, A-Z) or apostrophes. You may assume that the input to
the function will only be a single "word". Examples: Zebra , doesn't , apple
2. If a word starts with a vowel, the Pig Latin version is the original word with "way" added to the end
3. If a word starts with a consonant, or a series of consecutive consonants, the Pig Latin
version transfers ALL consonants up to the first vowel to the end of the word, and adds "ay" to the end.
4. The letter 'y' should be treated as a consonant if it is the first letter of a word,
but treated as a vowel otherwise.
5. If the original word is capitalized, the new Pig Latin version of the word should be capitalized
in the first letter.  If the original capital letter was a consonant, and thus moved, it should not
be capitalized once in its new location.

### Steps
1. Author your ‘ToPigLatin’ function
2. Convert the following test data into a Python dictionary.

Test Word (keys) | Pig Latin (values)
-----------------|--------------------
football | ootballfay
Pittsburgh | Ittsburghpay
Apple | Appleway
oink | oinkway
ontology | ontologyway
yellow | ellowyay
yttrium | iumyttray

3. Iterate over the dictionary supplying the dictionary keys as arguments to your ToPigLatin function.
4. Confirm your returned results against the dictionary values.
5. Upon confirmation, display the test word and its Pig Latin translation via print


In [11]:
# RE Pig Latin Converter Code
VOWELS = 'aeiou'

# TEST DATA
test_dict = dict(
    football = 'ootballfay',
    Pittsburgh = 'Ittsburghpay',
    Apple = 'Appleway',
    oink = 'oinkway',
    ontology = 'ontologyway',
    yellow = 'ellowyay',
    yttrium = 'iumyttray')

# test boundary condition.  word that starts with
# y but is followed by a consonant
# yttrium - a silvery, malleable metallic element that is found in the same ores as other rare-earth elements

def to_pig_latin(eng_word):

    if len(eng_word) == 0:
        raise ValueError("Input must contain characters.")

    # remember if I need to return title case word
    if eng_word[0].isupper():
        to_upper = True
    else:
        to_upper = False

    # now, normalize the word to LC
    eng_word = eng_word.lower()

    # low hanging fruit
    if eng_word[0] in VOWELS:
        eng_word += 'way'
        return eng_word.title() if to_upper else eng_word

    # ok, word starts with consonant

    cons_str = ""
    for i,c in enumerate(eng_word):
        if i == 0 and c == 'y':
            cons_str += c
        elif c not in (VOWELS + 'y'):
            cons_str += c
        else:
            break
    else:
        raise ValueError("English words must contain a vowel.")

    new_word = eng_word[i:] + cons_str + 'ay'
    return new_word.title() if to_upper else new_word

# run the test data through the PL translation
for k,v in test_dict.items():
    pig = to_pig_latin(k)
    if pig == v:
        print(f'{k} => {pig}')
    else:
        print(f'Failure for Enlish word: {k}. {pig} should be {v}.')

football => ootballfay
Pittsburgh => Ittsburghpay
Apple => Appleway
oink => oinkway
ontology => ontologyway
yellow => ellowyay
yttrium => iumyttray
