#CSE 101: Computer Science Principles
####Stony Brook University
####Kevin McDonnell (ktm@cs.stonybrook.edu)
##Module 8: Iteration and Strings

**Iteration** in programming refers to a sequence of steps that is repeated one or more times.

A **loop** statement in Python implements the concept of iteration.

#### for-loops

A **for-loop** allows us to repeat a sequence of steps a fixed number of times.

Sometimes we know the number of **iterations** (repetitions) needed; other times we need to compute it.

There are two forms of the for-loop:
* one that uses a counter to keep track of how many iterations we perform
* one that simply processes every item in a collection of values without counting

#### Counter-driven for-loops

Suppose we need to perform some process 10 times. A for-loop can track the number of iterations using a **counter** (integer variable) and the `range` function:

In [0]:
# Computes the sum 0^3 + 1^3 + ... + 9^3
total = 0
for i in range(10):
    total += i**3
total

2025

The value given to the `range` function is not included in the count. That is, `i` will count from 0 to 9 in this example.

We can give a different starting value by giving a number before the upper bound. The upper bound is not included in the counting.

In [0]:
# Computes the sum 4^3 + 5^3 + ... + 9^3
total = 0
for i in range(4, 10):
    total += i**3
total

1989

We can even give a step size by providing a third number:

In [0]:
# Computes the sum 1^3 + 4^3 + 7^3
total = 0
for i in range(1, 10, 3):
    total += i**3
total

408

A common usage of a counter-driven for-loop is to use the counter as an *index* into a string or list, the latter of which we will study in a future module.

#### Example: Password Strength Checker

Suppose a sign-up form on a website requires you to provide a password that is at least 10 characters in length and which contains at least 2 uppercase letters, 2 lowercase letters and 2 digit characters.

Three methods will help us write code to check if a candidate password satisfies these requirements:
* [`isupper`](https://docs.python.org/3/library/stdtypes.html#str.isupper): returns `True` if all characters in a string are uppercase letters; returns `False`, otherwise
* [`islower`](https://docs.python.org/3/library/stdtypes.html#str.islower): returns `True` if all characters in a string are lowercase letters; returns `False`, otherwise
* [`isdigit`](https://docs.python.org/3/library/stdtypes.html#str.isdigit): returns `True` if all characters in a string are digit characters; returns `False`, otherwise

The function below returns `True` if the password is strong, and `False`, otherwise.

In [0]:
def password_strong(password):
    # Below is a "multiple target assignment"
    num_uppercase = num_lowercase = num_digits = 0  
    if len(password) < 10:
        return False
    for i in range(len(password)):  # i serves as an index into the password
        if password[i].isupper():
            num_uppercase += 1
        elif password[i].islower():
            num_lowercase += 1
        elif password[i].isdigit():
            num_digits += 1    
    if num_uppercase < 2:
        return False
    elif num_lowercase < 2:
        return False
    elif num_digits < 2:
        return False
    else:
        return True

print(password_strong('WolFIE123'))
print(password_strong('WolFIE1234'))
print(password_strong('Wolfie1234'))
print(password_strong('WOLFIe1234'))

False
True
False
False


#### for-each Loops

The other kind of for-loop is known informally as a `for-each` loop. The idea is that "for each" character in the string, we want to perform some operation. We create a **loop variable** that will, in turn, refer to each character in the string.

Let's rewrite our password processor using this style.

#### Example: Password Strength Checker Revisited

In [0]:
def password_strong(password):
    num_uppercase = num_lowercase = num_digits = 0  
    if len(password) < 10:
        return False
    for char in password:  # char is the loop variable
        if char.isupper():
            num_uppercase += 1
        elif char.islower():
            num_lowercase += 1
        elif char.isdigit():
            num_digits += 1    
    if num_uppercase < 2:
        return False
    elif num_lowercase < 2:
        return False
    elif num_digits < 2:
        return False
    else:
        return True

print(password_strong('WolFIE123'))
print(password_strong('WolFIE1234'))
print(password_strong('Wolfie1234'))
print(password_strong('WOLFIe1234'))

False
True
False
False


#### When Should I Use Which Form of the for-loop?

If you need to process every character in the string, you can usually use the for-each style. With the for-each style, you must "visit" (process) every character in the string.

If you need the index of a character or you don't want to process the entire string, then use the index-based approach that uses `range`.

#### Example: Clean Up a Phone Number

There are many ways to write phone numbers: with or without parentheses, dashes, periods, plus signs (for country codes), and so on.

Suppose we expect someone to type in a US/Canada phone number starting with `1` for the country code. We will exctract the digits and concatenate them into a new string.

In [0]:
phone_number = '+1 (631) 632-2456'
cleaned_number = ''  # empty string
for c in phone_number:
    if c.isdigit():  
        cleaned_number += c  # we can use += to do concatenation!

if len(cleaned_number) != 11:
    print('Error with input.')
else:
    print(f'Cleaned phone number: {cleaned_number}')

Cleaned phone number: 16316322456


#### The `ord` and `chr` Functions

Every character is represented using an integer code called it Unicode value. To get the code for a particular character we can call the `ord` function on the character.

In [0]:
ord('A'), ord('Z'), ord('?')

(65, 90, 63)

To find the character associated with a particular code, we can call the `chr` function.

In [0]:
chr(77), chr(47), chr(98)

('M', '/', 'b')

The codes for the letters are laid out sequentially. This means we can do arithmetic with these codes to move forward or backward through the alphabet.

In [0]:
code = ord('A') + 4
chr(code)

'E'

We can also compute the position of a letter in the alphabet, where A is letter #0:

In [0]:
ord('E') - ord('A')

4

#### Example: A Simple Encryption Scheme

In the **Caesar cipher** (encryption algorithm), a **plaintext** message is encrypted by shifting each letter forward in the alphabet by a fixed number of positions (call it $k$).

For example, if $k=3$, `'A'` becomes `'D'`, `'B'` becomes `'E'`, ..., `'X'` becomes `'A'`, etc., wrapping around the alphabet.

Encryption algorithm:
1. Map the letters A through Z to the numbers 0 through 25.
1. Add $k$ to each input letter's code and then apply the remainder operator with a divisor of 26 (to wrap-around). Leave non-letters untouched.
1. Map the code back to a valid letter. (Trickier than you might think!)

In [0]:
def caesar_encrypt(plaintext, k):
    ciphertext = ''
    for ch in plaintext:
        if ch.isupper():
            replacement = (ord(ch) - ord('A') + k) % 26 + ord('A')
            ciphertext += chr(replacement)
        elif ch.islower():
            replacement = (ord(ch) - ord('a') + k) % 26 + ord('a')
            ciphertext += chr(replacement)
        else:
            ciphertext += ch
    return ciphertext

caesar_encrypt("What's a Seawolf?", 5)  # use double-quotes to deal with the single quote

"Bmfy'x f Xjfbtqk?"

#### Example: UPC Codes

[UPC](https://en.wikipedia.org/wiki/Universal_Product_Code) bar codes have significantly improved many aspects of business since they were introduced in the 1970s, including:

* faster check-out of customers at point-of-sale terminals
* better inventory management
* better supply chain management
* reducing shrinkage (theft)

A UPC-A bar code consists of 11 digits, followed by a 12th digit, which is the **check digit** used to verify that the bar code's number is valid. We imagine the bar code consisting of digits $x_1$, $x_2$, $\ldots$, $x_{11}$, with $x_{12}$ as the check digit.

The UPC-A check digit may be calculated using this [algorithm](https://en.wikipedia.org/wiki/Universal_Product_Code#Check_digit_calculation):

1. Sum the digits at odd-numbered positions (first, third, fifth,..., eleventh). In this scheme, the leftmost digit is at position 1, not 0.
1. Multiply the result by 3.
1. Sum the digits at even-numbered positions (second, fourth, sixth,..., tenth) and add the sum to the result from step 2.
1. Find the result modulo 10 (i.e., the remainder, when divided by 10) and call it $M$.
1. If $M$ is zero, then the check digit is $0$; otherwise the check digit is $10 - M$.

The function below takes a string of 11 digit characters from a UPC, and returns the check digit as an integer.

Our code calls the [`enumerate`](https://docs.python.org/3/library/functions.html#enumerate) function, which lets you combine the style of the index-based for-loop with a for-each loop. The second argument of `enumerate` is the starting value of the index.

In [0]:
def upc_check_digit(upc):
    checksum = 0
     # "i" is the counter variable; "digit" is the digit character
    for i, digit in enumerate(upc, 1): 
        number = ord(digit) - ord('0')
        if i % 2 == 1:  # test for "oddness"
            checksum += 3*number
        else:
            checksum += number
    rem = checksum % 10
    if rem != 0:
        rem = 10 - rem
    return rem

print(upc_check_digit('03600029145'))  # should be 2
print(upc_check_digit('89268500100'))  # should be 3
print(upc_check_digit('72527273070'))  # should be 6

2
3
6


#### Application: Simple Compression Scheme

**Run-length encoding** is a well-known, simple compression scheme for compressing **runs** of repeated characters. If our dataset consists of such runs, then we store the length of the run and the repeated character, rather than the run itself.

For example, rather than store `'AAAAAA'`, we could compress it with the encoding `'*A6'`, where the `*` inside the string indicates the start of a run.

For runs of three or less, there is no sense in compressing the run. Consider the run `'AA'`. Would it make sense to replace those two characters with the three-character code `'*A2'`? Certainly not!

In [0]:
def rle(message):
    prev = message[0]
    run_length = 1  # I called this variable "count" in the video.
    encoded = ''
    for ch in message[1:]:
        if ch == prev:
            run_length += 1
        else:
            if run_length >= 4:
                encoded += '*' + prev + str(run_length)
            else:
                encoded += prev * run_length
            run_length = 1
            prev = ch
    if run_length >= 4:
        encoded += '*' + prev + str(run_length)
    else:
        encoded += prev * run_length
    return encoded

m1 = 'A'*6 + 'B'*7 + 'C' + 'DD' + 'E'*13 + 'FFF' + 'G'
m2 = 'ABCDEFG'
m3 = 'A'*17
e1 = rle(m1)
e2 = rle(m2)
e3 = rle(m3)
print(m1, e1, len(e1)/len(m1))  # Printing the *compression ratio* at the end
print(m2, e2, len(e2)/len(m2))  # of each line. 
print(m3, e3, len(e3)/len(m3))  

AAAAAABBBBBBBCDDEEEEEEEEEEEEEFFFG *A6*B7CDD*E13FFFG 0.5151515151515151
ABCDEFG ABCDEFG 1.0
AAAAAAAAAAAAAAAAA *A17 0.23529411764705882
