<div>
<img src="img/python_logo.png" width="100px" style="float: left; "/> 
<div style="font-size: 40px; padding-top: 20px">Python basics</div>   
<div style="font-size: 30px; padding-top: 20px">Part 3 - Conditions and Loops</div>   
</div>

 
<h2 style="clear: both">COMM4190 Spring 2025</h2>

<h3>Instructor: Matt O'Donnell (mbod@asc.upenn.edu)</h3>

-----

<div class="alert alert-info">

## Overview

* This notebook will cover:
    1. __CODE BLOCKS__ - are one or more lines of code that are executed by Python in order
        * Comments are lines that can be used in a code block to document function but are not executed.
            * It is good practice to use comments to outline the steps in a process
            * Comments are lines in a code block that begin with the `#` (hash) character
             
        * _Code indentation_ - initial indentation of a line of code shows which code block it is a part of. Different levels work a bit like bulleted lists in a document.
        <br/><br/>
    2. __CONDITIONS__ - are constructions in Python that allow you to direct the steps your code on the basis of the result of a test.
        * You can think of conditions like a decision tree where at a certain point you might as 'Is this true or not?' and you follow different paths depending on whether the answer is _YES_ or _NO_.
        * _Conditional constructions_ will have at least one test and associated code block
            * The syntax is:
              ```
              if TEST:
                 # code block executed if TEST is True
              ```
    3. __LOOPS__ - are constructions in Python that allow you to _walk through_ each item in a sequence (e.g. a `list` object) and carry out a series of steps (in a __code block__) with each step.
        * The most commonly used loop construction in Python is the __FOR LOOP__.
        * _For loop constructions_ will have at least one test and associated code block
            * The syntax is:
              ```
              for POINTER in LIST:
                 # code block executed for each item in LIST
              ```

    4. __LOOP AND FILTER PARADIGM__ - a way to build process a list and select a subset of items.
        * This Python idiom combines a for loop with a condition in the code block to create a subset of a list based of certain criteria.
        * The syntax is:
            ```
            filtered_list = []
            for item in list:
                if FILTER_TEST:
                   filtered_list.append(item)
            ```
    
    
</div>
    
    
------

## 1. Code Blocks

* When more than one line of code is included in a code cell and you run the cell, each line is executed in turn in the order they appear, from top to bottom.
    * For example:

In [None]:
print('Step 1')
print('Step 2')
print('Step 3')

* This is refered to as a __CODE BLOCK__.


* As we develop a series of common steps to process text (which we will sometimes refer to as a _processing pipeline_), we will start to group them into code blocks.


* For example, lets take the process of _normalizing_ and _tokenizing_ a sentence like:
    ```
    This, is some text!
    ```
    
* We want to do three things:
    1. remove the punctuation from the string
    2. transform all the characters into the same case (e.g. uppercase)
    3. create a list of tokens by splitting on whitespace
    
    

* Here is a code block that carries out those three steps:

In [None]:
a_string = 'This, is some text!'
a_string_np = a_string.replace(',','').replace('!','')
a_string_uc = a_string_np.upper()
tokens = a_string_uc.split()
print(tokens)

### Comments

* __Comments__ are lines included in a code block that provide documentation of what the code does so that you and readers of your code can more easily follow it.


* A comment is a line in a code block that begins with a `#` (hash)


* Using comments is a good way to start working out a code solution. First outline the steps you want your code to carry out to provide a template for your code.

    * For example:

In [None]:
# Step 1. remove the punctuation from the string

# Step 2. transform all the characters into the same case (e.g. uppercase)

# Step 3. create a list of tokens by splitting on whitespace

* Then you can work through and fill in the code for each step


* Add code for Step 1:

In [None]:
a_string = 'This, is some text!'

# Step 1. remove the punctuation from the string
a_string_np = a_string.replace(',','').replace('!','')

# Step 2. transform all the characters into the same case (e.g. uppercase)

# Step 3. create a list of tokens by splitting on whitespace

In [None]:
a_string_np

* Add code for Step 2:

In [None]:
a_string = 'This, is some text!'

# Step 1. remove the punctuation from the string
a_string_np = a_string.replace(',','').replace('!','')

# Step 2. transform all the characters into the same case (e.g. uppercase)
a_string_uc = a_string_np.upper()

# Step 3. create a list of tokens by splitting on whitespace

* Add code for Step 3:

In [None]:
a_string = 'This, is some text!'

# Step 1. remove the punctuation from the string
a_string_np = a_string.replace(',','').replace('!','')

# Step 2. transform all the characters into the same case (e.g. uppercase)
a_string_uc = a_string_np.upper()

# Step 3. create a list of tokens by splitting on whitespace
tokens = a_string_uc.split()

print(tokens)

### Indentation

* In Python indentation in code blocks is important and are part of the syntax.


* Series of lines that have the same amount of initial indentation function as a code block.

## 2. Conditions

* Frequently we want to only do something if some condition is met (or is `True`)


* For instance, if we test whether a string is all in lowercase with the `.islower()` function we get back a logical value, either:
    * `True`
    
  or
    * `False`
    
* Let's create two string objects with values:
    1. `Matt`
    2. `matt`
    
  and used _named pointers_
  
    1. `name`
    2. `name_lc`
    
  to reference them.

In [None]:
name='Matt'
name_lc='matt'

* These are string objects so we can use one of the *string specific functions*, called `islower()` to ask Python whether the value in the object has characters that are **all in lowercase**.

In [None]:
name.islower()

In [None]:
name_lc.islower()

* Now lets suppose we wanted to print a message based on whether or not this test is `True` or `False`

In [None]:
result = name.islower()
print(f'"{name}" is all in lowercase is {result}')

In [None]:
result2 = name_lc.islower()
print(f'"{name_lc}" is all in lowercase is {result2}')

* __BUT__ really what we want if to display the statement __ONLY__ if the test is true.


* This is where we use a **CONDITIONAL**.

In [None]:
if name.islower():
    print(f'"{name}" is all in lowercase')

* No output is displayed because `name.islower()` is `False`

* The syntax for a conditional in Python is:
    ```
    if TEST:
       # code block
    ```
    
    * Where:
      * `TEST` is some Python that results in a `True` or `False` value
      * And `code block` is __one or more indented lines__ of Python

    * __NOTE__ The colon `:` after the test __MUST BE PRESENT__.
      * This signals that following indented lines (the code block) are grouped together and only to be executed if `TEST` is `True`.

In [None]:
name.isupper()

In [None]:
if name.upper().isupper():
    print('Is upper')
    
print('Not part of code block')

#### Some test operations for numbers

In [None]:
1>2

In [None]:
2>1

* To test whether two numeric values are the same __USE A DOUBLE EQUAL__

In [None]:
2==2

In [None]:
1==3

In [None]:
# not equal to
1!=3

In [None]:
if 1>2:
    print("Something is wrong with Math")
    print("I better check what is going")

* We can invert a conditional test with `not`

In [None]:
1>2

In [None]:
not 1>2

In [None]:
if not 1>2:
    print("OK that's good... math is still working!")

#### IF-ELSE

* Often you want your code to do one thing if the result of the test is `True` and another different thing if the result is `False`.


* The syntax in Python for this is:

    ```
    if TEST:
       # code block 1
       # ...
    else:
       # code block 2
       # ...
    ```

* Where:
  * `TEST` is some Python that results in a `True` or `False` value
  * And `code block 1` is __one or more indented lines__ of Python to be executed if `TEST` is `True`
  * And `code block 2` is __one or more indented lines__ of Python to be executed if `TEST` is `False`
  

In [None]:
if 1>2:
    print('Something is wrong with math')
else:
    print("OK that's good... math is still working!")

In [None]:
if name.islower():
    print(f'{name} is all in lowercase characters')
else:
    print(f'{name} is NOT all in lowercase characters')
    

## 3. Using loops to walk through a sequence


* So far we have used list _indexing_ and _slicing_ to access the items in a list.


* For example, given the list of tokens:
    ```
    ['This', 'is', 'a', 'SENTENCE']
    ```
    
    

In [None]:
sentence = ['This', 'is', 'a', 'SENTENCE']

#### Example 1. Display each of the tokens on a separate line


* If we wanted to display of the tokens on a separate line, we could combine a call to the `print()` function and indexing each of the items in turn

In [None]:
print(sentence[0])
print(sentence[1])
print(sentence[2])
print(sentence[3])

* This is just about reasonable when the list of items is short. But it would become increasing verbose and untenable as the length of the list grows.



#### Example 2. Test each token for case of first character

* And what if we wanted to do more than just display each item but show two different messages depending on whether, for instance, the item in the list began with an uppercase character or not?



In [None]:
# define string templates for each of the messages
is_uc = "'{}' begins with an uppercase character"
isnot_uc = "'{}' does not begin with an uppercase character"

* We can select the first character of a token using indexing of the first character (index `0`)

In [None]:
sentence[0][0]

* Here we have two levels of indexing
    1. select the first item in the `list` object `sentence`
        ```
        sentence[0]
        ```
       
       returns a `str` object `This` <br/><br/>
    2. apply string indexing to access the first character of this `str` object
        ```
        'This'[0]
        ```
        
        returns `T`
        
  So combining them we have:
  ```
  sentence[0][0]
  ```
  
  
* Then we set up a condition with an `if-else` construct

In [None]:
if sentence[0][0].isupper():
    print(is_uc.format(sentence[0]))
else:
    print(isnot_uc.format(sentence[0]))

* To apply this to each of the four items in the list we would need to duplicate it for each item

In [None]:
# process item 1 (index 0)
if sentence[0][0].isupper():
    print(is_uc.format(sentence[0]))
else:
    print(isnot_uc.format(sentence[0]))
    
# process item 2 (index 1)
if sentence[1][0].isupper():
    print(is_uc.format(sentence[1]))
else:
    print(isnot_uc.format(sentence[1]))
    
# process item 3 (index 2)
if sentence[2][0].isupper():
    print(is_uc.format(sentence[2]))
else:
    print(isnot_uc.format(sentence[2]))
     
# process item 4 (index 3)
if sentence[3][0].isupper():
    print(is_uc.format(sentence[3]))
else:
    print(isnot_uc.format(sentence[3]))
  


* Imagine if we had 100 or 1000 or 10000 items in our list!


* This is were we can use a _for loop_ that allows us to **walk through each item in the list**.
  * The syntax is:
      ```
      for POINTER in LIST:
          # code block
      ```
      
  * The key components are:
      1. a `list` object to walk through (`LIST`)
      2. a moving reference or name pointer (`POINTER`)
      3. the `:` (colon) at the end of the line beginning with `for` 
          * WITHOUT THIS YOU WILL GET A SYNTAX ERROR
      3. some steps of code to carry out with each step (the `code block`)
          * notice the indentation

#### Example 1 Revisted

* Using a for loop construction we can replace:

    ```
    print(sentence[0])
    print(sentence[1])
    print(sentence[2])
    print(sentence[3])
    ```
    
  with:

In [None]:
for token in sentence:
    print(token)

* Here take notice the following:
    1. `sentence` is the list object to 'walk through' item-by-item from the first to the last item
    2. `token` is a name pointer used to reference the current item in the walk and this can be used in the code block to process each item
    3. the colon (`:`) at the end of the first line
    4. the code block is indented and here is a single line of code `print(token)`

#### Example 2 Revisted

* The for loop construct allows us to remove the duplication of the `if-else` code block 

In [None]:
for token in sentence:
    if token[0].isupper():
        print(is_uc.format(token))
    else:
        print(isnot_uc.format(token))

* The first line that sets up the loop is the same as in Example 1 Revisted


* The code block has the `if-else` construct from Example 2 but uses the `token` pointer from the loop
  * the test in the `if` line selects the first character in the `str` object currently referenced by `token` and applies the `isupper()` function.
      * if the result is `True` then the indented code underneath (`print(is_uc.format(token))`) is executed
      * but if the result is `False` then the indented code in the `else` code block is executed (`print(isnot_uc.format(token))`)
  

## 4. The _Loop and Filter_ paradigm

### Filtering `list` objects using loops

* Here we have a list of words, i.e. a sentence represented as an __ORDERED SEQUENCE OF TOKENS__, where a token is a `str` object.

In [None]:
sentence = ['This', 'is', 'A', 'sentence', 'represented', 'as', 'a', 'LIST']

* We have seen using _indexing_ and _slicing_ as ways to access the items in a `list` object

In [None]:
sentence[0]

In [None]:
sentence[2:4]

* A __LOOP__ in Python is a construct that allows each item in a list to be processed in turn

In [None]:
for token in sentence:
    print(token)

* The code block in this example calls the `print()` function to display the data in the current `str` object which is referenced by the name pointer `token` at each step in the loop.


#### Create a copy of a list using a list

* Imagine we wanted to create another `list` object called `sentence_copy`. One way to do this would be to:
  1. create an empty list called `sentence_copy`
  2. walk through all the items in `sentence` (using a for loop)
  3. and add each item to `sentence_copy` using the list `.append()` function

In [None]:
sentence_copy = []

for item in sentence:
    sentence_copy.append(item)

In [None]:
sentence_copy

* The following code cell repeats this but adds some calls to `print()` to provide a step-by-step display of what is happening

  1. Before the loop begins `sentence_copy` is an empty list
  2. At each step of the walk through the list `sentence` the pointer `token` references the item in `sentence` at the current position in the walk
  3. In the loop code block, we add this item to the end of the list referenced by `sentence_copy`
  4. And then display the item currently referenced by `token` and the contents of `sentence_copy`

In [None]:
tmpl = "Step {}\n\ttoken => '{}' (index={})\n\tsentence_copy {}\n\n"

sentence_copy = []

print('Before loop begins\n\tsentence_copy {}\n\n'.format(sentence_copy))

for idx, token in enumerate(sentence):
    sentence_copy.append(token)
    print(tmpl.format(idx+1, token, idx, sentence_copy))
    
print('After loop completed\n\tsentence_copy {}\n\n'.format(sentence_copy))


* This pattern is the basis of the __LOOP AND FILTER__ paradigm
    * If you have a list of items and you want create a new list that only contains certain of these items based on some criterion or criteria, then the _loop and filter_ pattern will look like:

    ```
    filtered_list = []

    for item in your_list:
        if TEST_ON_ITEM:
           filtered_list.append(item)
    ```
    
  where:
  * `your_list` is a `list` object you wish to filter
  * `TEST_ON_ITEM` is some test or function call that returns `True` or `False` based of the value referenced by `item`
    * For example, if `item` referenced a `str` object then you could test to see if all the characters are lowercase:
        ```
        if item.lower():
           ...
        ```

#### Select the tokens in `sentence` that are in uppercase

* We can apply the __loop and filter__ paradigm to our list of tokens, `sentence` to select/filter tokens that are in uppercase

In [None]:
UC_tokens = []

for token in sentence:
    if token.isupper():
        UC_tokens.append(token)

In [None]:
UC_tokens

* Note the components of the loop and filter paradigm:
    1. create an empty list (`UC_tokens=[]`)
    2. set up a loop to walk through each item in our list of tokens and use the pointer `token`
       ```
       for token in sentence:
       ```
    3. specify the filter criteria in a condition construction
       ```
       if token.isupper():
       ```

#### Select the tokens in `sentence` that are in lowercase

* We can do the same but this time selecting items that are all in lowercase

In [None]:
LC_tokens = []

for token in sentence:
    if token.islower():
        LC_tokens.append(token)

In [None]:
LC_tokens

#### Filter into two lists in a single loop

* The two examples above could be combined into a single loop

In [None]:
UC_tokens = []
LC_tokens = []

for token in sentence:
    
    # add lowercase tokens to LC_tokens
    if token.islower():
        LC_tokens.append(token)
    
    # add uppercase tokens to UC_tokens
    if token.isupper():
        UC_tokens.append(token)

In [None]:
print('lowercase tokens', LC_tokens)
print('uppercase tokens', UC_tokens)