# Text Analytics | BAIS:6100
# Module 1: Python Basics for Text Processing, Part 1

Instructor: Kang-Pyo Lee 

Topics to be covered:
- Strings (+ exercises)
- Collections: lists, tuples, dictionaries, and sets (+ exercises)
- Built-in functions for type conversion
- Operators
- Branches (+ exercises)
- Loops (+ exercises)
- User-defined functions (+ exercises)

## Data Types of Python

- Numbers
- Strings
- Collections
    - Lists
    - Tuples
    - Dictionaries
    - Sets

## Strings

In [None]:
s = "How are you?"

`s` is a string variable.

String literals can be enclosed in matching single quotes (') or double quotes ("); either is fine. 

In [None]:
print(s, type(s))

In [None]:
s = 'How are you?'
print(s, type(s))

In [None]:
s = "She said, "How are you?""
print(s)

It returns a SyntaxError because the use of double quotes was confusing.

In [None]:
s = "She said, \"How are you?\""
print(s)

If you need double quotes in a string already enclosed in double quotes, you can put a backslash escape character (\\) before each double quote inside the string.

In [None]:
s = 'She said, "How are you?"'
print(s)

If the string contains double quotes, you can use single quotes around the string without using a backslash, and vice versa.

In [None]:
s = "I'm learning Python."
print(s)

In [None]:
s = "1"
print(s, type(s))

In [None]:
print(1, "1")

Integer 1 and string "1" look the same when printed, but actually they are different in terms of data type.

In [None]:
print(type(1), type("1"))

In [None]:
s = "How are you?"
len(s)

The <b>len</b> function is a built-in function of Python, which is widely used for getting the length of a list or a sequence of any type. 

### Difference between print(s) and s

In [None]:
s = "How are you?"

In [None]:
print(s)     # Prints the value of s.

In [None]:
s            # Prints not only the value of s but also its meta information.      

In [None]:
print(s)
print(s)

In [None]:
s
s            # Only the statement in the last line prints something.

In [None]:
s2 = print("How are you?")

One of the common mistakes is assigning a printed output to a variable, which does not make sense to Python but raises no error. 

In [None]:
s2

In [None]:
print(s2)

In this course, use the <b>print</b> function only when it is specified that you should print something. 

### String Additions and Multiplications

In [None]:
s1 = "hello"
s2 = "world"
s1 + s2

The easiest way to combine two strings is to use the + operator. 

In [None]:
s1 * 3

### String Containment

In [None]:
s1 = "hello"
s2 = "hell"
s2 in s1

In [None]:
s1 in s2

The <b>in</b> operator returns True if the first operand is contained in the second.

### String Indexing and Slicing

Python string is, in fact, a sequence, meaning that it could be indexed and sliced.

In [None]:
s = "This is text."
s

In [None]:
from IPython.display import Image
Image("classdata/images/string.png")

A Python index starts from 0, increments by 1, and ends at the length -1. 

In [None]:
s[0]

You can access a character in a string by referring to the index position inside square brackets.

In [None]:
s[12]

In [None]:
s[13]

The slicing starts with the `start_pos` index (inclusive) and ends at `end_pos` index (exclusive). The `step` parameter is used to specify the steps to take from `start` to `end` index. All the three parameters are optional.

In [None]:
Image("classdata/images/string2.png")

In [None]:
s[0:4]

Note that `s[i:j]` will return a string starting with `s[i]` and ending with `s[j-1]`, not `s[j]`.

In [None]:
s[:4]

You can skip the starting index 0, if it starts from 0. `s[:n]` is the easiest way to get the first `n` characters in a string.

In [None]:
Image("classdata/images/string3.png")

In [None]:
s[8:13]

In [None]:
s[8:]

You can skip the ending index, if it ends to the end.

In [None]:
s[:]

You can skip both the starting and ending indices if it starts from 0 and ends to the end.

In [None]:
s[::2]      # Stepping by 2

In [None]:
s[::3]      # Stepping by 3

In [None]:
s[::-1]     # Stepping by -1, which means reversing the string

In [None]:
Image("classdata/images/string4.png")

Python also indexes the arrays backwards, using negative numbers.

In [None]:
s[-1]

In [None]:
s[-5]

In [None]:
s[-5:]

`s[-n:]` is the easiest way to get the last `n` characters in a string.

In [None]:
Image("classdata/images/string5.png")

In [None]:
s

Note that the original string `s` has not changed at all. Indexing and slicing of strings returns a copy of string, not changing the original string. 

In [None]:
s = s[:4]
s

If you want to change the orignial string, make sure to save the copy back in the original variable. 

### String Methods

In [None]:
s = "a"

In [None]:
s.islower()

The <b>islower</b> method returns True if the string is lowercase.

In [None]:
s.isupper()

The <b>isupper</b> method returns True if the string is uppercase.

In [None]:
s = "This is text."
s

In [None]:
s.upper()

The <b>upper</b> method returns a string where all characters are in upper case. Symbols and numbers are ignored.

In [None]:
s.lower()

The <b>lower</b> method returns a string where all characters are lower case. Symbols and numbers are ignored.

In [None]:
s.count("is")

The <b>count</b> method returns the number of times a specified value appears in the string.

All strings and string methods in Python are case-sensitive.

In [None]:
s = "\tThis\nis\ntext.\n\n\n"     # \t: tab, \n: new line
print(s)

In [None]:
print(s.strip())

The <b>strip</b> method removes any leading (spaces at the beginning) and trailing (spaces at the end) characters. Space is the default character to be removed.

In [None]:
print(s.lstrip())

The <b>lstrip</b> method removes any leading characters.

In [None]:
print(s.rstrip())

The <b>rstrip</b> method removes any trailing characters.

In [None]:
s = "This is text."
print(s.rstrip("."))

You can specify the character to be removed. 

In [None]:
s = "This is text."
s.startswith("This")

The <b>startswith</b> method returns True if the string starts with the specified value, otherwise False.

In [None]:
s.endswith("?")

The <b>endswith</b> method returns True if the string ends with the specified value, otherwise False.

In [None]:
s = "This is text."
s.find("text")

The <b>find</b> method finds the index of the first occurrence of the specified value.

In [None]:
s.find("z")

It returns -1 if the value is not found.

In [None]:
s.index("text")

The <b>index</b> method is almost the same as the <b>find</b> method, the only difference is that the <b>find</b> method returns -1 if the value is not found.

In [None]:
s.index("z")

In [None]:
s = "This is text."
s.replace(" ", "_")

The <b>replace</b> method replaces a specified value with another specified value.

In [None]:
s = "This is text."
s.split()

The <b>split</b> method splits a string into a list. Default separator is any whitespace. You can specify the separator. 

In [None]:
s = "This_is_text."
s.split("_")

In [None]:
l = ["This", "is", "text"]
" ".join(l)

The <b>join</b> method takes all items in a list and joins them into one string. A string must be specified as the separator.

The <b>split</b> and <b>join</b> methods are opposites of each other. 

In [None]:
"+".join(l)

In [None]:
name = "Alice"
age = 30

In [None]:
s = "Name: {}, Age: {}".format(name, age)
s

The <b>format</b> method formats specified values in a string. 

In [None]:
s = "Name: " + name + ", Age: " + age
s

In [None]:
s = "Name: " + name + ", Age: " + str(age)
s

Note that string methods return a copy of a string, not changing the original string.

### Dot Notation

In [None]:
s = "This is text."

Suppose you want to perform a series of string methods, e.g., convert `s` to lowercase and then split it into a list of words. 

In [None]:
s1 = s.lower()
s1

In [None]:
s1.split()

In [None]:
s.lower().split()

Dot notation is useful for taking the outcome of the previous method. That way, you do not have to store the intermediate outcome from the previous operation to do another operation. 

In [None]:
s.split().lower()

Reversing the order of methods in a dot notation may not always work. In this example, `s.split()` returns a list, not a string. There is no <b>lower</b> method in lists. 

## Exercises - Strings

## ▪ Python Collections

There are four types of collections in Python:

- List
- Tuple
- Dictionary: a collection of key-value mappings
- Set

It is important to choose the right type that fits your needs.

In [None]:
Image("classdata/images/collection.png")

## Lists

A list is a collection that is ordered, mutable, indexed, and written with square brackets. It allows duplicate members.

In [None]:
l = []
print(l, type(l))

In [None]:
l = list()
print(l, type(l))

In [None]:
l = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]
l

In [None]:
Image("classdata/images/list.png")

Indexing and slicing of lists are exactly the same as those of strings.

In [None]:
l[0]

In [None]:
l[:3]

In [None]:
l[-3:]

In [None]:
len(l)

The length of a list is the number of items in the list.

In [None]:
"Jan" in l

In [None]:
"Mon" in l

### List Methods

In [None]:
l1 = ["a", "b", "c"]
l2 = ["d", "e", "f"]
l1 + l2

The easiest way to combine two lists is to use the + operator, just as combining two strings. 

Note that `l1` + `l2` returns a copy of list.

In [None]:
l = ["a", "b", "c"]
l.extend(["d", "e", "f"])
l

The <b>extend</b> method extends the list by appending all the items from another list. Note also that the <b>extend</b> method actually extends the target list, not returning a copy.

In [None]:
l = ["a", "b", "c"]
l = l + ["d", "e", "f"]
l

Another way to extend a list is to combine the original list with the new list using the + operator and then save the outcome back in the original variable.  

In [None]:
l = ["a", "b", "c"]
l.append("d")
l

The <b>append</b> method appends an element to the end of the list. Note that the <b>append</b> method actually appends a new element to the target list, not returning a copy. 

In [None]:
l = ["c", "e", "a", "d", "b", "f"]
l.sort()
l

The <b>sort</b> method sorts the list in ascending order by default.

In [None]:
l = ["c", "e", "a", "d", "b", "f"]
l.sort(reverse=True)
l

Setting the `reverse` parameter to True will sort the list in descending order.

In [None]:
l = ["c", "e", "a", "d", "b", "f"]
sorted(l)

Python also has a built-in function <b>sorted</b>, which works the same as the <b>sort</b> method except that it returns a copy. 

In [None]:
l

In [None]:
sorted(l, reverse=True)

In [None]:
l = ["c", "e", "a", "d", "b", "f"]
l.reverse()
l

The <b>reverse</b> method reverses the sorting order of the elements. Do not confuse reversing a list with sorting a list. 

In [None]:
Image("classdata/images/list2.png")

In [None]:
l = ["c", "e", "a", "d", "b", "f"]
reversed(l)

Python also has a built-in function <b>reversed</b>, which works the same as the <b>reverse</b> method except that it returns a new copy.

In [None]:
list(reversed(l))

In [None]:
l[::-1]

Another way to get a reversed list is to use slicing setting the `step` parameter to -1. 

In [None]:
l = ["a", "b", "c", "a", "b", "c"]
l.count("a")

The <b>count</b> method returns the number of elements with the specified value.

## Exercises - Lists

## Tuples

A tuple is a collection that is ordered, immutable, indexed, and written with round brackets. It allows duplicate members.

In [None]:
t = ()
print(t, type(t))

In [None]:
t = tuple()
print(t, type(t))

In [None]:
t = ("a", "b", "c")
print(t, type(t))

In [None]:
t[0]

Indexing and slicing of tuples are the same as other collections.

In [None]:
t[0] = "A"

Once a tuple is created, you cannot change its values as tuples are immutable. 

In general,  we use a list to store similar, or homogeneous, items, whereas we use a tuple to store heterogeneous items describing an entity. 

In [None]:
employees = [("Alice", 30, "female"), ("Bob", 25, "male"), ("Tom", 34, "male")]

In this example, `employees` is a list of three tuples, each of which consists of three items.

In [None]:
employees[0]

In [None]:
employees[0][0]

In [None]:
employees[0][1]

## Dictionaries

A dictionary is a collection of key-value mappings that is unordered, mutable, indexed, and written with curly brackets. If you look up a key in the dictionary, it returns its value, but not vice versa. It allows no duplicate keys.

When designing a dictionary, you should think about what should be the key and what should be the value. It depends on the purpose of the dictionary.

In [None]:
d = {}
print(d, type(d))

In [None]:
d = dict()
print(d, type(d))

In [None]:
buildings = {"UCC": "University Capitol Center", "PBB": "Pappajohn Business Building"}
print(buildings, type(buildings))

`buildings` is a dictionary for UI building name abbreviations. If you look up an abbreviation, it returns its full name.

In [None]:
Image("classdata/images/dict.png")

If you look up a key in a dictionary, it returns, if any, its value. 

In [None]:
buildings["UCC"]

In [None]:
buildings["PBB"]

In [None]:
buildings["IMU"]

If there is not the key in the dictionary, it returns KeyError.

In [None]:
buildings.keys()

In [None]:
buildings.values()

In [None]:
buildings["IMU"] = "Iowa Memorial Union"
buildings

In [None]:
"IMU" in buildings

You can check if a key is in a dictionary using the <b>in</b> operator.

In [None]:
"SH" in buildings

In [None]:
len(buildings)

The length of a dictionary is the number of key-value mappings in the dictionary. 

In [None]:
buildings = {
    "UCC": {"name": "University Capitol Center",
            "address": "200 South Capitol Street, Iowa City, IA 52240", 
            "year": 1981}, 
    "PBB": {"name": "Pappajohn Business Building",
            "address": "21 East Market Street, Iowa City, IA 52242",
            "year": 1993},
    "IMU": {"name": "Iowa Memorial Union", 
            "address": "125 North Madison Street, Iowa City, IA 52242", 
            "year": 1925}
}

A value of a key in a dictionary can be any object in Python. 

In [None]:
buildings["UCC"]

In [None]:
buildings["UCC"]["address"]

## Exercises - Dictionaries

## Sets

A set is a collection that is unordered, mutable, unindexed, and written with curly brackets. It allows no duplicates. Dictionaries and sets are both written with curly brackets, but sets only have keys with no corresponding values to those keys. 

In [None]:
s = set()
print(s, type(s))

In [None]:
s = {"cat", "dog", "bird"}
print(s, type(s))

In [None]:
s[0]

Sets are not indexed, which means you cannot access the elements using their index positions.

In [None]:
"dog" in s

In [None]:
"cow" in s

### Set Methods

In [None]:
s.add("fish")
s

The <b>add</b> method adds an element to the set. Note that there is no <b>append</b> method in sets.

In [None]:
s.add("fish")
s

Sets do not allow duplicate values. If the element to be added already exists, it does not add the element.

In [None]:
s.update({"elephant", "horse", "whale"})
s

The <b>update</b> method updates the current set by adding items from another set.

Note that the <b>add</b> method adds a single element to a set, while the <b>update</b> method adds a group of elements. 

In [None]:
s.remove("cat")
s

The <b>remove</b> method removes the specified element from the set.

## Built-in Functions

In [None]:
s = "I'm learning the Python programming language."
dir(s)

The <b>dir</b> function returns a list of the specified object's properties and methods.

In [None]:
help(print)

The <b>help</b> function executes the built-in help system.

In [None]:
len(s)

The <b>len</b> function returns the length of an object.

In [None]:
print(s)

The <b>print</b> function prints the specified message to the screen or other standard output device.

In [None]:
print(s)
print(s)

In [None]:
print(s, end=" ")
print(s)

The `end` parameter sets a string appended after the last value, default a newline.

In [None]:
range(0, 10)

The <b>range</b>(start, stop) function returns a sequence of numbers, starting from `start` and increments by 1 and ends at `stop` - 1. Rather than being a function, range() is actually an unchangeable sequence type.

In [None]:
list(range(0, 10))

In [None]:
list(range(10))

You can skip `start` if it is 0.

In [None]:
list(range(0, 10, 2))

The <b>range</b>(start, stop, step) function increments by `step`. 

In [None]:
l = ["a", "b", "c"]
reversed(l)

The <b>reversed</b> function returns a reversed iterator.

In [None]:
list(reversed(l))

In [None]:
l = ["c", "a", "b"]
sorted(l)

The <b>sorted</b> function returns a sorted list.

In [None]:
sorted(l, reverse=True)

In [None]:
type(l)

The <b>type</b> function returns the type of an object.

In [None]:
l1 = ["Alice", "Bob", "Tom"]
l2 = [30, 25, 34]
zip(l1, l2)

The <b>zip</b> function returns an iterator that aggregates elements from each of the lists. 

In [None]:
list(zip(l1, l2))

### Difference between Functions and Methods

- A function looks like this: <b>function_name(something)</b>, e.g., sorted(l1)
- A method looks like this: <b>something.method_name()</b>, e.g., l1.sort()

A method always belongs to an object (e.g. string methods only work for string objects, list methods only work for list objects, etc.), while a function doesn’t necessarily (e.g., you can use the <b>len</b> function for strings, lists, tuples, or any data types).

### Built-in Functions for Type Conversion

In [None]:
1 == "1"

In [None]:
s = str(1)
print(s, type(s))

The <b>str</b> function converts the specified value into a string.

In [None]:
str(1) == "1"

In [None]:
num = int(s)
print(num, type(num))

The <b>int</b> function converts the specified value into an integer.

In [None]:
num = float(s)
print(num, type(num))

The <b>float</b> function converts the specified value into a floating point number.

In [None]:
l = ["a", "b", "c", "b", "a"]
print(l, type(l))

In [None]:
s = set(l)
print(s, type(s))

The <b>set</b> function creates a set object.

There are cases where you need to convert a list to a set, so you can use some characteristics of sets. For example, if you need to count the number of "unique" items in a list, you can first convert the list to a set and then count the number of items in the set.  

## Operators

### Boolean Operators

In [None]:
p = True
q = False

There are two Boolean values in Python: True and False. 

In [None]:
p & q     # and operator

In [None]:
p | q     # or operator

In [None]:
not p

### Comparison Operators

Python supports all types of comparison operators.

In [None]:
"a" == "b"

Do not counfuse the <b>==</b> (equality) operator with the <b>=</b> (assignment) operator. 

In [None]:
"a" is "b"

In [None]:
"a" != "b"

In [None]:
"a" is not "b"

In [None]:
"a" < "b"

In [None]:
Image(url="https://callisto.ggsrv.com/imgsrv/FastFetch/UBER1/9781682171400_00127")

When two strings are being compared, the character with lower Unicode value is considered to be smaller.

In [None]:
"A" < "a"

## ▪ Flow Control

There are three main categories of program control flow in Python:
- Branches (if, elif, else)
- Loops (for loops, while loops)
- Function calls

## Branches

An <b>if</b> statement takes the form of an <b>if</b> test, followed by one or more optional <b>elif</b> ("else if") statements and a final optional <b>else</b> statement. 

The tests and <b>else</b> part each have an associated block of nested statements, indented under a headline.

When test1 evaluates to true, Python executes statements1. When false, it moves down to the next <b>elif</b> statement to check if test2 evaluates to true. When true, it executes statements2. When false, it moves down and so on and so forth. It reaches the <b>else</b> statement and executes statements3 if all tests prove false. 

In [None]:
s = "Python"

In [None]:
if s[0].islower():
    print("S starts with a lowercase letter.")

Make sure to end the <b>if</b> test with a colon (:). When you put a colon and press enter, it automatically indents the next line, so you can write the nested statements. 

It prints nothing because the if test evaluates to false. 

In [None]:
if s[0].islower():
    print("S starts with a lowercase letter.")
else:               # for the rest of the cases
    print("S starts with an uppercase letter.")

In [None]:
if s[0].islower():
    print("S starts with a lowercase letter.")
elif s[0].isupper():
    print("S starts with an uppercase letter.")
else:
    print("S starts with a non-alphabetical letter.")

In [None]:
if s[0].islower():
    print("S starts with a lowercase letter.")
else:
    if s[0].isupper():
        print("S starts with an uppercase letter.")
    else:
        print("S starts with a non-alphabetical letter.")

Note that <b>if</b> statements can be nested.

Note also that this code using nested <b>if-else</b> statements works exactly the same as the previous code using <b>if-elif-else</b> statements. You can write two different versions of code that have the equivalent logic and work the same, just like we can think of two different sentences in English that have exactly the same meaning. 

In [None]:
if (s[0] == "p") | (s[0] == "P"):
    print("S starts with p.")
else:
    print("S does not start with p.")

You can set a compound condition in a test using Boolean operators. When setting a compound condition, enclose each subcondition with matching round brackets to avoid any confusion. 

## Exercises - Branches

## For Loops

A <b>for</b> loop steps through the items in a list or any other ordered sequence or iterable object. 

- It begins with a header line that specifies an assignment target (or targets), along with the object you want to step through. 
- The header is followed by a block of indented statements that you want to repeat. 
- When Python runs a <b>for</b> loop, it assigns the items in the iterable <i>object</i> to the <i>target</i> one by one and executes the loop body for each. 
- The loop body typically uses the assignment target to refer to the current item in the sequence. 

- It begins with a header line that specifies an assignment target (or targets), along with the object you want to step through. 
- The header is followed by a block of (normally indented) statements that you want to repeat. 
- When Python runs a <b>for</b> loop, it assigns the items in the iterable <i>object</i> to the <i>target</i> one by one and executes the loop body for each. 
- The loop body typically uses the assignment target to refer to the current item in the sequence. 

In [None]:
l = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]

In [None]:
for item in l:
    print(item)

Make sure to end the header line with a colon (:).

In [None]:
Image("classdata/images/for_loop.png")

In [None]:
for item in sorted(l):
    print(item)

In [None]:
s = "Python"

In [None]:
for c in s:
    print(c, end="")     # The end parameter sets a string appended after the last value.

A <b>for</b> loop can iterate over a string. This for loop works exactly the same as `print(s)`. 

In [None]:
for c in s:
    print(c, end=" ")

In [None]:
for c in reversed(s):
    print(c, end="")

In [None]:
employees = [("Alice", 30), ("Bob", 25), ("Tom", 34)]

A <b>for</b> loop can iterate over any iterable object. 

In [None]:
for t in employees:
    print(t)

In [None]:
for t in employees:
    name = t[0]
    age = t[1]
    print("Name: {}, Age: {}".format(name, age))

You can decompose a tuple into multiple objects. 

In [None]:
for name, age in employees:
    print("Name: {}, Age: {}".format(name, age))

You can unpack the tuple in the header line, not in the loop body. The header line automatically unpacks the current tuple on each iteration. 

In [None]:
l1 = ["a", "b", "c"]
l2 = ["x", "y", "z"]

In [None]:
for item1, item2 in zip(l1, l2):
    print(item1 + item2)

The <b>zip</b> function returns a tuple at each iteration. 

### Iteration Using Index

You can also iterate over an iterable object using its index.

In [None]:
l = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]

In [None]:
for item in l:
    print(item, end=" ")

In [None]:
for i in range(len(l)):
    item = l[i]
    print(item, end=" ")

In [None]:
Image("classdata/images/for_loop2.png")

In [None]:
for i in range(len(l)):
    item = l[i]
    print("[{}] {}".format(i, item))

Iteration using the index is useful when you need to track the current location at each iteration. 

In [None]:
for i in range(len(l)):
    if i % 3 == 2:                          # Print for every 3rd item 
        print("[{}] {}".format(i, l[i]))

You can use the index in a condition inside the loop body. 

### Nested For Loops

In [None]:
l1 = ["a", "b", "c"]
l2 = ["x", "y", "z"]

In [None]:
for item1 in l1:               # outer for loop
    for item2 in l2:           # inner for loop
        print(item1 + item2)

The inner <b>for</b> loop runs for every item in the outer for loop. 

In [None]:
for item2 in l2:               # outer for loop
    for item1 in l1:           # inner for loop
        print(item1 + item2)

The order of retrieving elements depends on how it is described in the nested <b>for</b> loop.

### List Comprehensions

List comprehension is used for quickly creating a new list from another list using <b>for</b> loops. 

In [None]:
l1 = ["a", "b", "c"]

In [None]:
l2 = []
for item in l1:
    l2.append(item + "*")

l2

You want to create a new list `l2` by adding * to each letter of `l1`.

In [None]:
l2 = [item + "*" for item in l1]
l2

Using list comprehension, you simply describe the process within a single line.

In [None]:
l = ["Python", "R", "SAS", "SPSS", "Matlab", "Stata"]

In [None]:
[item for item in l]

This creates a new list with exactly the same content as `l`.

In [None]:
[item.lower() for item in l]

In [None]:
[(item, item.lower()) for item in l]

In [None]:
[item for item in l if item.startswith("S")]

In [None]:
[item for item in l if len(item) > 3]

List comprehension has not only the code length advantage, but also the time advantage. List comprehension is known to be 35% faster than for loops.

### Break, Continue, and Pass in Loops

- <b>break</b>: exits the current loop (past the entire loop statement)
- <b>continue</b>: jumps to the header line of the current loop and moves on to the next item
- <b>pass</b>: does nothing at all (i.e., an empty statement placeholder)

These statements are typically used in an <b>if</b> statement. 

In [None]:
l = ["a1", "a2", "b1", "b2", "c1", "c2"]

In [None]:
for item in l:
    if item.startswith("c"):
        break       # Stops!
    print(item)

The <b>for</b> loop prints each number in `l` from the beginning and stops when it sees <i>c1</i>.

In [None]:
for item in l:
    if item.endswith("2"): 
        continue     # Moves on to the next iteration of the loop
    print(item)            

In [None]:
for item in l:
    if item.endswith("1"):
        print(item)            

This <b>for</b> loop works the same as the previous one. Again, you can write two different verions of code that work exactly the same. 

In [None]:
for item in l:
    pass

The <b>pass</b> statement is used when the syntax requires a statement, but you have nothing useful to say. It is ofen used to code an empty body for a compound statement. 

In [None]:
for item in l:
    

The loop body requires any valid statement. 

## Exercises - For Loops

## User-Defined Functions

A function is a block of reusable code that is used to perform a specific action. The advantages of using functions include:
- reducing duplication of code
- decomposing complex problems into simpler pieces
- improving clarity or readability of the code
- reuse of code
- information hiding

In [None]:
s1 = "\t\t\tI'm learning Python.\n"
s1 = s1.replace(" ", "")
s1 = s1.replace("\t", "")
s1 = s1.replace("\n", "")
s1

In [None]:
s2 = "\t\t\tI'm learning data analytics.\n"


In [None]:
s3 = "\t\t\tI'm learning programming.\n"


In [None]:
def remove_all_whitespaces(s):
    s = s.replace(" ", "")
    s = s.replace("\t", "")
    s = s.replace("\n", "")
    
    return s

In [None]:
remove_all_whitespaces(s1)

We can simply call the function to do exactly the same things to a different variable. 

In [None]:
remove_all_whitespaces(s2)

In [None]:
remove_all_whitespaces(s3)

In [None]:
def add_asterisk(s):
    return s + "*"

In this example, `add_asterisk` is the function name; `s` is the (only) argument, or parameter. Make sure to put a colon (:) after the parentheses. s + "*" is the return value of this function.

In [None]:
add_asterisk("a")

To call a function, just specify the function name followed by argument values enclosed with parentheses. 

In [None]:
def combine(s1, s2):
    return s1 + s2

A function can have multiple arguments. 

In [None]:
combine("a", "b")

In [None]:
def combine_print(s1, s2):
    print("If {} combines with {}, it becomes {}.".format(s1, s2, s1 + s2))

A function can return no value. 

In [None]:
combine_print("a", "b")

In [None]:
def get_first_2_items(l):
    return l[0], l[1]

A function can return multiple values as a tuple. 

In [None]:
get_first_2_items(["a", "b", "c", "d", "e"])

In [None]:
a, b = get_first_2_items(["a", "b", "c", "d", "e"])
print(a, b)

A function's return value can be saved in a variable. 

In [None]:
def just_print():
    print("Hello, world!")

A function can have no arguments. 

In [None]:
just_print()

In [None]:
def remove_all_whitespaces(s):
    s = s.replace(" ", "")
    s = s.replace("\t", "")
    s = s.replace("\n", "")
    
    return s

You can write a series of operations in the function body.

In [None]:
remove_all_whitespaces("\t\t\tI'm learning the Python programming language.\n")

In [None]:
def remove_all_whitespaces2(s):
    if type(s) != str:
        return "Not a string!"
    
    s = s.replace(" ", "")
    s = s.replace("\t", "")
    s = s.replace("\n", "")
    
    return s

You can use <b>if</b> statements in the function body to make the function respond differently to the arguments. 

In [None]:
remove_all_whitespaces2(100)

In [None]:
def remove_all_whitespaces3(s):
    if type(s) != str:
        return "Not a string!"
    
    s = s.replace(" ", "")
    s = s.replace("\t", "")
    s = s.replace("\n", "")
    
    return s

    print("Can I get printed???")

You can use <b>if</b>-<b>else</b> statements to check the soundness of arguments. Place those statements at the beginning of the function body, so that the rest of the code does not get executed for unsound arguments.  

In [None]:
remove_all_whitespaces3("\t\t\tI'm learning the Python programming language.\n")

In [None]:
def get_first_n_items(l, n=3):
    return l[:n]

You can set the default value of an argument in a function.

In [None]:
get_first_n_items(["a", "b", "c", "d", "e"])

 You will have the option of not specifying a value for that argument when calling the function. If you do not specify a value, then the argument will have the default value given when the function executes. 

## Exercises - User-Defined Functions