## Unit 4A: How to Write Functions and Use Modules

As you spend more time coding, you will realize that many of your everyday tasks will become redundant. You will find yourself writing the same or similar code for similar tasks. To make life easier, we can put our code in <b>functions</b>, which are blocks of code that take input and perform the tasks that you put inside of the function. For example, you can write a function that will iterate through an HTML file and return everything that is contained by header tags. Once the function is written, you won't need to rewrite the code again. All you have to do is call the function with different inputs.

We have used functions before. Python has built-in functions that we have been using since Unit 1. Things like <i>print()</i>, <i>split()</i>, <i>strip()</i>, <i>len()</i>, and <i>range()</i> are all example of functions that we have used in previous modules. Just like Python offers built in functions, we can make our own as well.
    
From using the native Python functions, you might have realized a few key things. First, some functions directly alter their target in-place, and others will return some value. We won't go too deep into the former. I would just like to introduce the term <b>Object Oriented Programming</b> (OOP for short), and you can look more into this topic when you feel comfortable with Python. For now, we will focus on writing functions that simply take some input and return some output.

### Defining Functions

Defining a function is simple. The syntax is is generally:

    def function_name(argument):
        do_something_with_argument
        
Just as with loops, anything contained by a function needs to be indented.

In [1]:
def print_a_message(message):
    print(message)

In [2]:
print_a_message("This function prints the message we put in the 'message' argument.")

This function prints the message we put in the 'message' argument.


In the example above, we defined a function (using <b>def</b>) called "print_a_message." This function takes an argument called "message." Notice that you can use the argument you defined as a variable inside of the function. Likewise, you can pass a variable to a function as an argument well:

In [3]:
a_message = "This function prints the message we put in the 'message' argument."
print_a_message(a_message)

This function prints the message we put in the 'message' argument.


<i>*Notice that we are still able to call the function as long as this notebook is open. We can call it from anywhere in the notebook. However, if you close the notebook, you will need to re-run the cell that contains the funciton to be able to use it again.</i>

Functions can take multiple arguments as well. In Lesson 3 we searched through two lists, one of previous client names and another project names. We then printed out if the client name we specified matched the project name. We can put this in a function and have it take 3 arguments: the list with the previous client names, the list with the previous project names, and the name of the client we want to check for. 

In [4]:
#define our function
def check_client_project(previous_clients, previous_projects, client):
    
    if client in previous_clients and client in previous_projects:
        print(f"{client} had the same client name and project name.")
    else:
        print(f"{client} does not have the same client name and project name.")
        
#define our lists
previous_clients = ['Concrete Jungle', 'HumAnS Lab', 'The Bakery', 'The Carter Center', 'GreenLight Fund']
previous_projects = ['Concrete Jungle', 'Baby Kicks', 'The Bakery', 'The Carter Center', 'GreenLight Fund']

#check for Concrete Jungle
check_client_project(previous_clients, previous_projects, 'Concrete Jungle')

#check for HumAnS Lab
check_client_project(previous_clients, previous_projects, 'HumAnS Lab')

Concrete Jungle had the same client name and project name.
HumAnS Lab does not have the same client name and project name.


In the example above, notice that the code inside of the function is exactly the same as the code that we looked at in Unit 3. We have simply put it inside of a function. Of course putting your code inside of a function is not necessary. However, it does make life easier. Say we want to check if the client and project share a name for every name in the list. We could write out the function over and over for each value in the list. However, because we have stored the code in a function, we can call the function inside of a for loop:

In [5]:
for client in previous_clients:
    check_client_project(previous_clients, previous_projects, client)

Concrete Jungle had the same client name and project name.
HumAnS Lab does not have the same client name and project name.
The Bakery had the same client name and project name.
The Carter Center had the same client name and project name.
GreenLight Fund had the same client name and project name.


In the example above, by calling the "check_client_project" function inside of a for loop, we shortened out code from 5 lines to 2 lines. Now consider you had a list of 100 clients. This approach would shorten that code from 100 lines to 2 lines!

It is also important to point out that in the "check_client_project" function that we defined above, the arguments that we took in our function happened to be the same name as the variables we gave the function. Python will interpret the arguments in order. Let's explore this a bit more just to make sure we understand how functions work.

In the cell below, we defined a funciton that simply prints what the first, second, and third arguments are. This is to track the different arguments as we play with their order a little bit.

In [6]:
def demo_function(first_argument, second_argument, third_argument):
    print(f"{first_argument} is the first argument")
    print(f"{second_argument} is the second argument")
    print(f"{third_argument} is the third argument")

In [7]:
previous_clients = ['Concrete Jungle', 'HumAnS Lab', 'The Bakery', 'The Carter Center', 'GreenLight Fund']
demo_function(previous_clients[0], previous_clients[1], previous_clients[2])

Concrete Jungle is the first argument
HumAnS Lab is the second argument
The Bakery is the third argument


In the example above, we gave the first three items in the "previous_clients" list as the three arguments in our "demo_function." This showed that the function prints the arguments in the order that they were given. For example, if we change the order of the arguments we give the function, we will get a differently ordered output. 

In [8]:
previous_clients = ['Concrete Jungle', 'HumAnS Lab', 'The Bakery', 'The Carter Center', 'GreenLight Fund']
demo_function(previous_clients[2], previous_clients[1], previous_clients[0])

The Bakery is the first argument
HumAnS Lab is the second argument
Concrete Jungle is the third argument


Since we gave the arguments in reverse order, the function printed them in the reversed order. Another way to give arguments to a function without the order making a difference is to specify which argument is equal to what input:

In [9]:
previous_clients = ['Concrete Jungle', 'HumAnS Lab', 'The Bakery', 'The Carter Center', 'GreenLight Fund']
demo_function(third_argument = previous_clients[2], second_argument = previous_clients[1], first_argument = previous_clients[0])

Concrete Jungle is the first argument
HumAnS Lab is the second argument
The Bakery is the third argument


In the example above, we told the function which argument was receiving which value by using the an "=" sign. Therefore, it did not matter that the arugments were in reverse order. The function still printed them in the original order.

### Returning Values From Functions

Let's recall the <i>len()</i> function. We use this function to get the length of some string or list. This length is returned as an integer. A <b>return</b> value is what a function spits out. In the functions above, all we did was print some messages. However, say we want our function to give us some value or item back that we can save as a variable. To illustrate how to do this, let's recreate the <i>len()</i> function.

In [10]:
def get_length(list_or_string):
    
    count = 0
    
    for item in list_or_string:
        count += 1
        
    return count

In [11]:
previous_clients = ['Concrete Jungle', 'HumAnS Lab', 'The Bakery', 'The Carter Center', 'GreenLight Fund']

#get length of list
previous_clients_length = get_length(previous_clients)
print(previous_clients_length)

#get length of string
previous_clients_first_element_length = get_length(previous_clients[0])
print(previous_clients_first_element_length)

5
15


In the example above, we defined a function called "get_length" that iterates through some list or string and adds 1 to a counter for every value. At the end of this function, we returned the count. Returning the count allows us to store the output from our function to some variable, which we can then use later. Notice in the example we saved the lenght of the list to the variable "previous_clients_length" and the lenght of the fist entry of this list to "previous_clients_first_element_length." All we do is print each of these values, but we could use them for other things as well.

Although we only returned an integer in the example above, you can return any data structure. Return only allows you to return one thing. However, we can return a list that contains multiple pieces of data, and then select which data we want from the returned object. 

Let's return the the GreenLight_Fund.html page for this example. We will write a function to read the page into a list, and then return the client name and the website.

In [12]:
def get_client_and_website(html_page):
    
    client = ''
    website = ''
    
    with open(html_page, 'r') as infile:
        lines = infile.readlines()
        
        for line in lines:
            line = line.strip()
            
            #get client name
            if line[0:17] == '<p><u>Client</u>:':
                client = line.lstrip('<p><u>Client</u>: ').rstrip('</p>')
    
            #get email
            if line[0:18] == '<p><u>Website</u>:':
                website = line.split('>')[4][:-3]
    
    #save the output into a list and return said list
    output = [client, website]
    
    return output

In [13]:
client_info = get_client_and_website('GreenLight_Fund.html')

In [14]:
client_name = client_info[0]
client_website = client_info[1]

print(client_name)
print(client_website)

The GreenLight Fund - Atlanta
https://greenlightfund.org/sites/atlanta/


In the example above, we read the GreenLight_Fund.html file into a list, iterated over said list, and extracted the client name and the website. Since we had two outputs we wanted, and we can only return one, we saved the output into a list and returned the list. Now, "client_info" is a list that contains the information that we just returned. Then we saved the first item in the list into the "client_name" variable, and the second item to the "client_website" variable.

### An Introduction to Modules

One of the primary reasons Python is one of the widest used programming languages is because it has an extensive collection of <b>modules</b>. There is a lot of unpack with modules, but the easiest way to think of them is as packages of other people's functions that you can use in your own code. For example, in the previous section, we looked at a function called "get_client_and_website" that scraped the "GreenLight_Fund.html" page for client names and websites. We could package this function up as a module that other people can use in their code as well. We won't be going over how to create modules, but you can think of them as collections of other useful functions that don't come with Python by default.

That being said, Python does have several modules that do come with it. For example, there is a module called <b>re</b> that stands for <b>regular expression</b> that allows us to pattern match in a string. It's not important that you understand re or regular epxressions. This unit is just to introduce how to use modules by using one that comes with Python. 

To tell your code that you are using a module, you use <b>import</b>.

In [15]:
import re

Now that we have imported re, we can use it's associated functions:

In [16]:
website = 'https://greenlightfund.org/sites/atlanta/'

if bool(re.match('^https', website)):
    print(f"{website} is a website.")

https://greenlightfund.org/sites/atlanta/ is a website.


In the example above, we introduced a few new things. These aren't really a focal point of this course, but we use them as a example to expose you to some things that might be useful as you move forward. First, <b>bool()</b> is a function that evaluates something and returns True or False. 
    
Inside of the bool() function, we use <b>re.match()</b> function. Here, the <b>match()</b> function is part of the <b>re</b> module. When you import a module and want to use one of it's functions, you use the module name, a period, and then the function name. The general syntax for using a function in a module is:

    module_name.funtion_name(arguments)

We could do a whole lesson on regular expressions. Since this isn't a main part of the course, we will gloss over the details. However, regular expressions are very useful, and we encourage you to look at them on your own. Basically, the match() function is checking to see if the string (the website variable) starts with "https". Since we put this inside of the bool() function, the overall statement is True, wen we print the message. 

Also note that you can import specific functions from a module. For exmple, we can import just the match function like:

In [17]:
from re import match

Now when we use match, we can just use "match()" instead or "re.match()":

In [18]:
website = 'https://greenlightfund.org/sites/atlanta/'

if bool(match('^https', website)):
    print(f"{website} is a website.")

https://greenlightfund.org/sites/atlanta/ is a website.


This method of importing is not always recomended because there will be cases when you are importing different modules that have functions that are named the same thing. In this case, just use the first method for importing. 

You can also redefine what your imported module is called. We will look at this using re, but note that this is most useful when functions have long names that you don't want to type out. 

In [19]:
import re as regex

In [20]:
website = 'https://greenlightfund.org/sites/atlanta/'

if bool(regex.match('^https', website)):
    print(f"{website} is a website.")

https://greenlightfund.org/sites/atlanta/ is a website.


Again, this is useful when you have modules with long names that you want to condense into a few letters, or to make code more human readable.

### Practice Problems

Write a function that will take two inputs: 1) an HTML file, and 2) a key word that specifies what you want to get from that HTML file. The HTML input file will be the <i>Recent_Projects_Simple.html</i>. The function must be able to take 6 key words that correspond with the 6 different fields provided for each project. These words are:

    1) "Project" - contained by \<h2>\</h2> and \<b>\</b> HTML tags
    2) "Client" - contained by \<p>\</p> and \<u>\</u> HTML tags
    3) "Website" - contained by \<p>\</p> and \<u>\</u> HTML tags
    4) "Main_Location" - contained by \<p>\</p> and \<u>\</u> HTML tags
    5) "Tools_Used" - contained by \<p>\</p> and \<u>\</u> HTML tags
    6) "Description" - contained by \<p>\</p> and \<u>\</u> HTML tags

For example, say the function is called "get_field," if we run this function as:

    websites = get_field('Recent_Projects_Simple.html', 'Website')

the functions should return a list of every website in the HTML file. The same is true for the other fields. If the key word was "Client", the function should return a list of all of the clients in the HTML file.

Once you have written this function test it on each of the 6 key words.

In [21]:
#write your solution below



