# Functions

## What is a function?

A function is some code that takes some inputs (arguments) and calculates a value (the return value).  The arguments and return value could be of any types - integers, strings, boolean values, or more complex data types are all possible.  If the function does something else besides calculate its return value, like print something to the screen, this is considered a "side effect" of the function.


Many functions have already been written for you in Python, and you just need to import the right library to use them.  But if you write code of any significant length, you will eventually want to define your own functions.

When you write a function, you state the arguments (input) you expect, then write the body of the function that calculates the value, then you write the return value.  Here's a function that appends an s to its argument.

In [None]:
def add_an_s(string):
    new_string = string + 's'
    return new_string

add_an_s('example')

## Why functions?


Without writing functions, code tends to get long, repetitive, and error-prone.  It gets long because every time a tiny detail of what you want to do is changed, you'd need to copy-paste the code with the one small detail changed everywhere it appears.  It gets error-prone because you may miss a spot where the code needed to change, and now you have a bug.



So one benefit of writing functions is to have code that is reusable for new situations, with no manual copy-pasting (and the bugginess that comes with it).


Another benefit of functions is that they make your code much more readable.  And this, in turn, helps you catch bugs.  A checkers-playing program may read at the top level:

```
while not somebody_won():
  if (player_turn == HUMAN):
    move = get_move()
    play(HUMAN, move)
  else:
    move = ai_find_best_move()
    play(AI, move)
print("Winner is " + str(get_winner()))
```

This "main" function gives a nice clean outline of the logic of the game.  Each of the functions mentioned in this code could be tested separately, and it's easy to find the relevant code if you want to improve something.


We see function-like instructions in a lot of places outside computer science and data science.  Recipes may include sub-recipes - for the special sauce, for example.  Crochet and knitting instructions sometimes give little subroutines that should be repeated.  Cartooning instructions tell you how to draw figures in general, but face-drawing is a subroutine, and eye-drawing is a subroutine of that.  Everywhere you see instructions, you might see an attempt on the part of the instructor to make them more concise or more organized.

# Parts of a function definition in Python

Here's a simple function for us to dissect.

In [None]:
def add_two(my_number):
  # Adds two to the argument.
  return my_number + 2

add_two(3)

"def" followed by functionname(argument) and a colon indicates that we're defining a function.  By putting my_number between the parentheses, we've indicated that it is an argument.  While we're in the function, my_number will have whatever value was passed to the function.

In [None]:
# Repeated for the presentation
def add_two(my_number):
  # Adds two to the argument.
  return my_number + 2

add_two(3)

It's typical to put in a comment right after the first line, explaining what the function is supposed to do in just one line.

The last line of our function has a return statement.  The expression after "return" is what the function as a whole will evaluate to.  In this case, it'll evaluate to whatever was passed in, plus 2.

The final line in the code box isn't a part of the function at all; it's just to try out the function with a sample call.  We can tell it's not part of the function because it's not indented.

Let's do another example with more arguments.

In [None]:
def count_matches(to_match, my_list):
  # Counts how many times to_match appears in my_list
  count = 0
  for m in my_list:
    if to_match == m:
      count += 1
  return count

print(count_matches(5, [5, 6, 7, 5]))
print(count_matches("foo", ["foo","bar","baz"]))

This function has two arguments because it needs to know two things:  what to match, and the list to look in.  The first example call binds 5 to to_match and a list of ints to my_list, while the second example looks for "foo" in its list of strings.

In [None]:
# Repeated for presentation
def count_matches(to_match, my_list):
  # Counts how many times to_match appears in my_list
  count = 0
  for m in my_list:
    if to_match == m:
      count += 1
  return count

Notice how the "count" variable gets manipulated from beginning to end, so that when the return statement is hit, this variable has the right value to return.  It's very common to not do any significant computation in the return statement's expression, but instead return a variable holding the result.


In [None]:
# Repeated for presentation
def count_matches(to_match, my_list):
  # Counts how many times to_match appears in my_list
  count = 0
  for m in my_list:
    if to_match == m:
      count += 1
  return count

The indentation level changed over the course of the function because of the iteration and the conditional.  Python has no problem indenting an arbitrary number of times.  Each indent is a new logical block that also belongs to the blocks containing it.  The function is over only when the code is no longer indented at all.

Sometimes a function is a reusable tool, and other times, it is just called once, but making that part of the program a function helps to organize the code.  Filtering inputs so that they meet certain criteria may happen only once in the program, but the filtering forms a logical step in the program that could be compartmentalized into a function.  Separating it out into a function also makes the code easier to test in isolation from the rest of the program.

In [None]:
def filter_movies(movies, min_star_rating):
    # Assume movies is a list of (movie, rating) tuples
    new_movies = []
    for movie, rating in movies:
        if rating >= min_star_rating:
            new_movies.append((movie, rating))
    return new_movies

movies = [('Casablanca', 5), ('The Avengers', 4), ('Labyrinth', 4.5), ('Minions', 3)]
filter_movies(movies, 4.5)

If a sequence of steps appears repeatedly in the code, though, that means those steps are a prime candidate for a function.  If a function is small, it may not seem necessary, but it could just serve to make some code more readable.

In [None]:
def get_rating(movie_tuple):
    # More readable way to access a movie rating
    return movie_tuple[1] # Now we don't need to remember what [1] refers to

get_rating(('Portrait of a Lady on Fire', 5))

# Exercise

Try writing a function double() that simply multiplies its argument by 2, and returns that value.

In [None]:
# TODO

# Variations

## No arguments, no return value

Here's an example that takes no arguments at all.  (This is unusual.)

In [None]:
from datetime import date

def greet_user():
  print("Hello, user!")
  print("Today's date is " + str(date.today()))

greet_user()

Having no arguments is unusual because it means whatever causes the behavior to vary is coming in through some other channel besides the arguments.  Here, the program can just query the operating system for the time with date.today(), and we don't need to tell it anything.


Note also that this function doesn't return anything, either - it just stops when it runs out of code.  If you try to evaluate the function, you'll get a result of "None."  This is fine if you're calling the function primarily for its "side effects," such as printing something.

In [None]:
a = greet_user()
print(a)

A simple statement of "return" returns from the function with a value of None, but it's optional since the function will return with a "None" value without it as well.

In [None]:
def greet_user():
  print("Hello, user!")
  print("Today's date is " + str(date.today()))
  return

greet_user()

Note that these features of the function - no arguments, no return value - are independent of each other.  You can have a function with an argument that returns nothing, or a function with no arguments and a return value.

In [None]:
def greet_username(name):
    # Argument but no return value example.  Notice print is not the same as a return value.
    print('Hello, ' + name + '!')
    
def get_date_string():
    # No argument but return value example.
    return str(date.today())

greet_username('class')
print('It is ' + get_date_string())

## Multiple return values

It's possible for a function to have multiple return values.  The return statement should separate the different return values with commas, and where the function is called, comma-separated variables can have these multiple values assigned to them.  (The program thinks of the return value as a single tuple.)

In [None]:
def longest_string(list_of_strings):
    # Returns the longest string and its number of characters.
    longest_len = 0
    longest_word = ""
    for s in list_of_strings:
        if len(s) > longest_len:
            longest_len = len(s)
            longest_word = s
    return longest_word, longest_len

word, length = longest_string(['apple', 'pear', 'banana'])
print(word)
print(length)

Our example of finding the best movie, with the best rating, from the iteration lecture is also a function that would benefit from multiple return values, since we may want to return both the movie and its rating.

In [None]:
movies = [('Casablanca', 5), ('The Avengers', 4), ('Labyrinth', 4.5), ('Minions', 3)]

def best_movie(list_of_movie_ratings):
    # Assume list_of_movie_ratings is a list of (movie, rating) tuples
    best_movie = ""
    best_rating = 0
    for movie, rating in list_of_movie_ratings:
        if rating > best_rating:
            best_movie = movie
            best_rating = rating
    return best_movie, best_rating

best_movie(movies)

# Exercise

Recall that ** raises a number to a power.  Write a function powers() that takes a number and returns three values:  that number squared, cubed, and raised to the fourth power.

In [None]:
# TODO

# Day 2

## Multiple return statements

There could be multiple points in the function with return statements, although it's considered stylistically preferable for there to be just one return statement if possible.  As soon as the return statement is reached and evaluated, the function quits, and any lines further down aren't evaluated.

In [None]:
def count_items(lst):
    # Count items but warn if the list is empty
    if (len(lst) == 0):
        print('Warning: empty list passed to count_items!')
        return 0
    print("We don't get here with an empty list")
    return len(lst)

count_items([])

In this case and many other cases, the extra return statement doesn't really need to be there.

In [None]:
def count_items(lst):
    # Count items but warn if the list is empty
    if (len(lst) == 0):
        print('Warning: empty list passed to count_items!')
    return len(lst)

One reason to return early could be that the function found something it was looking for - and there's no need to look any further.  The final return statement could be the behavior for the case where nothing is found.

In [None]:
def is_prime(n):
    for i in range(2, n): # Look for a divisor
        if n % i == 0:    # i divides n evenly, no remainder
            return False
    return True           # didn't find a divisor

print(is_prime(11))
print(is_prime(4))

## Functions calling functions

You can define functions that call other functions that you've written.  In a big project, there could be several levels of hierarchy to your code, with function A calling function B calling function C.

In [None]:
def count_longest_string(list_of_strings):
    # Count how many times the longest string appears in the list
    # Return this count
    word, length = longest_string(list_of_strings)  # We defined this function above
    return count_matches(word,list_of_strings) # We defined this function above, too

count_longest_string(['apple','banana','pear','banana'])


# Exercise (8 min)

Write a function that returns the first number in a list that is 100 or greater.  If no such number is found, return -1.

In [None]:
# TODO

# Scope and local variables

All the variables created in a function, including the arguments, are no longer accessible once the function returns.  All that memory gets cleaned up and made available again.  This helps reduce bugs, because if a function "makes a mess" by creating many different variables as it executes, there's no way the code outside the function can accidentally look at a value that was intended just for the function.


In [None]:
def add5(arg):
    b = arg + 5
    return b

add5(7) # Return 12
b  # Program says it doesn't know what this is
arg  # Similarly no recollection

This is also an example of "encapsulation," the principle that the user of a function shouldn't need to know how it was implemented. You assume that as long as you know the inputs, outputs, and that it works, you don't need to know exactly how it works.  If you needed to know the names of a lot of variables that get modified as the function works, that wouldn't be encapsulated.


While the program "forgets" variables when it leaves functions, it's aware of variables outside the function while the function is executing.  Variables declared outside all functions are called "global variables," and they can be accessed from inside functions.  But it's better style to pass in the needed values as arguments, rather than using global variables.

Newcomers to functions often don't understand why pattern A in the next code box is better than pattern B.  They both theoretically get the job done, but the second one uses a *global variable* that breaks encapsulation.  In big code bases, hunting for where a global variable was defined, and trying to determine what might change it, is not sustainable.

In [None]:
def pattern_a(x, y, z):
  # Computes x * y - z
  return x * y - z

z = 4 # Global variable
def pattern_b(x, y):
  return x * y - z # Uses global variable

print(pattern_a(2,3,4))
print(pattern_b(2,3))

The first pattern, A, is vastly preferred.  We want to be able to figure out what a function will do from its arguments alone, and not worry about variables somewhere in the codebase having far-reaching effects.  Global variables, those accessible from all functions, are generally seen as a last resort.

# Shadowing

A variable declared in a function can "shadow" a variable that lives outside the function with the same name, sharing its name and preventing access to the outside variable until the function is done.  Again, this means users of your function don't need to worry about what you named your variables.  Below, the local copy of *a* shadows the value of *a* that lives outside the function.

In [None]:
def add_two(my_number):
  # Adds two to the argument.
  a = my_number + 2
  print("a is " + str(a))
  return a

a = 5
print(add_two(2)) # local "a" set to 4
print(a)

Shadowing often happens with arguments, because the name of the argument is what we wanted to call the variable at the top level, too.  But the local variable in the function and the one at the top level are two different variables.

In [None]:
my_list = ['a','b','c']

def concatenate_all(my_list):
    out = ''
    for item in my_list:
        out += item
    return out

print(concatenate_all(['d','e'])) # ['d','e'] is called my_list in the function
print(concatenate_all(my_list))  # my_list is still a,b,c

# Refactoring

It might not be obvious at first what parts of code need to be broken up into functions.  You may well end up writing a piece of code only to look back and say, "Hmm, that could have been more concise with functions."  If you did copy-paste any code in the course of writing it, that might be a good signal that the code could use some reorganization.

"Refactoring" is simply the act of trying to clean up the breakdown of the code into functions - probably by turning non-function code into functions, but perhaps also cleaning up which functions do what.

Here is some code that could use a cleanup:

In [None]:
list1 = ["A","B","C","D"]
list2 = ["W","X","Y","Z"]
for item in list1:
  for item2 in list2:
      print(item+item2)
list3 = ["1","2","3"]
for item in list1:
  for item2 in list3:
    print(item+item2)


The familiarity of the second set of nested loops should clue us in to the repetitiveness.  The second batch of code differs only in the list and its name.

So instead, we can turn this into a function.  We can get new, more readable code like this:

In [None]:
def create_letter_combos(first_list, second_list):
  for m in first_list:
    for s in second_list:
      print(m + s)

create_letter_combos(["A","B","C","D"],["W","X","Y","Z"])
create_letter_combos(["A","B","C","D"],["1","2","3"])

This is a fairly small cleanup job, but it shows how the code is now a little more concise and a little more readable.

# Exercise

Refactor the following code so that it calls a function (that you will write, based on the provided code), evens(), that returns just the even numbers from a list.

In [None]:
important_numbers1 = [1, 2, 3, 4, 5]
evens1 = []
for n in important_numbers1:
    if n % 2 == 0:  # Remainder when dividing by 2 is 0
        evens1.append(n)
print(evens1)
important_numbers2 = [6, 7, 8, 9, 10]
evens2 = []
for n in important_numbers2:
    if n % 2 == 0:
        evens2.append(n)
print(evens2)

In [None]:
# TODO

# Pseudocode and functions

Pseudocode is code that is written in a style closer to English than any particular programming language.  It's meant for human readers instead of being parsed by machine.  Sketching out a program in pseudocode ahead of time can help identify what would make a good function in the program.



Here is some pseudocode for the top level of a checkers-playing program:
```
while there is no winner:
  if it's the human's turn:
    get the human's move from the keyboard
    play the human's move on the board
  else:
    search for the best-looking move for the computer
    play the best-looking move
print the winner
```

The pseudocode's high-level view of the program gives us an idea of what functions we might reasonably expect to code.  We can further sketch pseudocode for the functions themselves, until we feel like we've gone deep enough that there's no further use to pseudocode.


```
# pseudocode for looking for the best-looking move
moves = find all checkers that can move
for each move in moves:
  try the move
  find the difference between the red and black piece counts
  remember the difference if it's best so far
return the move with the best board value

```



Here, "find all checkers that can move" sounds like another good candidate for a function.  So does "try the move" and "find the difference between the red and black piece counts."  The process of writing pseudocode can reveal a structure to your program that matches a good functional decomposition of the code.

# Comment conventions

There is a standard way to comment Python functions.  It's probably overkill for very simple functions, but it's a good habit to get into, and whoever needs to work with your code next, whether a grader, a coworker, or your boss, will be happy that everything is well-commented.  The style for "docstrings," the strings that provide these comments, is uniform enough that some coding tools automatically can pull them up, allowing smarter tooltips and that kind of thing.



These comments use three double-quotes surrounding a multiline string:

In [None]:
def get_first_letter(word):
  """ Returns the first letter of a string.

  word (str):  The string to get the letter from.

  A simple function just for demo purposes.  Probably
  not useful since get_first_letter takes more characters
  to type than string[0].
  """

  return word[0]

The first line of the multiline comment should be a quick description of what the function does.  After some space, there's then a description of every argument and its expected type.  Below that is anything else a programmer ought to know.

# Tests

It's a good software engineering principle to write tests for every function that you write.  This can use dedicated testing tools for your language, or it can just consist of writing function calls and comparing the results to what they should be.

In [None]:
print(get_first_letter("Shibboleth") == "S")
print(pattern_a(5,4,3) == 17)   # 5*4-3
print(count_matches("A",[]) == 0)
print(count_matches("A", ["A","A","A"]) == 3)

In tests, you want to "kick the tires" of your function and determine not only whether it works under ideal conditions, but also in the toughest of corner cases.  For count_matches, I could test whether the function does the right thing for empty lists, or for longer lists where all the items match.

Some styles of programming even define the tests first, before writing the function, as a way of defining the expected behavior.

We won't adopt any official tools for testing in this class.  But I suggest that for any major piece of code, you have a cell with test calls that make it easy to tell whether the code is doing the right thing.