# Functions

*Could all the numbers be brightnesses of pixels in an image?  Cynthia tried her hypothesis...*

*...and blinked in surprise at her screen.  The brightened image was a map, clearly showing Kinshasa, in the Democratic Republic of Congo.  At the bottom of the image was the message, "If you want to help, come."*

*Well, that was exciting.  Someone in Kinshasa was broadcasting a message to come, to anyone clever enough to decipher it.  Not just anyone - a **data scientist** clever enough.  How could she pass that up?*

*But - alone?  Maybe she could get her friend Kathleen to come as well.  "Hi, I'm going on a secret mission to the DRC" may not be the most convincing opener, though.  She should send her code, she decided - the code she used to decipher the image.*

*Was that code really in a state to be used by somebody else, though?  She decided she'd clean the code up a little before sending it to Kathleen - by breaking it into functions.*

## What is a function?

A function is some code that takes some inputs (arguments) and calculates a value (the return value).  It's essentially a reusable tool in your code.  Sometimes you'll use the tools of others, by using functions from other modules; and sometimes, you'll make your own tools.

Functions organize code, make it more testable, and can reduce the overall amount of code that needs to be written.

When you write a function, you state the arguments (input) you expect, then write the body of the function that calculates the value, then you write the return value.  Here's a function that appends an s to its argument.

In [None]:
def add_an_s(my_input): # header
    # Returns the input with an 's' added to the end
    new_string = my_input + 's'
    return new_string

print(add_an_s('example') + '!')


* def add_an_s(my_input): indicates we're defining the add_an_s function, and that it should take a single argument my_input.  (This line is called the function "header.")

* It's typical to add a comment after the function header that describes what the function does.

* The lines that follow are indented and do some computation with the arguments.

* The instruction "return [value]" defines what the function will evaluate to when it is called.

## Why functions?

Without functions, if you want to do the same thing a little bit differently elsewhere in your code, you have to write it all over again.  This makes the code longer and more bug-prone, because you might make a mistake in copying or changing the code.  Functions let you reuse code in a more maintainable way.


Let's do another example with more arguments.

In [None]:
def count_matches(to_match, my_list):
  # Counts how many times to_match appears in my_list
  count = 0
  for m in my_list:
    if to_match == m:
      count += 1
  return count

print(count_matches(5, [5, 6, 7, 5]))
print(count_matches("foo", ["foo","bar","baz"]))

* This function has two arguments because it needs to know two things:  what to match, and the list to look in.

* Notice how there's a variable "count" that is intended from the beginning to be the return value.

* Notice how functions can still contain indentation that reflects their structure, despite everything being indented for the function.

If a sequence of steps appears repeatedly in the code, that means those steps are a prime candidate for a function - although sometimes functions may exist just to organize the code.

In [None]:
def percent_gain(start, finish):
    return (finish-start)/start * 100

# Dow Industrial Average gains from Jan 3 2022 to Jan 3 2023
print(percent_gain(36585.06, 33147.25))
# S&P 500 Jan 3 2022 to Jan 3 2023
print(percent_gain(4796.56, 3839.50))
# Nasdaq Jan 3 2022 to Jan 3 2023
print(percent_gain(15832.80, 10466.48))

# Exercise (5 min)

Try writing a function as_percent() that takes a float (like 0.1) as an argument and returns a string that is the number as a percentage (like "10.0%").

In [None]:
# TODO

as_percent(0.1) # Expect "10.0%"

# Variations

## No arguments, no return value

Here's an example that takes no arguments at all.  (This is unusual.)  This example also evaluates to None, since it has no return statement.

In [None]:
from datetime import date

def greet_user():
  print("Hello, user!")
  print("Today's date is " + str(date.today()))

greet_user()
#print(greet_user()) # print to see it evaluates to None

A simple statement of "return" returns from the function with a value of None, but it's optional since the function will return with a "None" value without it as well.

In [None]:
def greet_user():
  print("Hello, user!")
  print("Today's date is " + str(date.today()))
  return

print(greet_user())
#greet_user()

## Multiple return values

It's possible for a function to have multiple return values.  The return statement should separate the different return values with commas, and where the function is called, comma-separated variables can have these multiple values assigned to them.  (The program thinks of the return value as a single tuple.)

In [None]:
def longest_customer_name(list_of_names):
    # Find the longest customer name, and how long it is
    # (maybe so we can display the names nicely later)
    longest_len = 0
    longest_name = ""
    for n in list_of_names:
        if len(n) > longest_len:
            longest_len = len(n)
            longest_name = n
    return longest_name, longest_len

name, length = longest_customer_name(['Alice', 'Bob', 'Cassia'])
print(name)
print(length)

*On the plane to the Democratic Republic of Congo, Cynthia repeatedly checked her phone for messages from Kathleen.  Nothing, nothing, nothing.*

*Until, finally - a message!  From Kathleen!  Cynthia opened it and ... "What is this crazy nonsense you sent me???" Kathleen wrote.  "What am I even looking at???"*

*Maybe I could have done more to organize my code, Cynthia thought ruefully.  Having my hard work dismissed as nonsense ... kind of sucks.*

## Multiple return statements

There could be multiple points in the function with return statements, although it's considered stylistically preferable for there to be just one return statement if possible.  As soon as the return statement is reached and evaluated, the function quits, and any lines further down aren't evaluated.

In [None]:
def count_items(lst):
    # Count items but return None if the list is empty
    if (len(lst) == 0):
        print('Warning: empty list passed to count_items!')
        return None
    print("We don't get here with an empty list")
    return len(lst)

count_items([])

One reason to return early could be that the function found something it was looking for - and there's no need to look any further.  The final return statement could be the behavior for the case where nothing is found.

Note that a % b gives the remainder when a is divided by b (when both are non-negative); thus, 8 % 2 == 0 and 14 % 5 == 4.

In [None]:
def is_prime(n):
    for i in range(2, n): # Look for a divisor
        if n % i == 0:    # i divides n evenly, no remainder
            return False
    return True           # didn't find a divisor

print(is_prime(11))
print(is_prime(4))

## Functions calling functions

You can define functions that call other functions that you've written.  In a big project, there could be several levels of hierarchy to your code, with function A calling function B calling function C.

In [None]:
# Repeat these functions because it's day 2
# and we haven't run their boxes for a while
def longest_customer_name(list_of_names):
    # Find the longest customer name, and how long it is
    # (maybe so we can display the names nicely later)
    longest_len = 0
    longest_name = ""
    for n in list_of_names:
        if len(n) > longest_len:
            longest_len = len(n)
            longest_name = n
    return longest_name, longest_len

def count_matches(to_match, my_list):
  # Counts how many times to_match appears in my_list
  count = 0
  for m in my_list:
    if to_match == m:
      count += 1
  return count

def count_longest_name(list_of_names):
    # Count how many times the longest name appears in the list
    # Makes use of functions defined above
    word, length = longest_customer_name(list_of_names)
    return count_matches(word,list_of_names)

count_longest_name(['Alice','Bob','Catherine','Catherine'])


# All-together exercise

Suppose we want to write a function all_names_short_enough() that takes a arguments as list of strings and an integer character limit, and returns True only if all names have at most that many characters.  Thus all_names_short_enough(['Alice', 'Bob'], 3) would return False, but passing 5 as the second argument would make it True.

How would we do this...

* Iterating through the list, and maybe returning early?
* Using a call to longest_customer_name()?

In [None]:
# solutions in full slides, so we don't spoil the answers to the above


In [None]:
# solutions in full slides

# Scope and local variables

All the variables created in a function, including the arguments, are no longer accessible once the function returns.  All that memory gets cleaned up and made available again.  This helps reduce bugs, because if a function "makes a mess" by creating many different variables as it executes, there's no way the code outside the function can accidentally look at a value that was intended just for the function.


In [None]:
def add5(arg):
    b = arg + 5
    return b

add5(7) # Return 12
#b  # Program says it doesn't know what this is
#arg  # Similarly no recollection

This is also an example of "encapsulation," the principle that the user of a function shouldn't need to know how it was implemented. You assume that as long as you know the inputs, outputs, and that it works, you don't need to know exactly how it works.  If you needed to know the names of a lot of variables that get modified as the function works, that wouldn't be encapsulated.


While the program "forgets" variables when it leaves functions, it's aware of variables outside the function while the function is executing.  Variables declared outside all functions are called "global variables," and they can be accessed from inside functions.  But it's better style to pass in the needed values as arguments, rather than using global variables.

In [None]:
def pattern_a(price, tax):
  return price * (1 + 0.01 * tax)  # Everything we need is in the arguments - good

tax = 20 # Global variable - this is worse style
def pattern_b(price):
  return price * (1 + 0.01 * tax) # Works, but less flexible, hard to debug & test

print(pattern_a(100,20))
print(pattern_b(100))

# Shadowing

Even though your functions can access global variables from within, as soon as you try assigning to a variable inside a function, a new local variable will be created *even though* there's a perfectly good variable of that name outside the function.  This potentially confusing behavior is called "shadowing."

In [None]:
def add_two(my_number):
  a = my_number + 2 # Shadows outer "a", now we have two a's and see this one
  print("a is " + str(a) + " inside add_two")
  return a

a = 5
print("add_two(2) is " + str(add_two(2)))
print("a is " + str(a) + " outside add_two")

Shadowing often happens with arguments, because the name of the argument is what we wanted to call the variable at the top level, too.  But the local variable in the function and the one at the top level are two different variables.

In [None]:
my_list = ['a','b','c']

def concatenate_all(my_list):
    out = ''
    for item in my_list:
        out += item
    return out

print(concatenate_all(['d','e'])) # ['d','e'] is called my_list in the function
print(concatenate_all(my_list))  # my_list is still a,b,c

# Refactoring

It might not be obvious at first what parts of code need to be broken up into functions.  You may well end up writing a piece of code only to look back and say, "Hmm, that could have been more concise with functions."  If you did copy-paste any code in the course of writing it, that might be a good signal that the code could use some reorganization.

"Refactoring" is simply the act of trying to clean up the breakdown of the code into functions - probably by turning non-function code into functions, but perhaps also cleaning up which functions do what.

Here is some code that could use a cleanup:

In [None]:
names = ["Catherine", "Donovan", "alice", "BOB"]
standardized_names = []
for name in names:
    name = name.capitalize() # Capitalize first letter, lc others
    standardized_names.append(name) 
    standardized_names.sort()    
jobs = ['Pilot', 'teacheR', 'firefighter', 'LIBRARIAN']
standardized_jobs = []
for job in jobs:
    job = job.capitalize()
    standardized_jobs.append(job)
    standardized_jobs.sort()
print(standardized_names)
print(standardized_jobs)


This is more readable code that makes it clear we're doing the same thing to both lists:

In [None]:
names = ["Catherine", "Donovan", "alice", "BOB"]
jobs = ['Pilot', 'teacheR', 'firefighter', 'LIBRARIAN']

def standardize_strings(string_list):
    out = []
    for s in string_list:
        s = s.capitalize()
        out.append(s)
    out.sort()
    return out

standard_names = standardize_strings(names)
standard_jobs = standardize_strings(jobs)
print(standard_names)
print(standard_jobs)

This is a fairly small cleanup job, but it shows how the code is now a little more concise, a little more readable, a little more debuggable, and maybe usable elsewhere.

# Comment conventions

There is a standard way to comment Python functions.  It's probably overkill for very simple functions, but it's a good habit to get into, and whoever needs to work with your code next, whether a grader, a coworker, or your boss, will be happy that everything is well-commented.



Function comments often use three double-quotes surrounding a multiline string in the place of # symbols.  This is a standard Python way of writing multiline comments.

In [None]:
def get_first_letter(word):
  """ Returns the first letter of a string.

  word (str):  The string to get the letter from.

  A simple function just for demo purposes.  Probably
  not useful since get_first_letter takes more characters
  to type than string[0].
  """

  return word[0]

The first line of the multiline comment should be a quick description of what the function does.  After some space, there's then a description of every argument and its expected type.  Below that is anything else a programmer ought to know.

# Tests

It's a good software engineering principle to write tests for every function that you write.  This can use dedicated testing tools for your language, or it can just consist of writing function calls and comparing the results to what they should be.

In [None]:
# Remember to run the corresponding cells to define these functions
print(get_first_letter("Shibboleth") == "S")
print(pattern_a(100,20) == 120)
print(pattern_a(0, 20) == 0)
print(count_matches("A",[]) == 0)
print(count_matches("A", ["A","A","A"]) == 3)

In tests, you want to "kick the tires" of your function and determine not only whether it works under ideal conditions, but also in the toughest of corner cases.  For count_matches, I could test whether the function does the right thing for empty lists, or for longer lists where all the items match.

I suggest that for any major piece of code, you have a cell with test calls that make it easy to tell whether the code is doing the right thing.