<a href="https://colab.research.google.com/github/sunvince9204/Weekly_notes/blob/main/Lecture7and8Functions_nosol_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Functions

*Cynthia blinked in surprise at her screen.  The brightened image was a map, clearly showing Kinshasa, Democratic Republic of Congo.  At the bottom of the image was the message, in Times New Roman, "If you want to help, come."*

*Well, that was exciting.  Someone in Kinshasa was broadcasting a message to come, to anyone clever enough to decipher it.  Not just anyone - a **data scientist** clever enough.  How could she pass that up?*

*But - alone?  Maybe she could get her friend Kathleen to come as well.  "Hi, I'm going on a secret mission to the DRC" may not be the most convincing opener, though.  She should send her code, she decided - the code she used to decipher the image.*

*Was that code really in a state to be used by somebody else, though?  She decided she'd clean the code up a little before sending it to Kathleen - by breaking it into functions.*

## What is a function?

A function is some code that takes some inputs (arguments) and calculates a value (the return value).  It's essentially a reusable tool in your code.  Sometimes you'll use the tools of others, by using functions from other modules; and sometimes, you'll make your own tools.

Functions organize code, make it more testable, and can reduce the overall amount of code that needs to be written.

When you write a function, you state the arguments (input) you expect, then write the body of the function that calculates the value, then you write the return value.  Here's a function that appends an s to its argument.

In [None]:
def add_an_s(string):
    new_string = string + 's'
    return new_string

add_an_s('example')

## Why functions?

Without functions, if you want to do the same thing a little bit differently elsewhere in your code, you have to write it all over again.  This makes the code longer and more bug-prone, because you might make a mistake in copying or changing the code.  Functions let you reuse code in a more maintainable way.


Another benefit of functions is that they make your code much more readable.  And this, in turn, helps you catch bugs.  A program for analyzing customer records may read at the top level:


In [None]:
# Not meant to be actually run - we didn't define the functions
records = read_customer_data('input.csv')
sales = 0
purchase_counts = []
s_names = []
for record in records:
    name, purchase_list, sale_info = parse_record(record)
    s_names.append(standardize_name(name))
    sales = update_total_sales(sales, sale_info)
    update_purchase_counts(purchase_counts, purchase_list)
write_to_file(s_names, purchase_counts, sales, 'output.csv')

Each of the functions mentioned in this code could be tested separately, and it's easy to find the relevant code if you want to improve something.


We see function-like instructions in a lot of places outside computer science and data science.  Recipes may include sub-recipes - for the special sauce, for example.  Crochet and knitting instructions sometimes give little subroutines that should be repeated.  When drawing a person, you might engage in a special procedure for drawing the face.  Everywhere you see instructions, you might see subsets of instructions that are kind of like functions.

# Parts of a function definition in Python

In [None]:
def add_two(my_number):
  # Adds two to the argument.
  return my_number + 2

* def add_two(mynumber): indicates we're defining the add_two function, and that it should take a single argument my_number.  (This line is called the function "header.")

* It's typical to add a comment after the function header that describes what the function does.

* The lines that follow are indented and do some computation with the arguments.

* The instruction "return [value]" defines what the function will evaluate to when it is called.

In [None]:
add_two(2)

4

Let's do another example with more arguments.

In [None]:
def count_matches(to_match, my_list):
  # Counts how many times to_match appears in my_list
  count = 0
  for m in my_list:
    if to_match == m:
      count += 1
  return count

print(count_matches(5, [5, 6, 7, 5]))
print(count_matches("foo", ["foo","bar","baz"]))

* This function has two arguments because it needs to know two things:  what to match, and the list to look in.

* Notice how there's a variable "count" that is intended from the beginning to be the return value.

* Notice how functions can still contain indentation that reflects their structure, despite everything being indented for the function.

If a sequence of steps appears repeatedly in the code, that means those steps are a prime candidate for a function - although sometimes functions may exist just to organize the code.

In [None]:
def percent_gain(start, finish):
    return (finish-start)/start * 100
# Dow Industrial Average gains from Jan 3 2022 to Jan 3 2023
print(percent_gain(36585.06, 33147.25))
# S&P 500 Jan 3 2022 to Jan 3 2023
print(percent_gain(4796.56, 3839.50))
# Nasdaq Jan 3 2022 to Jan 3 2023
print(percent_gain(15832.80, 10466.48))

-9.39675922357377
-19.953049685608025
-33.893689050578544


If a function is small, it may not seem necessary, but it could just serve to make some code more readable.

In [None]:
def get_rating(movie_tuple):
    # More readable way to access a movie rating
    return movie_tuple[1]

get_rating(('Portrait of a Lady on Fire', 5))

5

# Exercise (7 min)

Try writing a function with_tax() that takes a base price and a percentage as input, and returns the total price in dollars.  You can use round(price,2) to ensure your price has a valid number of cents.  For example, with_tax(1, 8.6) should return 1.09.

In [None]:
# TODO: with_tax()
def calculate(base, percentage):
  final = ((base) * (percentage)/100) + base
  return(round(final,2))

calculate(1,8.6)



1.09

# Variations

## No arguments, no return value

Here's an example that takes no arguments at all.  (This is unusual.)  This example also evaluates to None, since it has no return statement.

In [None]:
from datetime import date

def greet_user():
  print("Hello, user!")
  print("Today's date is " + str(date.today()))
  #does not need a return bc no input or arguments

print(greet_user()) # print to see it evaluates to None, because it has no return statement

Hello, user!
Today's date is 2023-09-22
<function greet_user at 0x7ae831448e50>


A simple statement of "return" returns from the function with a value of None, but it's optional since the function will return with a "None" value without it as well.

In [None]:
def greet_user():
  print("Hello, user!")
  print("Today's date is " + str(date.today()))
  return

greet_user()

Hello, user!
Today's date is 2023-08-16


## Multiple return values

It's possible for a function to have multiple return values.  The return statement should separate the different return values with commas, and where the function is called, comma-separated variables can have these multiple values assigned to them.  (The program thinks of the return value as a single tuple.)

In [None]:
def longest_customer_name(list_of_names):
    # Find the longest customer name, and how long it is
    # (maybe so we can display the names nicely later)
    longest_len = 0
    longest_name = ""
    for n in list_of_names:
        if len(n) > longest_len:
            longest_len = len(n)
            longest_name = n
            #steps throguh the list once and return both values instead of using 2 def function
    return longest_name, longest_len
# if you dont want to print one of these two, then substitutte with _
name, length = longest_customer_name(['Alice', 'Bob', 'Cassia'])
print(name)
print(length)

Cassia
6


# Exercise (5 min)

For a list of numbers, we might often care about the min, the mean, and the max.  Write a function min_mean_max() that takes a list and returns all three values, using min(), statistics.mean(), and max().  You can assume the list isn't empty.

In [None]:
from statistics import mean

def min_mean_max(n):
  min = min(n)
  max = max(n)
  mean = mean(n)
  return(print(min, mean, max))

min_mean_max([1,2,3,4,5])

UnboundLocalError: ignored

In [None]:
from statistics import mean

def min_mean_max(nlist):
  return(min(nlist), mean(nlist), max(nlist))

min_mean_max([1,2,3,4,5])


(1, 3, 5)

# Functions Day 2

*On the plane to the Democratic Republic of Congo, Cynthia repeatedly checked her phone for messages from Kathleen.  Nothing, nothing, nothing.*

*Until, finally - a message!  From Kathleen!  Cynthia opened it and ... "What is this crazy nonsense you sent me???" Kathleen wrote.  "What am I even looking at???"*

*Maybe I could have done more to organize my code, Cynthia thought ruefully.  Having my hard work dismissed as nonsense ... kind of sucks.*

## Multiple return statements

There could be multiple points in the function with return statements, although it's considered stylistically preferable for there to be just one return statement if possible.  As soon as the return statement is reached and evaluated, the function quits, and any lines further down aren't evaluated.

In [None]:
def count_items(lst):
    # Count items but warn if the list is empty
    if (len(lst) == 0):
        print('Warning: empty list passed to count_items!')
        return 0
    print("We don't get here with an empty list")
    return len(lst)

count_items([])

One reason to return early could be that the function found something it was looking for - and there's no need to look any further.  The final return statement could be the behavior for the case where nothing is found.

In [None]:
def is_prime(n):
    for i in range(2, n): # Look for a divisor
        if n % i == 0:    # i divides n evenly, no remainder
            return False
    return True           # didn't find a divisor

print(is_prime(11))
print(is_prime(4))

## Functions calling functions

You can define functions that call other functions that you've written.  In a big project, there could be several levels of hierarchy to your code, with function A calling function B calling function C.

In [None]:
# Repeat these functions because it's day 2
# and we haven't run their boxes for a while
#stuff inside def funtions are local. so vairables are not global
def longest_customer_name(list_of_names):
    # Find the longest customer name, and how long it is
    # (maybe so we can display the names nicely later)
    longest_len = 0
    longest_name = ""
    for n in list_of_names:
        if len(n) > longest_len:
            longest_len = len(n)
            longest_name = n
    return longest_name, longest_len

def count_matches(to_match, my_list):
  # Counts how many times to_match appears in my_list
  count = 0
  for m in my_list:
    if to_match == m:
      count += 1
  return count

def count_longest_name(list_of_names):
    # Count how many times the longest name appears in the list
    # Makes use of functions defined above
    word, length = longest_customer_name(list_of_names)
    return count_matches(word,list_of_names)

count_longest_name(['Alice','Bob','Catherine','Catherine'])


2

# Exercise (5 min)

Write a function all_names_short_enough that takes as arguments as list of strings and an integer character limit, and returns True only if all names have at most that many characters.  Thus all_names_short_enough(['Alice', 'Bob'], 3) would return False, but passing 5 as the second argument would make it True.

Choose one of the following ways to do this:  either iterate through the names and quit early if a too-long name is encountered; or just call longest_customer_name(), defined above, and use one of its results to decide whether the list qualifies.

In [None]:
def all_names_short_enough1(names, limit):
    for name in names:
        if len(name) > limit:
            return False
    return True

def all_names_short_enough2(names, limit):
    name, length = longest_customer_name(names)
    return length <= limit

print(all_names_short_enough1(['Alice', 'Bob'], 3))
print(all_names_short_enough1(['Alice', 'Bob'], 5))
print(all_names_short_enough2(['Alice', 'Bob'], 3))
print(all_names_short_enough2(['Alice', 'Bob'], 5))


# Scope and local variables

All the variables created in a function, including the arguments, are no longer accessible once the function returns.  All that memory gets cleaned up and made available again.  This helps reduce bugs, because if a function "makes a mess" by creating many different variables as it executes, there's no way the code outside the function can accidentally look at a value that was intended just for the function.


In [None]:
def add5(arg):
    b = arg + 5
    return b

add5(7) # Return 12
b  # Program says it doesn't know what this is
arg  # Similarly no recollection

This is also an example of "encapsulation," the principle that the user of a function shouldn't need to know how it was implemented. You assume that as long as you know the inputs, outputs, and that it works, you don't need to know exactly how it works.  If you needed to know the names of a lot of variables that get modified as the function works, that wouldn't be encapsulated.


While the program "forgets" variables when it leaves functions, it's aware of variables outside the function while the function is executing.  Variables declared outside all functions are called "global variables," and they can be accessed from inside functions.  But it's better style to pass in the needed values as arguments, rather than using global variables.

Newcomers to functions often don't understand why pattern A in the next code box is better than pattern B.  They both theoretically get the job done, but the second one uses a *global variable* that breaks encapsulation.  In big code bases, hunting for where a global variable was defined, and trying to determine what might change it, is not sustainable.

In [None]:
def pattern_a(price, tax):
  return price * (1 + 0.01 * tax)  # Everything we need is in the arguments - good

tax = 20 # Global variable - this is worse style
def pattern_b(price):
  return price * (1 + 0.01 * tax) # Works, but less flexible, hard to debug

print(pattern_a(100,20))
print(pattern_b(100))

120.0
120.0


# Shadowing

A variable declared in a function can "shadow" a variable that lives outside the function with the same name, sharing its name and preventing access to the outside variable until the function is done.  Again, this means users of your function don't need to worry about what you named your variables.  Below, the local copy of *a* shadows the value of *a* that lives outside the function.

In [None]:
def add_two(my_number):
  # Adds two to the argument.
  a = my_number + 2
  print("a is " + str(a) + " inside add_two")
  return a

a = 5
# remembers what a is inside the function when called even if same name as same thng outside
print("add_two(2) is " + str(add_two(2))) # local "a" set to 4
print("a is " + str(a) + " outside add_two")

a is 4 inside add_two
add_two(2) is 4
a is 5 outside add_two


Shadowing often happens with arguments, because the name of the argument is what we wanted to call the variable at the top level, too.  But the local variable in the function and the one at the top level are two different variables.

In [None]:
my_list = ['a','b','c']

def concatenate_all(my_list):
    out = ''
    for item in my_list:
        out += item
    return out

print(concatenate_all(['d','e'])) # ['d','e'] is called my_list in the function
print(concatenate_all(my_list))  # my_list is still a,b,c

# Refactoring

It might not be obvious at first what parts of code need to be broken up into functions.  You may well end up writing a piece of code only to look back and say, "Hmm, that could have been more concise with functions."  If you did copy-paste any code in the course of writing it, that might be a good signal that the code could use some reorganization.

"Refactoring" is simply the act of trying to clean up the breakdown of the code into functions - probably by turning non-function code into functions, but perhaps also cleaning up which functions do what.

Here is some code that could use a cleanup:

In [None]:
names = ["alice", "BOB", "Catherine", "Donovan"]
standardized_names = []
for name in names:
    name = name.capitalize()
    if len(name) > 5:
        name = name[0:5]
    standardized_names.append(name) # Capitalize first letter, lc others
jobs = ['firefighter', 'LIBRARIAN', 'Pilot', 'teacheR']
standardized_jobs = []
for job in jobs:
    job = job.capitalize()
    if len(job) > 3:
        job = job[0:3]
    standardized_jobs.append(job)
print(standardized_names)
print(standardized_jobs)


The familiarity of the second loop should clue us in to the repetitiveness.  The string length limit is different, but that's easily addressed with an argument.

So instead, we can turn this into a function.  We can get new, more readable code like this:

In [None]:
def standardize_strings(string_list, char_limit):
    out = []
    for s in string_list:
        s = s.capitalize()
        if len(s) > char_limit:
            s = s[0:char_limit]
        out.append(s)
    return out

standard_names = standardize_strings(names, 5)
standard_jobs = standardize_strings(jobs, 3)
print(standard_names)
print(standard_jobs)

This is a fairly small cleanup job, but it shows how the code is now a little more concise, a little more readable, a little more debuggable, and maybe usable elsewhere.

# Exercise (2 min)

Whups, we actually wanted to add a # character to the end of any string that was truncated because it was too long, for both the names and the jobs.  Change the standardize_strings() code below to have this functionality.  Was that easier than changing the original code in every place there was truncation?

In [None]:
def standardize_strings(string_list, char_limit):
    out = []
    for s in string_list:
        s = s.capitalize()
        if len(s) > char_limit:
            s = s[0:char_limit]
        out.append(s)
    return out

standard_names = standardize_strings(names, 5)
standard_jobs = standardize_strings(jobs, 3)
print(standard_names)
print(standard_jobs)

# Pseudocode and functions

Pseudocode is code that is written in a style closer to English than any particular programming language.  It's meant for human readers instead of being parsed by machine.  Sketching out a program in pseudocode ahead of time can help identify what would make a good function in the program.



Here is some pseudocode for the top level of a program to analyze some data:
```
open and read the customer purchase datafile
for every row in the data:
    if it's a new customer:
        create a new customer record
    add customer's purchases to database
    if customer is eligible for rewards:
        add customer to rewards list
    add purchases to database
sort all purchases in database by popularity
remove least popular items from database
predict for each customer what they will buy
return customers, rewards list, predictions
```

Here are some benefits of writing pseudocode:

* You can show your pseudocode to another programmer, and they can catch bugs, cases you're not handling, or errors in your thinking before you ever write any real code.

* From a pseudocode outline, it's easy to identify what high-level functions need to be written.  On a team, you can then break up the work by delegating the writing of these functions.

* When writing your own code, you may find it useful to write pseudocode in comments, then gradually replace it with actual code.

* We will later study algorithms, or procedures for doing things that are fast, and these are traditionally described in pseudocode instead of a specific language like Python, so that they clearly don't depend on language-specific details.

# Comment conventions

There is a standard way to comment Python functions.  It's probably overkill for very simple functions, but it's a good habit to get into, and whoever needs to work with your code next, whether a grader, a coworker, or your boss, will be happy that everything is well-commented.



These comments use three double-quotes surrounding a multiline string:

In [None]:
def get_first_letter(word):
  """ Returns the first letter of a string.

  word (str):  The string to get the letter from.

  A simple function just for demo purposes.  Probably
  not useful since get_first_letter takes more characters
  to type than string[0].
  """

  return word[0]

The first line of the multiline comment should be a quick description of what the function does.  After some space, there's then a description of every argument and its expected type.  Below that is anything else a programmer ought to know.

# Tests

It's a good software engineering principle to write tests for every function that you write.  This can use dedicated testing tools for your language, or it can just consist of writing function calls and comparing the results to what they should be.

In [None]:
# Remember to run the corresponding cells to define these functions
print(get_first_letter("Shibboleth") == "S")
print(pattern_a(100,20) == 120)
print(count_matches("A",[]) == 0)
print(count_matches("A", ["A","A","A"]) == 3)

True
True
True
True


In tests, you want to "kick the tires" of your function and determine not only whether it works under ideal conditions, but also in the toughest of corner cases.  For count_matches, I could test whether the function does the right thing for empty lists, or for longer lists where all the items match.

We won't adopt any official tools for testing in this class.  But I suggest that for any major piece of code, you have a cell with test calls that make it easy to tell whether the code is doing the right thing.