# Python Functions

Functions address two needs:

- Complexity. A good function is a tidy chunk that we can refer to by name without getting bogged down in the details. We can design, develop, and debug separate functions far more easily than big balls of mud.

- Reuse

A Python function is a construct which takes in input (commonly called the “arguments” to the function), runs some code, and produces a result.

We’ve seen built-in Python functions several times before in this course – print(), input(), and help() are all functions.

Now let's create your owner function

In [2]:
#define a funcion:
def first_function(name):
    print("Hello, {0}".format(name))


In [3]:
#use the function:
first_function('ABC')

Hello, ABC


In [4]:
def print_all_the_names(name_list):
    for i in name_list:
        print(i)

In [4]:
print_all_the_names(['jack','jason','kate','mike'])

jack
jason
kate
mike


you can also make a function provide an output, using the **Return** statement:

In [13]:
def sum_function(number_list):
    running_total = 0
    for number in number_list:
        running_total  = running_total + number
    return running_total

In [14]:
sum_function([1,3,4,6,9,0,2,66,77])

168

In [9]:
#You can return multiple things from one function:
def sum_avg_function(number_list):
    running_total = 0
    for number in number_list:
        running_total = running_total + number
    avg= float(running_total)/len(number_list)
    return running_total,avg

In [10]:
sum_avg_function([1,3,4,6,9,0,2,66,77])  #the returned values are in a tuple object.

(168, 18.666666666666668)

Three ways to end the processing of a function:

- A return expression statement. The value of the expression is the result of the function.
- An empty return statement. The function processing is done. There’s no explicitly returned value. The special object None will be the result. The print() function works this way.
- The end of the indented block inside the def statement. This is equivalent to a return statement with no expression.


In [15]:
def sum_function(number_list):
    running_total = 0
    for number in number_list:
        running_total  = running_total + number
    return running_total # A return expression statement.

In [16]:
a=sum_function([1,3,4,6,9,0,2,66,77])

In [18]:
a is None

168

In [19]:
def sum_function(number_list):
    running_total = 0
    for number in number_list:
        running_total  = running_total + number
    return 

In [24]:
a=sum_function([1,3,4,6,9,0,2,66,77])

In [21]:
a is None

True

In [28]:
def sum_function(number_list):
    running_total = 0
    for number in number_list:
        running_total  = running_total + number
    print(running_total)

In [30]:
a=sum_function([1,3,4,6,9,0,2,66,77])

168


In [32]:
a

Functions can take an arbitrary number of parameters (including no parameters at all). Here’s a typical function with three parameters.

In [35]:
def print_customer_info(id, user, income):
    print("Customer {0}: name is {1}, has ${2} income".format(id, user, income))

In [36]:
print_customer_info(2223,"Steve", 2000)

Customer 2223: name is Steve, has $2000 income


We can provide concrete argument values to this function a number of different ways.

- By position
- By paramter name
- A mixture of the two

The most common way to provide values to functions is via positional parameters. But when we get a larger number of parameters, or the ordering of the parameters isn’t obvious, we often want to provide the argument values using a the parameter name as a keyword

When we assign arguments via their keywords, we can provide the arguments in any order.

If we want, we can mix positional and keyword argument values. We must **provide the positional arguments first,** and they will be matched against the defined parameters in order. After the positional parameters, we can provide keywords

In [69]:
print_customer_info(222#positional
                    ,user="Steve", income=2000)

Customer 222: name is Steve, has $2000 income


Python supports the use of optional parameters by allowing us to provide default values.

**The default values for a parameter are denoted by an equals sign.** In this example, we provided a default value of each of the three parameters.

When there’s a default value for a parameter, it means that an argument value is not required. The value given to each keyword argument in the function definition is a default value that is used if an actual argument value isn’t specified when the function is called.

In [38]:
def print_customer_info(id=111, user='ABC', income=1000):
    print("Customer {0}: name is {1}, has ${2} income".format(id, user, income))

In [40]:
print_customer_info(income=2000)

Customer 111: name is ABC, has $2000 income


**Mandatory and optional parameters can be used in the same function,** but all mandatory parameters must come before any optional parameters

In [74]:
def print_customer_info2(id, user='ABC', income=1000):
    print("Customer {0}: name is {1}, has ${2} income".format(id, user, income))

In [76]:
print_customer_info2(id=999,income=2000)

Customer 999: name is ABC, has $2000 income


Another benefit of defining functions is that each function has its own scope. Broadly this means that **a variable defined inside a function, stays inside that function**

This is hugely important because you’ll often define a function far away from the code that calls it (either somewhere else in the script, or even in another file entirely), and scoping ensures that you don’t have to worry about what variable names you choose

In [77]:
def change_name():
    name = "Leo Li"
    print(name)

In [79]:
name = "Ken"
print(name)

Ken


In [80]:
change_name()

Leo Li


In [81]:
print(name)

Ken


## Lambda

You can make a Lambda object instead of a function to do something pretty simple – something that’s essentially just an expression, not even a full line of code.

In [82]:
def math(x, y):
    return x*y+x/y

In [83]:
math_lambda = lambda x,y: x*y+x/y

In [84]:
math_lambda(8,3)

26.666666666666668

The lambda object, lambda x,y: x*y+x/y is assigned to a variable named math_lambda. 

**A lambda object is effectively a function.** It’s a very small function with a body that’s no more complex than a single expression.

Notice the transformation from full function definition to lambda

def name( args ): 
    return expression

--> transformation-->

lambda args: expression

In [85]:
data = [('ON', 12), ('QC', 36), ('BC', 99)]
data.sort()
print(data)

[('BC', 99), ('ON', 12), ('QC', 36)]


In [41]:
data = [('ON', 112), ('QC', 36), ('BC', 99)]
data.sort(key=lambda item: item[1])
print(data)

[('QC', 36), ('BC', 99), ('ON', 112)]


The first example sorts a list using default comparison rules. This puts the items in order using the first item of each tuple.

The second example sorts the list using a key function. The key function is a lambda object, lambda item: item[1]. This object has a very simple expression to pick the second item of each tuple for comparison.

To make your programs easier for others to understand, Python functions support explanatory text called **docstrings**, which you should almost always use:

In [44]:
def docstring():
    '''This is a docstring.
    It describes what the function does.

    Arguments:
    None

    Returns:
    None
    '''
    return None

Note that the docstring is a multiline string, denoted by three quotes '''. Note also that the docstring of a function is actually what **help()** displays, so by writing good docstrings you automatically make them accessible through the interpreter! You’ll see more examples of docstrings in the exercises for this section and in future sections – indeed, most of the exercises will have you coding functions from now on.

In [45]:
help(docstring)

Help on function docstring in module __main__:

docstring()
    This is a docstring.
    It describes what the function does.
    
    Arguments:
    None
    
    Returns:
    None



**General guidelines for how to use functions:**

- A function should do one thing and one thing only

- A function should have fewer than ~20 lines of code

- Most (if not all) of your program should be contained inside functions. This helps to increase the portability of your code

In [49]:
def change_name():
    """this function is used to print name
    
    Arguments:
    None
    
    Return:
    None
    """
    name = "Leo Li"
    print(name)

In [50]:
help(change_name)

Help on function change_name in module __main__:

change_name()
    this function is used to print name
    
    Arguments:
    None
    
    Return:
    None



## Use Packages - Numpy

Numpy is an open-source, third party Python package. Its chief innovation is the implementation of the ndarray, a multidimensional array object (also simply called ‘Numpy arrays’). 

There are three really big advantages to using Numpy arrays:

- They’re much easier to manage than lists when you need a 2+ dimensional array. For example, if you have a 2D Numpy array all the rows will have the same number of elements and all columns must have the same number of rows. Lists don’t guarantee this.
- A Numpy array will hold data of a single type. This makes it much easier to predict the kind of data you’ll be analyzing. A list can’t guarantee this.
- The underlying code provides very high computational speed when Numpy arrays are used properly.

In [93]:
#load Numpy
import numpy as np
# The as np bit defines an alias, which is a shorthand way of referencing the package
# Some frequently used packages have standard aliases: np is the one for Numpy.

In [95]:
#create a Numpy array:
np_array = np.array([1.,2.,3.])

In [96]:
print(np_array)

[1. 2. 3.]


In [None]:
print(type(np_array))
# You can see that np_array is now a numpy.ndarray object, just as we wanted!

A few other common ways to create Numpy arrays:

In [101]:
range_array = np.arange(1,10)  # uniform range
range_array

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [106]:
linspace_array = np.linspace(1,9,9)  # another uniform range
linspace_array
#numpy.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=0)
# Return evenly spaced numbers over a specified interval.

array([1., 2., 3., 4., 5., 6., 7., 8., 9.])

In [103]:
zero_array = np.zeros(5)  # an array of zeros
zero_array

array([0., 0., 0., 0., 0.])

In [107]:
one_array = np.ones(5)  # an array of ones
one_array

array([1., 1., 1., 1., 1.])

Two dimensional arrays:

In [110]:
twod_array_fromlist = np.array([[1, 2, 3], [4, 5, 6]])
twod_array_fromlist

array([[1, 2, 3],
       [4, 5, 6]])

In [111]:
twod_array_zeros = np.zeros((5, 5))
twod_array_zeros

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

Note that numpy arrays are implicitly in “row-major order”. The structure is a list of rows. Within each row is a list of column values.

You can also reshape an existing array:

In [117]:
twod_array_fromlist.reshape(3,2)

array([[1, 2],
       [3, 4],
       [5, 6]])

In [120]:
five_x_five = np.arange(25)
five_x_five

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24])

In [121]:
five_x_five.reshape(5,5)
#The reshape(5, 5) turns a 25-element 1D array into a 2D array.

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24]])

Some other useful attributes of numpy arrays:

In [122]:
test_array = np.arange(10).reshape(5,2)
print(test_array)

[[0 1]
 [2 3]
 [4 5]
 [6 7]
 [8 9]]


In [123]:
print(test_array.ndim)

2


In [124]:
print(test_array.shape)

(5, 2)


In [125]:
print(test_array.size)

10


In [126]:
print(test_array.dtype)

int64


The value of ndim attribute is the number of dimensions. This is also the length of the shape attribute. The size attribute is the product of the shape. The underlying data type is a 64-bit integer.

**Indexing and Slicing for Numpy array:**

Indexing and slicing Numpy arrays works almost exactly the same as for other iterables:

In [127]:
test_array = np.arange(20)
print(test_array)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]


In [128]:
print(test_array[5])

5


In [129]:
print(test_array[-1])

19


In [130]:
print(test_array[3:6]) #half-open end

[3 4 5]


In [131]:
print(test_array[3::2])

[ 3  5  7  9 11 13 15 17 19]


We can also slice and dice a multidimensional array:

In [146]:
list_of_lists = [ [1,2,3], [4, 5, 6] ]
array_of_lists = np.array(list_of_lists)
array_of_lists

array([[1, 2, 3],
       [4, 5, 6]])

In [148]:
print(list_of_lists[1][2]) # the third element in the second list

6


In [149]:
print(array_of_lists[1,2]) # the third element in the second row

6


In [150]:
print(array_of_lists[1][2])

6


Note the use of the comma, instead of multiple square brackets. Both variations work when we’re using numpy. In pure Python, only the first version (using brackets) will work.

We mentioned that 2d arrays are in “row-major order”. Here’s what that means:

In [151]:
array_of_lists[0]

array([1, 2, 3])

When we give a partial set of indices, we’re slicing one row from the the array.

To slice a column from the array, we **need to explain which rows we want**

In [152]:
array_of_lists[:,0]
# The : for the row index means “all”

array([1, 4])

In [161]:
# a little complex manipulation:
input = np.array([
        [1, 6],
        [2, 5],
        [3, 7],
        [4, 10]
        ])

In [155]:
m = len(input)
m

4

In [160]:
x = np.array([np.ones(m), input[:, 0]]).T
x

array([[1., 1.],
       [1., 2.],
       [1., 3.],
       [1., 4.]])

Use np.ones(m) to produce a array of m one values.

Use input[:, 0] to slice off the first column of the input.

Put them to gether with np.array([np.ones(m), input[:, 0]]) we get two rows instead of two columnns

The .T operator transposes (or “pivots”) two rows to become two columns.

You can use numpy arrays of **Boolean values to filter data,** which you can’t do with lists:

In [162]:
bool_list = [True,True,False,True,True]
data_list = [1,2,3,4,5]
data_list[bool_list]

TypeError: list indices must be integers or slices, not list

In [163]:
bool_array = np.array(bool_list)
data_array = np.array(data_list)
print(data_array[bool_array])

[1 2 4 5]


In [164]:
data_array = np.arange(10)
print(data_array[data_array > 5])

[6 7 8 9]


In [166]:
data_array[data_array > 5] = -999
print(data_array)

[   0    1    2    3    4    5 -999 -999 -999 -999]


For Numpy, Use **& (and)** and **| (or)** for combining two logical predicates into a more complex logical value

In [170]:
data_array = np.arange(10)
filtered_data_array_and = data_array[(data_array >= 5) & (data_array < 9)] # you can not use 'and'
print(filtered_data_array_and)

[5 6 7 8]


In [169]:
filtered_data_array_or = data_array[(data_array <= 5) | (data_array >= 9)]
print(filtered_data_array_or)

[   0    1    2    3    4    5 -999 -999 -999 -999]


Math with Numpy Arrays

In [172]:
x_array = np.arange(10)
pi = 3.14
print(x_array*pi)

[ 0.    3.14  6.28  9.42 12.56 15.7  18.84 21.98 25.12 28.26]


In [173]:
y_array = np.arange(10,20)
print(x_array+y_array)

[10 12 14 16 18 20 22 24 26 28]


By default Numpy array math happens elementwise, which means that operations are performed on elements from each array that have the same index value.

That makes sense when the two arrays have the same shape, but what happens if they don’t? We’ve actually already seen the answer in action when we printed x_array*pi – Numpy ‘scales up’ the pi variable to the size of x_array, then does the math.

This process is called **broadcasting.** 

In [176]:
some_data = np.arange(10)
more_data = np.arange(3,33,3)
more_data

array([ 3,  6,  9, 12, 15, 18, 21, 24, 27, 30])

In [178]:
some_data + more_data

array([ 3,  7, 11, 15, 19, 23, 27, 31, 35, 39])

General Broadcasting Rules

When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions, and works its way forward. Two dimensions are compatible when

- they are equal, or
- one of them is 1

If these conditions are not met, a ValueError: operands could not be broadcast together exception is thrown, indicating that the arrays have incompatible shapes

In [179]:
x = np.arange(4)
xx = x.reshape(4,1)
y = np.ones(5)
z = np.ones((3,4))


In [180]:
x.shape

(4,)

In [181]:
y.shape

(5,)

In [182]:
x + y

ValueError: operands could not be broadcast together with shapes (4,) (5,) 

In [183]:
xx.shape

(4, 1)

In [184]:
y.shape

(5,)

In [185]:
xx

array([[0],
       [1],
       [2],
       [3]])

In [186]:
y

array([1., 1., 1., 1., 1.])

In [187]:
xx + y

array([[1., 1., 1., 1., 1.],
       [2., 2., 2., 2., 2.],
       [3., 3., 3., 3., 3.],
       [4., 4., 4., 4., 4.]])

In [188]:
z

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In [189]:
x + z

array([[1., 2., 3., 4.],
       [1., 2., 3., 4.],
       [1., 2., 3., 4.]])

Numpy also includes a wide variety of useful functions (fully optimized for speed, of course) :

In [193]:
np.random.seed(891)
random_array = np.random.randint(0,30,10)
print(random_array)

[ 9 24 11 14 18 22 14  2 25  3]


In [194]:
print(np.sum(random_array))

142


In [195]:
print(np.mean(random_array))

14.2


In [196]:
print(np.median(random_array))

14.0


In [197]:
print(np.std(random_array)) 

7.743384273042375


In [198]:
print(np.max(random_array))

25


In [199]:
print(np.argmax(random_array))  # The index of the maximum element

8


In [200]:
print(np.min(random_array))

2


In [201]:
print(np.argmin(random_array))  # The index of the minimum element

7


In [202]:
print(np.sort(random_array))

[ 2  3  9 11 14 14 18 22 24 25]


In [204]:
print(np.unique(random_array))

[ 2  3  9 11 14 18 22 24 25]


## Reading and Writing Files – the Basics
In this module, you will learn how use the built in file object in python to read, write and append to files. This is the essence of persistent data.

To open create a simple **text file** object you can use:
    
    file = open(filename[, mode[, buffering]])

- filename refers to the name of the file you want to open (as a string; this can be an existing file or a new file that you are creating)

- mode is a string that refers to whether you are reading or writing:

    ‘r’ is for reading.
    
    ‘w’ is for writing. This will remove previous contents
    
    ‘a’ is for append.
    
- Don’t worry about buffering, the defaults are often adequate

You can use a combination of modes, also. For example file= open(hello.txt, 'rw') will allow you to read and write to the hello.txt file. This will create the file if it doesn’t exist. It will erase any previous contents if it does exist.

Once we have a file object, we’ll use various methods of the object to read or write the text in the file.

To actually read the file object you can use:
    
    file.read([size]) #if size is omitted, reads the entire file
    
    file.readline() #finds newline character and reads a single line
    
To write to the file object:

    file.write("Hi\n") #Note the \n, which is a newline character

Here’s the way we close a file:    
    
    file.close()


In [205]:
f=open('hello_world.txt', 'w')
f.write('hello world\n')
f.close() #Don’t forget to close the file when you are done.

**With statement:**

To be sure that a file is closed, we often use the with statement. With this, we no longer need to worry about closing them when you are done. It will close automatically

In [207]:
with open("hello_world.txt") as newFile:
    a=newFile.readlines() #Reads each line of a file into a list

In [208]:
a

['hello world\n']

Files as Iterables:

An input file is an iterable object. It behaves like a collection of lines. We can use a file object in a for statement to iterate through the lines of the file. It often looks like this:

In [None]:
count = 0
with open('some_file.txt') as source:
    for line in source:
        if len(line.rstrip()) == 0: continue #rstrip() method returns a copy of the string with trailing characters removed 
        count += 1
print( "{0} non-blank lines".format(count) )

We’ve filtered the lines of the file by checking the length after stripping whitespace from the end of the line. A zero-length line would have no content, and we’ve used the continue statement to skip processing these empty lines and incremented a counter for the non-empty lines in the file.

**Read CSV Files:**

To read .CSV files, we use the built-in csv library. We will “wrap” the raw file to create a csv.reader object.

In [216]:
import csv
with open("count.csv","r") as source:
    reader = csv.reader(source)
    for row in reader:
        print(row)

['date', 'count']
['2017-01-01', '20']
['2017-01-02', '21']
['2017-01-03', '22']
['2017-01-04', '23']
['2017-01-05', '24']
['2017-01-06', '25']
['2017-01-07', '26']
['2017-01-08', '27']
['2017-01-09', '28']
['2017-01-10', '29']
['2017-01-11', '30']
['2017-01-12', '31']
['2017-01-13', '32']
['2017-01-14', '33']
['2017-01-15', '34']
['2017-01-16', '35']
['2017-01-17', '36']
['2017-01-18', '37']
['2017-01-19', '38']
['2017-01-20', '39']
['2017-01-21', '40']
['2017-01-22', '41']
['2017-01-23', '42']
['2017-01-24', '43']
['2017-01-25', '44']
['2017-01-26', '45']


This will open the file in “read” mode. This is the default; we can actually omit the "r" option.

We create a reader object using the csv.reader() function. This will handle all details of CSV file parsing. The reader object is an iterable; when we reference it in the for statement, we get a sequence of rows.

Each row will be a list of column values. Remember that lists are indexed from 0, so spreadsheet column “A” will be item zero of the row.

The values will be purely text. If we want to handle values as numbers, we’ll need to do explicit conversion. If column “C” is an integer, we need to convert using int(row[1]).

CSV with Pipes:

A common variation on CSV files is to use a character like | as a delimiter. This is still – technically – a “comma” separated value file; the | is filling the role of the ,

    reader = csv.reader( source, delimiter='|' )

CSV with headings:

Some CSV files will have a header line. We can use the header to build a dictionary from each line of input.

It looks like this:

In [219]:
import csv
with open("count.csv","r") as source:
    reader = csv.DictReader(source)
    for row in reader:
        print(row['date'], row['count'])

2017-01-01 20
2017-01-02 21
2017-01-03 22
2017-01-04 23
2017-01-05 24
2017-01-06 25
2017-01-07 26
2017-01-08 27
2017-01-09 28
2017-01-10 29
2017-01-11 30
2017-01-12 31
2017-01-13 32
2017-01-14 33
2017-01-15 34
2017-01-16 35
2017-01-17 36
2017-01-18 37
2017-01-19 38
2017-01-20 39
2017-01-21 40
2017-01-22 41
2017-01-23 42
2017-01-24 43
2017-01-25 44
2017-01-26 45


Reading and Writing Columnar Data with Numpy:

Numpy has functions to handle file I/O

In [220]:
array_to_save = np.arange(20).reshape(10,2)
print(array_to_save)

[[ 0  1]
 [ 2  3]
 [ 4  5]
 [ 6  7]
 [ 8  9]
 [10 11]
 [12 13]
 [14 15]
 [16 17]
 [18 19]]


In [221]:
np.savetxt('saved_file.csv',array_to_save,delimiter=',')

In [222]:
loaded_array = np.loadtxt('saved_file.csv',delimiter=',')
print(loaded_array)

[[ 0.  1.]
 [ 2.  3.]
 [ 4.  5.]
 [ 6.  7.]
 [ 8.  9.]
 [10. 11.]
 [12. 13.]
 [14. 15.]
 [16. 17.]
 [18. 19.]]


In [234]:
output_array_1 = np.arange(10)
output_array_2 = np.arange(10,20)
np.savetxt('file_2.txt', np.array([output_array_1,output_array_2]))

In [237]:
output_array_1 = np.arange(10)
output_array_2 = np.arange(10,20)
np.savetxt('file_3.txt', np.transpose([output_array_1,output_array_2]))

In [None]:
Loading data into multiple 1D arrays is also easy:

In [239]:
input_array_1,input_array_2 = np.loadtxt('file_3.txt',unpack=True)
print(input_array_1)
print(input_array_2)

[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
[10. 11. 12. 13. 14. 15. 16. 17. 18. 19.]
