## Python Functions

Functions address two need:

- Complexity. A function is a tidy chunk that we can refer to by name without getting bogged down in the details. We can design, develop, and debug separate functions far more easily than big balls of mud
- Reuse

In [12]:
# define a function:
def first_function(name):
    print("Hello, {0}".format(name))

In [15]:
first_function('Ken')

Hello, Ken


In [16]:
def print_all_the_names(name_list):
    for i in name_list:
        print(i)

In [17]:
print_all_the_names(['jack', 'jason', 'kate', 'mike'])

jack
jason
kate
mike


you can also make a function provide an output, using the Return statement

In [27]:
# Create a sum function
def sum_function(number_list):
    running_total = 0
    for number in number_list:
        running_total = running_total + number
    
    return running_total

In [29]:
sum_function([1,2,3,4,5])

15

In [30]:
# You can return multiple things from one function
def sum_avg_function(number_list):
    running_total = 0
    for number in number_list:
        running_total = running_total + number
    avg = float(running_total)/len(number_list)
    return running_total, avg

In [32]:
sum_avg_function([1,3,4,6,9,0,2,66,77]) # the returned values are in tuple object

(168, 18.666666666666668)

### Three ways to end the processing of a function


In [37]:
# Create a sum function
def sum_function(number_list):
    running_total = 0
    for number in number_list:
        running_total = running_total + number
    
    return

In [38]:
a = sum_function([1,3,4,6,9,0,2,66,77])

In [39]:
a

In [40]:
a is None

True

#### Function can take an arbitrary number of parameters (including  no parameters at all).

In [41]:
def print_customer_info(id, user, income):
    print("Customer {0}: name is {1}, has ${2} income".format(id, user, income))

In [42]:
print_customer_info(2223, "Steve", 2000)

Customer 2223: name is Steve, has $2000 income


We can provide concrete argument values to this function a number of different ways.

- By position
- By parameter name
- A mixture of the two

In [45]:
print_customer_info(222, user = "Steve", income = 2000)

Customer 222: name is Steve, has $2000 income


### Can set up default values for parameters

In [46]:
def print_customer_info(id = 111, user = 'ABC', income = 1000):
    print("Customer {0}: name is {1}, has ${2} income".format(id, user, income))

In [47]:
print_customer_info(income=2000)

Customer 111: name is ABC, has $2000 income


### Mandatory and optional parameters can be used in the same function

### A variable defined inside a function, stays inside that function

## Lambda 

You can make a Lambda object instead of a function to do something pretty simple - something that's essentially just an expression, not even a full line of code.

In [1]:
def math(x,y):
    return x*y + x/y

In [2]:
math_lambda = lambda x,y : x*y + x/y

A lambda object is effectively a function. It's very simple function with a body that's no more complex than a single expression

def name(args):return expression

-> transformation ->

lambda args : expression

In [3]:
data = [('ON', 112), ('BC', 99), ('QC', 36)]
data.sort()
print(data)

[('BC', 99), ('ON', 112), ('QC', 36)]


In [5]:
data.sort(key = lambda item : item[1])
print(data)

[('QC', 36), ('BC', 99), ('ON', 112)]


### To make your program easier to read, use doctrings

In [6]:
def docstring():
    ''' This is a docstring
    It describes what the function does.
    
    Arguments:
    None
    
    Returns:
    None
    '''
    return None

In [10]:
help(docstring)

Help on function docstring in module __main__:

docstring()
    This is a docstring
    It describes what the function does.
    
    Arguments:
    None
    
    Returns:
    None



### General guidelines for use functions
- A function should do one thing and one thing only
- A function should have fewer than ~ 20 lines of code
- Most of your program should be contained inside functions.This helps to increase the portability of your code

In [12]:
def change_name():
    '''this function is used to print name
    
    Arguments: 
    None
    
    Return: 
    None
    '''

In [13]:
help(change_name)

Help on function change_name in module __main__:

change_name()
    this function is used to print name
    
    Arguments: 
    None
    
    Return: 
    None



## Numpy

In [15]:
# Load Numpy
import numpy as np
# The as np bit defines an alias, which is a shorthand way of referencing the package
# Some frequently used packages have standard aliases: np is the one for Numpy.

In [17]:
# Create a Numpy array:
np_array = np.array([1,2,3])

In [18]:
print(np_array)

[1 2 3]


In [19]:
print(type(np_array))

<class 'numpy.ndarray'>


A few other ways to create Numpy arrays:

In [20]:
range_array = np.arange(1,10) # uniform range
range_array

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [25]:
linspace_array = np.linspace(1,9,9) # start, stop, create how many numbers
linspace_array

array([1., 2., 3., 4., 5., 6., 7., 8., 9.])

In [29]:
zero_array = np.zeros(5) # an array of zeros
zero_array

array([0., 0., 0., 0., 0.])

In [30]:
one_array = np.ones(5) # an array of ones
one_array

array([1., 1., 1., 1., 1.])

Two dimensional arrays

In [32]:
twod_array_fromlist = np.array([[1, 2, 3], [4, 5, 6]])
twod_array_fromlist

array([[1, 2, 3],
       [4, 5, 6]])

In [34]:
twod_array_zeros = np.zeros((5,5))
twod_array_zeros

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

Note that numpy arrays are implicitly in "row-major order". The struture is a list of rows. Within each row is a list of column values.

You can also reshape an existing array

In [35]:
twod_array_fromlist.reshape(3,2)

array([[1, 2],
       [3, 4],
       [5, 6]])

In [36]:
twod_array_fromlist.reshape(3,2)

array([[1, 2],
       [3, 4],
       [5, 6]])

In [83]:
five_x_five = np.arange(1, 25)
five_x_five
five_x_five.reshape(3, 8)

array([[ 1,  2,  3,  4,  5,  6,  7,  8],
       [ 9, 10, 11, 12, 13, 14, 15, 16],
       [17, 18, 19, 20, 21, 22, 23, 24]])

In [45]:
five_x_five.shape

(25,)

In [42]:
five_x_five.reshape(5,5)


array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24]])

Some other useful attributes of numpy arrays:

In [48]:
test_array = np.arange(10).reshape(5,2)
print(test_array)

[[0 1]
 [2 3]
 [4 5]
 [6 7]
 [8 9]]


In [49]:
print(test_array.ndim)

2


In [50]:
print(test_array.shape)

(5, 2)


In [51]:
print(test_array.size)

10


In [52]:
print(test_array.dtype)

int32


The value of ndim attribute is the number of dimensions. It does not have () after the attributes

#### Indexing and Slicing for Numpy array:

In [53]:
test_array = np.arange(20)
print(test_array)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]


In [54]:
print(test_array[5])

5


In [55]:
print(test_array[-1])

19


In [57]:
print(test_array[3:6]) # half-open end

[3 4 5]


In [58]:
print(test_array[3::2])

[ 3  5  7  9 11 13 15 17 19]


We can also slice and dice a multidimensional array:

In [59]:
list_of_lists = [[1, 2, 3], [4, 5, 6]]
array_of_lists = np.array(list_of_lists)
array_of_lists

array([[1, 2, 3],
       [4, 5, 6]])

In [61]:
print(list_of_lists[1][2]) # the third element in the second list

6


In [62]:
print(array_of_lists[1,2]) # the third element in the second row

6


In [63]:
print(array_of_lists[1][2])

6


"row-major order"

In [64]:
array_of_lists[0]

array([1, 2, 3])

In [66]:
array_of_lists[:,0] # the : means all rows

array([1, 4])

In [69]:
# a little complex manipulation:
input = np.array([
        [1,6],
        [2,5],
        [3,7],
        [4,10]
        ])
len(input)

4

In [75]:
m = len(input)
m

4

In [76]:
np.ones(m)

array([1., 1., 1., 1.])

In [77]:
input[:, 0]

array([1, 2, 3, 4])

In [78]:
np.array([np.ones(m), input[:, 0]])

array([[1., 1., 1., 1.],
       [1., 2., 3., 4.]])

In [80]:
x = np.array([np.ones(m), input[:,0]]).T

In [81]:
x

array([[1., 1.],
       [1., 2.],
       [1., 3.],
       [1., 4.]])

#### You can use numpy array of Boolean values to filter data, which you can't do with lists:

In [94]:
bool_list = [True, True, False, True, True]
data_list = [1,2,3,4,5]

In [95]:
bool_array = np.array(bool_list)
data_array = np.array(data_list)

In [96]:
print(data_array[bool_array])

[1 2 4 5]


In [97]:
bool_array

array([ True,  True, False,  True,  True])

In [98]:
data_array

array([1, 2, 3, 4, 5])

In [100]:
data_array[bool_array]

array([1, 2, 4, 5])

In [103]:
data_array = np.arange(10)
print(data_array[data_array > 5])

[6 7 8 9]


In [104]:
data_array > 5

array([False, False, False, False, False, False,  True,  True,  True,
        True])

## Use array within an array to do filtering

In [105]:
data_array[data_array > 5] = -999
print(data_array)

[   0    1    2    3    4    5 -999 -999 -999 -999]


### For Numpy, Use & (and) and | (or) for combining two logical predicates into a more complex logical value



In [112]:
data_array = np.arange(10)
filtered_data_array_and = data_array[(data_array >= 5) & (data_array < 9)]
print(filtered_data_array_and)

[5 6 7 8]


Math with Numpy Arrays

In [115]:
x_array = np.arange(10)
pi = 3.14
print(x_array * pi)

[ 0.    3.14  6.28  9.42 12.56 15.7  18.84 21.98 25.12 28.26]


In [116]:
x_array

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [118]:
y_array = np.arange(10 ,20)
print(x_array + y_array)

[10 12 14 16 18 20 22 24 26 28]


### Broadcasting: different shapes

In [120]:
some_data = np.arange(10)
more_data = np.arange(3, 33, 3)
more_data

array([ 3,  6,  9, 12, 15, 18, 21, 24, 27, 30])

In [122]:
some_data = np.arange(10)
more_data = np.arange(3, 33, 3)

In [123]:
some_data + more_data

array([ 3,  7, 11, 15, 19, 23, 27, 31, 35, 39])

General Broadcasting Rules

- two dimension: equal shape
- one of them is 1 dimension

Numpy also includes a wide variety of useful functions (fully optimitzed for speed, of course):

In [125]:
np.random.seed(891)
random_array = np.random.randint(0,30,10)
print(random_array)

[ 9 24 11 14 18 22 14  2 25  3]


In [126]:
print(np.sum(random_array))

142


In [127]:
print(np.mean(random_array))

14.2


In [128]:
print(np.median(random_array))

14.0


In [130]:
print(np.std(random_array))

7.743384273042375


In [131]:
print(np.max(random_array))

25


In [133]:
print(np.argmax(random_array)) # The index of the maximum element

8


In [134]:
print(np.min(random_array))

2


In [135]:
print(np.argmin(random_array))

7


In [136]:
print(np.sort(random_array))

[ 2  3  9 11 14 14 18 22 24 25]


In [137]:
print(np.unique(random_array))

[ 2  3  9 11 14 18 22 24 25]


## Reading and Writing Files - the Basics

To open and create a simple text file object you can use:

file = open(filename =, mode =, buffering =)

'r' is for reading

'w' is for writing. This will remove previous contents

'a' is for append

Ex. file = open(hello.txt 'rw')

#### To actually read the file object you can use:
    file.read([size]) # if size is omitted, reads the entire file
    
    file.readline() # finds newline character and reads single line
    
    file.write("Hi\n") # Note the \n, which is a newline character
    
    file.close() # close a file

In [144]:
f = open('hello_world.txt', 'w')
f.write('hello world\n')
f.close() # Don't forget to close the file when you are done

#### With statement:
To be sure that a file is closed, we often use the with statement. With this, we no longer need to worry about closing them when you are done. It will close automatically.

In [145]:
with open("hello_world.txt") as newFile:
    a=newFile.readlines() #Reads each line of a file into a list

In [146]:
a

['hello world\n']

Files as iterables:


In [None]:
count = o
with open("some_file.txt") as source:
    for line in source:
        if len(line.rstrip()) == 0: continue # rstrip() method returns a copy of the string with trailing characters removed
        count += 1
print("{0} non-blank lines".format(count))

#### Read CSV Files:

In [149]:
import csv
with open("count.csv", "r") as source:
    reader = csv.reader(source) # put every row as a list into a list
    for row in reader:
        print(row)

['date', 'count']
['2017-01-01', '20']
['2017-01-02', '21']
['2017-01-03', '22']
['2017-01-04', '23']
['2017-01-05', '24']
['2017-01-06', '25']
['2017-01-07', '26']
['2017-01-08', '27']
['2017-01-09', '28']
['2017-01-10', '29']
['2017-01-11', '30']
['2017-01-12', '31']
['2017-01-13', '32']
['2017-01-14', '33']
['2017-01-15', '34']
['2017-01-16', '35']
['2017-01-17', '36']
['2017-01-18', '37']
['2017-01-19', '38']
['2017-01-20', '39']
['2017-01-21', '40']
['2017-01-22', '41']
['2017-01-23', '42']
['2017-01-24', '43']
['2017-01-25', '44']
['2017-01-26', '45']


CSV with Pipes:
A common variation on CSV files is to use a character like | as a delimiter. This is still technically - a "comma" sparated value file; the | is filling the role of the 

reader = csv.reader(source, delimiter = '|')

CSV with headings:
Some CSV files will have a header line. We can use the header to build a dictinary from each line of input.
It looks like this:

In [150]:
import csv
with open("count.csv", "r") as source:
    reader = csv.DictReader(source)
    for row in reader:
        print(row['date'], row['count'])

2017-01-01 20
2017-01-02 21
2017-01-03 22
2017-01-04 23
2017-01-05 24
2017-01-06 25
2017-01-07 26
2017-01-08 27
2017-01-09 28
2017-01-10 29
2017-01-11 30
2017-01-12 31
2017-01-13 32
2017-01-14 33
2017-01-15 34
2017-01-16 35
2017-01-17 36
2017-01-18 37
2017-01-19 38
2017-01-20 39
2017-01-21 40
2017-01-22 41
2017-01-23 42
2017-01-24 43
2017-01-25 44
2017-01-26 45


Reading and Writing Columnar Data with Numpy:

Numpy has functions to handle file I/O:

In [151]:
array_to_save = np.arange(20).reshape(10,2)
print(array_to_save)

[[ 0  1]
 [ 2  3]
 [ 4  5]
 [ 6  7]
 [ 8  9]
 [10 11]
 [12 13]
 [14 15]
 [16 17]
 [18 19]]


In [153]:
np.savetxt("saved_file.csv", array_to_save, delimiter = ',')

In [154]:
loaded_array = np.loadtxt('saved_file.csv', delimiter = ',')
print(loaded_array)

[[ 0.  1.]
 [ 2.  3.]
 [ 4.  5.]
 [ 6.  7.]
 [ 8.  9.]
 [10. 11.]
 [12. 13.]
 [14. 15.]
 [16. 17.]
 [18. 19.]]
