#  Functions, Modules, Packages

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import statsmodels as sm

## Modules, Imports

In Python a module is simply a file with the **.py** extension containing Python code:

In [4]:
import some_module
result = some_module.f(5)
pi = some_module.PI

In [5]:
result

7

In [6]:
pi

3.14159

Or equivalently:

In [7]:
from some_module import f, g, PI
result = g(5, PI)

By using the **as** keyword you can give imports different variable names:

In [8]:
import some_module as sm
from some_module import PI as pi, g as gf
r1 = sm.f(pi)
r2 = gf(6, pi)

## Functions

Functions are declared with the **def** keyword and returned from with the **return** keyword:

In [9]:
def my_function(x, y, z=1.5):
    if z > 1:
        return z * (x + y)
    else:
        return z / (x + y)

If Python reaches the end of a function without encountering a return statement, **None** is returned automatically.

Each function can have _positional_ arguments and _keyword_ arguments. Keyword arguments are most commonly used to specify default values or optional arguments. In the preceding function, x and y are positional arguments while z is a keyword argument. This means that the function can be called in any of these ways:

In [10]:
my_function(5, 6, z=0.7)

0.06363636363636363

In [11]:
my_function(3.14, 7, 3.5)

35.49

In [12]:
my_function(10, 20)

45.0

* **keyword arguments must follow the positional arguments (if any)**. 
* **You can specify keyword arguments in any order; this frees you from having to remember which order the function arguments were specified in and only what their names are**.

It is possible to use keywords for passing positional arguments as well. In the preceding example, we could also have written:

In [13]:
my_function(x=5, y=6, z=7)
my_function(y=6, x=5, z=7)

77

### Namespaces, Scope, and Local Functions

Functions can access variables in two different scopes: _global_ and _local_. An alternative and more descriptive name describing a variable scope in Python is a _namespace_.

Any variables that are assigned within a function by default are assigned to the local namespace. The local namespace is created when the function is called and immediately populated by the function’s arguments. After the function is finished, the local namespace is destroyed

In [14]:
def func():
    a = []
    for i in range(5):
        a.append(i)

In [15]:
func()

In [16]:
print(a)

NameError: name 'a' is not defined

In [17]:
a = []
def func():
    for i in range(5):
        a.append(i)
    

In [18]:
func()

In [19]:
print(a)

[0, 1, 2, 3, 4]


Assigning variables outside of the function’s scope is possible, but those variables must be declared as global via the **global** keyword:

In [23]:
x = None
def bind_a_variable():
    global z
    z = [2]

In [24]:
bind_a_variable()

In [25]:
print(z)

[2]


#### Returning Multiple Values

In [26]:
def f():
    a = 5
    b = 6
    c = 7
    return a, b, c

In [27]:
a, b, c = f()

In [28]:
a, b, c

(5, 6, 7)

In [29]:
return_value = f()

In [30]:
return_value

(5, 6, 7)

In [31]:
def f():
    a = 5
    b = 6
    c = 7
    return {'a' : a, 'b' : b, 'c' : c}

#### Functions Are Objects

Suppose we were doing some data cleaning (stripping whitespace, removing punctuation symbols, and standardizing on proper capitalization) and needed to apply a bunch of transformations to the following list of strings:

In [32]:
states = [' Alabama ', 'Georgia!', 'Georgia', 'georgia', 'FlOrIda', 'south carolina##', 'West virginia?']

One way to do this is to use built-in string methods along with the re standard library module for regular expressions:

In [33]:
import re
def clean_strings(strings):
    result = []
    for value in strings:
        value = value.strip()
        value = re.sub('[!#?]', '', value)
        value = value.title()
        result.append(value)
    return result

In [34]:
clean_strings(states)

['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South Carolina',
 'West Virginia']

An alternative approach that you may find useful is to make a list of the operations you want to apply to a particular set of strings:

In [35]:
def remove_punctuation(value):
    return re.sub('[!#?]', '', value)

clean_ops = [str.strip, remove_punctuation, str.title]
def clean_strings(strings, ops):
    result = []
    for value in strings:
        for function in ops:
            value = function(value)
        result.append(value)
    return result

In [36]:
clean_strings(states, clean_ops)

['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South Carolina',
 'West Virginia']

You can use functions as arguments to other functions like the built-in **map** function, which applies a function to a sequence of some kind:

In [37]:
map(remove_punctuation, states)

<map at 0x28518d437c8>

In [38]:
for x in map(remove_punctuation, states):
    print(x)

 Alabama 
Georgia
Georgia
georgia
FlOrIda
south carolina
West virginia


#### Anonymous (Lambda) Functions

Writing functions consisting of a single statement, the result of which is the return value:

In [39]:
(lambda x: x * 2)(4)

8

In [40]:
equiv_anon = lambda x: x * 2

In [41]:
equiv_anon(2)

4

In [43]:
def apply_to_list(some_list, f):
    return [f(x) for x in some_list]
    
ints = [4, 0, 1, 5, 6]
apply_to_list(ints, lambda x: x * 2)

[8, 0, 2, 10, 12]

As another example, suppose you wanted to sort a collection of strings by the number
of distinct letters in each string

In [44]:
strings = ['foo', 'card', 'bar', 'aaaa', 'abab']

In [45]:
strings.sort(key=lambda x: len(set(list(x))))

In [46]:
strings

['aaaa', 'foo', 'abab', 'bar', 'card']

Unlike functions declared with the def keyword, the function object itself is never given an explicit __name__ attribute.

#### Currying: Partial Argument Application

Deriving new functions from existing ones by partial argument application. 

In [47]:
def add_numbers(x, y):
    return x + y

Using this function, we could derive a new function of one variable, add_five, that adds 5 to its argument:

In [48]:
add_five = lambda y: add_numbers(5, y)

The built-in functools module can simplify this process using the partial function:

In [49]:
from functools import partial
add_five = partial(add_numbers, 5)

In [50]:
add_five(3)

8

### Iterators and Generators

**iterator protocol**: a generic way to make objects iterable. For example, iterating over a dict yields the dict keys:

In [51]:
some_dict = {'a': 1, 'b': 2, 'c': 3}

In [59]:
for key in some_dict:
    print(key)

a
b
c


When you write for key in some_dict, the Python interpreter first attempts to create an iterator out of some_dict:

In [60]:
dict_iterator = iter(some_dict)

In [61]:
dict_iterator

<dict_keyiterator at 0x28518e57868>

An iterator is any object that will yield objects to the Python interpreter when used in a context like a for loop. Most methods expecting a list or list-like object will also accept any iterable object. This includes built-in methods such as min, max, and sum, and type constructors like list and tuple:

In [62]:
list(dict_iterator)

['a', 'b', 'c']

A _generator_ is a concise way to construct a new iterable object. Whereas normal functions execute and return a single result at a time, _generators return a sequence of multiple results lazily_, pausing after each one until the next one is requested. To create
a generator, use the **yield** keyword instead of return in a function:

In [66]:
def evens(n=100):
    for i in range(n):
        if i%2 == 0:
            yield i

In [69]:
even_gen = evens(10)

In [70]:
list(even_gen)

[0, 2, 4, 6, 8]

In [71]:
def squares(n=10):
    print('Generating squares from 1 to {0}'.format(n ** 2))
    for i in range(1, n + 1):
        yield i ** 2

When you actually call the generator, no code is immediately executed:

In [72]:
gen = squares()

It is not until you request elements from the generator that it begins executing its code:

In [73]:
for x in gen:
    print(x, end=' ')

Generating squares from 1 to 100
1 4 9 16 25 36 49 64 81 100 

In [128]:
def squares_inf():
    i = 0
    while True:
        yield i
        i = i + 2 

In [129]:
def all_even():
    n = 0
    while True:
        yield n
        n += 2

In [130]:
for i in all_even():
    print(i, end=" ")
    if i == 100:
        break

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100 

In [131]:
gen = squares_inf()

In [132]:
gen

<generator object squares_inf at 0x0000028518E80648>

In [133]:
squares_inf()

<generator object squares_inf at 0x0000028518E80948>

In [134]:
print(next(gen))
print(next(gen))
print(next(gen))

0
2
4


In [135]:
def fib_gen():
    a, b = 0, 1
    while True:
        yield a
        a, b = b, a + b

In [136]:
generator = fib_gen()

In [137]:
print(next(generator))
print(next(generator))
print(next(generator))
print(next(generator))
print(next(generator))
print(next(generator))

0
1
1
2
3
5


In [138]:
def fib(n):
    a, b = 0, 1
    for _ in range(n):
        yield a
        a, b = b, a + b

In [139]:
print(list(fib(4)))

[0, 1, 1, 2]


### Generator expresssions

A concise way to make a generator. This is a generator analogue to list, dict, and set comprehensions; to create one, enclose what would otherwise be a list comprehension within parentheses instead of brackets:

In [140]:
gen = (x ** 2 for x in range(100))

In [144]:
gen

<generator object _make_gen at 0x0000028518E80748>

In [147]:
next(gen)

4

This is completely equivalent to the following more verbose generator:

In [142]:
def _make_gen():
    for x in range(100):
        yield x ** 2


gen = _make_gen()

Generator expressions can be used instead of list comprehensions as function arguments in many cases:

In [143]:
sum(x ** 2 for x in range(100))

328350

In [96]:
dict((i, i **2) for i in range(5))

{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

In [150]:
def natural_numbers():
    """returns 1, 2, 3, ..."""
    n = 1
    while True:
        yield n
        n += 1

### Randomness

In [151]:
import random

In [152]:
four_uniform_randoms = [random.random() for _ in range(4)]

In [153]:
four_uniform_randoms

[0.4212639948995278,
 0.8268482288343075,
 0.26232111049317675,
 0.17467058960990223]

The *random* module actually produces pseudorandom (that is, deterministic) numbers based on an internal state that you can set with random.seed if you want to get reproducible results:

In [154]:
random.seed(10)

In [155]:
print(random.random())

0.5714025946899135


In [156]:
random.seed(10)

In [157]:
print (random.random())

0.5714025946899135


We’ll sometimes use *random.randrange*, which takes either 1 or 2 arguments and returns an element chosen randomly from the corresponding range():

In [158]:
random.randrange(10) # choose randomly from range(10) = [0, 1, ..., 9]

6

In [159]:
random.randrange(3, 6) # choose randomly from range(3, 6) = [3, 4, 5]

4

*random.shuffle* randomly reorders the elements of a list:

In [160]:
up_to_ten = list(range(10))
random.shuffle(up_to_ten)
print(up_to_ten)

[4, 5, 8, 1, 2, 6, 7, 3, 0, 9]


If you need to randomly pick one element from a list you can use *random.choice*:

In [161]:
my_best_friend = random.choice(["Alice", "Bob", "Charlie"])

In [162]:
my_best_friend

'Bob'

And if you need to randomly choose a sample of elements without replacement (i.e., with no duplicates), you can use *random.sample*:

In [163]:
lottery_numbers = list(range(60))
winning_numbers = random.sample(lottery_numbers, 6)  # [16, 36, 10, 6, 25, 9]

In [164]:
winning_numbers

[4, 15, 47, 23, 2, 26]

To choose a sample of elements with replacement (i.e., allowing duplicates), you can just make multiple calls to *random.choice*:

In [165]:
four_with_replacement = [random.choice(range(10)) for _ in range(4)]

In [166]:
four_with_replacement

[2, 9, 5, 6]

#### itertools module

The standard library itertools module has a collection of generators for many common data algorithms. For example, **groupby** takes any sequence and a function, grouping consecutive elements in the sequence by return value of the function. Here’s an example:

In [167]:
import itertools

In [168]:
first_letter = lambda x: x[0]

In [169]:
names = ['Alan', 'Adam', 'Wes', 'Will', 'Albert', 'Steven']

In [170]:
for letter, names in itertools.groupby(names, first_letter):
    print(letter, list(names)) 

A ['Alan', 'Adam']
W ['Wes', 'Will']
A ['Albert']
S ['Steven']


![alt text](images/itertools.png "Some useful itertools functions")