# CHAPTER 2
# Built-in Data Structures and Functions & Files

# 2.2 Functions 

Functions are the primary and most important method of code organization and reuse in Python. As a rule of thumb, if you anticipate needing to repeat the same or very similar code more than once, it may be worth writing a reusable function. 

Functions can also help make your code more <span style="color:red">readable</span> by giving a name to a group of Python statements. Functions are declared with the def keyword and returned from with the return keyword:

    def my_function(x, y, z=1.5):    
        if z > 1:        
            return z * (x + y)    
        else:        
            return z / (x + y)

There is no issue with having multiple return statements. If Python reaches the end of a function without encountering a return statement, None is returned automatically. Each function can have positional arguments and keyword arguments. Keyword arguments are most commonly used to specify default values or optional arguments. In the preceding function, x and y are positional arguments while z is a keyword argument. This means that the function can be called in any of these ways:

    my_function(5, 6, z=0.7) 
    my_function(3.14, 7, 3.5) 
    my_function(10, 20) 
    
The main restriction on function arguments is that the keyword arguments must follow the positional arguments (if any). You can specify keyword arguments in any order; this frees you from having to remember which order the function arguments were specified in and only what their names are.

It is possible to use keywords for passing positional arguments as well. In the preceding example, we could also have written: 
    
    my_function(x=5, y=6, z=7) 
    my_function(y=6, x=5, z=7) 
    
In some cases this can help with readability.

## 2.2.1   Namespaces, Scope, and Local Functions 

Functions can access variables in two different scopes: global and local. An alternative and more descriptive name describing a variable scope in Python is a namespace. Any variables that are assigned within a function by default are assigned to the local namespace. The local namespace is created when the function is called and immediately populated by the function’s arguments. After the function is finished, the local namespace is destroyed (with some exceptions that are outside the purview of this chapter). Consider the following function:

    def func():    
        a = []    
        for i in range(5):        
            a.append(i) 
            
When func() is called, the empty list _a_ is created, five elements are appended, and then _a_ is destroyed when the function exits. Suppose instead we had declared _a_ as follows:

    a = [] 
    def func():    
        for i in range(5):        
            a.append(i)

Assigning variables outside of the function’s scope is possible, but those variables must be declared as global via the <span style="color:red; font-weight:bold;">global</span> keyword:

   

In [3]:
a = None
print(type(a))

def bind_a_variable():
    global a   
    a = []   
bind_a_variable()   

print(a) 
print(type(a))

<class 'NoneType'>
[]
<class 'list'>


In [5]:
def bind_a_variable():
    #global a   
    b = []   
    
bind_a_variable()   

print(b) 
print(type(b))

[]
<class 'list'>


## 2.2.2   Returning Multiple Values 

If you have programmed in Java and C++ before, one of useful features in Python that is not offered in those languages is the ability to return multiple values from a function with simple syntax. Here’s an example:

    def f():    
        a = 5    
        b = 6    
        c = 7    
        return a, b, c

    a, b, c = f() 
    
In data analysis and other scientific applications, you may find yourself doing this often. What’s happening here is that the function is actually just returning one object, namely a tuple, which is then being unpacked into the result variables. In the preceding example, we could have done this instead:

    return_value = f() 
    
In this case, return_value would be a 3-tuple with the three returned variables. A potentially attractive alternative to returning multiple values like before might be to return a dict instead:

    def f():    
        a = 5    
        b = 6    
        c = 7    
        return {'a' : a, 'b' : b, 'c' : c}

This alternative technique can be useful depending on what you are trying to do.


In [5]:
# An example for a car loan calculation

def get_a_car_loan(amount, tenure, rate):
    total = amount * (1 + (tenure*(rate/100)))
    installment = total / (tenure * 12)
    return {'total_loan':total, 'installment_per_month':installment} 
    #return a dict object

result = get_a_car_loan(96000,6,2.58)
print('data type of result: '+str(type(result)))
print('Total loan amount: RM'+str(result['total_loan']))
print('Installment per month: RM'+str(result['installment_per_month']))

data type of result: <class 'dict'>
Total loan amount: RM110860.8
Installment per month: RM1539.7333333333333


In [6]:
# An example for a car loan calculation

def get_a_car_loan(amount, tenure, rate):
    total = amount * (1 + (tenure*(rate/100)))
    installment = total / (tenure * 12)
    return total, installment #return a tuple object

result = get_a_car_loan(96000,6,2.58)
print('data type of result: '+str(type(result)))
print('Total loan amount: RM'+str(result[0]))
print('Installment per month: RM'+str(result[1]))
print()
# OR you can call in this way:
tot,inst = get_a_car_loan(96000,6,2.58)
print('data type of tot: '+str(type(tot)))
print('data type of inst: '+str(type(inst)))
print('Total loan amount: RM'+str(tot))
print('Installment per month: RM'+str(inst))
print()


data type of result: <class 'tuple'>
Total loan amount: RM110860.8
Installment per month: RM1539.7333333333333

data type of tot: <class 'float'>
data type of inst: <class 'float'>
Total loan amount: RM110860.8
Installment per month: RM1539.7333333333333




## 2.2.3  Functions Are Objects 

Since Python functions are objects, many constructs can be easily expressed that are difficult to do in other languages. Suppose we were doing some data cleaning and needed to apply a bunch of transformations to the following list of strings.

In [6]:
 states = ['Alabama', 'Georgia!', 'Georgia', 'georgia', 'FlOrIda',\
           'south    carolina##',\
           'West virginia?'] 

Anyone who has ever worked with user-submitted survey data has seen messy results like these. Lots of things need to happen to make this list of strings uniform and ready for analysis: stripping whitespace, removing punctuation symbols, and standardizing on proper capitalization. One way to do this is to use built-in string methods along with the __re__ standard library module for regular expressions:

In [7]:
import re

def clean_strings(strings):    
    result = []    
    for value in strings:        
        value = value.strip()        
        value = re.sub('[!#?]', '', value)        
        value = value.title()        
        result.append(value)    
    return result 

The result looks like this:

In [8]:
clean_strings(states)

['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South    Carolina',
 'West Virginia']

An alternative approach that you may find useful is to make a list of the operations you want to apply to a particular set of strings.

In [9]:
def remove_punctuation(value):    
    return re.sub('[!#?]', '', value)

clean_ops = [str.strip, remove_punctuation, str.title]

def clean_strings(strings, ops):    
    result = []    
    for value in strings:        
        for function in ops:
            value = function(value)        
        result.append(value)    
    return result 

Then we have the following:

In [10]:
clean_strings(states, clean_ops)

['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South    Carolina',
 'West Virginia']

A more _functional_ pattern like this enables you to easily modify how the strings are transformed at a very high level. The __clean_strings__ function is also now more reusable and generic. 

You can use functions as arguments to other functions like the built-in __map__ function, which applies a function to a sequence of some kind. The __map__ function applies a given function to each item of an iterable (list, tuple etc.) and returns a list of the results.

In [11]:
print(states)

['Alabama', 'Georgia!', 'Georgia', 'georgia', 'FlOrIda', 'south    carolina##', 'West virginia?']


In [12]:
for x in map(remove_punctuation, states):  
    print(x) 

Alabama
Georgia
Georgia
georgia
FlOrIda
south    carolina
West virginia


In [13]:
for x in (states):
    print(remove_punctuation(x))
    #print(result)

Alabama
Georgia
Georgia
georgia
FlOrIda
south    carolina
West virginia


In [14]:
def myfunc(n):
    return len(n)

x = map(myfunc, ('apple', 'banana', 'cherry')) 
print(list(x))
#print(x)

[5, 6, 6]


## 2.2.4  Anonymous (Lambda) Functions 

Python has support for so-called __anonymous__ or __lambda__ functions, which are a way of writing functions consisting of a single statement, the result of which is the return value. They are defined with the __lambda__ keyword, which has no meaning other than “we are declaring an anonymous function”. __A lambda__ function can take any number of arguments, but can only have one expression. The syntax is:

    lambda arguments : expression
  
Lambda functions are throw-away functions, i.e. they are just needed where they have been created. Lambda functions are mainly used in combination with the functions filter(), map() and reduce().  

They are especially convenient in data analysis because, as you’ll see, there are many cases where data transformation functions will take functions as arguments. It’s often less typing (and clearer) to pass a lambda function as opposed to writing a full-out function declaration or even assigning the lambda function to a local variable. For example, consider this silly example:

In [22]:
def apply_to_list(some_list, f):    
    return [f(x) for x in some_list]

ints = [4, 0, 1, 5, 6] 
apply_to_list(ints, lambda x: x*2)
apply_to_list(ints, lambda x: x**2)
apply_to_list(ints, lambda x: x+2)

[6, 2, 3, 7, 8]

In the above program, lambda x: x**2 is the lambda function. Here x is the argument and x*****2 is the expression that gets evaluated and returned. You could also have written [x ** 2 for x in ints], but here we were able to succinctly pass a custom operator to the __apply_to_list__ function. 

You can change your lambda function without having to create other new functions.

In [None]:
print(apply_to_list(ints, lambda x: x**2)) 
print(apply_to_list(ints, lambda x: x/2)) 

As another example, suppose you wanted to sort a collection of strings by the number of distinct letters in each string:

In [26]:
strings = ['card', 'amin', 'bar', 'aaaa', 'abab'] 

Here we could pass a lambda function to the list’s sort method:

In [27]:
strings.sort(key=lambda x: len(set(list(x))))
    
strings

['aaaa', 'abab', 'bar', 'card', 'amin']

Other examples:

In [28]:
# Example 1: A lambda function that adds 10 to the number passed in 
# as an argument and print the result.

x = lambda a : a + 10
print(x(5)) 

15


In [29]:
# Example 2: A lambda function that multiplies argument a with argument b and print the result 
# (Lambda functions can take any number of arguments).

x = lambda a, b : a * b
print(x(5, 6)) 

30


### Quick Exercise 1

Write a lambda function that sums argument a, b, and c and print the result.

In [None]:
# write your code here



# 2.3 File IO

Most of this book uses high-level tools like pandas.read_csv (you will learn this later) to read data files from disk into Python data structures. However, it’s important to understand the basics of how to work with files in Python. 

Fortunately, it’s very simple, which is one reason why Python is so popular for text and file munging. To open a file for reading or writing, use the built-in open function with either a relative or absolute file path:

In [30]:
path = 'File1.txt'
f = open(path)

By default, the file is opened in read-only mode 'r'. We can then treat the file handle f like a list and iterate over the lines like so:

In [31]:
for line in f:    
    pass 

The lines come out of the file with the end-of-line (EOL) markers intact, so you’ll often see code to get an EOL-free list of lines in a file like:

In [32]:
lines = [x.rstrip() for x in open(path)]

lines

['Hi! We are learning Python now..',
 '',
 'Python is so easy to learn..',
 'I hope you can learn it well..']

When you use open to create file objects, it is important to explicitly close the file when you are finished with it. Closing the file releases its resources back to the operating system:

In [33]:
f.close()

One of the ways to make it easier to clean up open files is to use the with statement:

In [34]:
with open(path) as f:   
    lines = [x.rstrip() for x in f] 
    
lines

['Hi! We are learning Python now..',
 '',
 'Python is so easy to learn..',
 'I hope you can learn it well..']

This will automatically close the file f when exiting the with block. 

If we had typed 

    f = open(path, 'w')
a new file at File1.txt would have been created (be careful!), overwriting any one in its place. 

There is also the 'x' file mode, which creates a writable file but fails if the file path already exists. See Table 2-2 for a list of all valid file read/write modes. 

<br>
<center>Table 2.2: Python file modes</center>
<img src="Table2.2.jpg", style="width: 800px";>

For readable files, some of the most commonly used methods are __read__, __seek__, and __tell__. __read__ returns a certain number of characters from the file. What constitutes a “character” is determined by the file’s encoding (e.g., UTF-8) or simply raw bytes if the file is opened in binary mode:

In [41]:
path = 'File1.txt'
f = open(path)
f.read(50)

'Hi! We are learning Python now.. \n\nPython is so ea'

In [39]:
f.closed

False

In [40]:
f2 = open(path, 'rb')
f2.read(50)

b'Hi! We are learning Python now.. \r\n\r\nPython is so '

The __read__ method advances the file handle’s position by the number of bytes read. __tell__ gives you the current position.

In [42]:
print(f.tell())
print(f2.tell())

52
50


__seek__ changes the file position to the indicated byte in the file:

In [43]:
print(f.seek(3))
print(f.tell())
f.read(3)

3
3


' We'

Lastly, don't forget to close the files:

In [44]:
f.close()
f2.close()

To write text to a file, you can use the file’s write or writelines methods. For example, we could create a version of File1 with no blank lines like so:

In [45]:
with open('tmp.txt', 'w') as handle:  
    handle.writelines(x for x in open(path) if len(x) > 1)

with open('tmp.txt') as f: 
    lines = f.readlines()

lines

['Hi! We are learning Python now.. \n',
 'Python is so easy to learn..\n',
 'I hope you can learn it well..']

In [46]:
f.closed

True

See Table 2-3 for many of the most commonly used file methods.

<br>
<center>Table 2.3: Important Python file methods or attributes </center>
<img src="Table2.3.jpg", style="width: 800px";>
