In [4]:
import random
import math
from math import nan, inf, isnan, isinf
import numpy as np

# Introduction to Python
## Bases of programming

### I. Core Native Data Types

- Numeric: integers, floats and complex
- Boolean (bool)
- String (str)
- Standard container types: list, tuple, dictionary, set, range

**Examples:**

The most important (scalar) data type for numerical analysis is the float. Unfortunately, not all non- complex numeric data types are floats. To input a floating data type, it is necessary to include a . (period, dot) in the expression. This example uses the function type() to determine the data type of a variable.

In [2]:
x = 1
type(x)

int

In [3]:
y = 1.0
type(y)

float

In [4]:
xx = float(x)
type(xx)

float

In [None]:
z = 1j
type(z)

The Boolean data type is used to represent true and false, using the reserved keywords True and False. Non-zero, non-empty values generally evaluate to true when evaluated by bool(). Zero or empty values such as bool(0), bool(0.0), bool(0.0j), bool(None), bool('') and bool([]) are all false.

In [5]:
x = True
type(x)

bool

In [6]:
int(x)==1


True

In [7]:
int(False)

0

In [8]:
y = 100
bool(y)

True

**Slicing strings:**
- s[:] - Entire string
- s[i ] - Character *i*
- s[i :] - Characters *i,..., n−1*
- s[:i ] - Characters *0,..., i−1*
- s[i:j] - Characters *i,..., j−1*
- s[i:j:m] - Characters *i,i+m,...i+m⌊(j−i−1)/m⌋*
- s[−i ] - Character *n−i*
- s[−i :] - Characters *n−i, ...,n−1*
- s[:−i ] - Characters *0, ..., n−i−1*
- s[−j:−i] - Characters *n−j, ..., n−i−1, −j < −i*
- s[−j:−i:m] - Characters *n−j, n−j+m , ...,n−j+m ⌊(j−i−1)/m⌋*

In [9]:
text = 'Python strings are sliceable.'
print(text)
print(text[7:14])

Python strings are sliceable.
strings


**NB:** be careful about indexing correctly. Indexing in Python starts with 0 and ends with len()-1.

In [10]:
L = len(text)
text[L] # Error

IndexError: string index out of range

In [11]:
text[L-1]

'.'

**Lists** are a built-in container data type which hold other data. A list is a collection of other objects – floats, integers, complex numbers, strings or even other lists. Lists are essential to Python programming and are used to store collections of other values. For example, a list of floats can be used to express a vector (although the NumPy data types array and matrix are better suited). Lists also support slicing to retrieve one or more elements. Basic lists are constructed using square braces, [], and values are separated using commas.

These examples show that lists can be regular, nested and can contain any mix of data types including other lists:

In [12]:
x = []
type(x) 

list

In [13]:
x=[1,2,3,4]
x

[1, 2, 3, 4]

In [14]:
# 2-dimensional list (list of lists)
x = [[1,2,3,4], [5,6,7,8]]
x

[[1, 2, 3, 4], [5, 6, 7, 8]]

In [15]:
# Jagged list, not rectangular
x = [[1,2,3,4] , [5,6,7]]
x

[[1, 2, 3, 4], [5, 6, 7]]

In [16]:
# Mixed data types
x = [1,1.0,1+0j,'one',None,True]
x

[1, 1.0, (1+0j), 'one', None, True]

**Slicing Lists**

Lists, like strings, can be sliced. Slicing is similar, although lists can be sliced in more ways than strings. The difference arises since lists can be multi-dimensional while strings are always 1 × n . Basic list slicing is identical to slicing strings, and operations such as x[:], x[1:], x[:1] and x[-3:] can all be used. To understand slicing, assume x is a 1-dimensional list with n elements and i ≥ 0, j > 0, i < j ,m ≥ 1. Python uses 0-based indices, and so the n elements of x can be thought of as x0,x1,...,xn−1.

In [17]:
x = [0,1,2,3,4,5,6,7,8,9]
x[0]

0

In [18]:
x[5]

5

In [19]:
x[10] # Error

IndexError: list index out of range

In [20]:
x[4:]

[4, 5, 6, 7, 8, 9]

In [21]:
x[:4]

[0, 1, 2, 3]

In [27]:
x[-1:-4:-1]

[9, 8, 7]

In [23]:
x[-0]

0

In [24]:
x[-1]

9

In [25]:
x[-10:-1]

[0, 1, 2, 3, 4, 5, 6, 7, 8]

List can be multidimensional, and slicing can be done directly in higher dimensions. For simplicity, consider slicing a 2-dimensional list x = [[1,2,3,4], [5,6,7,8]]. If single indexing is used, x[0] will return the first (inner) list, and x[1] will return the second (inner) list. Since the list returned by x[0] is sliceable, the inner list can be directly sliced using x[0][0] or x[0][1:4].

In [28]:
x = [[1,2,3,4], [5,6,7,8]]
x[0]

[1, 2, 3, 4]

In [29]:
x[1]

[5, 6, 7, 8]

In [30]:
x[0][0]

1

In [31]:
x[0][1:4]

[2, 3, 4]

In [32]:
x[1][-4:-1]

[5, 6, 7]

In [34]:
x[0,0] #Error: syntax unsuitable for lists in Python

TypeError: list indices must be integers or slices, not tuple

Examples of methods for manipulating lists:

In [35]:
x.append([11,12,13])
x

[[1, 2, 3, 4], [5, 6, 7, 8], [11, 12, 13]]

In [36]:
len(x)

3

In [37]:
x.extend([11,12,13])
x

[[1, 2, 3, 4], [5, 6, 7, 8], [11, 12, 13], 11, 12, 13]

In [38]:
x.pop(4)
x

[[1, 2, 3, 4], [5, 6, 7, 8], [11, 12, 13], 11, 13]

In [39]:
x.remove(11)
x

[[1, 2, 3, 4], [5, 6, 7, 8], [11, 12, 13], 13]

Elements can also be deleted from lists using the keyword del in combination with a slice.

In [40]:
x = [0,1,2,3,4,5,6,7,8,9]
del x[1:3]
x

[0, 3, 4, 5, 6, 7, 8, 9]

In [None]:
del x[:]
x

A **tuple** is virtually identical to a list with one important difference – tuples are **immutable**. Immutability means that a tuple cannot be changed once created. It is not possible to add, remove, or replace elements in a tuple. However, if a tuple contains a mutable data type, for example a tuple that contains a list, the contents mutable data type can be altered. Tuples are constructed using parentheses (()) in place of the square brackets ([]) used to create lists. Tuples can be sliced in an identical manner as lists. A list can be converted into a tuple using tuple() (Similarly, a tuple can be converted to list using list()).

In [41]:
x =(0,1,2,3,4,5,6,7,8,9)
type(x)

tuple

In [42]:
x[-10:-5]

(0, 1, 2, 3, 4)

In [43]:
x = list(x)
type(x)

list

In [44]:
x = tuple(x)
type(x)

tuple

In [45]:
x= ([1,2],[3,4])
x[0][1] = -10
x # Contents can change, elements cannot

([1, -10], [3, 4])

Note that tuples containing a single element must contain a **comma** when created, so that x = (2,) is assign a tuple to x, while x=(2) will assign 2 to x. The latter interprets the parentheses as if they are part of a mathematical formula rather than being used to construct a tuple. x = tuple([2]) can also be used to create a single element tuple. Lists do not have this issue since square brackets do not have this ambiguity.

Tuples are immutable, and so only have the methods index and count, which behave in an identical manner to their list counterparts.

In [46]:
x =(2)
type(x)

int

In [47]:
x = (2,)
type(x)

tuple

In [48]:
x = tuple([2])
type(x)

tuple

**Dictionaries** are encountered far less frequently than then any of the previously described data types in
numerical Python. They are, however, commonly used to pass options into other functions such as optimizers, and so familiarity with dictionaries is important. Dictionaries in Python are composed of **keys**
(words) and **values** (definitions). Dictionaries keys must be unique immutable data types (e.g. strings,
the most common key, integers, or tuples containing immutable types), and values can contain any valid Python data type. Values are accessed using keys.

In [49]:
data = {'age': 34, 'children' : [1,2], 1: 'apple'}
type(data)

dict

In [50]:
data['age']

34

Values' update:

In [51]:
data['age'] = 'xyz'
data['age']

'xyz'

Adding new key-value pairs:

In [53]:
data['name'] = 'abc'
data

{'age': 'xyz', 'children': [1, 2], 1: 'apple', 'name': 'abc'}

Deleting key-value pairs:

In [54]:
del data['age']
data

{'children': [1, 2], 1: 'apple', 'name': 'abc'}

**Sets** are collections which contain all unique elements of a collection. set and frozenset only differ in that the latter is immutable (and so has higher performance), and so set is similar to a unique list while frozenset is similar to a unique tuple . While sets are generally not important in numerical analysis, they can be very useful when working with messy data – for example, finding the set of unique tickers in a long list of tickers.

In [55]:
x = set(['MSFT','GOOG','AAPL','HPQ','MSFT'])
x

{'AAPL', 'GOOG', 'HPQ', 'MSFT'}

In [56]:
x.add('CSCO')
x

{'AAPL', 'CSCO', 'GOOG', 'HPQ', 'MSFT'}

In [57]:
y = set(['XOM', 'GOOG'])
x.intersection(y)

{'GOOG'}

In [58]:
x = x.union(y)
x

{'AAPL', 'CSCO', 'GOOG', 'HPQ', 'MSFT', 'XOM'}

In [59]:
x.remove('XOM')
x

{'AAPL', 'CSCO', 'GOOG', 'HPQ', 'MSFT'}

A **range** is most commonly encountered in a for loop. range(a,b,i) creates the sequences that follows the pattern a,a +i,a +2i,...,a +(m −1)i where m = ⌈b−a ⌉. In other words, it find all integers x starting i with a such a ≤ x < b and where two consecutive values are separated by i . range can be called with 1 or
two parameters – range(a,b) is the same as range(a,b,1) and range(b) is the same as range(0,b,1).

In [60]:
x = range(10)
type(x)

range

In [61]:
print(x)

range(0, 10)


In [62]:
list(x)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [63]:
x = range(3,10)

In [64]:
list(x)

[3, 4, 5, 6, 7, 8, 9]

In [65]:
x = range(3,10,3)
list(x)

[3, 6, 9]

range is not technically a list, which is why the statement print(x) returns range(0,10). Explicitly converting with list produces a list which allows the values to be printed. range is technically an iterator which does not actually require the storage space of a list. Note that if using Python 2.7, range and xrange are both present. xrange is preferred method loop iterator since it is more memory efficient, especially when iterating over a wide range of values.

**Exercise**

Input the variable ex = 'Python is an interesting and useful language for numerical computing!'. Using slicing, extract:

(a) Python

(b) !

(c) computing

(d) in

Note: There are multiple answers for all.

(e) !gnitupmoc laciremun rof egaugnal lufesu dna gnitseretni na si nohtyP' (Reversed)

(f) nohtyP

(g) Pto sa neetn n sfllnug o ueia optn!


### II. Importing modules

Python, by default, only has access to a small number of built-in types and functions. The vast majority of functions are located in modules, and before a function can be accessed, the module which contains the function must be imported.

Import can be used in a variety of ways. The simplest is to use from module import * which imports all functions in module. This method of using import can dangerous since it is possible for functions in one module to be hidden by later imports from other modeuls. A better method is to only import the required functions. This still places functions at the top level of the namespace while preventing conflicts.

In [87]:
from numpy import log10 # Importing a specific function from NumPy
import numpy # Importing the whole module
import numpy as np # Renaming the imported module

The only difference between the two types is that 'import numpy' is implicitly calling 'import numpy as numpy'. When this form of import is used, functions are used with the “as” name. For example, the square root provided by NumPy is accessed using np.sqrt:

In [88]:
np.sqrt(9)

3.0

Useful modules:
- **math:** module providing mathematical functions which operate on built-in scalar data types (e.g. float and complex).
- **NumPy:** provides a set of array and matrix data types which are essential for statistics, econometrics and data analysis.
- **SciPy:** contains a large number of routines needed for analysis of data. The most important include a wide range of random number generators, linear algebra routines, and optimizers. SciPy depends on NumPy.
- **time:** provides various time-related functions, useful to compute execution time.
- **pandas:** provides high-performance data structures.
- **matplotlib:** provides a plotting environment for 2D plots, with limited support for 3D plotting.
- **seaborn:** package that improves the default appearance of matplotlib plots without any additional code.


### III. NumPy: Arrays and Matrices

NumPy provides the core data types for econometrics, statistics, and numerical analysis – arrays and matrices. The difference between these two data types are:

• Arrays can have 1, 2, 3 or more dimensions, and matrices always have 2 dimensions. This means that a 1 by n vector stored as an array has 1 dimension and n elements, while the same vector stored as a matrix has 2-dimensions where the sizes of the dimensions are 1 and n (in either order).

• Standard mathematical operators on arrays operate element-by-element. This is not the case for matrices, where multiplication (*) follows the rules of linear algebra. 2-dimensional arrays can be multiplied using the rules of linear algebra using dot, and, if using Python 3.5 or later, arrays can be multiplied using the symbol @. Similarly, the function multiply can be used on two matrices for element-by-element multiplication.

• Arrays are more common than matrices, and all functions are thoroughly tested with arrays. The same functions should also work with matrices, but there is an increased chance of a rare bug when using matrices.

• Arrays can be quickly treated as a matrix using either asmatrix or mat without copying the underlying data.

The best practice is to use arrays and to use the @ symbol for matrix multiplication. Alternatively, the asmatrix view can be used when writing linear algebra-heavy code. It is also important to test custom functions with both arrays and matrices to ensure that false assumptions about the behavior of multiplication have not been made.

**Arrays** are the base data type in NumPy, are in similar to lists or tuples since they both contain collections of elements. Arrays, unlike lists, are always rectangular so that all dimensions have the same number of elements.

Array initialization using lists of lists:

In [89]:
x = [0.0, 1, 2, 3, 4]
y = np.array(x)
y

array([0., 1., 2., 3., 4.])

In [90]:
type(y)

numpy.ndarray

Two (or higher) -dimensional arrays are initialized using nested lists:

In [91]:
y = np.array([[0.0, 1, 2, 3, 4], [5, 6, 7, 8, 9]])
y

array([[0., 1., 2., 3., 4.],
       [5., 6., 7., 8., 9.]])

In [92]:
np.shape(y)

(2, 5)

In [93]:
y = np.array([[[1,2],[3,4]],[[5,6],[7,8]]])
y

array([[[1, 2],
        [3, 4]],

       [[5, 6],
        [7, 8]]])

In [94]:
np.shape(y)

(2, 2, 2)

Manually initializing higher dimension arrays is tedious and error prone, and so it is better to use functions such as np.zeros or np.empty:

In [95]:
x=np.zeros((2, 2, 2))
x

array([[[0., 0.],
        [0., 0.]],

       [[0., 0.],
        [0., 0.]]])

In [96]:
np.shape(x)

(2, 2, 2)

**Matrices** are essentially a subset of arrays and behave in a virtually identical manner:

In [97]:
x = [0.0, 1, 2, 3, 4] # Any float makes all float
y = np.array(x)
type(y)

numpy.ndarray

The two important differences are:

• Matrices always have 2 dimensions

• Matrices follow the rules of linear algebra for *

In [98]:
y * y # Element-by-element array([ 0., 1., 4., 9., 16.])

array([ 0.,  1.,  4.,  9., 16.])

1- and 2-dimensional arrays can be copied to a matrix by calling matrix on an array. Alternatively, mat or asmatrix provides a faster method to coerce an array to behave like a matrix without copying any data:

In [100]:
z = np.asmatrix(x)
type(z)

numpy.matrix

In [102]:
z * z.T # Error

matrix([[30.]])

**Concatenation** is the process by which one vector or matrix is appended to another. Arrays and matrices can be concatenation horizontally or vertically using np.concatenate:

In [103]:
x = np.array([[1.0,2.0],[3.0,4.0]])
y = np.array([[5.0,6.0],[7.0,8.0]])
z = np.concatenate((x,y), axis = 0) # vertical concatenation
z

array([[1., 2.],
       [3., 4.],
       [5., 6.],
       [7., 8.]])

In [104]:
z = np.concatenate((x,y),axis = 1) # horizontal concatenation
z

array([[1., 2., 5., 6.],
       [3., 4., 7., 8.]])

In [105]:
?np.concatenate

Accessing an indiviadual element in a 1-d array:

In [106]:
x = np.array([1.0,2.0,3.0,4.0,5.0])
x[0]

1.0

Accessing an indiviadual element in a 2-d array:

In [107]:
x = np.array([[1.0,2,3],[4,5,6]])
x[1, 2]

6.0

Pure scalar selection always returns a single element which is not an array:

In [108]:
type(x[1,2])

numpy.float64

Scalar selection can also be used to assign values in an array:

In [109]:
x = np.array([1.0,2.0,3.0,4.0,5.0])
x[0] = -5
x

array([-5.,  2.,  3.,  4.,  5.])

**Arrays slicing** is virtually identical list slicing except that a simpler slicing syntax is available when using multiple dimensions. Arrays are sliced using the syntax [:,:,...,:] (where the number of dimensions of the arrays determines the size of the slice). 

Basic slicing of 1-dimensional arrays is identical to slicing a simple list, and the returned type of all slicing operations matches the array being sliced. In 2-dimensional arrays, the first dimension specifies the row or rows of the slice and the second dimension specifies the column or columns. 

In [110]:
y = np.array([[0.0, 1, 2, 3, 4],[5, 6, 7, 8, 9]])
y

array([[0., 1., 2., 3., 4.],
       [5., 6., 7., 8., 9.]])

In [111]:
y[:1,:] # Row 0, all columns

array([[0., 1., 2., 3., 4.]])

In [112]:
y[:1] # Same as y[:1,:]

array([[0., 1., 2., 3., 4.]])

In [113]:
y[:,:1] # all rows, column 0

array([[0.],
       [5.]])

In [114]:
y[:1,0:3] # Row 0, columns 0 to 2 

array([[0., 1., 2.]])

In [115]:
y[:1][:,0:3] # Same as previous

array([[0., 1., 2.]])

In [116]:
y[:,3:] # All rows, columns 3 and 4

array([[3., 4.],
       [8., 9.]])

In [117]:
y = np.array([[[1.0,2],[3,4]],[[5,6],[7,8]]])
y[:1,:,:] # Panel 0 of 3D y

array([[[1., 2.],
        [3., 4.]]])

**NB:** In the previous examples, slice notation was always used even when only selecting 1 row or column. This was done to emphasize the difference between using slice notation, which always returns an array with the same dimension and using a scalar selector which will perform dimension reduction.

In [118]:
y = np.array([[0.0, 1, 2, 3, 4],[5, 6, 7, 8, 9]])
y[:1,:]

array([[0., 1., 2., 3., 4.]])

In [119]:
y[0,:] # Compare the outcomes

array([0., 1., 2., 3., 4.])

In [120]:
np.shape(y[0,:])

(5,)

In [122]:
np.shape(y[:1])

(1, 5)

Slicing and scalar selection can be used to assign arrays that have the same dimension as the slice:

In [123]:
x = np.array([[0.0]*3]*3) # *3 repeats the list 3 times
x

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [124]:
x[0,:] = np.array([1.0, 2.0, 3.0])
x

array([[1., 2., 3.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [125]:
x[::2,::2] = np.array([[-99.0,-99],[-99,-99]]) # 2 by 2
x

array([[-99.,   2., -99.],
       [  0.,   0.,   0.],
       [-99.,   0., -99.]])

**NB:** NumPy attempts to automatic (silent) data type conversion if an element with one data type is inserted into an array with a different type. For example, if an array has an integer data type, placing a float into the array results in the float being truncated and stored as an integer. This is dangerous, and so in most cases, arrays should be initialized to contain floats unless a considered decision is taken to use a different data type.

In [126]:
x = [0, 1, 2, 3, 4] # Integers 
y = np.array(x)
y.dtype

dtype('int64')

In [127]:
y[0] = 3.141592
y

array([3, 1, 2, 3, 4])

Unlike lists, slices of arrays do not copy the underlying data. Instead, a slice of an array returns a view of the array which shares the data in the sliced array. This is important since changes in slices will propagate to the original array as well as to any other slices which share the same element.

In [128]:
x = np.reshape(np.arange(4.0),(2,2))
x

array([[0., 1.],
       [2., 3.]])

In [130]:
s1 = x[0,:] # First row
s2 = x[:,0] # First column
s1[0] = -3.14 # Assign first element
s2

array([-3.14,  2.  ])

In [131]:
x

array([[-3.14,  1.  ],
       [ 2.  ,  3.  ]])

If changes should not propagate to parent and sibling arrays, it is necessary to call copy on the slice. Alternatively, they can also be copied by calling array on arrays, or matrix on matrices.

In [132]:
x = np.reshape(np.arange(4.0),(2,2))
x

array([[0., 1.],
       [2., 3.]])

In [134]:
s1 = np.copy(x[0,:]) # Function copy
s2 = x[:,0].copy() # Method copy, more common
s3 = np.array(x[0,:]) # Create a new array

In [135]:
s1[0] = -3.14 # Assign first element
s2

array([0., 2.])

In [136]:
s3

array([0., 1.])

In [137]:
x

array([[0., 1.],
       [2., 3.]])

Exception: when using pure scalar selection the (scalar) value returned is always a copy:

In [138]:
x = np.arange(5.0)
y = x[0] # Pure scalar selection
z = x[:1] # A pure slice
y = -3.14
y # y Changes

-3.14

In [139]:
x # No propagation

array([0., 1., 2., 3., 4.])

In [140]:
z # No changes to z either

array([0.])

### IV. **Flow Control, Loops and Exception Handling**

**Loops:**
- if ... elif ... else
- for
- while
- try ... except

**NB:** Python uses white space changes to indicate the start and end of flow control blocks, and so indention matters. For example, when using if ... elif ... else blocks, all of the control blocks must have the same indentation level and all of the statements inside the control blocks should have the same level of indentation. 

**Examples:**

**if . . . elif . . . else** blocks always begin with an if statement immediately followed by a scalar logical expression. elif and else are optional and can always be replicated using nested if statements at the expense of more complex logic and deeper nesting. 

**NB:** Remember that all logicals must be scalar logical values. While it is possible to use arrays containing a single element, attempting to use an array with more than 1 element results in an error.

In [141]:
x = 5
if x<5:
    x += 1 
else:
    x -= 1
x

4

In [143]:
L = [3, -10, 0, nan, -5, -inf, 17, inf]
x = random.choice(L)

if isnan(x)|isinf(abs(x)):
    print('Untreatable value: ', x)
elif (x>=0):
    print('The value ', x, ' is positive or null')
else:
    print('The value ', x, ' is strictly negative')

Untreatable value:  nan


**for** loops have a following syntax: 'for *item* in *iterable*:'. *item* is an element from *iterable*, and *iterable* can be anything that is iterable in Python. The most common examples are range, lists, tuples, arrays or matrices. The for loop will iterate across all items in *iterable*, beginning with item 0 and continuing until the final item. When using multidimensional arrays, only the outside dimension is directly iterable. For example, if x is a 2-dimensional array, then the iterable elements are x[0], x[1] and so on.

In [144]:
count = 0
for i in range(100):
    count += i
count

4950

This for expression can be expressed using range as the iterator and len to get the number of items in the iterable.

In [145]:
returns = np.random.randn(100)
count = 0
for i in range(len(returns)):
    if returns[i]<0:
        count += 1
count

58

Nested for loops with flow control:

In [146]:
x = np.zeros((10,10))
for i in range(np.size(x,0)):
    for j in range(np.size(x,1)):
        if i<j:
            x[i,j]=i+j;
        else:
            x[i,j]=i-j
            
x

array([[ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.],
       [ 1.,  0.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.],
       [ 2.,  1.,  0.,  5.,  6.,  7.,  8.,  9., 10., 11.],
       [ 3.,  2.,  1.,  0.,  7.,  8.,  9., 10., 11., 12.],
       [ 4.,  3.,  2.,  1.,  0.,  9., 10., 11., 12., 13.],
       [ 5.,  4.,  3.,  2.,  1.,  0., 11., 12., 13., 14.],
       [ 6.,  5.,  4.,  3.,  2.,  1.,  0., 13., 14., 15.],
       [ 7.,  6.,  5.,  4.,  3.,  2.,  1.,  0., 15., 16.],
       [ 8.,  7.,  6.,  5.,  4.,  3.,  2.,  1.,  0., 17.],
       [ 9.,  8.,  7.,  6.,  5.,  4.,  3.,  2.,  1.,  0.]])

Nested loops that are executed based on a flow control statement:

In [147]:
x = np.zeros((10,10))
for i in range(np.size(x,0)):
    if (i % 2) == 1:
        for j in range(np.size(x,1)):
            x[i,j] = i+j
    else:
        for j in range(int(i/2)):
            x[i,j] = i-j
x

array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.],
       [ 2.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.],
       [ 4.,  3.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 5.,  6.,  7.,  8.,  9., 10., 11., 12., 13., 14.],
       [ 6.,  5.,  4.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 7.,  8.,  9., 10., 11., 12., 13., 14., 15., 16.],
       [ 8.,  7.,  6.,  5.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 9., 10., 11., 12., 13., 14., 15., 16., 17., 18.]])

**NB:** It is not safe to modify the sequence of the iterable when looping over it. The means that the iterable should not change size, which can occur when using a list and the functions pop(), insert() or append() or the keyword del. 

The loop below would never terminate (except for the if statement that breaks the loop) since L is being extended each iteration.

In [148]:
L = [1, 2]
for i in L:
    print(i) 
    L.append(i+2) 
    if i>5:
        break

1
2
3
4
5
6


for loops can be used with 2 *items* when the *iterable* is wrapped in **enumerate**, which allows the elements of the *iterable* to be directly accessed, as well as their index in the *iterable*:

In [149]:
x = np.linspace(0,100,11)
for i,y in enumerate(x):
    print('i is :', i)
    print('y is :', y)

i is : 0
y is : 0.0
i is : 1
y is : 10.0
i is : 2
y is : 20.0
i is : 3
y is : 30.0
i is : 4
y is : 40.0
i is : 5
y is : 50.0
i is : 6
y is : 60.0
i is : 7
y is : 70.0
i is : 8
y is : 80.0
i is : 9
y is : 90.0
i is : 10
y is : 100.0


**while** loops are useful when the number of iterations needed depends on the outcome of the loop contents. while loops are commonly used when a loop should only stop if a certain condition is met, such as when the change in some parameter is small. The generic structure of a while loop is:

In [None]:
# while logical: 
#     Code to run
#     Update logical

Two things are crucial when using a while loop: first, the logical expression should evaluate to true when the loop begins (or the loop will be ignored) and second, the inputs to the logical expression must be updated inside the loop. If they are not, the loop will continue indefinitely. while loops should generally be avoided when for loops are sufficient. However, there are situations where no for loop equivalent exists:

In [150]:
mu = abs(100*np.random.randn(1)) 
print(mu)
index = 1
while abs(mu) > .0001:
    mu = (mu+np.random.randn(1))/index
    index=index+1
print(mu)

[77.03848755]
[9.01332152e-05]


**Exception handling** is an advanced programming technique which can be used to produce more resilient code (often at the cost of speed). try . . . except blocks are useful for running code which may fail for reasons outside of the programmer’s control. In most numerical applications, code should be deterministic and so dangerous code can usually be avoided. When it can’t, for example, if reading data from a data source which isn’t always available (e.g. a website), then try . . . except can be used to attempt to execute the code, and then to do something if the code fails to execute. The generic structure of a try . . . except block is

In [None]:
# try:
#     Dangerous Code
# except ExceptionType1:
#     Code to run if ExceptionType1 is raised
# except ExceptionType2:
#     Code to run if ExceptionType1 is raised
# ...
# ...
# except:
#     Code to run if an unlisted exception type is raised

A simple example of exception handling occurs when attempting to convert text to numbers:

In [152]:
text = ('a','1','54.1','43.a') 
for t in text:
    try:
        temp = float(t)
        print(temp)
    except ValueError:
        print('Not convertible to a float')

Not convertible to a float
1.0
54.1
Not convertible to a float


In [151]:
float('a')

ValueError: could not convert string to float: 'a'

**Exercises:**

1. Find two different methods to use a for loop to fill a 5×5 array with i × j where i is the row index, and j is the column index. One will use range as the iterable, and the other should directly iterate on the rows, and then the columns of the matrix.


2. Simulate 1000 observations from an ARMA(2,2) where $ε_t, ε_{t+1}, ...$ are white noise terms (independent identically distributed standard normal random variables). The process of an ARMA(2,2) is given by

$y_t =φ_1y_{t−1} + φ_2y_{t−2} + θ_1ε_{t−1} + θ_2ε_{t−2} + ε_t$

   Use the values $φ_1 = 1.4$, $φ_2 = −0.8$, $θ_1 = 0.4$ and $θ_2 = 0.8$. Note: A T vector containing standard normal random variables can be simulated using e = np.random.randn(T). When simulating a process, always simulate more data than needed and throw away the first block of observations to avoid start-up biases. This process is fairly persistent, at least 100 extra observations should be computed.
   
3. Write a code block doing the following: define two arrays of size 10 such that their elements are realizations of standard normal distribution. Create an output array of the same size with elements that:
    * are equal to 1 where the elements of the two initial arrays with the corresponding index are both positive,
    * are equal to -1 where the elements of the two initial arrays with the corresponding index are both negative,
    * are equal to 0 where the elements of the two initial arrays with the corresponding index have different signs.


### V. Functions

Python functions are very simple to declare and can occur in the same file as the main program or a standalone file. Functions are declared using the *def* keyword, and the value produced is returned using the *return* keyword. Consider a simple function which returns the square of the input, $y = x^2$:

In [10]:
def square(x):
    return x**2

# Call the function
x = 2
y = square(x) 
print(x,y)

2 4


In this example, the same Python file contains the main program – the final 3 lines – as well as the function. More complex function can be crafted with multiple inputs:

In [11]:
def l2distance(x,y):
    z = (x-y)**2
    return z

# Call the function
x=3
y = 10
z = l2distance(x,y) 
print(x,y,z)

3 10 49


In [13]:
def l1_l2_norm(x,y): 
    d=x-y
    return sum(np.abs(d)), np.sqrt(d @ d)

# Call the function
x = np.random.randn(10)
y = np.random.randn(10)
# Using 1 output returns a tuple
z = l1_l2_norm(x,y)
print(x-y)
print("The L1 distance is ",z[0])
print("The L2 distance is ",z[1])

[ 0.31960141 -0.99299159 -0.65340149 -0.38615149  0.05531734 -1.81147177
 -0.08001426 -2.92624558 -0.53371882  0.88973825]
The L1 distance is  8.648652000195664
The L2 distance is  3.8202773750838124


In [14]:
# Using 2 output returns the values
l1,l2 = l1_l2_norm(x,y)
print("The L1 distance is ",l1)
print("The L2 distance is ",l2)

The L1 distance is  8.648652000195664
The L2 distance is  3.8202773750838124


All input variables in functions are automatically keyword arguments, so that the function can be accessed either by placing the inputs in the order they appear in the function (positional arguments), or by calling the input by their name using *keyword=value*.

In [15]:
def lp_norm(x,y,p): 
    d=x-y
    return sum(abs(d)**p)**(1/p)

# Call the function
x = np.random.randn(10)
y = np.random.randn(10)
z1 = lp_norm(x,y,2)
z2 = lp_norm(p=2,x=x,y=y)
print("The Lp distances are ",z1,z2)

The Lp distances are  2.199129386855736 2.199129386855736


**Default values** are set in the function declaration using the syntax *input=default*:

In [16]:
def lp_norm(x,y,p = 2): 
    d=x-y
    return sum(abs(d)**p)**(1/p)

# Call the function
x = np.random.randn(10)
y = np.random.randn(10)
# Inputs with default values can be ignored
l2 = lp_norm(x,y)
l1 = lp_norm(x,y,1)
print("The l1 and l2 distances are ",l1,l2)
print("Is the default value overridden?", sum(abs(x-y))==l1)

The l1 and l2 distances are  9.802619063799462 4.209929919793486
Is the default value overridden? True


**NB:** Default values should not normally be mutable (e.g. lists or arrays) since they are only initialized the first time the function is called. Subsequent calls will use the same value, which means that the default value could change every time the function is called.

In [17]:
def bad_function(x = np.zeros(1)):
    print(x)
    x[0] = np.random.randn(1)
# Call the function
bad_function()
bad_function()
bad_function()

[0.]
[0.55546915]
[-0.26699561]


Each call to bad_function shows that x has a different value – despite the default being 0. The solution to this problem is to initialize mutable objects to 'None', and then the use an if to check and initialize only if the value is 'None'. Note that tests for 'None' use the 'is' keyword rather the testing for equality using ==.

In [18]:
def good_function(x = None):
    if x is None:
        x = np.zeros(1)
    print(x)
    x[0] = np.random.randn(1)
    
# Call the function
good_function()
good_function()

[0.]
[0.]


Most function written as an “end user” have an known (ex ante) number of inputs. However, functions which evaluate other functions often must accept **variable numbers of input**. Variable inputs can be handled using the '\*args' (arguments) or '\*\*kwargs' (keyword arguments) syntax. The '\*args' syntax will generate a tuple containing all inputs past the required input list. For example, consider extending the $L_p$ function so that it can accept a set of p values as extra inputs (Note: in practice it would make more sense to accept an array for p ):

In [19]:
def lp_norm(x,y,p = 2, *args): 
    d=x-y
    print('The L' + str(p) + ' distance is :', sum(abs(d)**p)**(1/p))
    out = [sum(abs(d)**p)**(1/p)]
    print('Number of *args:', len(args))
    for p in args:
        print('The L' + str(p) + ' distance is :', sum(abs(d)**p)**(1/p))
        out.append(sum(abs(d)**p)**(1/p))
    return tuple(out)

# Call the function
x = np.random.randn(10)
y = np.random.randn(10)
# x & y are required inputs and so are not in *args
lp = lp_norm(x,y)
# Function takes 3 inputs, so no *args
lp = lp_norm(x,y,1)
# Inputs with default values can be ignored
lp = lp_norm(x,y,1,2,3,4,1.5,2.5,0.5)

The L2 distance is : 5.275991097258029
Number of *args: 0
The L1 distance is : 13.938240859518313
Number of *args: 0
The L1 distance is : 13.938240859518313
Number of *args: 6
The L2 distance is : 5.275991097258029
The L3 distance is : 4.057380760150882
The L4 distance is : 3.6536036902880062
The L1.5 distance is : 7.150811852375454
The L2.5 distance is : 4.475857866933422
The L0.5 distance is : 122.05769667080303


The alternative syntax, '\*\*kwargs', generates a dictionary with all keyword inputs which are not in the function signature. One reason for using '\*\*kwargs' is to allow a long list of optional inputs without having to have an excessively long function definition. This is how this input mechanism operates in many matplotlib functions such as plot.

In [None]:
def lp_norm(x,y,p = 2, **kwargs): 
    d=x-y
    print('Number of *kwargs:', len(kwargs))
    for key in kwargs:
        print('Key :', key, ' Value:', kwargs[key])
    return sum(abs(d)**p)

# Call the function
x = np.random.randn(10)
y = np.random.randn(10)
# Inputs with default values can be ignored
lp = lp_norm(x,y,kword1=1,kword2=3.2)
# The p keyword is in the function def, so not in **kwargs
lp = lp_norm(x,y,kword1=1,kword2=3.2,p=0)

**Exercise:**

Write a function which takes an array with T elements containing categorical data (e.g. 1,2,3), and returns a T by C array of indicator variables where C is the number of unique values of the categorical variable, and each column of the output is an indicator variable (0 or 1) for whether the input data belonged to that category. For example, if x = [1 2 1 1 2], then the output is

$$\begin{bmatrix} 1 & 0 \\ 0 & 1 \\ 1 & 0 \\ 1 & 0 \\ 0 & 1 \end{bmatrix}$$


   The function should provide a second output containing the categories (e.g. [1 2] in the example).
   