# McKinney Chapter 4 - Practice for Section 05

## Announcements

1. Our second DataCamp course, *Intermediate Python*, is due Friday, 1/26, at 11:59 PM
2. I will record our week 4 lecture video on McKinney chapter 5 this Thursday evening, and the week 4 pre-class quiz is due before class next Tuesday, 1/30
3. Team projects
   1. Continue to join teams on Canvas > People > Team Projects
   2. I removed the join-a-team assignment, but I will give the first project assignment in early February, so join a team by then

## 10-minute Recap

### NumPy Arrays

NumPy arrays are multidimensional data structures that can store numerical data efficiently and perform fast mathematical operations on them.

In [1]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


In [2]:
import numpy as np
%precision 4

'%.4f'

In [3]:
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [4]:
np.ones(10)

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

In [5]:
np.ones((2, 2))

array([[1., 1.],
       [1., 1.]])

In [6]:
np.ones((2, 2, 2, 2))

array([[[[1., 1.],
         [1., 1.]],

        [[1., 1.],
         [1., 1.]]],


       [[[1., 1.],
         [1., 1.]],

        [[1., 1.],
         [1., 1.]]]])

We will use standard normal random variables from `np.random.randn()` to tinker with ideas.
If we want to get the same random numbers (every time and across computers), we should use `np.random.seed()` to set the seed of the random number generator.

In [7]:
np.random.seed(42)
np.random.randn(2, 2)

array([[ 0.4967, -0.1383],
       [ 0.6477,  1.523 ]])

### Vectorized Functions

Vectorized computation is the process of applying an operation to an entire array or a subset of an array without using explicit loops. NumPy supports vectorized computation using universal functions (ufuncs), which are functions that operate on arrays element-wise.

In [8]:
4**(1/2)

2.0000

In [9]:
np.sqrt(4)

2.0000

In [10]:
np.arange(5)

array([0, 1, 2, 3, 4])

In [11]:
np.sqrt(np.arange(5))

array([0.    , 1.    , 1.4142, 1.7321, 2.    ])

### Indexing and Slicing

Indexing and slicing are techniques to access or modify specific elements or subsets of an array. NumPy also supports advanced indexing methods, such as fancy indexing and boolean indexing, which allow more flexible and complex selection of array elements.

In [12]:
np.random.seed(42) # 42 is the answer to everything from *The Hitchhikers Guide to the Galaxy*
my_array = np.random.randn(3, 3)

my_array

array([[ 0.4967, -0.1383,  0.6477],
       [ 1.523 , -0.2342, -0.2341],
       [ 1.5792,  0.7674, -0.4695]])

We can index the "first row" with the `[0]` index.

In [13]:
my_array[0]

array([ 0.4967, -0.1383,  0.6477])

We can index the first element in the first row by chaining a `[0]` index.

In [14]:
my_array[0][0]

0.4967

A simpler syntax for `[0][0]` is `[0, 0]`, which has a $i, j$ interpretation!

In [15]:
my_array[0, 0]

0.4967

we can combine this notation with slices, like for lists!
What if we want the first two columns in the first two rows?

In [16]:
my_array[:2] # first two rows

array([[ 0.4967, -0.1383,  0.6477],
       [ 1.523 , -0.2342, -0.2341]])

In [17]:
my_array[:2][:2] # first two rows and first two columns

array([[ 0.4967, -0.1383,  0.6477],
       [ 1.523 , -0.2342, -0.2341]])

In [18]:
my_array[:2, :2] # first two rows and first two columns

array([[ 0.4967, -0.1383],
       [ 1.523 , -0.2342]])

## Practice

### Create a 1-dimensional array named `a1` that counts from 0 to 24 by 1.

In [19]:
a1 = np.arange(25)

a1

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24])

The above is much easier than a step-by-step solution!

In [20]:
np.array(list(range(25)))

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24])

### Create a 1-dimentional array named `a2` that counts from 0 to 24 by 3.

In [21]:
a2 = np.arange(0, 25, 3) # start, stop, size (if we want to give size, we must give start)

a2

array([ 0,  3,  6,  9, 12, 15, 18, 21, 24])

### Create a 1-dimentional array named `a3` that counts from 0 to 100 by multiples of 3 or 5.

In [22]:
a3 = np.array([i for i in range(101) if (i%3==0) | (i%5==0)])

a3

array([  0,   3,   5,   6,   9,  10,  12,  15,  18,  20,  21,  24,  25,
        27,  30,  33,  35,  36,  39,  40,  42,  45,  48,  50,  51,  54,
        55,  57,  60,  63,  65,  66,  69,  70,  72,  75,  78,  80,  81,
        84,  85,  87,  90,  93,  95,  96,  99, 100])

In [23]:
a3_alt = np.arange(101)
a3_alt = a3_alt[ (a3_alt%3==0) | (a3_alt%5==0) ]

a3_alt

array([  0,   3,   5,   6,   9,  10,  12,  15,  18,  20,  21,  24,  25,
        27,  30,  33,  35,  36,  39,  40,  42,  45,  48,  50,  51,  54,
        55,  57,  60,  63,  65,  66,  69,  70,  72,  75,  78,  80,  81,
        84,  85,  87,  90,  93,  95,  96,  99, 100])

How can we make sure these two answers are iedntical?

In [24]:
a3 == a3_alt

array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True])

In [25]:
(a3 == a3_alt).all()

True

In [26]:
np.allclose(a3, a3_alt)

True

### Create a 1-dimensional array `a3` that contains the squares of the even integers through 100,000.

How much faster is the NumPy version than the list comprehension version?

In [27]:
np.arange(0, 100_001, 2)**2

array([         0,          4,         16, ..., 1409265424, 1409665412,
       1410065408])

On some computers, the output above is wrong because NumPy defaults to 32-bit integers, depending on the computer!
***Always check your output!***
To avoid this problem, we can force `np.arange()` to use 64-bit integers with the `dtype=` argument.

In [28]:
np.arange(0, 100_001, 2, dtype=np.int64)**2

array([          0,           4,          16, ...,  9999200016,
        9999600004, 10000000000], dtype=int64)

We can use the `%timeit` magic to time which code is faster!
The `%timeit` magic runs the code on the same line many times and reports the mean computation time.
The `%%timet` magic with two percent signs runs the code in the same cell many times and reports the mean computation time.

In [29]:
%timeit np.arange(0, 100_001, 2, dtype=np.int64)**2

56.3 µs ± 11.6 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


In [30]:
%timeit np.array([i**2 for i in range(0, 100_001)])

17.6 ms ± 1.82 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


The NumPy version is about 1,000 times faster!

### Write a function that mimic Excel's `pv` function.

Here is how we call Excel's `pv` function:
`=PV(rate, nper, pmt, [fv], [type])`
We can use the annuity and lump sum present value formulas.

Present value of an annuity payment `pmt`:
$PV_{pmt} = \frac{pmt}{rate} \times \left(1 - \frac{1}{(1+rate)^{nper}} \right)$

Present value of a lump sum `fv`:
$PV_{fv} = \frac{fv}{(1+rate)^{nper}}$

In [31]:
def pv(rate, nper, pmt=None, fv=None, type=None):
    if pmt is None:
        pmt = 0
    if fv is None:
        fv = 0
    if type is None:
        type = 'END'
        
    pv_pmt = (pmt / rate) * (1 - 1 / (1 + rate)**nper)
    pv_fv = fv / (1 + rate)**nper
    pv = -1 * (pv_pmt + pv_fv)

    if type == 'BGN':
        pv *= (1 + rate) # same as pv = pv*(1 + rate)
    
    return -1 * pv

In [32]:
pv(rate = 0.05, nper = 1_000, pmt = 5, type = 'BGN')

105.0000

In [33]:
a = 5
a*= 5
a*= 5
a

125

### Write a function that mimic Excel's `fv` function.

### Replace the negative values in `data` with -1 and positive values with +1.

In [34]:
np.random.seed(42)
data = np.random.randn(7, 7)
data

array([[ 0.4967, -0.1383,  0.6477,  1.523 , -0.2342, -0.2341,  1.5792],
       [ 0.7674, -0.4695,  0.5426, -0.4634, -0.4657,  0.242 , -1.9133],
       [-1.7249, -0.5623, -1.0128,  0.3142, -0.908 , -1.4123,  1.4656],
       [-0.2258,  0.0675, -1.4247, -0.5444,  0.1109, -1.151 ,  0.3757],
       [-0.6006, -0.2917, -0.6017,  1.8523, -0.0135, -1.0577,  0.8225],
       [-1.2208,  0.2089, -1.9597, -1.3282,  0.1969,  0.7385,  0.1714],
       [-0.1156, -0.3011, -1.4785, -0.7198, -0.4606,  1.0571,  0.3436]])

### Write a function `npmts()` that calculates the number of payments that generate $x\%$ of the present value of a perpetuity.

Your `npmts()` should accept arguments `c1`, `r`, and `g` that represent  $C_1$, $r$, and $g$.
The present value of a growing perpetuity is $PV = \frac{C_1}{r - g}$, and the present value of a growing annuity is $PV = \frac{C_1}{r - g}\left[ 1 - \left( \frac{1 + g}{1 + r} \right)^t \right]$.

### Write a function that calculates the internal rate of return given a NumPy array of cash flows.

### Write a function `returns()` that accepts *NumPy arrays* of prices and dividends and returns a *NumPy array* of returns.

In [35]:
prices = np.array([100, 150, 100, 50, 100, 150, 100, 150])
dividends = np.array([1, 1, 1, 1, 2, 2, 2, 2])

### Rewrite the function `returns()` so it returns *NumPy arrays* of returns, capital gains yields, and dividend yields.

### Rescale and shift numbers so that they cover the range [0, 1]

Input: `np.array([18.5, 17.0, 18.0, 19.0, 18.0])` \
Output: `np.array([0.75, 0.0, 0.5, 1.0, 0.5])`

In [36]:
numbers = np.array([18.5, 17.0, 18.0, 19.0, 18.0])

### Write functions `var()` and `std()` that calculate variance and standard deviation.

NumPy's `.var()` and `.std()` methods return *population* statistics (i.e., denominators of $n$).
The pandas equivalents return *sample* statistics (denominators of $n-1$), which are more appropriate for financial data analysis where we have a sample instead of a population.


Both function should have an argument `sample` that is `True` by default so both functions return sample statistics by default.

Use the output of `returns()` to compare your functions with NumPy's `.var()` and `.std()` methods.