<p style="text-align: center; font-size: 300%"> Computational Finance </p>
<img src="img/ABSlogo.svg" alt="LOGO" style="display:block; margin-left: auto; margin-right: auto; width: 50%;">

# Dealing with Data
## More Datatypes
### NumPy Arrays
* The most fundamental data type in numerical Python is `ndarray`, provided by the NumPy package ([user guide](https://docs.scipy.org/doc/numpy/user/index.html)).
* An array is similar to a `list`, except that
  * it can have more than one dimension;
  * its elements are homogenous (they all have the same type).
* NumPy provides a large number of functions (*ufuncs*) that operate elementwise on arrays. Allows *vectorized* code, avoiding loops (which are slow in Python).

#### Constructing Arrays
* Arrays can be constructed using the `array` function which takes sequences (e.g, lists), and converts them into arrays. The data type is inferred automatically or can be specified.

In [1]:
import numpy as np
a=np.array([1, 2, 3, 4])
a.dtype

dtype('int64')

In [2]:
a=np.array([1, 2, 3, 4],dtype='float64') #or np.array([1., 2., 3., 4.])
a.dtype

dtype('float64')

* Python uses C++ data types which differ from Python (though `float64` is equivalent to Python's `float`).

* Nested lists result in multidimensional arrays. We won't need anything beyond two-dimensional (i.e., a matrix or table).

In [3]:
a=np.array([[1., 2.], [3., 4.]]); a

array([[ 1.,  2.],
       [ 3.,  4.]])

In [4]:
a.ndim #Number of dimensions

2

In [5]:
a.shape #number of rows and columns

(2, 2)

* Other functions for creating arrays:

In [6]:
np.eye(3, dtype='float64') #identity matrix. float64 is the default dtype and can be omitted

array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

In [7]:
np.ones([2,3]) #there's also np.zeros, and np.empty (which result in an uninitialized array)

array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])

In [8]:
np.arange(0,10,2) #like range, but creates an array instead of a list

array([0, 2, 4, 6, 8])

In [9]:
np.linspace(0,10,5) #5 equally spaced points between 0 and 10

array([  0. ,   2.5,   5. ,   7.5,  10. ])

#### Indexing
* Indexing and slicing operations are similar to lists:

In [10]:
a=np.array([[1., 2.], [3., 4.]])
a[0,0] #indexing [row, column]. Equivalent to b[0][0]

1.0

In [11]:
b=a[:,0]; b #First column. Note that this yields a 1-dimensional array, not a matrix 

array([ 1.,  3.])

* Slicing returns *views* into the original array (unlike slicing lists):

In [12]:
b[0]=42

In [13]:
a

array([[ 42.,   2.],
       [  3.,   4.]])

* Apart from indexing by row and column, arrays also support *Boolean* indexing:

In [14]:
a=np.arange(10); a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [15]:
ind=a<5; ind

array([ True,  True,  True,  True,  True, False, False, False, False, False], dtype=bool)

In [16]:
a[ind]

array([0, 1, 2, 3, 4])

#### Arithmetic and ufuncs
* NumPy ufuncs are functions that operate elementwise:

In [17]:
a=np.arange(1,5); np.sqrt(a)

array([ 1.        ,  1.41421356,  1.73205081,  2.        ])

* Other useful ufuncs are `exp`, `log`, `abs`, `sqrt`
* Basic arithmetic on arrays works elementwise: 

In [18]:
a=np.arange(1,5); b=np.arange(5,9); a, b, a + b, a - b, a / b

(array([1, 2, 3, 4]),
 array([5, 6, 7, 8]),
 array([ 6,  8, 10, 12]),
 array([-4, -4, -4, -4]),
 array([0, 0, 0, 0]))

#### Broadcasting

* Operations between scalars and arrays are also supported:

In [19]:
np.array([1,2,3,4])+2

array([3, 4, 5, 6])

* This is a special case of a more general concept known as *broadcasting*, which allows operations between arrays of different shapes:
* NumPy compares the shapes of two arrays element-wise. It starts with the trailing dimensions, and works its way forward. Two dimensions are compatible if
  * they are equal, or
  * one of them is 1 (or not present)
* In the latter case, the singleton dimension is "stretched" to match the larger array.

* Example:

In [20]:
x=np.arange(6).reshape((2,3)); x #x has shape (2,3)

array([[0, 1, 2],
       [3, 4, 5]])

In [21]:
m=np.mean(x,axis=0); m #m has shape (3,)

array([ 1.5,  2.5,  3.5])

In [22]:
x-m #the trailing dimension matches, and m is `stretched` to match the 2 rows of x

array([[-1.5, -1.5, -1.5],
       [ 1.5,  1.5,  1.5]])

#### Array Reductions
* *Array reductions* are operations on arrays that return scalars or lower-dimensional arrays, such as the `mean` function used above
* They can be used to summarize information about an array, e.g., compute the standard deviation:

In [23]:
a=np.random.randn(300,3) #create a 300x3 matrix of standard normal variates
a.std(axis=0) #or np.std(a, axis=0)

array([ 0.97251912,  0.98773563,  0.95367387])

* By default, reductions work on a flattened version of the array. For row- or columnwise operation, the `axis` argument has to be given.
* Other useful reductions are `sum`, `median`, `min`, `max`, `argmin`, `argmax`, `any`, and `all` (see help).

#### Saving Arrays to Disk

* There are several ways to save an array to disk:

In [48]:
np.save('myfile.npy', a) #save a as a binary .npy file

In [49]:
import os
print(os.listdir('.'))

['README.md', 'week2.ipynb', 'myfile.npy', 'img', 'week1.ipynb', '.ipynb_checkpoints']


In [50]:
b=np.load('myfile.npy') #load the data into variable b
os.remove('myfile.npy') #clean up

In [54]:
np.savetxt('myfile.csv', a, delimiter=',') #save as ASCII file

In [56]:
b=np.loadtxt('myfile.csv', delimiter=',') #load data into b
os.remove('myfile.csv')

### Pandas Dataframes
#### Series

#### Dataframes

In [29]:
import pandas as pd

#### Fetching Data
* `pandas_datareader` is a package that makes it easy to fetch financial data from the web ([user manual](http://pandas-datareader.readthedocs.io/en/latest/remote_data.html)).
* It used to be included in pandas (and therefore Anaconda). In newer versions, you'll have to do `conda install pandas-datareader` to install it

In [30]:
#!conda install -y pandas-datareader #uncomment to install. (Note ! executes shell commands)
import pandas_datareader.data as web

In [31]:
start = pd.datetime(2010, 1, 1)
end = pd.datetime(2013, 1, 27)
f = web.DataReader("AAPL", 'yahoo', start, end)
f.index

DatetimeIndex(['2009-12-31', '2010-01-04', '2010-01-05', '2010-01-06',
               '2010-01-07', '2010-01-08', '2010-01-11', '2010-01-12',
               '2010-01-13', '2010-01-14',
               ...
               '2013-01-11', '2013-01-14', '2013-01-15', '2013-01-16',
               '2013-01-17', '2013-01-18', '2013-01-22', '2013-01-23',
               '2013-01-24', '2013-01-25'],
              dtype='datetime64[ns]', name=u'Date', length=772, freq=None)

## Regression Analysis

## Plotting with `matplotlib`