### Creating Arrays

While python lists can contain values corresponding to different data types, 
arrays in python can only contain values corresponding to same data type

Easiest way to create an array is to use the `array()` function

In [1]:
import numpy as np

data1=[6,7.5,8,0,1]

arr1=np.array(data1)

arr1

array([ 6. ,  7.5,  8. ,  0. ,  1. ])

Nested list of equal length lists are converted into a ndarray or multidimensional array

In [2]:
data2=[[1,2,3,4],[5,6,7,8]]
arr2=np.array(data2)

#check on dimensionality of array using .ndim object
print(arr2.ndim)

# check on shape of array using .shape object
print(arr2.shape)

# check data type of array using .dtype object
print(arr2.dtype)

2
(2L, 4L)
int32


Create zero and 1's arrays

In [3]:
np.zeros(10)

array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])

In [4]:
np.zeros((4,10))

array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]])

`empty` creates an array without initializing its values to any particular value. To create a ndarray, pass a tuple for the shape (Row x Column)

In [5]:
np.empty((2,3))

array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

In [6]:
# Create two 3x2 ndarray

np.empty((2,3,2))

array([[[  0.00000000e+000,   6.36598737e-314],
        [  0.00000000e+000,   1.27319747e-313],
        [  1.27319747e-313,   1.27319747e-313]],

       [[  1.27319747e-313,   1.27319747e-313],
        [  0.00000000e+000,   4.44659081e-323],
        [  2.54639495e-313,   6.42285340e-323]]])

**`range()`**: create lists containing arithmetic progressions. The arguments must be plain integers. If the step argument is omitted, it defaults to 1 step. If the start argument is omitted, start = 0. The full form returns a list of plain integers [start, start + step, start + 2 * step, ...]. If step is positive, the last element is the largest start + i * step less than stop;

In [7]:
# an array-valued version of the built-in Python range function
np.arange(15)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

Create an **identity square matrix **(1's on the daigonal and 0's elsewhere)

In [8]:
identity_mx = np.eye((4))
identity_mx

array([[ 1.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.],
       [ 0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  1.]])

### Assigning Data Types for ndarrays

In [9]:
arr1=np.array([1,2,3], dtype=np.float64)
print(arr1.dtype)
arr2=np.array([1,2,3], dtype=np.float32)
print(arr2.dtype)

float64
float32


`astype()`: Change an array with an existing data type using method

In [10]:
arr=np.array([1,2,3,4,5])
arr.dtype

float_arr=arr.astype(np.float64)

float_arr.dtype

dtype('float64')

In [11]:
arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
print(arr)

print(arr.astype(np.int32))

[  3.7  -1.2  -2.6   0.5  12.9  10.1]
[ 3 -1 -2  0 12 10]


Casting floating point numbers to integers will cause the decimal part to be truncated. We can also convert strings to numbers using `astype()`

In [12]:
arr = np.array([3.8, -1.2,4.3,8.8,-34.5])
print(arr)

print(arr.astype(int))

# Converting string to numbers

numeric_strings = np.array(['1.24','4.34','8.97'], dtype= np.string_)
print(numeric_strings.dtype)

numeric_strings=numeric_strings.astype(float)
print("New Data Type: {}".format(numeric_strings.dtype))


[  3.8  -1.2   4.3   8.8 -34.5]
[  3  -1   4   8 -34]
|S4
New Data Type: float64


In [13]:
# Use another array's dtype to assign dtype

int_array = np.arange(10)
print(int_array)

calibers = np.array([.22,.385,.85,.999])
print(calibers)

print(int_array.astype(calibers.dtype))

[0 1 2 3 4 5 6 7 8 9]
[ 0.22   0.385  0.85   0.999]
[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9.]


### Operations between Arrays & Scalars

Arithmetic operations with scalars and arrays are done onto each element of the array

In [14]:
arr = np.array([[1.,2.,3.],[4.,5.,6.]])
print arr
print''

# Operations
print arr*arr
print arr-arr
print arr + arr
print 1/arr
print arr **0.5 # Exponentiation

[[ 1.  2.  3.]
 [ 4.  5.  6.]]

[[  1.   4.   9.]
 [ 16.  25.  36.]]
[[ 0.  0.  0.]
 [ 0.  0.  0.]]
[[  2.   4.   6.]
 [  8.  10.  12.]]
[[ 1.          0.5         0.33333333]
 [ 0.25        0.2         0.16666667]]
[[ 1.          1.41421356  1.73205081]
 [ 2.          2.23606798  2.44948974]]


### Basic Indexing

We can call upon individual elements using `array[]`

**slice** is a part of an array

**scalar** are single digits

**propagated** spread widely

In [15]:
arr = np.arange(10)
print(arr[5])
print(arr[5:10]) # the last number is not considered part of the output

# change the values directly by assigned a scalar value to a slice. 
# The scalar is propagated to the entire selection
arr[5:11] = 0
print(arr)

5
[5 6 7 8 9]
[0 1 2 3 4 0 0 0 0 0]


An important difference from lists is that array slices are *views* on the original array, meaning that any modifications to the view will be reflected in the original array.

In [16]:
arr_slice = arr[5:11]

arr_slice[1] = 0

print(arr_slice)

arr_slice[:] = 100000000 # A colon by itself means to take the entire axis

print(arr_slice)

[0 0 0 0 0]
[100000000 100000000 100000000 100000000 100000000]


With >2D arrays, the elements at each index are 1D arrays

In [17]:
arr2d = np.array([[1,2,3],[4,5,6],[7,8,9]])

arr2d[2]

array([7, 8, 9])

To access individual elements of a >2D array, pass a comma-separated list of indices to select elmeents. In 3D arrays, if we omit later indices and pass just a scalar, the returned object will be the lower dimensional ndarray consisting of all the data 

In [18]:
# 3D Array 2 x 2 x 3
arr3d = np.array([[[1,2,3],[4,5,6]],[[7,8,9],[10,11,12]]])
print arr3d

[[[ 1  2  3]
  [ 4  5  6]]

 [[ 7  8  9]
  [10 11 12]]]


In [19]:
print(arr3d[0]) # Return ndarray 0
print''
print(arr3d[0,0]) # Return ndarry 0 axis 0 (Row 0)
print''
print(arr3d[0,1]) # Return ndarray 0 axis 1

[[1 2 3]
 [4 5 6]]

[1 2 3]

[4 5 6]


In [20]:
old_values = arr3d[0].copy()

arr3d[0] = 42

print arr3d
print''
arr3d[0] = old_values

print arr3d

[[[42 42 42]
  [42 42 42]]

 [[ 7  8  9]
  [10 11 12]]]

[[[ 1  2  3]
  [ 4  5  6]]

 [[ 7  8  9]
  [10 11 12]]]


#### Indexing with slices

In [21]:
arr[1:6]

array([        1,         2,         3,         4, 100000000])

In [22]:
arr2d[:2]

array([[1, 2, 3],
       [4, 5, 6]])

In [23]:
arr2d[:2,1:]  # Select Row 1, 2 and Columns 1,2

array([[2, 3],
       [5, 6]])

In [24]:
# Mixing integer indexes and slices gets you lower dimensional slices

arr2d[1,:2] # Recall 2 is not included in the count of a slice

array([4, 5])

In [25]:
arr2d[2,:1] # Row axis 2,Column axis 0

array([7])

### Boolean Indexing

To generate some random normally distributed data, use `randn` function from `numpy.random`. Selecting data from an array by boolean indexing always creates a copy of the data.

In [26]:
names = np.array(['Bob','Joe','Will','Bob','Will','Joe','Joe'])

data = np.random.randn(7,4)

print(names)
print''
print(data)

['Bob' 'Joe' 'Will' 'Bob' 'Will' 'Joe' 'Joe']

[[ 0.22686939 -1.03787842  1.04256982  0.81685705]
 [-0.62288664 -0.05943373 -2.34261394 -0.03440283]
 [ 1.6598063   0.0840991   0.25405968  2.49153799]
 [-0.885544    0.93793348  1.37813532  0.36885533]
 [-0.35930713  0.23494922  2.28398274  1.66976128]
 [ 1.31681794  1.0412515  -0.57852522  1.71161624]
 [-0.19036411 -0.00515259 -1.46820832 -0.16882016]]


We can make comparisons using arithmetic operations with arrays, which yields boolean (True/False) array(s).

In [27]:
names == 'Bob'

array([ True, False, False,  True, False, False, False], dtype=bool)

This boolean array can be passes as an index on the array where it returns any indices that yield true. However, the boolean array must be of the same length as the axis it's indexing. You can mix and match boolean arrays with slices or integers.

In [28]:
data[names =='Bob'] # There are 2 True's

array([[ 0.22686939, -1.03787842,  1.04256982,  0.81685705],
       [-0.885544  ,  0.93793348,  1.37813532,  0.36885533]])

In [29]:
data[names=='Bob', 2:] # mix and match boolean arrays with slices, integers

array([[ 1.04256982,  0.81685705],
       [ 1.37813532,  0.36885533]])

In [30]:
data[names=='Bob',3]

array([ 0.81685705,  0.36885533])

To select everything but 'Bob', we can use != or negate the condition using ~:

In [31]:
names != 'Bob'

array([False,  True,  True, False,  True,  True,  True], dtype=bool)

In [32]:
data[~(names=='Bob')]

array([[-0.62288664, -0.05943373, -2.34261394, -0.03440283],
       [ 1.6598063 ,  0.0840991 ,  0.25405968,  2.49153799],
       [-0.35930713,  0.23494922,  2.28398274,  1.66976128],
       [ 1.31681794,  1.0412515 , -0.57852522,  1.71161624],
       [-0.19036411, -0.00515259, -1.46820832, -0.16882016]])

We can combine multiple boolean conditions, using `&` for AND and `|` for OR

In [33]:
mask = (names=='Bob') | (names=='Will')
print(mask)
print(names)

data[mask]

[ True False  True  True  True False False]
['Bob' 'Joe' 'Will' 'Bob' 'Will' 'Joe' 'Joe']


array([[ 0.22686939, -1.03787842,  1.04256982,  0.81685705],
       [ 1.6598063 ,  0.0840991 ,  0.25405968,  2.49153799],
       [-0.885544  ,  0.93793348,  1.37813532,  0.36885533],
       [-0.35930713,  0.23494922,  2.28398274,  1.66976128]])

In [34]:
data[data < 0] = 0
data[data > 0] = 1
data

array([[ 1.,  0.,  1.,  1.],
       [ 0.,  0.,  0.,  0.],
       [ 1.,  1.,  1.,  1.],
       [ 0.,  1.,  1.,  1.],
       [ 0.,  1.,  1.,  1.],
       [ 1.,  1.,  0.,  1.],
       [ 0.,  0.,  0.,  0.]])

In [35]:
data[names != 'Joe'] = 77
data

array([[ 77.,  77.,  77.,  77.],
       [  0.,   0.,   0.,   0.],
       [ 77.,  77.,  77.,  77.],
       [ 77.,  77.,  77.,  77.],
       [ 77.,  77.,  77.,  77.],
       [  1.,   1.,   0.,   1.],
       [  0.,   0.,   0.,   0.]])

### Fancy Indexing

Whereby we index using integer arrays

In [36]:
arr = np.empty((8,4))

for i in range(8):
    arr[i] = i
    
print(arr)
print''
arr[[4,3,0,6]]

[[ 0.  0.  0.  0.]
 [ 1.  1.  1.  1.]
 [ 2.  2.  2.  2.]
 [ 3.  3.  3.  3.]
 [ 4.  4.  4.  4.]
 [ 5.  5.  5.  5.]
 [ 6.  6.  6.  6.]
 [ 7.  7.  7.  7.]]



array([[ 4.,  4.,  4.,  4.],
       [ 3.,  3.,  3.,  3.],
       [ 0.,  0.,  0.,  0.],
       [ 6.,  6.,  6.,  6.]])

Using negative indices, select rows from the end counting backwards

In [37]:
arr[[-3,-5,-7]]

array([[ 5.,  5.,  5.,  5.],
       [ 3.,  3.,  3.,  3.],
       [ 1.,  1.,  1.,  1.]])

We can reshape arrays using np.reshape option

In [38]:
arr = np.arange(32).reshape((8,4))
arr

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

In [39]:
arr[[1,5,7,2],[0,3,1,2]]

# Select (1,0) ...(5,3)...(7,1)...(2,2)

array([ 4, 23, 29, 10])

#### Transposing Arrays and Swapping Axes

Transposing is a special form of reshaping the matrix where row --> columns and vice versa

In [40]:
arr = np.arange(15).reshape((3,5))

arr

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [41]:
arr.T

array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14]])

To compute **inner matrix** which is a matrix times the transpose of itself, use the `np.dot(arr.T, arr)` method:

In [42]:
arr = np.random.randn(6,3)
print(arr)

np.dot(arr.T, arr)

[[ 0.2166502   0.22676602 -1.13728509]
 [-0.22599891  0.04464421 -0.0314991 ]
 [ 0.72588707  0.86527759 -1.28447265]
 [-0.1523244   1.84638518 -0.76626155]
 [ 0.48044259  0.07202031  1.13593044]
 [ 0.87561737  0.11036596  0.06880243]]


array([[ 1.64565845,  0.51712363, -0.44894207],
       [ 0.51712363,  4.22862703, -2.69613974],
       [-0.44894207, -2.69613974,  4.82650807]])

In [43]:
arr = np.arange(6).reshape(2,3)
print arr
arr1= np.arange(12).reshape(3,4)
print''
print arr1
print''
# Multiplication 2x3 * 3x4 = 2x4
np.dot(arr,arr1)

[[0 1 2]
 [3 4 5]]

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]



array([[20, 23, 26, 29],
       [56, 68, 80, 92]])

We can also creater 3D arrays using a tuple to pass the axis numbers:

In [44]:
arr= np.arange(16).reshape(2,2,4)
arr

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

In [45]:
arr.transpose((1,0,2))

# Transpose dimension 1, row 0, with row 2?

array([[[ 0,  1,  2,  3],
        [ 8,  9, 10, 11]],

       [[ 4,  5,  6,  7],
        [12, 13, 14, 15]]])

ndarray has the method `swapaxes` which takes a pair of axis numbers

In [46]:

arr.swapaxes(1,2)

# Means axis 1 becomes axis 2 and axis 2 becomes axis 1

array([[[ 0,  4],
        [ 1,  5],
        [ 2,  6],
        [ 3,  7]],

       [[ 8, 12],
        [ 9, 13],
        [10, 14],
        [11, 15]]])

### Universal Functions: Elementwise array functions

or aka ufunc's

In [47]:
arr = np.arange(10)

#Square Root
print(np.sqrt(arr))
print''
# Compute the exponent e^x of each element
print(np.exp(arr))
print ''
# Round elements to the nearest integer while preserving the dtype
np.rint(np.random.randn(8))
print''
# Return boolean array indicating which elemnet is NaN
np.isnan(arr)


[ 0.          1.          1.41421356  1.73205081  2.          2.23606798
  2.44948974  2.64575131  2.82842712  3.        ]

[  1.00000000e+00   2.71828183e+00   7.38905610e+00   2.00855369e+01
   5.45981500e+01   1.48413159e+02   4.03428793e+02   1.09663316e+03
   2.98095799e+03   8.10308393e+03]




array([False, False, False, False, False, False, False, False, False, False], dtype=bool)

In [48]:
x = np.random.randn(8)
y=np.random.randn(8)
print x, y

#Binary ufuncs which takes 2 arrays and returns 1 array after comparing the 2 arrays

print(np.maximum(x,y))

[-0.01813621 -0.97034562 -0.75139559 -0.66368541 -1.62470233  0.04586162
 -1.91296103  0.77605912] [ 1.36162833  1.62906031  1.74887393  0.49287437  0.64549954  1.39076663
  1.41258785  0.43283502]
[ 1.36162833  1.62906031  1.74887393  0.49287437  0.64549954  1.39076663
  1.41258785  0.77605912]


In [49]:
np.add(x,y)

array([ 1.34349212,  0.65871469,  0.99747835, -0.17081105, -0.97920279,
        1.43662826, -0.50037317,  1.20889414])

In [50]:
np.subtract(x,y)

array([-1.37976455, -2.59940593, -2.50026952, -1.15655978, -2.27020188,
       -1.34490501, -3.32554888,  0.3432241 ])

In [51]:
np.multiply(x,y)

array([-0.02469478, -1.58075153, -1.31409616, -0.32711353, -1.04874461,
        0.06378282, -2.70222551,  0.33590556])

In [52]:
x = np.array([1,2,3,4])
y = np.array([2,3,4,5])

#Raises elements in first array to powers indicated in 2nd array

np.power(x,y)

array([   1,    8,   81, 1024])

In [53]:
# Perform element-wise comparison yielding a boolean array

print(np.greater(x,y))
print(np.less(x,y))
print(np.less_equal(x,y))
print(np.greater_equal(x,y))
print(np.not_equal(x,y))

[False False False False]
[ True  True  True  True]
[ True  True  True  True]
[False False False False]
[ True  True  True  True]


The `np.meshgrid` function takes two 1D arrays and outputs two 2D matrices corresponding to all pairs of (x,y) in the two arrays so we can evaluate functions that involve more than one matrix

In [54]:
points=np.arange(-5,5,0.01)

xs, ys = np.meshgrid(points, points)
print ys

[[-5.   -5.   -5.   ..., -5.   -5.   -5.  ]
 [-4.99 -4.99 -4.99 ..., -4.99 -4.99 -4.99]
 [-4.98 -4.98 -4.98 ..., -4.98 -4.98 -4.98]
 ..., 
 [ 4.97  4.97  4.97 ...,  4.97  4.97  4.97]
 [ 4.98  4.98  4.98 ...,  4.98  4.98  4.98]
 [ 4.99  4.99  4.99 ...,  4.99  4.99  4.99]]


In [55]:
import matplotlib.pyplot as plt

z = np.sqrt(xs ** 2 + y ** 2)
z

ValueError: operands could not be broadcast together with shapes (1000,1000) (4,) 

The `numpy.where` function is a vectorized version of the expression `x if condition else y`. 

In [None]:
xarr = np.array([1.1,1.2,1.3,1.4,1.5,])
yarr = xarr + 1

print xarr, yarr

cond = np.array([True, False, True, True, False])

result = np.where(cond, xarr, yarr) #np.where(condition,value if True, value if False)
print result

In [None]:
# Task: Replace all positive values with 2 and all negative values with -2

arr = np.random.randn(4,4)
print arr

print(np.where(arr > 0, 2, -2))
print(np.where(arr > 0, "positive", "negative"))

# Set only positive values to 2. If negative, do nothing

print(np.where(arr > 0, 2, arr))

Nested If Statements can be expressed in nested `np.where` statements

In [None]:
np.where(cond1 & cond2, 0, 
         np.where(cond1,1, 
                  np.where(cond2, 2, 3)))

### Math and Stat Methods

In [None]:
arr = np.arange(20).reshape(5,4)
print arr
print''
print(arr.mean())
print(np.mean(arr))
print''

print arr.sum()
print(np.sum(arr))
print''

# Optional axis argument which computes the method over a given axis
print arr.mean(axis=0) # Computes mean by columns
print arr.mean(axis=1) # Computes mean by rows
print arr.sum(0)

In [None]:
# cumsum and cumprod does not aggregate. It produces an array of intermediate, cascading, cumulative results.

arr = np.arange(9).reshape(3,3)
print(arr)

arr.cumsum(0) # 0, 1, 2.... 0+3, 1+4, 2+5..... 0+3+9, 1+5+12, 2+7+15

In [None]:
arr.cumprod(1)

In [None]:
print(arr.std)
print(arr.std())
print(arr.var())
print(arr.min())
print(arr.max())
print(arr.argmin()) # Return the index of min element
print(arr.argmax()) # Return the index of max element

### Methods for Boolean Arrays (T/F's)

`sum` is often used as a means of counting `True` values in a boolean array. 


In [None]:
arr = np.random.randn(100)
print arr

# How many values are positive?

print((arr > 0).sum())

`any` method tests whether 1 or more values in an array is True.

`all` method tests if every value is True

In [None]:
bools=np.array([False, False, True, False])

# Are there any True's in the array?
print(bools.any())

# Are all the values True in the array?
print(bools.all())

### Sorting 

Arrays can be sorted in replacement using the `sort()` method.

In [None]:
arr = np.random.randn(8)
print(arr)

arr.sort()
print(arr)

Multidimensional arrays can have each 1D section of values sorted in replacement along an axis by passing the axis # to sort

In [None]:
arr = np.random.randn(5,3)
print(arr)
print''

#To sort by row use axis = 1

arr.sort(1)
print(arr)
print''
# To sort by column use axis = 0 
arr.sort(0)
print(arr)

In [None]:
# To sort without replacement, use np.sort()
arr = np.random.randn(5,3)
print(arr)
print''
print(np.sort(arr))
print''
print(arr)

A quick way to compute the quantiles of an array is to 

1) sort the array

2) select the __% value using 5% times the length of the array

In [None]:
large_arr = np.random.randn(1000)

large_arr.sort()

large_arr[int(0.05*len(large_arr))] #5% quantile

#### Unique and Other Set Logic

In [None]:
# Return a list of sorted, unique values in an array

names = np.array(['Bob','Joe','Will','Bob','Will','Joe','Joe'])
print(np.unique(names))

ints = np.array([3,3,3,2,2,2,1,1,5,4])
np.unique(ints)

In [None]:
# Test the membership of values in 1 array vs. another. Return True if the array has the value, False if not

values = np.array([6,0,0,3,2,5,6])
print(np.in1d(values, [2,3,5])) 

### File Input and Output with Arrays

`np.save` saves array data on disk as an uncompressed raw binary format with file extension .npy

`np.load` loads array saved in a file

In [None]:
arr = np.arange(10)
np.save('some_array.npy', arr) # argument: (file name, array)

# Load the array from file
np.load('some_array.npy')

We can save multiple arrays in 1 file in a zip archive using np.savez and passing the arrays as keyword arguments. File extension is .npz.

In [None]:
arr1=np.arange(20)

np.savez('arrays.npz', one=arr, two=arr1)

arch = np.load('arrays.npz')
print(arch['one'])
print(arch['two'])

Load data into ndarrays using np.loadtxt or np.genfromtxt

In [59]:
%cd "C:\Users\sonya\Documents\Python for Data Analysis\data"

#!text.txt  # Open file

arr = np.loadtxt('text.txt',delimiter=',', dtype='string')
print(arr)

C:\Users\sonya\Documents\Python for Data Analysis\data
['hola' 'hello' 'hey' 'howdy' 'ello']


### Linear Algebra

Includes matrix multiplication, decompositions, determinants, and other square matrix math.

With multiplication, the # of rows in matrix 1 needs to match the # of columns in matrix 2. The resulting dimension is the rows of matrix 1 by the columns of matrix 2

In [69]:
x = np.array([[1.,2.,3.],[4.,5.,6.]])
y= np.array([[6., 23.],[-1,7],[8,9]])

print x
print y

# Multiply Matrices X times Y
print(x.dot(y))  # or np.dot(x,y)
print ''
# Multiply matrices Y time X
print(y.dot(x))

[[ 1.  2.  3.]
 [ 4.  5.  6.]]
[[  6.  23.]
 [ -1.   7.]
 [  8.   9.]]
[[  28.   64.]
 [  67.  181.]]

[[  98.  127.  156.]
 [  27.   33.   39.]
 [  44.   61.   78.]]


In [71]:
print(np.ones(3))
print(x)

[ 1.  1.  1.]
[[ 1.  2.  3.]
 [ 4.  5.  6.]]


In [73]:
# A matrix product between a 2D array and a 1D array results in a 1D array 
np.dot(x,np.ones(3))

array([  6.,  15.])

`numpy.linalg` has matrix decompositions, inverses, and determinants. 

In [91]:
from numpy.linalg import inv, qr

X = np.random.randn(5,5)
print X
print''

# Transpose
mat = X.T
print "Transposed Matrix is where rows become columns and vice versa:","\n",mat
print''
# Transpose of matrix x itself = Inner Matrix
# Where the diagonal values are unique, and it's symmetrical along the diagonal
mat = X.T.dot(X)
print "Inner matrix is symmetrical along the diagonal axis:","\n",mat

[[-2.35000831  1.13964429  0.82073302  0.20367248 -0.53045475]
 [-0.08773397  0.23838804 -0.98029048  0.48833804 -0.4673859 ]
 [-0.03395577 -0.52969165  0.14824204  0.70081477 -0.33662501]
 [ 0.22188085 -0.37557086 -0.67979429 -0.34203564 -1.0792081 ]
 [-0.32459371  0.49844851 -1.17972069  2.12111182 -0.41600345]]

Transposed Matrix is where rows become columns and vice versa: 
[[-2.35000831 -0.08773397 -0.03395577  0.22188085 -0.32459371]
 [ 1.13964429  0.23838804 -0.52969165 -0.37557086  0.49844851]
 [ 0.82073302 -0.98029048  0.14824204 -0.67979429 -1.17972069]
 [ 0.20367248  0.48833804  0.70081477 -0.34203564  2.12111182]
 [-0.53045475 -0.4673859  -0.33662501 -1.0792081  -0.41600345]]

Inner matrix is symmetrical along the diagonal axis: 
[[ 5.68598149 -2.92622742 -1.61566173 -1.30966327  1.19458556]
 [-2.92622742  2.02569559  0.29041251  1.16303604 -0.33967866]
 [-1.61566173  0.29041251  3.51040901 -2.47746783  1.19731762]
 [-1.30966327  1.16303604 -2.47746783  5.38720161 -1.085455

In [81]:
# Inverse
inv(mat)

array([[ 7.22158871,  1.23520723,  0.19659844,  2.97747064,  4.25441033],
       [ 1.23520723,  0.4115676 ,  0.32986617,  0.74765313,  0.64662635],
       [ 0.19659844,  0.32986617,  1.17397359,  1.43216857,  0.05044663],
       [ 2.97747064,  0.74765313,  1.43216857,  3.3245876 ,  1.8463446 ],
       [ 4.25441033,  0.64662635,  0.05044663,  1.8463446 ,  2.68856437]])

In [95]:
print mat.dot(inv(mat)) 
print''
print "Identity Matrix:","\n",np.rint(mat.dot(inv(mat))) 
# Becomes an identity matrix when you multiple Inner matrix by its inverse

[[  1.00000000e+00  -2.28455375e-15   9.69278167e-15   5.19640292e-16
    8.77519983e-15]
 [ -9.62848902e-15   1.00000000e+00  -2.57248311e-14  -1.90485784e-15
    7.71765687e-15]
 [  1.12836375e-15   1.28646115e-14   1.00000000e+00  -2.54215521e-16
   -9.69021297e-15]
 [  1.27659108e-14   1.29191983e-15  -2.68233175e-16   1.00000000e+00
   -8.16636431e-16]
 [  9.53204876e-15  -9.31240805e-15   2.44900875e-15   3.18381339e-15
    1.00000000e+00]]

Identity Matrix: 
[[ 1.  0.  0.  0.  0.]
 [ 0.  1.  0.  0.  0.]
 [ 0.  0.  1.  0.  0.]
 [ 0.  0.  0.  1.  0.]
 [ 0.  0.  0.  0.  1.]]


In [116]:
identity = np.rint(mat.dot(inv(mat))) 

# Retrieve the diagonal elements from a square matrix as a 1D array
print np.diag(identity)

# Compute the sum of the diagonal elements
print np.trace(identity)

# Compute the matrix determinant
arr = np.arange(4).reshape(2,2)
print "Determinant of matrix: ", np.linalg.det(arr)
# Another example
print''
a = np.array([ [[1, 2], [3, 4]], [[1, 2], [2, 1]], [[1, 3], [3, 1]] ])
print a
print "Determinant of the matrices:",'',np.linalg.det(a)
print''

# Calculate the eigenvalues and eigenvectors of a square matrix
print "Eigenvectors:","\n",np.linalg.eig(a)

[ 1.  1.  1.  1.  1.]
5.0
Determinant of matrix:  -2.0

[[[1 2]
  [3 4]]

 [[1 2]
  [2 1]]

 [[1 3]
  [3 1]]]
Determinant of the matrices:  [-2. -3. -8.]

Eigenvectors: 
(array([[-0.37228132,  5.37228132],
       [ 3.        , -1.        ],
       [ 4.        , -2.        ]]), array([[[-0.82456484, -0.41597356],
        [ 0.56576746, -0.90937671]],

       [[ 0.70710678, -0.70710678],
        [ 0.70710678,  0.70710678]],

       [[ 0.70710678, -0.70710678],
        [ 0.70710678,  0.70710678]]]))


In [119]:
q, r = qr(mat)

print r
print''
print q

[[-6.82977363  3.65521487  1.61544479  2.22545714 -1.40619635]
 [ 0.         -0.92629782  3.08468588 -4.28429546  0.67057332]
 [ 0.          0.         -3.23483248  3.48359408 -2.28838134]
 [ 0.          0.          0.         -1.99062733 -0.54409875]
 [ 0.          0.          0.          0.          0.02023795]]

[[-0.83252854 -0.12614009 -0.0365845  -0.06536505 -0.53420397]
 [ 0.4284516  -0.49618269 -0.34896408  0.3519537  -0.56972329]
 [ 0.23656153  0.61996339 -0.37586633 -0.483037   -0.43021366]
 [ 0.19175793 -0.49888879  0.38590159 -0.7428518  -0.11657635]
 [-0.17490851 -0.32349157 -0.76595699 -0.29445282  0.43745587]]


### Random Number Generation

`numpy.random` module provides whole arrays of sample values from different kinds of probability distributions

In [126]:
# Return a 4x4 array of samples from a standard normal distribution using normal

samples = np.random.normal(size=(4,4))
print samples

[[ 0.81369624  0.47260929 -1.69542758  0.12338307]
 [-1.25890493  0.99454322 -0.7636952   0.06396738]
 [-0.15322813 -1.29329469  1.15387651 -0.51412826]
 [-0.64184537 -0.28487735 -2.30563504  0.39912537]]


In [127]:
# Draw samples from a uniform distribution
samples = np.random.rand(8).reshape(2,4)
print samples

[[ 0.07890544  0.16212604  0.23649292  0.53175503]
 [ 0.82264809  0.64595633  0.31380186  0.58712089]]


In [130]:
# Draw sample from a given low to high range with parameters: (low, high=None, size=None, dtype='l')

print np.random.randint(0,high=100,size=100)

[28 77 89 24 31 19 51 41  5 42 44 35 23 34 49  6 77 88 78 48 83 74 50 59 57
 29  7 42 77 33 76 53 90 66 39 50 84 46 12 14 87 74 96 42 58 40 59 85 72 93
  3 96  5 55 62 12 23 82 27 97 35 42 49 83 68 11 78 25 68 91 62 31 20 27 15
 14 31 65 11  1 19 50 41 71 23 88  5 18 57 16  1 77 83 25 55 53 29 17 16 79]


In [134]:
# Draw samples from a chi-square distribution with parameters : (df, size=None) as in degrees of freedom

print np.random.chisquare(10,size=100)

[  6.15726118   8.45996793   8.18585131  10.17199982  14.91242945
   6.40009675   6.09334898  10.95418069  11.31603915   7.96804831
  15.58553698   8.77145325   7.9856113    6.79911994   4.64713197
   6.17292514  12.86917202  24.266456     9.63136479  10.22511903
   8.25870989   6.26336658   6.330398    15.45797333   5.91758537
  13.55481199  12.36062908   5.61693483   7.79958091   6.57494748
  13.63562656   5.98252559   8.95040865  12.97043056   5.62752669
  11.30853827   7.63598786  16.25954407   5.70329712   2.20900636
   6.88306536  17.78085994   6.47969926  11.59154448   7.75949431
  19.08057599   5.80539818  24.32356968   5.82730183   4.3688964
  14.10227913   5.59197415  10.88705702   4.39350512   7.40903364
   9.52864581   8.19335475   7.9363302    4.69269732  12.77377056
  20.99083292  15.01625468   5.691151     5.98533936   2.61268416
   5.25665389  22.1594896    7.52444235  14.79162866  22.79632749
   7.95075973   9.87675986   9.80925996   6.4371524    9.84734178
   9.126169

In [136]:
# Draw from a uniform (0,1) distribution with parameters: (low=0.0, high=1.0, size=None)

print np.random.uniform(size = 10)

[ 0.64419279  0.72023621  0.50224517  0.60981581  0.96531622  0.80886793
  0.21554368  0.3366486   0.86566622  0.90945398]


### Example: Random Walks

In [142]:
nsteps = 1000 

draws = np.random.randint(0,2,size=nsteps)
print draws
steps = np.where(draws > 0 , 1, -1)
print steps
walk = steps.cumsum()


[0 0 0 0 1 0 1 1 1 0 1 1 1 0 0 1 1 1 0 1 0 1 0 1 1 1 0 0 0 0 1 1 1 0 1 0 0
 0 1 0 1 0 1 1 1 0 0 0 0 1 0 0 0 0 1 0 1 1 1 0 0 1 1 1 0 1 1 1 1 1 1 1 1 0
 0 1 0 0 0 1 1 0 0 1 1 1 0 0 1 1 1 0 1 1 0 1 0 1 0 1 1 1 0 1 0 1 0 1 1 1 1
 1 1 0 0 1 1 0 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 0 1 0 0 1 1 0 1 1 1 0 1 0 1 1
 0 1 0 1 1 1 0 1 1 0 1 0 0 0 1 0 1 1 0 0 1 1 0 0 1 1 0 0 1 0 0 0 0 0 1 1 1
 0 0 0 0 1 0 1 1 0 0 0 0 1 0 1 0 0 0 1 1 1 0 1 0 0 1 1 1 1 0 1 1 1 1 1 0 1
 0 0 0 1 1 1 0 0 1 1 0 1 1 1 1 1 0 0 1 0 0 0 0 1 0 1 1 0 0 1 0 1 0 1 1 1 0
 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 0 0 1 0 0 1 0 0 1 0 0 1
 0 0 1 1 0 0 1 1 1 1 1 1 1 0 0 0 0 1 0 1 1 0 0 1 1 0 1 1 1 1 0 0 1 1 0 0 0
 1 0 0 0 1 1 1 1 1 1 1 0 0 1 0 1 0 0 0 1 0 1 0 0 1 1 0 1 0 0 1 1 0 1 1 1 1
 1 0 1 0 1 1 0 0 0 0 1 1 1 1 0 1 0 0 0 0 0 0 0 1 1 0 1 1 1 0 0 1 1 1 1 0 0
 0 0 0 1 1 1 1 0 0 1 1 0 1 1 1 0 1 1 1 0 0 0 0 1 0 0 0 1 1 1 0 0 0 0 0 0 1
 0 1 1 0 1 0 0 1 0 0 0 1 0 0 0 1 0 1 0 1 1 1 1 0 1 1 1 1 1 0 0 0 0 0 1 0 1
 0 1 1 0 0 0 1 0 1 0 1 0 

In [141]:
print walk

[ 1  2  3  2  3  2  3  2  1  2  1  2  3  4  5  4  3  4  5  4  5  6  5  6  7
  6  5  6  7  8  7  8  7  8  7  6  7  6  5  6  7  8  7  8  9  8  7  6  5  6
  5  6  5  4  5  4  5  6  5  6  5  4  5  4  3  2  3  4  3  4  5  4  3  4  5
  6  5  4  5  4  3  4  5  6  5  6  7  8  9  8  9  8  9  8  7  8  9  8  9  8
  7  8  9 10  9  8  7  8  9 10 11 12 11 10  9  8  7  8  7  6  5  4  5  6  7
  8  7  6  5  4  5  6  5  4  5  6  5  6  5  6  7  8  9 10 11 12 11 12 11 12
 13 14 15 16 17 18 17 18 17 18 19 20 21 22 21 20 21 20 19 20 21 22 21 22 23
 22 23 24 25 26 25 26 27 26 27 28 29 28 29 30 29 28 27 26 25 24 25 26 25 24
 23 24 23 24 23 24 25 26 25 24 25 24 23 24 23 22 23 24 23 24 23 24 23 22 21
 20 19 20 19 18 19 18 17 16 15 16 15 16 17 16 15 14 15 16 15 16 15 16 15 14
 13 14 15 16 17 18 19 18 19 18 19 18 19 18 17 18 19 20 21 20 19 18 17 16 15
 14 13 14 15 16 15 14 15 16 17 18 19 18 19 20 21 20 19 20 19 18 19 18 19 20
 21 20 19 20 19 18 19 20 21 22 21 20 21 20 19 20 19 18 17 16 15 16 15 14 13
 12 13 14 15

In [144]:
print walk.min(), walk.max()

-22 36


In [145]:
# How long did it took the random walk to get to at least 10 steps away from the origin 0 in either direction?

np.abs(walk) >=10

array([False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False,  True, False,  True, False,  True, False,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,

In [153]:
print "Point at which 10 steps  were taken from 0: ",(np.abs(walk) >= 10).argmax()

Point at which 10 steps  were taken from 0:  101


#### Example: Simulating Many Random Walks/Trials at Once

In [155]:
nwalks = 5000

nsteps = 1000
# Pass a 2-tuple to generate a 2D array of draws
draws = np.random.randint(0,2, size=(nwalks, nsteps))
draws

array([[1, 1, 0, ..., 0, 1, 1],
       [0, 0, 0, ..., 1, 1, 0],
       [1, 1, 1, ..., 0, 0, 1],
       ..., 
       [0, 0, 0, ..., 0, 1, 1],
       [1, 0, 0, ..., 1, 1, 0],
       [0, 1, 0, ..., 0, 0, 1]])

In [157]:
steps = np.where(draws > 0 , 1, -1)

walks = steps.cumsum(1)

walks

array([[  1,   2,   1, ..., -10,  -9,  -8],
       [ -1,  -2,  -3, ..., -80, -79, -80],
       [  1,   2,   3, ...,  22,  21,  22],
       ..., 
       [ -1,  -2,  -3, ...,  46,  47,  48],
       [  1,   0,  -1, ...,   6,   7,   6],
       [ -1,   0,  -1, ...,  46,  45,  46]])

In [158]:
print walks.max(), walks.min()

124 -118


In [160]:
hits30 = (np.abs(walks) >= 30).any(1) #First to hit 30
hits30

array([False,  True, False, ...,  True, False,  True], dtype=bool)

In [161]:
hits30.sum()

3362

In [164]:
crossing_times = (np.abs(walks[hits30]) >=30).argmax(1) # Returns index of max element

crossing_times.mean()

499.45508625817968