# Playing with NumPy
* Although I've used NumPy for a while now I've not learned all of its functionality.
* Based on [Python for Data Analysis](http://shop.oreilly.com/product/0636920023784.do) by Wes McKinney

In [1]:
%install_ext https://raw.githubusercontent.com/rasbt/watermark/master/watermark.py

Installed watermark.py. To use it, type:
  %load_ext watermark




In [2]:
%load_ext watermark

In [3]:
import numpy as np

In [4]:
%watermark -d -v -m -p numpy,pandas,scipy,matplotlib

11/16/2015 

CPython 3.5.0
IPython 4.0.0

numpy 1.10.1
pandas 0.17.0
scipy 0.16.0
matplotlib 1.4.3

compiler   : GCC 4.2.1 (Apple Inc. build 5577)
system     : Darwin
release    : 14.4.0
machine    : x86_64
processor  : i386
CPU cores  : 8
interpreter: 64bit


In [5]:
data2 = [[1,2,3,4],[5,6,7,8]]
arr2 = np.array(data2)

In [6]:
arr2

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

In [7]:
arr2.ndim

2

In [8]:
arr2.shape

(2, 4)

In [9]:
arr2.dtype

dtype('int64')

In [10]:
np.empty((2,3,2))

array([[[  0.00000000e+000,  -2.68156175e+154],
        [  2.96439388e-323,   2.49653716e+237],
        [  0.00000000e+000,   0.00000000e+000]],

       [[  3.95363904e-207,   0.00000000e+000],
        [  0.00000000e+000,  -1.85611574e+204],
        [  0.00000000e+000,   8.34402697e-309]]])

# Operations between Arrays and Scalars
Arrays are important because they enable you to express batch operations on data without writing any for loops. This is usually called **vectorization**. Any arithmetic operations between equal-sized arrays applies the operation elementwise:

In [16]:
arr = np.array([[1., 2., 3.], [5., 7., 9.]])

In [17]:
arr

array([[ 1.,  2.,  3.],
       [ 5.,  7.,  9.]])

In Python 3.5 and NumPy 1.10 there is a new matrix binary operator

In [19]:
arr @ arr.T

array([[  14.,   46.],
       [  46.,  155.]])

In [20]:
1/arr

array([[ 1.        ,  0.5       ,  0.33333333],
       [ 0.2       ,  0.14285714,  0.11111111]])

In [21]:
arr ** 0.5

array([[ 1.        ,  1.41421356,  1.73205081],
       [ 2.23606798,  2.64575131,  3.        ]])

# Universal Functions: Fast Element-wise Array Functions
A universal function, or *ufunc*, is a function that performs elementwise operations on data in ndarrays. You can think of them as fast vectorized wrappers for simple functions that take one of more scalar values and produce one or more scalar results. Many examples are like sqrt or exp

In [22]:
arr_2 = np.arange(20)

In [23]:
np.sqrt(arr_2)

array([ 0.        ,  1.        ,  1.41421356,  1.73205081,  2.        ,
        2.23606798,  2.44948974,  2.64575131,  2.82842712,  3.        ,
        3.16227766,  3.31662479,  3.46410162,  3.60555128,  3.74165739,
        3.87298335,  4.        ,  4.12310563,  4.24264069,  4.35889894])

In [25]:
x = np.random.randn(9)
y = np.random.randn(9)

In [26]:
np.maximum(x, y)

array([-1.09886578,  1.14519147, -0.47733132, -0.50168259, -0.81878711,
        2.51633349,  1.06050518,  0.33874671,  1.29634673])

# Linear Algebra
Linear algebra, like matrix multiplication, decompositions, determinants, and other square matrix math, is an important part of any array library. Unlike some languages like MATLAB, multiplying two two-dimensinonal arrays with $*$ is elementwise instead of a matrix dot product. So we'll rewrite the Pandas examples from Wes McKinney with the new **@** operator.

In [27]:
x = np.array([[1., 2., 3.], [4., 5., 6.]])

In [28]:
y = np.array([[6., 23.], [-1, 7], [8, 9]])

In [29]:
x

array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.]])

In [30]:
y

array([[  6.,  23.],
       [ -1.,   7.],
       [  8.,   9.]])

In [31]:
x @ y

array([[  28.,   64.],
       [  67.,  181.]])

A nice thing about these matrix decompositions is that they are implemtned under the hood using the same industry-standard libraries used in other languages such as MATLAB and R, such as BLAS, LAPACK, or possibly (depending on your NumPy build) the Intel MKL:

In [44]:
from numpy.linalg import inv, qr, solve

In [45]:
X = np.random.randn(5, 5)

In [46]:
mat = X.T @ X

In [47]:
mat

array([[  6.42234924,   2.08150619,   0.03211693,   2.66338807,
         -0.33702587],
       [  2.08150619,   9.2522974 ,   4.49475303,  -4.7292663 ,
         -2.22419419],
       [  0.03211693,   4.49475303,   7.35230855,  -0.1390147 ,   0.9943707 ],
       [  2.66338807,  -4.7292663 ,  -0.1390147 ,  10.35496047,
          2.15878809],
       [ -0.33702587,  -2.22419419,   0.9943707 ,   2.15878809,
          1.75132664]])

In [48]:
inv(mat)

array([[ 0.361194  , -0.45142924,  0.33029988, -0.20256795, -0.44165023],
       [-0.45142924,  1.25344022, -0.98194916,  0.33027445,  1.65541936],
       [ 0.33029988, -0.98194916,  0.9247809 , -0.22180807, -1.4351774 ],
       [-0.20256795,  0.33027445, -0.22180807,  0.25701021,  0.18960068],
       [-0.44165023,  1.65541936, -1.4351774 ,  0.18960068,  3.16954993]])

In [52]:
solve(mat, inv(mat))

array([[  0.67943618,  -1.85125015,   1.54681555,  -0.43132427,
         -2.81910453],
       [ -1.85125015,   5.58861936,  -4.7370874 ,   1.12198117,
          8.99316886],
       [  1.54681555,  -4.7370874 ,   4.03747484,  -0.92546231,
         -7.68956068],
       [ -0.43132427,   1.12198117,  -0.92546231,   0.30131647,
          1.60421898],
       [ -2.81910453,   8.99316886,  -7.68956068,   1.60421898,
         15.07719752]])

In [53]:
q, r = qr(mat)

In [54]:
r

array([[ -7.2655001 ,  -2.88004555,  -1.2515143 ,  -4.69457804,
          0.22060252],
       [  0.        , -11.36480608,  -6.11904856,   9.33857417,
          2.66440145],
       [  0.        ,   0.        ,  -6.02157223,  -4.9458143 ,
         -2.54484136],
       [  0.        ,   0.        ,   0.        ,  -2.7668364 ,
          0.29439234],
       [  0.        ,   0.        ,   0.        ,   0.        ,
          0.25753703]])