<h1>Getting Started with Matrices</h1>

Let's import a matrix from the provided file.  It can be found at https://archive.ics.uci.edu/ml/datasets/seeds

In [3]:
import numpy as np
seeds = np.loadtxt("seeds_dataset.txt",dtype=np.float64)

Column meanings for the seeds dataset
1. area A, 
2. perimeter P, 
3. compactness C = 4*pi*A/P^2, 
4. length of kernel, 
5. width of kernel, 
6. asymmetry coefficient 
7. length of kernel groove. 

In [5]:
print seeds[1:10,:]

[[ 14.88    14.57     0.8811   5.554    3.333    1.018    4.956    1.    ]
 [ 14.29    14.09     0.905    5.291    3.337    2.699    4.825    1.    ]
 [ 13.84    13.94     0.8955   5.324    3.379    2.259    4.805    1.    ]
 [ 16.14    14.99     0.9034   5.658    3.562    1.355    5.175    1.    ]
 [ 14.38    14.21     0.8951   5.386    3.312    2.462    4.956    1.    ]
 [ 14.69    14.49     0.8799   5.563    3.259    3.586    5.219    1.    ]
 [ 14.11    14.1      0.8911   5.42     3.302    2.7      5.       1.    ]
 [ 16.63    15.46     0.8747   6.053    3.465    2.04     5.877    1.    ]
 [ 16.44    15.25     0.888    5.884    3.505    1.969    5.533    1.    ]]


Attributes for matrices, try tab completion to see available methods

In [53]:
print type(seeds)
print seeds.ndim
print seeds.shape
print seeds.size
print seeds.dtype

<type 'numpy.ndarray'>
2
(210L, 8L)
1680
float64


In [7]:
n,p = seeds.shape #Remember tuple unpacking!

Let's initialize another array with a list

In [13]:
onelist = [1.0]*n
onevec = np.array(onelist)
print onevec.dtype
print onevec.shape
print onevec

float64
(210,)
[ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]


We can transpose really easily...

In [60]:
print seeds.T.shape
print seeds.T

(8L, 210L)
[[ 15.26    14.88    14.29   ...,  13.2     11.84    12.3   ]
 [ 14.84    14.57    14.09   ...,  13.66    13.21    13.34  ]
 [  0.871    0.8811   0.905  ...,   0.8883   0.8521   0.8684]
 ..., 
 [  2.221    1.018    2.699  ...,   8.315    3.598    5.637 ]
 [  5.22     4.956    4.825  ...,   5.056    5.044    5.063 ]
 [  1.       1.       1.     ...,   3.       3.       3.    ]]


What should happen if we do the following?

In [65]:
print seeds.T.dot(onevec)
print np.sum(seeds,axis=0)

[ 3117.98    3057.45     182.9097  1181.992    684.307    777.0422
  1135.695    420.    ]
[ 3117.98    3057.45     182.9097  1181.992    684.307    777.0422
  1135.695    420.    ]


In [15]:
output = [0.]*p
output

[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

Let's think about a way to do this without numpy...

In [43]:
def mult_ones(seeds):
    output = [0.]*p
    for row in seeds: #arrays are iterable and return the rows
        for i in range(p):
            output[i] += row[i]
    return output

In [48]:
%timeit output = mult_ones(seeds)
print output

1000 loops, best of 3: 544 µs per loop
[3117.9800000000014, 3057.4500000000007, 182.90969999999999, 1181.9920000000004, 684.30700000000002, 777.04219999999987, 1135.6949999999999, 420.0]


In [49]:
%timeit seeds.T.dot(onevec)

The slowest run took 22.59 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 1.85 µs per loop


This is quite a bit faster!

<h1>Matrix Slicing</h1>

In [112]:
print seeds.shape
seeds.shape = (n*p) #oops!
print seeds
seeds = seeds.reshape((n,p))
print seeds

(210L, 8L)
[ 15.26   14.84    0.871 ...,   5.637   5.063   3.   ]
[[ 15.26    14.84     0.871  ...,   2.221    5.22     1.    ]
 [ 14.88    14.57     0.8811 ...,   1.018    4.956    1.    ]
 [ 14.29    14.09     0.905  ...,   2.699    4.825    1.    ]
 ..., 
 [ 13.2     13.66     0.8883 ...,   8.315    5.056    3.    ]
 [ 11.84    13.21     0.8521 ...,   3.598    5.044    3.    ]
 [ 12.3     13.34     0.8684 ...,   5.637    5.063    3.    ]]


In [116]:
print seeds[10:20,:]
print seeds[:,0:4]
print seeds[10:20,0:4]

print seeds[:,3:8].shape
range(20)[3:10]

(210L, 5L)


[3, 4, 5, 6, 7, 8, 9]

In [117]:
print seeds.sum()
print seeds.mean()
print seeds.min()
print seeds.max()

10557.3759
6.28415232143
0.7651
21.18


In [119]:
ran = np.arange(20)
print ran
ran.mean()

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]


9.5

In [91]:
print seeds.sum(axis=0)
print seeds.mean(axis=0)
print seeds.min(axis=0)
print seeds.max(axis=0)

[ 3117.98    3057.45     182.9097  1181.992    684.307    777.0422
  1135.695    420.    ]
[ 14.84752381  14.55928571   0.87099857   5.62853333   3.25860476
   3.70020095   5.40807143   2.        ]
[ 10.59    12.41     0.8081   4.899    2.63     0.7651   4.519    1.    ]
[ 21.18    17.25     0.9183   6.675    4.033    8.456    6.55     3.    ]


In [120]:
print seeds.cumsum(axis=0) #really handy!

[[  1.52600000e+01   1.48400000e+01   8.71000000e-01 ...,   2.22100000e+00
    5.22000000e+00   1.00000000e+00]
 [  3.01400000e+01   2.94100000e+01   1.75210000e+00 ...,   3.23900000e+00
    1.01760000e+01   2.00000000e+00]
 [  4.44300000e+01   4.35000000e+01   2.65710000e+00 ...,   5.93800000e+00
    1.50010000e+01   3.00000000e+00]
 ..., 
 [  3.09384000e+03   3.03090000e+03   1.81189200e+02 ...,   7.67807200e+02
    1.12558800e+03   4.14000000e+02]
 [  3.10568000e+03   3.04411000e+03   1.82041300e+02 ...,   7.71405200e+02
    1.13063200e+03   4.17000000e+02]
 [  3.11798000e+03   3.05745000e+03   1.82909700e+02 ...,   7.77042200e+02
    1.13569500e+03   4.20000000e+02]]


In [121]:
myrows = [2,17,97]
print seeds[myrows,0:2]
rowslarge = seeds[:,0]>20
print rowslarge
print seeds[rowslarge,:]

[[ 14.29  14.09]
 [ 15.69  14.75]
 [ 18.98  16.57]]
[False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False  True False False False False  True False
 False False False False  True  True  True False False False False False
 False False False False False False False False False False False False
 False False False False False False  True False False False False  True
  True False False False False False False False  True False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False

<h1>Matrix operations</h1>

In [37]:
decseeds = seeds/10.0 
print decseeds
mulseeds = 15*seeds - 1.
print seeds[:,0:-1] - seeds[:,1:]

[[ 14.88    14.57     0.8811   5.554    3.333    1.018    4.956    1.    ]
 [ 14.29    14.09     0.905    5.291    3.337    2.699    4.825    1.    ]
 [ 13.84    13.94     0.8955   5.324    3.379    2.259    4.805    1.    ]
 [ 16.14    14.99     0.9034   5.658    3.562    1.355    5.175    1.    ]
 [ 14.38    14.21     0.8951   5.386    3.312    2.462    4.956    1.    ]
 [ 14.69    14.49     0.8799   5.563    3.259    3.586    5.219    1.    ]
 [ 14.11    14.1      0.8911   5.42     3.302    2.7      5.       1.    ]
 [ 16.63    15.46     0.8747   6.053    3.465    2.04     5.877    1.    ]
 [ 16.44    15.25     0.888    5.884    3.505    1.969    5.533    1.    ]]
[[ 14.88    14.57     0.8811   5.554    3.333    1.018    4.956 ]
 [ 14.29    14.09     0.905    5.291    3.337    2.699    4.825 ]
 [ 13.84    13.94     0.8955   5.324    3.379    2.259    4.805 ]
 [ 16.14    14.99     0.9034   5.658    3.562    1.355    5.175 ]
 [ 14.38    14.21     0.8951   5.386    3.312    2.462    4.

In [158]:
print np.log(decseeds)
print np.exp(decseeds)
print np.sqrt(decseeds)

[[ 0.42264993  0.39474114 -2.4406984  ..., -1.50462755 -0.65008769
  -2.30258509]
 [ 0.39743294  0.37637953 -2.42916925 ..., -2.28474517 -0.70198613
  -2.30258509]
 [ 0.3569749   0.34288023 -2.40240543 ..., -1.30970376 -0.72877436
  -2.30258509]
 ..., 
 [ 0.27763174  0.31188676 -2.42103085 ..., -0.18452398 -0.68200944
  -1.2039728 ]
 [ 0.16889854  0.27838903 -2.46263648 ..., -1.02220696 -0.68438567
  -1.2039728 ]
 [ 0.20701417  0.28818195 -2.44368793 ..., -0.57323308 -0.6806259
  -1.2039728 ]]
[[ 4.59974101  4.41055265  1.09100577 ...,  1.24869624  1.68539507
   1.10517092]
 [ 4.4282302   4.29306101  1.09210825 ...,  1.10716202  1.64148283
   1.10517092]
 [ 4.17452258  4.0918615   1.09472151 ...,  1.30983346  1.62011964
   1.10517092]
 ..., 
 [ 3.74342138  3.91964073  1.09289485 ...,  2.2967613   1.65798001
   1.34985881]
 [ 3.26741777  3.74716667  1.08894572 ...,  1.43304278  1.65599163
   1.34985881]
 [ 3.42122954  3.79619785  1.09072215 ...,  1.75716199  1.659141
   1.34985881]]
[[ 

In [159]:
seedcentered = seeds - seeds.mean(axis=0)
print seedcentered
seedcov = seedcentered.T.dot(seedcentered) / n
print seedcov

[[  4.12476190e-01   2.80714286e-01   1.42857143e-06 ...,  -1.47920095e+00
   -1.88071429e-01  -1.00000000e+00]
 [  3.24761905e-02   1.07142857e-02   1.01014286e-02 ...,  -2.68220095e+00
   -4.52071429e-01  -1.00000000e+00]
 [ -5.57523810e-01  -4.69285714e-01   3.40014286e-02 ...,  -1.00120095e+00
   -5.83071429e-01  -1.00000000e+00]
 ..., 
 [ -1.64752381e+00  -8.99285714e-01   1.73014286e-02 ...,   4.61479905e+00
   -3.52071429e-01   1.00000000e+00]
 [ -3.00752381e+00  -1.34928571e+00  -1.88985714e-02 ...,  -1.02200952e-01
   -3.64071429e-01   1.00000000e+00]
 [ -2.54752381e+00  -1.21928571e+00  -2.59857143e-03 ...,   1.93679905e+00
   -3.45071429e-01   1.00000000e+00]]
[[  8.42603482e+00   3.76045061e+00   4.16234107e-02   1.21887175e+00
    1.06183083e+00  -9.99573198e-01   1.22925132e+00  -8.20190476e-01]
 [  3.76045061e+00   1.69740663e+00   1.62541799e-02   5.59986190e-01
    4.63845575e-01  -4.24733761e-01   5.69029908e-01  -3.48809524e-01]
 [  4.16234107e-02   1.62541799e-02   

In [160]:
print seedcov.trace()
print np.linalg.eig(seedcov) #TDB

13.6183448104
(array([  1.08364768e+01,   2.31997067e+00,   3.95063452e-01,
         5.42424532e-02,   8.45141665e-03,   2.64188341e-03,
         1.47287013e-03,   2.52487858e-05]), array([[  8.79582631e-01,  -1.26116017e-01,   1.08610184e-03,
         -3.03317473e-01,  -1.38879757e-01,  -1.52061885e-01,
          2.74204862e-01,   2.87709147e-02],
       [  3.93152050e-01,  -6.88744128e-02,  -3.78014389e-02,
          3.68106006e-01,   5.50519006e-01,   5.58355464e-01,
         -2.89509008e-01,  -7.14796114e-02],
       [  4.34850558e-03,   3.19106495e-03,   1.10692713e-02,
         -6.22063083e-02,  -6.73806848e-02,  -3.86618542e-02,
         -3.25631128e-02,  -9.94426218e-01],
       [  1.27612344e-01,  -3.59319716e-02,  -5.20419510e-02,
          4.78550309e-01,   2.85063369e-01,  -7.95326618e-01,
         -1.91705751e-01,  -1.21890190e-02],
       [  1.10732226e-01,  -3.08242317e-03,   5.79051315e-02,
         -3.33897486e-01,  -2.50826296e-01,  -6.20307997e-02,
         -8.950558

Stacking arrays:

In [123]:
ratvec = seeds[:,4] / seeds[:,5]
print ratvec.shape

(210L,)

In [127]:
np.hstack((seeds,ratvec)) #error

ValueError: all the input array dimensions except for the concatenation axis must match exactly

In [130]:
ratmat = ratvec.reshape((n,1))
np.hstack((seeds,ratmat))

array([[ 15.26      ,  14.84      ,   0.871     , ...,   5.22      ,
          1.        ,   1.49122017],
       [ 14.88      ,  14.57      ,   0.8811    , ...,   4.956     ,
          1.        ,   3.2740668 ],
       [ 14.29      ,  14.09      ,   0.905     , ...,   4.825     ,
          1.        ,   1.23638385],
       ..., 
       [ 13.2       ,  13.66      ,   0.8883    , ...,   5.056     ,
          3.        ,   0.38869513],
       [ 11.84      ,  13.21      ,   0.8521    , ...,   5.044     ,
          3.        ,   0.78821568],
       [ 12.3       ,  13.34      ,   0.8684    , ...,   5.063     ,
          3.        ,   0.5275856 ]])

In [133]:
everyten = seeds[0:n:10,:]
print everyten.shape
np.vstack((seeds,everyten))

(21L, 8L)


array([[ 15.26  ,  14.84  ,   0.871 , ...,   2.221 ,   5.22  ,   1.    ],
       [ 14.88  ,  14.57  ,   0.8811, ...,   1.018 ,   4.956 ,   1.    ],
       [ 14.29  ,  14.09  ,   0.905 , ...,   2.699 ,   4.825 ,   1.    ],
       ..., 
       [ 11.41  ,  12.95  ,   0.856 , ...,   4.957 ,   4.825 ,   3.    ],
       [ 10.93  ,  12.8   ,   0.839 , ...,   5.398 ,   5.045 ,   3.    ],
       [ 12.38  ,  13.44  ,   0.8609, ...,   5.472 ,   5.045 ,   3.    ]])

In [13]:
A = np.arange(20)
print(A)
B = A
B.shape = (10,2)
print(A)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
[[ 0  1]
 [ 2  3]
 [ 4  5]
 [ 6  7]
 [ 8  9]
 [10 11]
 [12 13]
 [14 15]
 [16 17]
 [18 19]]


In [135]:
A = np.arange(20)
B = A.copy()
A.shape = (10,2)
B.shape

(20L,)

In [162]:
onesmat = np.ones((4,4))
zerosmat = np.zeros((4,4))
print onesmat
print zerosmat
ident = np.eye(4)
print ident

[[ 1.  1.  1.  1.]
 [ 1.  1.  1.  1.]
 [ 1.  1.  1.  1.]
 [ 1.  1.  1.  1.]]
[[ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]]
[[ 1.  0.  0.  0.]
 [ 0.  1.  0.  0.]
 [ 0.  0.  1.  0.]
 [ 0.  0.  0.  1.]]
