#Matrix and Covariance

The `mat_handler.py` module contains `matrix` class, which is the backbone of `pyemu`.  The `matrix` class overloads all common mathematical operators and also uses an "auto-align" functionality to line up matrix objects for multiplication, addition, etc. 



In [47]:
from __future__ import print_function
import os
import numpy as np
from pyemu import Matrix, Cov

Here is the most basic instantiation of the `matrix` class:

In [48]:
m = Matrix()

Here we will generate a `matrix` object with a random ndarray

In [49]:
a = np.random.random((5, 5))
row_names = []
[row_names.append("row_{0:02d}".format(i)) for i in range(5)]
col_names = []
[col_names.append("col_{0:02d}".format(i)) for i in range(5)]
m = Matrix(x=a, row_names=row_names, col_names=col_names)
print(m)

row names: ['row_00', 'row_01', 'row_02', 'row_03', 'row_04']
col names: ['col_00', 'col_01', 'col_02', 'col_03', 'col_04']
[[ 0.25147415  0.5847474   0.83208816  0.04700621  0.61856291]
 [ 0.5354256   0.17844495  0.89679782  0.0147489   0.05251886]
 [ 0.13800263  0.70606961  0.61471933  0.55675189  0.94697765]
 [ 0.95557179  0.31426416  0.27775357  0.41331219  0.7056998 ]
 [ 0.80117973  0.30115119  0.46469282  0.92263795  0.86697007]]


#File I/O with `matrix`
`matrix` supports several PEST-compatible I/O routines as well as some others:

In [50]:
ascii_name = "mat_test.mat"
m.to_ascii(ascii_name)
m2 = Matrix.from_ascii(ascii_name)
print(m2)

row names: ['row_00', 'row_01', 'row_02', 'row_03', 'row_04']
col names: ['col_00', 'col_01', 'col_02', 'col_03', 'col_04']
[[ 0.25147415  0.5847474   0.83208816  0.04700621  0.61856291]
 [ 0.5354256   0.17844495  0.89679782  0.0147489   0.05251886]
 [ 0.13800263  0.70606961  0.61471933  0.55675189  0.94697765]
 [ 0.95557179  0.31426416  0.27775357  0.41331219  0.7056998 ]
 [ 0.80117973  0.30115119  0.46469282  0.92263795  0.86697007]]


In [51]:
bin_name = "mat_test.bin"
m.to_binary(bin_name)
m3 = Matrix.from_binary(bin_name)
print(m3)

row names: [u'row_00', u'row_01', u'row_02', u'row_03', u'row_04']
col names: [u'col_00', u'col_01', u'col_02', u'col_03', u'col_04']
[[ 0.25147415  0.5847474   0.83208816  0.04700621  0.61856291]
 [ 0.5354256   0.17844495  0.89679782  0.0147489   0.05251886]
 [ 0.13800263  0.70606961  0.61471933  0.55675189  0.94697765]
 [ 0.95557179  0.31426416  0.27775357  0.41331219  0.7056998 ]
 [ 0.80117973  0.30115119  0.46469282  0.92263795  0.86697007]]


`Matrix` also implements a `to_dataframe()` and a `to_sparse`, which return `pandas dataframe` and a `scipy.sparse` (compressed sparse row) objects, respectively:

In [52]:
print(type(m.to_dataframe()))
print(type(m.to_sparse()))
m.to_dataframe() #looks really nice in the notebook!

<class 'pandas.core.frame.DataFrame'>
<class 'scipy.sparse.csr.csr_matrix'>


Unnamed: 0,col_00,col_01,col_02,col_03,col_04
row_00,0.251474,0.584747,0.832088,0.047006,0.618563
row_01,0.535426,0.178445,0.896798,0.014749,0.052519
row_02,0.138003,0.70607,0.614719,0.556752,0.946978
row_03,0.955572,0.314264,0.277754,0.413312,0.7057
row_04,0.80118,0.301151,0.464693,0.922638,0.86697


#Convience methods of `Matrix`

several cool things are implemented in `Matrix` and accessed through `@property` decorated methods.  For example, the SVD components of a `Matrix` object are simply accessed by name.  The SVD routine is called on demand and the components are cast to `Matrix` objects, all opaque to the user:

In [53]:
print(m.s) #the singular values of m cast into a matrix object.  the SVD() is called on demand
m.s.to_ascii("test_sv.mat") #save the singular values to a PEST-compatible ASCII file

row names: ['sing_val_1', 'sing_val_2', 'sing_val_3', 'sing_val_4', 'sing_val_5']
col names: ['sing_val_1', 'sing_val_2', 'sing_val_3', 'sing_val_4', 'sing_val_5']
[[ 2.70270211]
 [ 0.96251141]
 [ 0.81169598]
 [ 0.35718263]
 [ 0.04625848]]


In [54]:
m.v.to_ascii("test_v.mat") #the right singular vectors of m.
m.u.to_dataframe()# a data frame of the left singular vectors of m

Unnamed: 0,left_sing_vec_1,left_sing_vec_2,left_sing_vec_3,left_sing_vec_4,left_sing_vec_5
row_00,-0.398177,0.562895,-0.151057,-0.341352,0.620697
row_01,-0.280207,0.540946,0.634938,0.346112,-0.325456
row_02,-0.496273,0.128383,-0.658943,0.096846,-0.541891
row_03,-0.450653,-0.369583,0.365356,-0.684497,-0.241452
row_04,-0.559968,-0.487292,0.079648,0.534574,0.396067


The `Matrix` inverse operation is accessed the same way, but requires a square matrix:

In [55]:
m.inv.to_dataframe()

Unnamed: 0,col_00,col_01,col_02,col_03,col_04
row_00,-1.797852,1.135291,0.939991,1.860548,-1.327241
row_01,-9.356031,4.781424,9.003453,4.327946,-6.971545
row_02,2.530339,-0.2685,-2.031426,-1.834315,1.922925
row_03,-4.283705,2.078789,3.292263,0.142379,-0.781584
row_04,8.113845,-4.778369,-6.410924,-2.391052,4.602693


#Manipulating `Matrix` shape
`Matrix` has lots of functionality to support getting submatrices by row and col names:

In [56]:

print(m.get(row_names="row_00",col_names=["col_01","col_03"]))

row names: ['row_00']
col names: ['col_01', 'col_03']
[[ 0.5847474   0.04700621]]


`extract()` calls `get()` then `drop()`:

In [57]:
from copy import deepcopy
m_copy = deepcopy(m)
sub_m = m_copy.extract(row_names="row_00",col_names=["col_01","col_03"])
m_copy.to_dataframe()
sub_m.to_dataframe()

Unnamed: 0,col_01,col_03
row_00,0.584747,0.047006


#Operator overloading
The operator overloading uses the auto-align functionality as well as the `isdiagonal` flag for super easy linear algebra.  The "inner join" of the two objects is found and the rows and cols are aligned appropriately:

In [58]:
#a new matrix object that is not "aligned" with m
row_names = ["row_03","row_02","row_00"]
col_names = ["col_01","col_10","col_100"]
m_mix = Matrix(x=np.random.random((3,3)),row_names=row_names,col_names=col_names)
m_mix.to_dataframe()


Unnamed: 0,col_01,col_10,col_100
row_03,0.457715,0.915726,0.724446
row_02,0.719642,0.362283,0.736331
row_00,0.539345,0.729431,0.97527


In [59]:
m.to_dataframe()

Unnamed: 0,col_00,col_01,col_02,col_03,col_04
row_00,0.251474,0.584747,0.832088,0.047006,0.618563
row_01,0.535426,0.178445,0.896798,0.014749,0.052519
row_02,0.138003,0.70607,0.614719,0.556752,0.946978
row_03,0.955572,0.314264,0.277754,0.413312,0.7057
row_04,0.80118,0.301151,0.464693,0.922638,0.86697


In [60]:
prod = m * m_mix.T
prod.to_dataframe()

Unnamed: 0,row_03,row_02,row_00
row_00,0.267648,0.420809,0.315381
row_01,0.081677,0.128416,0.096243
row_02,0.323179,0.508117,0.380815
row_03,0.143844,0.226158,0.169497
row_04,0.137842,0.216721,0.162424


In [61]:
prod2 = m_mix.T * m
prod2.to_dataframe()

Unnamed: 0,col_00,col_01,col_02,col_03,col_04
col_01,0.672324,0.967341,1.018292,0.615194,1.338113
col_10,1.108471,0.97011,1.083999,0.61447,1.4405
col_100,1.039131,1.317855,1.465365,0.75522,1.811796


In [62]:
(m_mix + m).to_dataframe()

Unnamed: 0,col_01
row_03,0.77198
row_02,1.425712
row_00,1.124092


#The `Cov` derived type
The `Cov` type is designed specifically to handle covariance matrices.  It makes some assumptions, such as the symmetry (and accordingly that row_names == col_names). 

In [63]:
c = Cov(m.newx,m.row_names)

The `Cov` class supports several additional I/O routines, including the PEST uncertainty file (.unc):

In [64]:
c.to_uncfile("test.unc")

In [65]:
c1 = Cov.from_uncfile("test.unc")
print(c1)

row names: ['row_00', 'row_01', 'row_02', 'row_03', 'row_04']
col names: ['row_00', 'row_01', 'row_02', 'row_03', 'row_04']
[[ 0.25147415  0.5847474   0.83208816  0.04700621  0.61856291]
 [ 0.5354256   0.17844495  0.89679782  0.0147489   0.05251886]
 [ 0.13800263  0.70606961  0.61471933  0.55675189  0.94697765]
 [ 0.95557179  0.31426416  0.27775357  0.41331219  0.7056998 ]
 [ 0.80117973  0.30115119  0.46469282  0.92263795  0.86697007]]


We can also build `cov` objects implied by pest control file parameter bounds or observation weights:

In [66]:
parcov = Cov.from_parbounds(os.path.join("henry","pest.pst"))
obscov = Cov.from_obsweights(os.path.join("henry","pest.pst"))

In [67]:
#to_dataframe for diagonal types builds a full matrix dataframe - can be costly
parcov.to_dataframe().head() 

Unnamed: 0,global_k,mult1,mult2,kr01c01,kr01c02,kr01c03,kr01c04,kr01c05,kr01c06,kr01c07,...,kr10c51,kr10c52,kr10c53,kr10c54,kr10c55,kr10c56,kr10c57,kr10c58,kr10c59,kr10c60
global_k,0.003076,0.0,0.0,0.0,0.0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
mult1,0.0,0.003076,0.0,0.0,0.0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
mult2,0.0,0.0,0.022655,0.0,0.0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
kr01c01,0.0,0.0,0.0,0.25,0.0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
kr01c02,0.0,0.0,0.0,0.0,0.25,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [68]:
# notice the zero-weight obs have been assigned a really large uncertainty
obscov.to_dataframe().head()

Unnamed: 0,h_obs01_1,h_obs01_2,h_obs02_1,h_obs02_2,h_obs03_1,h_obs03_2,h_obs04_1,h_obs04_2,h_obs05_1,h_obs05_2,...,c_obs12_2,c_obs13_1,c_obs13_2,c_obs14_1,c_obs14_2,c_obs15_1,c_obs15_2,pd_one,pd_ten,pd_half
h_obs01_1,4.3e-05,0.0,0.0,0.0,0.0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
h_obs01_2,0.0,1.0000000000000001e+60,0.0,0.0,0.0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
h_obs02_1,0.0,0.0,4.3e-05,0.0,0.0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
h_obs02_2,0.0,0.0,0.0,1.0000000000000001e+60,0.0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
h_obs03_1,0.0,0.0,0.0,0.0,4.3e-05,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
