#Matrix and Covariance

The `mat_handler.py` module contains `matrix` class, which is the backbone of `pyemu`.  The `matrix` class overloads all common mathematical operators and also uses an "auto-align" functionality to line up matrix objects for multiplication, addition, etc. 



In [1]:
from __future__ import print_function
import os
import numpy as np
from pyemu import Matrix, Cov

Here is the most basic instantiation of the `matrix` class:

In [2]:
m = Matrix()

Here we will generate a `matrix` object with a random ndarray

In [3]:
a = np.random.random((5, 5))
row_names = []
[row_names.append("row_{0:02d}".format(i)) for i in range(5)]
col_names = []
[col_names.append("col_{0:02d}".format(i)) for i in range(5)]
m = Matrix(x=a, row_names=row_names, col_names=col_names)
print(m)

row names: ['row_00', 'row_01', 'row_02', 'row_03', 'row_04']
col names: ['col_00', 'col_01', 'col_02', 'col_03', 'col_04']
[[ 0.66369095  0.72103958  0.01834535  0.86447659  0.06158706]
 [ 0.58838725  0.83583326  0.92311355  0.65205233  0.21823867]
 [ 0.36925743  0.83478583  0.92683265  0.69077832  0.02634063]
 [ 0.79415036  0.14491413  0.759354    0.31046653  0.92000216]
 [ 0.46282043  0.94008384  0.36848832  0.6562321   0.00725252]]


#File I/O with `matrix`
`matrix` supports several PEST-compatible I/O routines as well as some others:

In [4]:
ascii_name = "mat_test.mat"
m.to_ascii(ascii_name)
m2 = Matrix.from_ascii(ascii_name)
print(m2)

row names: ['row_00', 'row_01', 'row_02', 'row_03', 'row_04']
col names: ['col_00', 'col_01', 'col_02', 'col_03', 'col_04']
[[ 0.66369095  0.72103958  0.01834535  0.86447659  0.06158706]
 [ 0.58838725  0.83583326  0.92311355  0.65205233  0.21823867]
 [ 0.36925743  0.83478583  0.92683265  0.69077832  0.02634063]
 [ 0.79415036  0.14491413  0.759354    0.31046653  0.92000216]
 [ 0.46282043  0.94008384  0.36848832  0.6562321   0.00725252]]


In [5]:
bin_name = "mat_test.bin"
m.to_binary(bin_name)
m3 = Matrix.from_binary(bin_name)
print(m3)

row names: ['row_00', 'row_01', 'row_02', 'row_03', 'row_04']
col names: ['col_00', 'col_01', 'col_02', 'col_03', 'col_04']
[[ 0.66369095  0.72103958  0.01834535  0.86447659  0.06158706]
 [ 0.58838725  0.83583326  0.92311355  0.65205233  0.21823867]
 [ 0.36925743  0.83478583  0.92683265  0.69077832  0.02634063]
 [ 0.79415036  0.14491413  0.759354    0.31046653  0.92000216]
 [ 0.46282043  0.94008384  0.36848832  0.6562321   0.00725252]]


`Matrix` also implements a `to_dataframe()` and a `to_sparse`, which return `pandas dataframe` and a `scipy.sparse` (compressed sparse row) objects, respectively:

In [6]:
print(type(m.to_dataframe()))
print(type(m.to_sparse()))
m.to_dataframe() #looks really nice in the notebook!

<class 'pandas.core.frame.DataFrame'>
<class 'scipy.sparse.csr.csr_matrix'>


Unnamed: 0,col_00,col_01,col_02,col_03,col_04
row_00,0.663691,0.72104,0.018345,0.864477,0.061587
row_01,0.588387,0.835833,0.923114,0.652052,0.218239
row_02,0.369257,0.834786,0.926833,0.690778,0.026341
row_03,0.79415,0.144914,0.759354,0.310467,0.920002
row_04,0.46282,0.940084,0.368488,0.656232,0.007253


#Convience methods of `Matrix`

several cool things are implemented in `Matrix` and accessed through `@property` decorated methods.  For example, the SVD components of a `Matrix` object are simply accessed by name.  The SVD routine is called on demand and the components are cast to `Matrix` objects, all opaque to the user:

In [7]:
print(m.s) #the singular values of m cast into a matrix object.  the SVD() is called on demand
m.s.to_ascii("test_sv.mat") #save the singular values to a PEST-compatible ASCII file

row names: ['sing_val_1', 'sing_val_2', 'sing_val_3', 'sing_val_4', 'sing_val_5']
col names: ['sing_val_1', 'sing_val_2', 'sing_val_3', 'sing_val_4', 'sing_val_5']
[[ 2.88341807]
 [ 1.11456419]
 [ 0.70868147]
 [ 0.16253913]
 [ 0.04708595]]


In [8]:
m.v.to_ascii("test_v.mat") #the right singular vectors of m.
m.u.to_dataframe()# a data frame of the left singular vectors of m

Unnamed: 0,left_sing_vec_1,left_sing_vec_2,left_sing_vec_3,left_sing_vec_4,left_sing_vec_5
row_00,-0.392105,-0.361768,0.728094,0.423196,-0.078494
row_01,-0.528011,0.046383,-0.308041,-0.065625,-0.78731
row_02,-0.490273,-0.125866,-0.53903,0.457192,0.494179
row_03,-0.384254,0.853126,0.279164,-0.069094,0.204495
row_04,-0.423602,-0.35115,0.080645,-0.776402,0.296565


The `Matrix` inverse operation is accessed the same way, but requires a square matrix:

In [9]:
m.inv.to_dataframe()

Unnamed: 0,col_00,col_01,col_02,col_03,col_04
row_00,1.634827,11.665531,-7.795345,-2.619598,-4.299779
row_01,-1.818211,-1.85026,-0.243465,0.534193,4.237218
row_02,-0.110676,2.348272,-0.089164,-0.532566,-1.841701
row_03,1.530958,-6.791201,5.842848,1.345398,-0.531936
row_04,-1.550087,-9.424745,4.86918,3.249609,4.743782


#Manipulating `Matrix` shape
`Matrix` has lots of functionality to support getting submatrices by row and col names:

In [10]:

print(m.get(row_names="row_00",col_names=["col_01","col_03"]))

row names: ['row_00']
col names: ['col_01', 'col_03']
[[ 0.72103958  0.86447659]]


`extract()` calls `get()` then `drop()`:

In [11]:
from copy import deepcopy
m_copy = deepcopy(m)
sub_m = m_copy.extract(row_names="row_00",col_names=["col_01","col_03"])
m_copy.to_dataframe()
sub_m.to_dataframe()

Unnamed: 0,col_01,col_03
row_00,0.72104,0.864477


#Operator overloading
The operator overloading uses the auto-align functionality as well as the `isdiagonal` flag for super easy linear algebra.  The "inner join" of the two objects is found and the rows and cols are aligned appropriately:

In [12]:
#a new matrix object that is not "aligned" with m
row_names = ["row_03","row_02","row_00"]
col_names = ["col_01","col_10","col_100"]
m_mix = Matrix(x=np.random.random((3,3)),row_names=row_names,col_names=col_names)
m_mix.to_dataframe()


Unnamed: 0,col_01,col_10,col_100
row_03,0.366383,0.894417,0.620734
row_02,0.331217,0.418396,0.622148
row_00,0.216129,0.26835,0.583555


In [13]:
m.to_dataframe()

Unnamed: 0,col_00,col_01,col_02,col_03,col_04
row_00,0.663691,0.72104,0.018345,0.864477,0.061587
row_01,0.588387,0.835833,0.923114,0.652052,0.218239
row_02,0.369257,0.834786,0.926833,0.690778,0.026341
row_03,0.79415,0.144914,0.759354,0.310467,0.920002
row_04,0.46282,0.940084,0.368488,0.656232,0.007253


In [14]:
prod = m * m_mix.T
prod.to_dataframe()

Unnamed: 0,row_03,row_02,row_00
row_00,0.264177,0.238821,0.155837
row_01,0.306235,0.276842,0.180648
row_02,0.305851,0.276495,0.180421
row_03,0.053094,0.047998,0.03132
row_04,0.344431,0.311372,0.203179


In [15]:
prod2 = m_mix.T * m
prod2.to_dataframe()

Unnamed: 0,col_00,col_01,col_02,col_03,col_04
col_01,0.55671,0.485427,0.589162,0.529385,0.359108
col_10,1.042899,0.672376,1.071886,0.798688,0.850414
col_100,1.109989,1.03008,1.05869,1.126953,0.623404


In [16]:
(m_mix + m).to_dataframe()

Unnamed: 0,col_01
row_02,1.166003
row_03,0.511297
row_00,0.937168


#The `Cov` derived type
The `Cov` type is designed specifically to handle covariance matrices.  It makes some assumptions, such as the symmetry (and accordingly that row_names == col_names). 

In [17]:
c = Cov(m.newx,m.row_names)

The `Cov` class supports several additional I/O routines, including the PEST uncertainty file (.unc):

In [18]:
c.to_uncfile("test.unc")

In [19]:
c1 = Cov.from_uncfile("test.unc")
print(c1)

row names: ['row_00', 'row_01', 'row_02', 'row_03', 'row_04']
col names: ['row_00', 'row_01', 'row_02', 'row_03', 'row_04']
[[ 0.66369095  0.72103958  0.01834535  0.86447659  0.06158706]
 [ 0.58838725  0.83583326  0.92311355  0.65205233  0.21823867]
 [ 0.36925743  0.83478583  0.92683265  0.69077832  0.02634063]
 [ 0.79415036  0.14491413  0.759354    0.31046653  0.92000216]
 [ 0.46282043  0.94008384  0.36848832  0.6562321   0.00725252]]


We can also build `cov` objects implied by pest control file parameter bounds or observation weights:

In [20]:
parcov = Cov.from_parbounds(os.path.join("henry","pest.pst"))
obscov = Cov.from_obsweights(os.path.join("henry","pest.pst"))

In [21]:
#to_dataframe for diagonal types builds a full matrix dataframe - can be costly
parcov.to_dataframe().head() 

Unnamed: 0,global_k,mult1,mult2,kr01c01,kr01c02,kr01c03,kr01c04,kr01c05,kr01c06,kr01c07,...,kr10c51,kr10c52,kr10c53,kr10c54,kr10c55,kr10c56,kr10c57,kr10c58,kr10c59,kr10c60
global_k,0.003076,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
mult1,0.0,0.003076,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
mult2,0.0,0.0,0.022655,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
kr01c01,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
kr01c02,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [22]:
# notice the zero-weight obs have been assigned a really large uncertainty
obscov.to_dataframe().head()

Unnamed: 0,h_obs01_1,h_obs01_2,h_obs02_1,h_obs02_2,h_obs03_1,h_obs03_2,h_obs04_1,h_obs04_2,h_obs05_1,h_obs05_2,...,c_obs12_2,c_obs13_1,c_obs13_2,c_obs14_1,c_obs14_2,c_obs15_1,c_obs15_2,pd_one,pd_ten,pd_half
h_obs01_1,4.3e-05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
h_obs01_2,0.0,1.0000000000000001e+60,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
h_obs02_1,0.0,0.0,4.3e-05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
h_obs02_2,0.0,0.0,0.0,1.0000000000000001e+60,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
h_obs03_1,0.0,0.0,0.0,0.0,4.3e-05,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
