Fitting cross-correlators #11

evanberkowitz · 2022-03-15T22:51:19Z

I would like a fit a 2x2 cross-correlation matrix of data given by e.g.

    C_{ij}(\tau) = \sum_n z^{(n)}_i z^{(n)}_j exp( - dE_n \tau)

where tau = a*(0, 1, ... nt-1), n in principle is an infinite sum z^{(n)}_i are overlap factors labeled by the state in the sum and the operator label i,j which take (in this example) 2 values. Because i and j run over the same set of operators, C is a Hermitian matrix at every tau. The parameters to fit are the zs and dEs.

(Of course, I'd also like to allow for complex z, but that's perhaps a question for a different time / repo. I can split the real and imaginary parts of my data for C into another matrix dimension, and write z=zr+zi*1j and work out the real and imaginary parts of the above spectral decomposition by hand. But this is an unneeded complication, and in fact also encounters the issue described below.)

I would like to treat my data as error-free and to use marginalization. In the script below I make up synthetic data for values of dE and z and try to marginalize in order to fit.

However, calling lsqfit.nonlinear_fit (on lines 76-77) fails in lsqfit internals.

This is (in some sense) a simultaneous with the same parameters to different data, which is supported in a more brute-force way. Is this sort of fitting supported? Is there something simple I can do to avoid this issue?

Does the problem arise from having

matrix-valued data?
matrix-valued fit parameters (in this case, z)?
something else?

#!/usr/bin/env python

import IPython

import numpy as np
import gvar as gv
import lsqfit as lsq

import matplotlib.pyplot as plt

beta = 3.0
nt = 16
tau = np.arange(nt) * beta / nt

states = 5
marginalized = 5
doublet = 2

def fcn(x, p):
    tau = x
    dE = p['dE']    #   n states in the sum
    z  = p['z']     #   n states in the sum, 2 different operators with compatible quantum numbers

    correlator = np.array([[[[
        left * right * np.exp(-dE * t)
        for left  in z[n]]
        for right in z[n]]
        for t in tau]
        for (n, dE) in enumerate(dE)]
        )

    return np.sum(correlator, axis=0)

# Just some made up numbers that we'd like to pull out by fitting.
true_values = {
        'dE' : np.power(np.arange(states), 2),
        'z'  : np.outer(np.power(1/(1+np.arange(states)),0.5), [1,0.5])
}

# Here is our exact data.
data = fcn(tau, true_values)

print(f'{data.shape=}')

prior_guesses = {
        'dE' : np.array([gv.gvar(f'{np.power(n,2)}({np.power(n/2,2)})') for n in range(states + marginalized)]),
        'z'  : np.array([
                    [gv.gvar('0.5(0.5)')/np.power(1.0+n,0.5) for i in range(doublet)]
                    for n in range(states + marginalized)
                    ]),
}

# Separate the states into included and marginalized
ymod_prior = { k : prior_guesses[k][states:] for k in prior_guesses }
yfit_prior = { k : prior_guesses[k][:states] for k in prior_guesses }

# Subtract the marginalization from the data
ymod = data - fcn(tau, ymod_prior)
print(f'{ymod.shape=}')

# plot it to see what it looks like
if False:
    rows, cols = data.shape[1:]
    fig, axs = plt.subplots(rows, cols, sharex=True, sharey=True)

    for r in range(rows):
        for c in range(cols):
            axs[r][c].plot(tau, data[:,r,c])
            axs[r][c].set_yscale('log')

    plt.show()

# Now try to fit!
try:
    # this fails
    fit = lsq.nonlinear_fit(
            data=(tau, ymod), prior=yfit_prior, fcn=fcn, p0=None, tol=1e-6
            )
    # with error
    # /path/to/python/site-packages/lsqfit/__init__.py:1891: RuntimeWarning: divide by zero encountered in reciprocal
    #   ans, inv_wgts = _gvar.regulate(
    # /path/to.python/site-packages/lsqfit/_scipy.py:147: RuntimeWarning: invalid value encountered in multiply
    #   return numpy.asarray(f(x), float)
except:
    # so open an interpreter for investigation
    IPython.embed()

The text was updated successfully, but these errors were encountered:

Gattocrucco · 2022-03-15T23:28:57Z

If I may... Printing your ymod array it appears that many of the errors are practically zero:

array([[[2.12(15), 0.98(10)],
        [0.98(10), 0.41(15)]],
    ...
       [[1.03003166971607895696649848105153(21),
         0.51501583485803947848324924052577(21)],
        [0.51501583485803947848324924052577(21),
         0.25750791742901973924162462026288(21)]]], dtype=object)

The reason is that ymod contains 64 numbers total, while ymod_prior just 15, so the covariance matrix of ymod has at most rank 15. If that is indeed the problem, you would solve it by marginalizing more than 64 parameters.

evanberkowitz · 2022-03-15T23:48:08Z

I changed states=2, marginalized=40 (so ymod_prior has 120 numbers) and changed the

prior_guesses = {
        'dE' : np.array([gv.gvar(f'{n}({n/2})') for n in range(states + marginalized)]),
        'z'  : np.array([
                    [gv.gvar('0.5(0.5)')/np.power(1.0+n,0.5) for i in range(doublet)]
                    for n in range(states + marginalized)
                    ]),
}

so that ymod doesn't get so absurdly small:

array([[[0.79(30), 0.04(22)],
        [0.04(22), -0.33(30)]],
        ...
       [[1.0297(10), 0.51470(95)],
        [0.51470(95), 0.2572(10)]]], dtype=object)

. But the issue nevertheless persists. It's past CET bed time, but I hope we can discuss further!

gplepage · 2022-03-16T03:20:28Z

I think the problem is that the covariance matrix for ymod is not positive definite, probably for the reason pointed out above. To do a quick check I added the following line just before the fit (ie, before the try:):

ymod = gv.gvar(gv.mean(ymod), gv.evalcov(ymod), verify=True)

It complains that the covariance matrix isn't positive definite:

...

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/gpl/tmp/berk.py", line 57, in <module>
    ymod = gv.gvar(gv.mean(ymod), gv.evalcov(ymod), verify=True)
  File "_gvarcore.pyx", line 924, in gvar._gvarcore.GVarFactory.__call__
ValueError: covariance matrix not positive definite

This is true even with the different values for states and marginalized.

Once you solve that problem you may need to come up with some sort of reasonable starting point p0 just to check that the fitter is working. Fit functions with exponentials can easily go crazy if you are far away from the solution, especially with very accurate data.

I am traveling through the end of next week and so you won't be hearing from me again until then.

gplepage · 2022-03-30T15:05:25Z

As I indicated in my previous note, the covariance matrix for ymod is not positive definite in your code, so lsqfit will not work with your data. The covariance matrix probably has zero modes, but you will have to check this yourself.

To make sure lsqfit was working, I made three modifications to your code:

I added a small uncorrelated error to each ymod to guarantee that there are no zero modes:

ymod.flat[:] *= gv.gvar(ymod.size * ['1.0000(1)'])

I made true_values, prior_guesses, ymod_prior, and yfit_prior into gvar.BufferDict dictionaries so I could replace dE by log(dE). You do not want to fit a sum of exponentials without guaranteeing that all the energies are positive (otherwise you will likely get overflows). Using log-normal distributions for dE is one way to accomplish this. I also added 0.5 to the energies to avoid getting a prior of 0(0) for the ground state:

true_values = gv.BufferDict({
        'log(dE)' : gv.log(np.power(np.arange(states), 2) + 0.5),
        'z'  : np.outer(np.power(1/(1+np.arange(states)),0.5), [1,0.5])
})

...

prior_guesses = gv.BufferDict({
        'log(dE)' : gv.log(np.array([gv.gvar(f'{np.power(n,2)}({np.power(n/2,2)})') + 0.5 for n in range(states + marginalized)])),
        'z'  : np.array([
                    [gv.gvar('0.5(0.5)')/np.power(1.0+n,0.5) for i in range(doublet)]
                    for n in range(states + marginalized)
                    ]),
})

...

ymod_prior = gv.BufferDict({ k : prior_guesses[k][states:] for k in prior_guesses })
yfit_prior = gv.BufferDict({ k : prior_guesses[k][:states] for k in prior_guesses })

I set the mean values of the prior closer to the correct values for the 2 states. I did this so the fitter would start searching at a reasonable place. With exponentials in the fit function, you are very likely to get overflows if you don't start at reasonable values. This is why in the corrfitter examples I generally start with a 1-exponential fit, followed by a 2-exponential fit, etc, where each fit uses the output of the previous fit to set the starting point p0. This is particularly important if the errors on the data are very small, as they are in your case. Here is what I changed:

prior_guesses['log(dE)'][:2] = gv.log(gv.gvar(['0.6(6)', '1.4(1.4)']))
prior_guesses['z'][0] = gv.gvar(['1(1)', '0.6(5)'])
prior_guesses['z'][1] = gv.gvar(['0.8(7)', '0.5(4)'])

With these modifications I get the following fit with no error messages along the way:

Least Square Fit:
  chi2/dof [dof] = 0.24 [64]    Q = 1    logGBF = 505.81

Parameters:
      log(dE) 0   -0.69315 (77)      [ -0.51 (83) ]  
              1     0.4054 (25)      [  0.3 (1.1) ]  
          z 0,0    1.00000 (71)      [  1.0 (1.0) ]  
            0,1    0.50000 (35)      [  0.60 (50) ]  
            1,0     0.7071 (12)      [  0.80 (70) ]  
            1,1    0.35353 (83)      [  0.50 (40) ]  
---------------------------------------------------
           dE 0    0.50000 (38)      [  0.60 (50) ]  
              1     1.4999 (38)      [  1.4 (1.5) ]  

Settings:
  svdcut/n = 1e-12/0    tol = (1e-06*,1e-10,1e-10)    (itns/time = 13/0.0)

The covariance matrix for your data doesn't look very realistic to me. If you want to play with matrix fits you might use the data that comes with the etab.py example in the corrfitter distribution (look in the examples/ directory). It comes from a lattice simulation. The example is for a 4x4 fit, but you can easily change it to a 2x2 fit if you want by keeping just the 'l' and 'g' sources. The example code uses GEVP-related techniques, but you can easily fit the 2x2 matrix using a fit like yours.

gplepage · 2022-11-02T17:24:41Z

I am doing housecleaning and am closing this thread because I believe the problem is solved by modifications to your code (see above).

evanberkowitz mentioned this issue Mar 15, 2022

np.array of gvars is not np.einsummable gplepage/gvar#21

Closed

gplepage closed this as completed Nov 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fitting cross-correlators #11

Fitting cross-correlators #11

evanberkowitz commented Mar 15, 2022

Gattocrucco commented Mar 15, 2022

evanberkowitz commented Mar 15, 2022

gplepage commented Mar 16, 2022

gplepage commented Mar 30, 2022

gplepage commented Nov 2, 2022

Fitting cross-correlators #11

Fitting cross-correlators #11

Comments

evanberkowitz commented Mar 15, 2022

Gattocrucco commented Mar 15, 2022

evanberkowitz commented Mar 15, 2022

gplepage commented Mar 16, 2022

gplepage commented Mar 30, 2022

gplepage commented Nov 2, 2022