# Appendix

This notebook contains information only tangentially relevant to parts of the 
assignment, but may be useful anyways.

## Runtimes

### TFIM simulations

I was able to run simulations of up to size $L=20$ before my sparse matrix
builder collapsed the kernel while converting lists of matrix elements in COO
format to the CSR representation.
As Brenden suggested, it would faster and less problematic to do this in Fortran
and save the sparse matrix in an intermediate step as a HDF5 dataset before
loading it into numpy, but then if all we cared about was performance we would
just write everything in Fortran.

Here are some summary statistics of the runtimes at various systems sizes,
averaged over the different boundary conditions and values of the parameter $h$.

In [None]:
import numpy as np
from scipy.stats import linregress
import matplotlib.pyplot as plt
%matplotlib inline

from ph121c_lxvm import tfim, data

In [None]:
d = data.hdf5.inquire(tfim.data.ARCHIVE)

#### Metadata
This is what some of the HDF5 metadata for a job looks like:

In [None]:
d[next(iter(d))]

Here the attributes '0' and '1' refer to datasets within this job.
In this case, '0' contains eigenvalues and '1' contains eigenvectors:
these are just the indices of the tuple returned by the solver.
We also know the system size, the solvers used in this job, and the time
taken for each part of the solver to complete its task.
The names of the jobs themselves are meaningless to people and are just
unique hashes of the job metadata.

#### Complexity

In the following program, we take the metadata and plot the growth of the
runtime average at a given system size, averaging over all other parameter
values: open and closed boundary conditions and 
$h \in \{0.3, 0.5, 0.7, 0.8, 0.85, 0.9, 0.95, 1, 1.05, 1.1, 1.15, 1.2, 1.3, 1.5, 1.7\}$.

In [None]:
# wrangle data from attributes
dset = dict()
for k, v in d.items():
    if str(v['L']) not in dset:
        dset[str(v['L'])] = {
            'solvertime' : [],
            'opertime' : [],
        }
    dset[str(v['L'])]['opertime'].append(v['opertime'])
    dset[str(v['L'])]['solvertime'].append(v['solvertime'])                         

In [None]:
# Collect averages for plotting
sizes = []
solvertimes = []
opertimes = []
for L in dset:
    sizes.append(int(L))
    opertimes.append(np.mean(dset[L]['opertime']))
    solvertimes.append(np.mean(dset[L]['solvertime']))

In [None]:
# Sort results by L
for i, e in enumerate(sorted(
    zip(sizes, solvertimes, opertimes)
)):
    sizes[i], solvertimes[i], opertimes[i] = e

In [None]:
%%capture plot
fig, ax = plt.subplots()

ax.set_title('Scaling of runtime')
ax.set_xlabel('dim($H_L$)')
ax.set_ylabel('Time (s)')

ax.loglog([ 2 ** e for e in sizes ], solvertimes, label='solver times')
ax.loglog([ 2 ** e for e in sizes ], opertimes, label='operator times')
ax.legend()

plt.show()

#### Results
The actual runtimes plotted on logarithmic axes are:

In [None]:
plot.show()

Even on logarithmic axis, the runtimes of the eigenvalue solver
`scipy.sparse.linalg.eigsh`, have positive curvature.
This means that the complexity may exceed $\mathcal O (2^{L^\gamma})$ for any
constant $\gamma$.
By comparison, the runtimes of the operator, the function that constructs the
sparse matrix, are essentially linear in log-log space, suggesting that the
complexity of the algorithm is algebraic.
Let's estimate the slope:

In [None]:
m, b, r, p, err = linregress(
    np.log10([ 2 ** e for e in sizes ]),
    np.log10(opertimes),
)
print('slope: ', m)
print('p-val: ', p)
print('stder: ', err)

So the complexity of generating the sparse matrix is very certainly with an
exponent of about 10% larger than linear.

It's also interesting that the operator builds the sparse matrix faster than
ARPACK can diagonalize it for $L \in \{6, 8\}$, but then it is slower until
ARPACK catches up again near $L=20$.

This is not the full story behind these implementations.
Clearly, the fact the operator causes Python to crash at $L=22$ implies
that the memory footprint of that algorithm is unreasonable, though
the runtimes alone do not seem to reveal this as an issue.

### All calculations

These are all the parameter values for which I have calculated wavefunctions
and energies for 6 extremal eigenvalues.

```python
# All values obtained
L = range(8, 21, 2)
h = [0.3, 0.5, 0.7, 0.8, 0.85, 0.9, 0.95, 1, 1.05, 1.1, 1.15, 1.2, 1.3, 1.5, 1.7]
bc = ['o', 'c']
```

In [None]:
results = dict()
for k, v in d.items():
    method = '+'.join([v['oper'], v['solver']])
    if method not in results:
        results[method] = {
            'L' : [],
            'h' : [],
            'bc': [],
        }
    results[method]['L'].append(v['L'])
    results[method]['h'].append(v['h'])
    results[method]['bc'].append(v['bc'])