# Analysis notebook for the quals report

The main point of this document is to put together all the pretty graphs and tables that prove our hypotheses and to document how we got them.

## Obtaining main timing results and statistics
The data we're working was generated from swizzleflow commit `49c8255170`.

We have the following files

1. `results/2020-04-26-timings-for-initial-quals-eval`
   - `env RUSTFLAGS="-g" cargo build --features stats --release `
   - ```bash
for i in specs/swinv_like/*/*.json                              (master*)
do
./target/release/swizzleflow -a $i | tee -a results/2020-04-26-timings-for-inital-quals-eval
done
```

2. `results/2020-04-27-2d-stencil-5-timings`
   - ```bash
for i in specs/swinv_like_big/*/*.json                           (master)
do                       
./target/release/swizzleflow -a $i | tee results/2020-04-27-2d-stencil-5-timings
done
```
   - Get yourself some coffee while that's going

3. `results/2020-04-27-stats-for-swinv-specs-for-quals`
   - Recompile with `env RUSTFLAGS="-g" cargo build --features stats --release`
   - Run the same `for` loop in point 1, but outputing to a different file. These experiments will be slower due to statistic collection, but will give more info on what's going on

4. For `results/2020-04-27-swizzleflow-comparisons.txt`, see that file for vague mumblings about the changes we had to make to their code to get all the data we caned about, or, like, call me if I haven't documented it better by then

## Obtaining results for the one point experiments
1. `git checkout one_point_test`, which has code modified to track only one point
2. `env RUSTFLAGS="-g" cargo build --features stats --release` to comple
3. We used small specs to make sure we wouldn't have issues with search performance.
4. And so run `./target/release/swizzleflow -m mats_one -a ./specs/1d-conv-16x3.json ./specs/trove-16x3.json | tee results/2020-04-28-small-specs-one-point-tracked-stats ` on the branch
5. `git checkout master`, recompile
6. `./target/release/swizzleflow -a ./specs/1d-conv-16x3.json ./specs/trove-16x3.json | tee results/2020-04-28-small-specs-two-points-tracked-stats` to get the results

In [1]:
# initial setup
import sys
sys.path.append("../analysis")

In [2]:
import parsing
import extraction

from parsing import parse_file
from extraction import humanize_names, expand_target_checks, pull_spec_in

In [3]:
%matplotlib widget

In [4]:
import pandas as pd
import numpy as np

import itertools

import matplotlib as plt

In [5]:
def fetch(dataset):
    return humanize_names(parse_file(f"../results/{dataset}"))

In [6]:
## Pruning utility experiments
one_point_raw = fetch("2020-04-28-small-specs-one-point-tracked-stats")
two_points_raw = fetch("2020-04-28-small-specs-two-points-tracked-stats")

In [7]:
def map_frames(frames, f):
    return {k: f(v) for k, v in frames.items()}


In [8]:
cols_to_keep = ['tested', 'found', 'failed', 'pruned', 'continued']
one_point = extraction.search_stats(one_point_raw)
two_points = extraction.search_stats(two_points_raw)
one_point = map_frames(one_point,
                       lambda v: v[cols_to_keep])
two_points = map_frames(two_points,
                        lambda v: v[cols_to_keep])

In [9]:
extraction.compute_basis_size(one_point)
extraction.compute_basis_size(two_points)

In [10]:
n_solutions = map_frames(two_points, lambda v: v['found'].sum())

In [11]:
for test, df in itertools.chain(one_point.items(), two_points.items()):
    df['redundant'] = (df['tested'] // df['basis_size']) - n_solutions[test]
    df['rel_redundant'] = df['redundant'] / (df['tested'] // df['basis_size'])

In [12]:
(one_point, two_points)

({'1d-conv-16x3':    tested  found  failed  pruned  continued  basis_size  redundant  \
  0       1      0       0       0          1           1          0   
  1      14      0       0      10          4          14          0   
  2     960      0       0       0        960         240          3   
  3   15360      0   15120       0        240          16        959   
  4     240      1     239       0          0           1        239   
  
     rel_redundant  
  0       0.000000  
  1       0.000000  
  2       0.750000  
  3       0.998958  
  4       0.995833  ,
  'trove-8x3':    tested  found  failed  pruned  continued  basis_size  redundant  \
  0       1      0       0       0          1           1         -1   
  1      18      0       0       0         18          18         -1   
  2      54      0       0       0         54           3         16   
  3    3024      0       0       0       3024          56         52   
  4   24192      0       0   24190          2    

In [13]:
redundancies = {}
rel_redundancies = {}
for k in n_solutions:
    redundancies[k] = pd.DataFrame({'one point': one_point[k]['redundant'][2:-1],
                                     'two points': two_points[k]['redundant'][2:-1]})

    rel_redundancies[k] = pd.DataFrame({'one point': one_point[k]['rel_redundant'][2:-1],
                                        'two points': two_points[k]['rel_redundant'][2:-1]})
redundancies

{'1d-conv-16x3':    one point  two points
 2          3           0
 3        959           0, 'trove-8x3':    one point  two points
 2         16           0
 3         52           0
 4       3022           0
 5          0           0
 6         34           0}

In [27]:
pull_spec_in(redundancies).groupby('spec').plot()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

spec
1d-conv-16x3    AxesSubplot(0.125,0.11;0.775x0.77)
trove-8x3       AxesSubplot(0.125,0.11;0.775x0.77)
dtype: object