# iPosition Monte Carlo Simulation

This notebook contains various monte carlo simulations for iPosition data. In particular, a few primary methods of simulation are used. Naive 2D simulation, histogram data-driven simulation, and dirichlet distribution simulation are used to determine chance levels. The "actual coordinates" are either from real coordinates or from random coordinates.

First we need to import the pipeline. You'll need to change this directory to wherever it is stored on your machine.

In [1]:
from cogrecon.core.full_pipeline import full_pipeline, get_header_labels
from cogrecon.core.data_structures import TrialData, ParticipantData, AnalysisConfiguration

## Naive 2D Simulation

This section contains the Naive 2D simulations using either truly random values with random target points or truly random values with actual target points.

First, we define some global variables about our simulation.

In [5]:
sim_iterations = 1000  # For convenience, the number of iterations each simulation configuration should run

# Define the dimensions, number of items, and iterations for each test
test_configs = [
    {'dims': 2, 'items': 2, 'iterations': sim_iterations},
    {'dims': 2, 'items': 3, 'iterations': sim_iterations},
    {'dims': 2, 'items': 4, 'iterations': sim_iterations},
    {'dims': 2, 'items': 5, 'iterations': sim_iterations},
    {'dims': 2, 'items': 6, 'iterations': sim_iterations}
]

remove_columns = [4, 17, 38]  # Some columns of our output may not average or standard-deviation easily, so we remove those

save_filename = 'naive_2d_monte_carlo.p'  # The filename to save the output as we go

In [6]:
import numpy as np
import numpy.random as rand
import logging
import time
import os
import pickle

# Disable some outputs that we don't need given our circumstances
logger = logging.getLogger()
logger.disabled = True
np.seterr(invalid='ignore')

{'divide': 'warn', 'invalid': 'ignore', 'over': 'warn', 'under': 'ignore'}

In [8]:
# Helper for getting the appropriate headers for columns we keep
def get_output_labels():
    headers = get_header_labels()
    headers = np.delete(headers, remove_columns)
    return headers

# Helper for printing our variables as we run
def print_read_friendly(o):
    headers = get_output_labels()
    
    row_format ="{0:55}: {1:15}"
    for h, oo in zip(headers, o):
        print(row_format.format(h, oo))

# Helper for converting our outputs to an easy-to-save format
def get_save_data(_test_configs, _output_labels, _mean_outputs, _std_outputs, _times):
    save_data = {
        'test_configs': _test_configs,
        'output_labels': _output_labels,
        'mean_outputs': _mean_outputs,
        'std_outputs': _std_outputs,
        'times': _times
            }
    return save_data
    
# Helper for saving our data
def checkpoint_data(save_filename, data):
    pickle.dump(data, open(save_filename, 'wb'))

# Helper for getting random data
def get_random_data():
    actual = np.array([np.array([rand.random() for _ in range(dims)]) for _ in range(items)])
    data = np.array([np.array([rand.random() for _ in range(dims)]) for _ in range(items)])
    return actual.tolist(), data.tolist()

In [9]:
# Lists to store our main outputs
mean_outputs = []
std_outputs = []
times = []

# Iterate through our configurations
for config in test_configs:
    # Get config parameters
    dims = config['dims']
    items = config['items']
    iterations = config['iterations']
    
    # List to store each iteration output - for large iterations, this is the list that can balloon up
    outputs = []
    
    # Record start runtime
    start_time = time.time()
    
    # Iterate the number of times requested
    for _ in range(iterations):
        # Generate random data
        actual, data = get_random_data()
        
        # Run the pipeline
        output = full_pipeline(ParticipantData([TrialData(actual, data)]), AnalysisConfiguration(), visualize=False)[0]
        
        # Delete the removal columns and append the output
        output = np.delete(output, remove_columns, axis=0)
        outputs.append(output)
    
    # Save the runtime, mean of outputs, and standard deviation of outputs (converting to float for that to avoid errors)
    duration = time.time() - start_time
    avgs = np.nanmean(outputs, axis=0)
    stds = np.nanstd([[float(x) for x in inner] for inner in outputs], axis=0)
    
    mean_outputs.append(avgs)
    std_outputs.append(stds)
    times.append(duration)
    
    # Checkpoint/save the data to file
    checkpoint_data(save_filename, get_save_data(test_configs, get_output_labels(), mean_outputs, std_outputs, times))

    # Print a report on this configuration for the user
    print('{0} iterations run in {1} seconds ({2} average) on {3}.'.format(sim_iterations, duration, duration/sim_iterations, config))
    print('_'*100)
    print_read_friendly(avgs)
    print('_'*100)
    print('_'*100)

  keepdims=keepdims)


1000 iterations run in 6.25499987602 seconds (0.00625499987602 average) on {'dims': 2, 'iterations': 1000, 'items': 2}.
____________________________________________________________________________________________________
Original Misplacement                                  :  0.504966379042
Original Swap                                          :           0.246
Original Edge Resizing                                 : 0.0959487307194
Original Edge Distortion                               :           0.953
Pre-Processed Accurate Placements                      :             2.0
Pre-Processed Inaccurate Placements                    :             0.0
Pre-Processed Accuracy Threshold                       :  0.693407322973
Deanonymized Accurate Placements                       :             2.0
Deanonymized Inaccurate Placements                     :             0.0
Deanonymized Accuracy Threshold                        :  0.616126922033
Raw Deanonymized Misplacement                    

1000 iterations run in 12.9520001411 seconds (0.0129520001411 average) on {'dims': 2, 'iterations': 1000, 'items': 5}.
____________________________________________________________________________________________________
Original Misplacement                                  :  0.521658024051
Original Swap                                          :          0.2482
Original Edge Resizing                                 :  0.188446726234
Original Edge Distortion                               :           0.997
Pre-Processed Accurate Placements                      :           3.829
Pre-Processed Inaccurate Placements                    :           1.171
Pre-Processed Accuracy Threshold                       :  0.704986619662
Deanonymized Accurate Placements                       :           3.821
Deanonymized Inaccurate Placements                     :           1.179
Deanonymized Accuracy Threshold                        :  0.485751183602
Raw Deanonymized Misplacement                     

Load the data to confirm it saved properly.

In [10]:
load_data = pickle.load(open(save_filename, "rb"))
print(load_data)

{'test_configs': [{'dims': 2, 'iterations': 1000, 'items': 2}, {'dims': 2, 'iterations': 1000, 'items': 3}, {'dims': 2, 'iterations': 1000, 'items': 4}, {'dims': 2, 'iterations': 1000, 'items': 5}, {'dims': 2, 'iterations': 1000, 'items': 6}], 'output_labels': array(['Original Misplacement', 'Original Swap', 'Original Edge Resizing',
       'Original Edge Distortion', 'Pre-Processed Accurate Placements',
       'Pre-Processed Inaccurate Placements',
       'Pre-Processed Accuracy Threshold',
       'Deanonymized Accurate Placements',
       'Deanonymized Inaccurate Placements',
       'Deanonymized Accuracy Threshold', 'Raw Deanonymized Misplacement',
       'Transformation Auto-Exclusion',
       'Number of Points Excluded From Geometric Transform',
       'Rotation Theta', 'Scaling', 'Translation Magnitude',
       'Geometric Distance Threshold', 'Number of Components',
       'Accurate Placements', 'Inaccurate Placements', 'True Swaps',
       'Partial Swaps', 'Cycle Swaps', 'Partia