# Experiment 1: Part Signal PDF Convergence

This notebook will contain gathered results from experiment 1.

# Part Signal PDF convergence
## Methodology 
For each part we have data for, run the function to simulate the PDF convergence, 100 times. Randomly shuffle the part signals between each run, otherwise it would yeild the same results each time. Track the part, part type, how many signals it needed until convergence, and the relative variance. 
## Deliverables
Associated graphs for each part run showing the the convergence of the CI
Graphs and analysis for the combined average of each part type. What does it tell us? What can we conclude about the part type and why it is behaving that way?
Graphs and analysis comparing the averages of the different types. How different are they? How can we explain this? Does this validate our assumptions? 

## Source Code

The below sections contains all of our source codes.

In [None]:
import mlflow
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [None]:

import os 

user_path = '~/GitHub/matcher'  # CHANGE THIS LINE AS NEEDED FOR YOUR ENVIRONMENT
os.chdir(os.path.expanduser(user_path))


In [None]:
def get_metrics_series(mlruns_path: str, experiment_id: str, run_id: str, metric_name: str) -> list:
    """Get a series of metric values for a given metric name."""
    with open(f'{mlruns_path}/{experiment_id}/{run_id}/metrics/{metric_name}') as f:
        file_lines = f.readlines()
    return [float(line.split()[1]) for line in file_lines]


In [None]:
experiment_id = mlflow.get_experiment_by_name(name='Experiment 1').experiment_id
runs_df = mlflow.search_runs(experiment_ids=experiment_id, max_results=10_000)
runs_df['num_samples_for_convergence'] = runs_df.apply(
    lambda row: get_metrics_series(
            mlruns_path='mlruns', 
            experiment_id=experiment_id, 
            run_id=row['run_id'], 
            metric_name='num_samples_for_convergence'), 
    axis=1)


In [None]:
mlflow.set_experiment('Experiment 1 Analysis')
mlflow.end_run()
mlflow.start_run()

part_type_groups = runs_df.groupby('params.part_type')
for part_type, part_group in part_type_groups:

    num_samples_for_convergence = part_group['num_samples_for_convergence'].values
    # plot box plot of num_samples_for_convergence
    plt.plot(part_group[param_col], vars, label=f'{part_type} - Correlation: {np.corrcoef(part_group[param_col], vars)[0,1]:.2f}')

plt.legend()
plt.title(f'Variance of Collision Rates vs {param_col}')
plt.xlabel(f'{param_col}')
plt.ylabel('Variance of Collision Rates')
plt.savefig(f'psig_matcher/experiments/graphs/variance_of_collision_rates_vs_{param_col}.png')
mlflow.log_artifact(f'psig_matcher/experiments/graphs/variance_of_collision_rates_vs_{param_col}.png')
plt.clf()
 