# Experiment 2: Part Type Collision Analysis

This notebook will contain gathered results from experiment 2.

# Part Type Collision Analysis
## Methodology
For each part type we have, run the experiment many times over many different hyperparameters. Specifically, isolate one hyperparameter, run the experiment over a range of values, tracking the computed collision rate each time. Repeat this for each hyperparameter and each part type.
## Deliverables
Graphs and analysis for the impact of different values of the hyperparmeters. How do they affect the final collision rate? Why are the effecting the collision rate like that? What does this tell us? 
Graphs and analysis for comparing the results across different part types. Are different part types affected in the same way by the same change in hyperperamters? How close are their collision rates? What does this tell us about the relative importance of both hyperparameters and part types. 

## Source Code

The below sections contains all of our source codes.

In [1]:
import mlflow
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [4]:
import os 

user_path = '~/GitHub/matcher'  # CHANGE THIS LINE AS NEEDED FOR YOUR ENVIRONMENT
os.chdir(os.path.expanduser(user_path))

In [30]:
def get_metrics_series(mlruns_path: str, experiment_id: str, run_id: str, metric_name: str) -> list:
    """Get a series of metric values for a given metric name."""
    with open(f'{mlruns_path}/{experiment_id}/{run_id}/metrics/{metric_name}') as f:
        file_lines = f.readlines()
    return [float(line.split()[1]) for line in file_lines]

In [31]:
experiment_id = mlflow.get_experiment_by_name(name='Experiment 3').experiment_id
runs_df = mlflow.search_runs(experiment_ids=experiment_id, max_results=10_000)
runs_df['monte_carlo_upper_collision_rate_series'] = runs_df.apply(
    lambda row: get_metrics_series(
            mlruns_path='mlruns', 
            experiment_id=experiment_id, 
            run_id=row['run_id'], 
            metric_name='monte_carlo_upper_collision_rate'), 
    axis=1)

print(runs_df.head(1)['monte_carlo_upper_collision_rate_series'])
print(runs_df.head(1)['run_id'])


Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniforge/base/envs/matcher/lib/python3.10/site-packages/mlflow/store/tracking/file_store.py", line 279, in search_experiments
    exp = self._get_experiment(exp_id, view_type)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/matcher/lib/python3.10/site-packages/mlflow/store/tracking/file_store.py", line 372, in _get_experiment
    meta = FileStore._read_yaml(experiment_dir, FileStore.META_DATA_FILE_NAME)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/matcher/lib/python3.10/site-packages/mlflow/store/tracking/file_store.py", line 1082, in _read_yaml
    return _read_helper(root, file_name, attempts_remaining=retries)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/matcher/lib/python3.10/site-packages/mlflow/store/tracking/file_store.py", line 1075, in _read_helper
    result = read_yaml(root, file_name)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/matcher/lib/python3.10/site-packages/mlflow/utils/file_util

0    [0.2758493175418901, 0.2686923494918686, 0.274...
Name: monte_carlo_upper_collision_rate_series, dtype: object
0    6874a8cf8aa54bb5b3ffdad4de91fc8c
Name: run_id, dtype: object


In [8]:
run_df = pd.DataFrame.from_dict(run_dicts, orient='index')
print(run_df.columns)
print(run_df.head(1))

Index(['monte_carlo_upper_collision_rate', 'part_dim', 'confidence_bound',
       'part_pdf_ci', 'meta_pdf_ci', 'num_samples', 'part_type'],
      dtype='object')
                                  monte_carlo_upper_collision_rate part_dim  \
6874a8cf8aa54bb5b3ffdad4de91fc8c                          0.285976        2   

                                 confidence_bound part_pdf_ci meta_pdf_ci  \
6874a8cf8aa54bb5b3ffdad4de91fc8c            0.999      0.9999       0.995   

                                 num_samples  part_type  
6874a8cf8aa54bb5b3ffdad4de91fc8c         100  CONTAINER  


In [13]:
analysis_groups = {
    "part_dim": run_df.groupby('part_dim'),
    "confidence_bound": run_df.groupby('confidence_bound'),
    "part_pdf_ci": run_df.groupby('part_pdf_ci'),
    "meta_pdf_ci": run_df.groupby('meta_pdf_ci')
}

In [18]:
mlflow.set_experiment("Experiment 2 Analysis")
for analysis_type in analysis_groups:
   
    group = analysis_groups[analysis_type]
    x_vals = []
    y_vals = []
    
    for index, df in group:

        col_vals = set(df[analysis_type].to_list())
        if len(col_vals) != 1:
            raise Exception(f"More than one {analysis_type} value in group")
        
        x_vals.append(col_vals.pop())
        y_vals.append(df['upper_collision_rate'].mean())
        
    plt.plot(x_vals, y_vals, label=f'{analysis_type}s vs upper_collision_rate')
    plt.xlabel(analysis_type)
    plt.ylabel(f"Averaged upper_collision_rate across all tested parts")
    plt.savefig(f"psig_matcher/experiments/graphs/{analysis_type}_vs_upper_collision_rate.png")
    mlflow.log_artifact(f"psig_matcher/experiments/graphs/{analysis_type}_vs_upper_collision_rate.png")
    

KeyError: 'upper_collision_rate'

---

## Conclusion

TBD.