# Plotting and Analysis

The role of this notebook is to plot and analyze logs results of a run (or runs) of a simulator, given some fixed timing configuration.
These logs (bboxes.csv) are obtained by running a simulator on some experiments. The goal of these plots is to analyze worm's behavior,
and to analyze the systems error and how it's affected by different behaviors the worm exhibits.

It's important to note that for proper analysis, all the experiments that are analyzed by this notebook *at once* must have the same timing configuration (TimingConfig) parameters.

In [None]:
import matplotlib.pyplot as plt
from sim.config import TimingConfig
from eval.analysis import Plotter
from utils.gui_utils import UserPrompt 

### Timing configuration and log files selection

In [None]:
from pprint import pprint

################################ User Input ################################

# path to the timing config file. If None, a file dialog will open to select a file
timing_config_path = "logs/time_config.json"

############################################################################

timing_config = TimingConfig.load_json(timing_config_path)

pprint(timing_config)

In [None]:
################################ User Input ################################

# list containing paths to simulation log files.
# All of these simulations must have been run with the above timing config.
# If empty, a file dialog will open to select files.
log_files = [
    "logs/bboxes.csv",
]

############################################################################

if len(log_files) == 0:
    log_files = UserPrompt.open_file(title="Select log files", filetypes=[("Log files", "*.csv")], multiple=True)

### Plotting configuration

Notice that all of below plots accept `condition` as a parameter.
`condition` is expected to be a function of the following signature:

```python
def cond_func1(input_df: pd.DataFrame) -> pd.DataFrame:
    return (input_df["wrm_speed"] > 5) &  (input_df["wrm_speed"] <= 30)
```

In python, such functions can be also declared without an explicit name and declaration, using the following syntax:
(for more information read about lambda functions)

```python
cond_func1 = lambda input_df: (input_df["wrm_speed"] > 5) & (input_df["wrm_speed"] <= 30)
cond_func2 = lambda input_df: input_df["phase"] == "imaging"
```

In [None]:
pltr = Plotter(
    timing_config,
    log_paths=log_files,
    plot_height=10,  # the size of the plot figures
)

################################ User Input ################################

# initialize the data for the plotting, by performing all relevant calculations
pltr.initialize(
    n=10,
    imaging_only=True,  # if True, only the imaging-phase data will be plotted. If False, all phases are included in the plots.
    unit="sec",  # the unit of the plots. If "frame" i used, the time is in frames, and the distance is in pixels. If "sec" is used, the time is in seconds, and the distance is in micro-meter.
    legal_bounds=(
        250,
        250,
        1400,
        1300,
    ),  # the legal bounds of the image, in format (x_min, y_min, x_max, y_max). All worm positions outside these bounds will be ignored.
)

############################################################################

#### Optionally, Calculate precise error

To calculate precise error of the system, run the following cell, otherwise skip it.
Note, that the next cell might take a while.

For each frame, the exact pixels in which worm's head is located are calculated. To this end, there is a need to access worm images which were extracted during the experiment initialization process.
Afterwards, the error is calculated as the proportion of worm pixels that are outside of the microscope view. 
Since to calculate this error there is a need to load images from the disk, the calculation is relatively slow.

In [None]:
# TODO: TEST

from utils.io_utils import pickle_load_object

################################ User Input ################################

background_paths = []

worm_folders_paths = []

diff_thresh = 20

############################################################################

if len(background_paths) == 0:
    background_paths = UserPrompt.open_file(
        title="Select background images",
        filetypes=[("Numpy files", "*.np")],
        multiple=True,
    )

if len(worm_folders_paths) == 0:
    raise NotImplementedError("Please provide the paths to the frame folders")

backgrounds = [pickle_load_object(path) for path in background_paths]

pltr.calc_precise_error(worm_image_paths=worm_folders_paths, backgrounds=backgrounds, diff_thresh=diff_thresh)

### Plotting and analysis

In [None]:
# print column names of the data
pprint([f"{i}: {col}" for i, col in enumerate(pltr.column_names())])

In [None]:
pltr.print_stats()

In [None]:
pltr.plot_trajectory(hue_col="log_num", condition=lambda x: x["wrm_y"] >= 0)

In [None]:
pltr.plot_speed(log_wise=True, condition=lambda x: x["wrm_speed"] <= 800)

In [None]:
pltr.plot_error(log_wise=False, error_kind="bbox", condition=lambda x: x["bbox_error"] > 1e-5)

In [None]:
pltr.plot_speed_vs_error(error_kind="dist", condition=lambda x: x["wrm_speed"] < 2000)

In [None]:
pltr.plot_deviation(percentile=0.999, log_wise=False)

In [None]:
pltr.plot_head_size()

In [None]:
pltr.describe(columns=["wrm_speed", "bbox_error", "worm_deviation"], num=9)

In [None]:
import numpy as np

# find anomalies in the data
pltr.find_anomalies(
    no_preds=True,
    min_bbox_error=1.0,
    min_dist_error=np.inf,
    min_speed=np.inf,
    min_size=300,
)