# Importing experimental data

This notebook illustrates the import of experimental data in *larvaworld* and the supporting classes and configuration structure.

Initialize the larvaworld registry. This loads some components from disc and builds the rest on the fly.

We also set VERBOSE=1 to get more info

In [None]:
%load_ext param.ipython
import panel as pn

pn.extension()

# You might have to install this module to run pn.Param
# !pip install jupyter_bokeh

import larvaworld
from larvaworld.lib import util, reg, sim
from larvaworld.lib.reg.generators import LabFormat

# Import the Replay configuration class (for Example III)
from larvaworld.lib.reg.generators import ReplayConf

larvaworld.VERBOSE = 1

### The LabFormat class

Raw data can be of diverse lab-specific formats. We will start with the *LabFormat* class which supports them.

In [None]:
%params LabFormat

Let's generate a new instance

In [None]:
lf_new = LabFormat(labID="MyLab")
print(f"An instance of {lf_new.__class__}")


%params lf_new

Stored instances of the *LabFormat* class are available through the configuration registry.

The registry is retrieved from a dictionary of registry objects by the *LabFormat* key.

In [None]:
LFreg = reg.conf.LabFormat

Each lab-specific data-format configuration is stored in the registry's dictionary under a unique ID.

Let's print the IDs

In [None]:
lfIDs = LFreg.confIDs
print(f"The IDs of the stored configurations of LabFormat class are :{lfIDs}")

# The registry is supported by a nested dictionary :
LFdict = LFreg.dict

# The path where the dictionary is stored:
print(LFreg.path_to_dict)


# The configuration IDs are the keys. They correspond to a nested dictionary :
lfID = lfIDs[0]
lf0_entry = LFdict[lfID]
print()
print(f"An instance of {lf0_entry.__class__.__name__}")

# The configuration dictionary can be retrieved directly by :
lf0_entry2 = LFreg.getID(lfID)
print()
print(lf0_entry == lf0_entry2)

In [None]:
# The configuration object can be retrieved directly by :
lf0 = LFreg.get(lfID)
print(f"The object under the ID : {lfID} is an instance of {lf0.__class__.__name__}")
print()

%params lf0

In [None]:
# The configuration object can be visualized by :
pn.Param(lf0)

In [None]:
# The configuration dictionary can be retrieved directly from the object :
lf0_entry3 = lf0.nestedConf

# As well as the parameter keys
print(lf0.param_keys)
print()

# The path where the lab data are stored:
print(lf0.path)
# print(lf0.raw_folder)

### Example I : Import datasets

Note : The data imported here are part of the core larvaworld package

In [None]:
# Let's inspect one specific lab-format configuration
id = "Schleyer"
Schleyer_lf = LFreg.get(id)

%params Schleyer_lf.tracker

Both raw and imported experimental data, as well as the simulated data are stored at a specific location in the filestructure that can be accessed easily. Regarding experimental data, each format has its own dedicated directory :

In [None]:
print(f"All data are stored here :\n{larvaworld.DATA_DIR}\n")

print(f"The path to the data of the {id} lab-format :\n{Schleyer_lf.path}\n")

print(
    f"Raw data to be imported should be stored here (if not otherwise specified) :\n{Schleyer_lf.raw_folder}\n"
)

print(
    f"Imported/Processed data will be stored here (if not otherwise specified) :\n{Schleyer_lf.processed_folder}"
)

Now we can import some datasets. This means we convert from the native lab-specific data-format to the *larvaworld* format while at the same time filter/select specific entries of the data.

Here two cases are illustrated : 
 - Tracks from a single dish
 - Merged tracks from all dishes inder a certain directory

The import returns an instance of *LarvaDataset* that can be then used.

By default this is not stored to disc, except if we specify *save_dataset = True*

In [None]:
# Single dish case
folder = "dish01"
kws1 = {
    "parent_dir": f"exploration/{folder}",
    "min_duration_in_sec": 90,
    "id": folder,
    "refID": f"exploration.{folder}",
    "group_id": "exploration",
}

d1 = Schleyer_lf.import_dataset(**kws1)

In [None]:
# Merged case
N = 40
kws2 = {
    "parent_dir": "exploration",
    "merged": True,
    "max_Nagents": N,
    "min_duration_in_sec": 120,
    "refID": f"exploration.{N}controls",
    "group_id": "exploration",
}

d2 = Schleyer_lf.import_dataset(**kws2)

In [None]:
print(
    f"The import method returns an instance of {d1.__class__.__name__} having the ID : {d1.id}\n"
)

s, e, c = d1.data

print("The timeseries data (dropping NaNs) : \n")
s.dropna().head()

print("The endpoint data : \n")
e

### Example II : Import downloaded data

Now we will illustrate the import functionality by downloading a publically available dataset of *Drosophila* larva locomotion.

Go to the website below, download the zipped file and extract in the lab-specific folder indicated above

In [None]:
# URL of the repository. Visit for further information.
link2repo = "https://doi.gin.g-node.org/10.12751/g-node.5e1ifd/"

# The name of the zipped file to be downloaded.
filename = "Naive_Locomotion_Drosophila_Larvae.zip"

# URL of the file.
link2data = f"https://gin.g-node.org/MichaelSchleyer/Naive_Locomotion_Drosophila_Larvae/src/master/{filename}"

# Path to extract the downloaded file
dirname = "naive"
print(
    f"The path to extract the downloaded file :\n{Schleyer_lf.raw_folder}/{dirname}\n"
)

In [None]:
# Single dish case
folder = "box1-2017-05-18_14_48_22"
id = "imported_single_dish"
kws = {
    "parent_dir": f"{dirname}/{folder}",
    "min_duration_in_sec": 120,
    "id": id,
    "refID": f"{dirname}.{id}",
    "group_id": dirname,
}

d6 = Schleyer_lf.import_dataset(**kws)

In [None]:
d6.e.cum_dur.sort_values()

In [None]:
# Merged case
N = 50
kws2 = {
    "parent_dir": dirname,
    "merged": True,
    "max_Nagents": N,
    "min_duration_in_sec": 160,
    "refID": f"{dirname}.{N}controls",
    "group_id": dirname,
}

d100 = Schleyer_lf.import_dataset(**kws2)

d100.e.cum_dur.sort_values()

### Example III : Import data of a different format

We will now illustrate the import functionality by importing a set of 3 datasets : Fed, Sucrose and Starved

The 3 animal groups have been subjected two different diets and therefore are in different metabolic state at the moment of tracking their locomotion. We want to compare them in order to detect any impact of metabolic state on locomotion.

Note : This example requires data existing in the *data/JovanicGroup/raw/ProteinDeprivation* folder

Also note that the tracks in the datasets above only include the body's midline and not its contour.

In [None]:
labID = "Jovanic"
Jovanic_lf = reg.conf.LabFormat.get(labID)

media_dir = "./media/3conditions"
plot_dir = f"{media_dir}/plots"
video_dir = f"{media_dir}/videos"

In [None]:
# The name of the experiment
exp = "ProteinDeprivation"

# The group IDs
gIDs = ["Fed", "Sucrose", "Starved"]

# The colors per group
palette = {
    "Fed": "black",
    "Sucrose": "red",
    "Starved": "purple",
}

In [None]:
# Here we configure the import of the data
Jovanic_lf.tracker.dt = 0.1

constraints = util.AttrDict(
    {
        "match_ids": False,
        "interpolate_ticks": True,
        "min_duration_in_sec": 20,
        "time_slice": (0, 60),
        # 'time_slice':None,
    }
)

enr_kws = util.AttrDict(
    {
        "proc_keys": ["angular", "spatial"],
        "anot_keys": ["bout_detection"],
        "traj2origin": True,
        # 'recompute' : True,
        "tor_durs": [20],
        "dsp_starts": [0],
        "dsp_stops": [40, 60],
    }
)


kws = {
    "parent_dir": exp,
    "source_ids": gIDs,
    "colors": [palette[gID] for gID in gIDs],
    # 'raw_folder': '../raw/',
    # 'proc_folder': processed_data_dir,
    "refIDs": gIDs,
    "merged": False,
    "save_dataset": True,
    "enrich_conf": enr_kws,
    **constraints,
}

The following cell actually imports the datasets. 

This step might take a while. 

It needs to be performed once when converting the datasets from the raw tracker-specific format (contained in the *raw* folder) to the larvaworld format (stored in the *processed* folder). 

If the datasets have already been imported they can just be loaded (from the *processed* folder). In this case you can instead run the next cell in order to load them.

In [None]:
# Import the datasets (Needs to run only once)
ds = Jovanic_lf.import_datasets(**kws)

In [None]:
# Load the datasets (If they have been imported in a previous session)
ds = [reg.loadRef(gID) for gID in gIDs]

Now that we have the data, we can generate some plots.

We will choose from the available ones :

In [None]:
# The available plots by their unique IDs
reg.graphs.ks

# The keyword arguments for all plots
plot_kws = {"datasets": ds, "save_to": plot_dir, "show": False, "subfolder": None}

In [None]:
# The trajectories of the larvae
_ = reg.graphs.run("trajectories", **plot_kws)

In [None]:
# The trajectories of the larvae aligned at the origin, colored by the respective color of the group
_ = reg.graphs.run("trajectories", mode="origin", single_color=True, **plot_kws)

In [None]:
# Boxplot of some endpoint metrics
_ = reg.graphs.run("endpoint box", **plot_kws)

In [None]:
# Composite plot summarizing exploration metrics
_ = reg.graphs.run("exploration summary", **plot_kws)

Let's say we want to compare the 3 larva groups in terms of their spatial dispersal

We will do this in increasingly elaborate ways :

1. boxplot of dispersal during the first minute. This will capture only the endpoint situation
2. timeplot of dispersal. This will capture the dispersal timecourse (mean and variance)
3. video of trajectories aligned to originate from the center of the dish
4. combined videos of the 3 groups

In [None]:
# 1. Boxplots of dispersal (mean, final, maximum) for the first 60 seconds
_ = reg.graphs.run(
    "endpoint box", ks=["dsp_0_60_mu", "dsp_0_60_fin", "dsp_0_60_max"], **plot_kws
)

In [None]:
# 2. Dispersal of larvae from their starting point. The default time range is 0-40 seconds.
_ = reg.graphs.run("dispersal", **plot_kws)

In [None]:
# 2. Dispersal of larvae from their starting point. Now plotting the time range is 0-60 seconds.
_ = reg.graphs.run("dispersal", range=(0, 60), **plot_kws)

In [None]:
# 2. Summary of dispersal of larvae from their starting point. The default time range is 0-40 seconds.
_ = reg.graphs.run("dispersal summary", **plot_kws)

In [None]:
# 2. Summary of dispersal of larvae from their starting point. Now plotting the time range is 0-60 seconds.
_ = reg.graphs.run("dispersal summary", range=(0, 60), **plot_kws)

In [None]:
# 3. Run replay simulations and store videos


# A method that runs the replay simulation
def run_replay(d):
    # The display parameters
    screen_kws = {
        "vis_mode": "video",
        "show_display": False,
        "draw_contour": False,
        "draw_midline": False,
        "draw_centroid": False,
        "visible_trails": True,
        "save_video": True,
        "fps": 1,
        "video_file": d.id,
        "media_dir": video_dir,
    }

    # The replay configuration
    replay_conf = ReplayConf(
        transposition="origin", time_range=(0, 60), track_point=d.c.point_idx
    ).nestedConf

    rep = sim.ReplayRun(
        dataset=d, parameters=replay_conf, id=f"{d.refID}_replay", screen_kws=screen_kws
    )
    # print(rep.refDataset.color)
    _ = rep.run()

In [None]:
# 3. Run the replay simulation for each dataset
for d in ds:
    _ = run_replay(d)

In [None]:
# 4. Combine the videos
from larvaworld.lib.util.combining import combine_videos

combine_videos(file_dir=video_dir, save_as="3conditions.mp4")