# Importing experimental data

This notebook illustrates the import of experimental data in *larvaworld* and the supporting classes and configuration structure.

Initialize the larvaworld registry. This loads some components from disc and builds the rest on the fly.

We also set VERBOSE=1 to get more info

In [None]:
%load_ext param.ipython
import panel as pn
pn.extension()

# You might have to install this module to run pn.Param
# !pip install jupyter_bokeh

import larvaworld
from larvaworld.lib import reg

larvaworld.VERBOSE = 1

Raw data can be of diverse lab-specific formats. We will start with the *LabFormat* class which supports them.

In [None]:
from larvaworld.lib.reg.generators import LabFormat

%params LabFormat

Let's generate a new instance

In [None]:
lf_new = LabFormat(labID="MyLab")
print(f"An instance of {lf_new.__class__}")


%params lf_new

Stored instances of the *LabFormat* class are available through the configuration registry.

The registry is retrieved from a dictionary of registry objects by the *LabFormat* key.

In [4]:
LFreg = reg.conf.LabFormat

Each lab-specific data-format configuration is stored in the registry's dictionary under a unique ID.

Let's print the IDs

In [None]:
lfIDs = LFreg.confIDs
print(f"The IDs of the stored configurations of LabFormat class are :{lfIDs}")

# The registry is supported by a nested dictionary :
LFdict = LFreg.dict

# The path where the dictionary is stored:
print(LFreg.path_to_dict)


# The configuration IDs are the keys. They correspond to a nested dictionary :
lfID = lfIDs[0]
lf0_entry = LFdict[lfID]
print()
print(f"An instance of {lf0_entry.__class__.__name__}")

# The configuration dictionary can be retrieved directly by :
lf0_entry2 = LFreg.getID(lfID)
print()
print(lf0_entry == lf0_entry2)

In [None]:
# The configuration object can be retrieved directly by :
lf0 = LFreg.get(lfID)
print(f"The object under the ID : {lfID} is an instance of {lf0.__class__.__name__}")
print()

%params lf0

In [None]:
# The configuration object can be visualized by :
pn.Param(lf0)

In [None]:
# The configuration dictionary can be retrieved directly from the object :
lf0_entry3 = lf0.nestedConf

# As well as the parameter keys
print(lf0.param_keys)
print()

# The path where the lab data are stored:
print(lf0.path)
# print(lf0.raw_folder)

Let's inspect one specific lab-format configuration

In [None]:
id = "Schleyer"
Schleyer_lf = LFreg.get(id)

%params Schleyer_lf.tracker

Both raw and imported experimental data, as well as the simulated data are stored at a specific location in the filestructure that can be accessed easily. Regarding experimental data, each format has its own dedicated directory :

In [None]:
print(f"All data are stored here :\n{larvaworld.DATA_DIR}\n")

print(f"The path to the data of the {id} lab-format :\n{Schleyer_lf.path}\n")

print(f"Raw data to be imported should be stored here (if not otherwise specified) :\n{Schleyer_lf.raw_folder}\n")

print(f"Imported/Processed data will be stored here (if not otherwise specified) :\n{Schleyer_lf.processed_folder}")

Now we can import some datasets. This means we convert from the native lab-specific data-format to the *larvaworld* format while at the same time filter/select specific entries of the data.

Here two cases are illustrated : 
 - Tracks from a single dish
 - Merged tracks from all dishes inder a certain directory

The import returns an instance of *LarvaDataset* that can be then used.

By default this is not stored to disc, except if we specify *save_dataset = True*

In [None]:
# Single dish case
folder = "dish01"
kws1 = {
    "parent_dir": f"exploration/{folder}",
    "min_duration_in_sec": 90,
    "id": folder,
    "refID": f"exploration.{folder}",
    "group_id": "exploration"
}

d1 = Schleyer_lf.import_dataset(**kws1)

In [None]:
# Merged case
N = 40
kws2 = {
    "parent_dir": "exploration",
    "merged": True,
    "max_Nagents": N,
    "min_duration_in_sec": 120,
    "refID": f"exploration.{N}controls",
    "group_id": "exploration",
}

d2 = Schleyer_lf.import_dataset(**kws2)


In [None]:
print(f"The import method returns an instance of {d1.__class__.__name__} having the ID : {d1.id}\n")

s, e, c = d1.data

print("The timeseries data (dropping NaNs) : \n")
s.dropna().head()

print("The endpoint data : \n")
e

Now we will illustrate the import functionality by downloading a publically available dataset of *Drosophila* larva locomotion.

Go to the website below, download the zipped file and extract in the lab-specific folder indicated above

In [None]:
# URL of the repository. Visit for further information.
link2repo = "https://doi.gin.g-node.org/10.12751/g-node.5e1ifd/"

# The name of the zipped file to be downloaded.
filename = "Naive_Locomotion_Drosophila_Larvae.zip"

# URL of the file.
link2data = f"https://gin.g-node.org/MichaelSchleyer/Naive_Locomotion_Drosophila_Larvae/src/master/{filename}"

# Path to extract the downloaded file
dirname = "naive"
print(f"The path to extract the downloaded file :\n{Schleyer_lf.raw_folder}/{dirname}\n")


In [None]:
# Single dish case
folder = "box1-2017-05-18_14_48_22"
id="imported_single_dish"
kws = {
    "parent_dir": f"{dirname}/{folder}",
    "min_duration_in_sec": 120,
    "id": id,
    "refID": f"{dirname}.{id}",
    "group_id": dirname
}

d6 = Schleyer_lf.import_dataset(**kws)

In [None]:
d6.e.cum_dur.sort_values()

In [None]:
# Merged case
N = 50
kws2 = {
    "parent_dir": dirname,
    "merged": True,
    "max_Nagents": N,
    "min_duration_in_sec": 160,
    "refID": f"{dirname}.{N}controls",
    "group_id": dirname,
}

d100 = Schleyer_lf.import_dataset(**kws2)

d100.e.cum_dur.sort_values()