# Basic workflow

In [2]:
from pykanto.utils.paths import pykanto_data, ProjDirs
from pykanto.dataset import SongDataset
from pykanto.parameters import Parameters

We are going to load one of the very small sample datasets that are packaged with `pykanto`â€”this will be enough for a first test to familiarise yourself with the package. See [working with paths and directories](../contents/2_paths-and-dirs.md) to learn how to load your own data.

This particular dataset consists of a few songs from two male great tits (_Parus major_) in [my study population](http://wythamtits.com/), Wytham Woods, Oxfordshire, UK.

In [3]:
DATASET_ID = "GREAT_TIT"
DIRS = pykanto_data(dataset=DATASET_ID)
print(DIRS)



Items held:

PROJECT: /home/nilomr/projects/pykanto/pykanto
DATA: /home/nilomr/projects/pykanto/pykanto/data
RAW_DATA: /home/nilomr/projects/pykanto/pykanto/data/segmented/great_tit
SEGMENTED: /home/nilomr/projects/pykanto/pykanto/data/segmented/great_tit
RESOURCES: /home/nilomr/projects/pykanto/pykanto/resources
REPORTS: /home/nilomr/projects/pykanto/pykanto/reports
FIGURES: /home/nilomr/projects/pykanto/pykanto/reports/figures


Now we can create a SongDataset object, which is the main class in `pykanto` and acts as a sort of database.

In [None]:

params = Parameters() # Using default parameters for simplicity, which you should't!
dataset = SongDataset(DATASET_ID, DIRS, parameters=params, overwrite_dataset=True)
dataset.vocs.head()

We now have an object `dataset`, which is an instance of the `SongDataset` class and has all of its methods. For example, you might want to segment your songs into discrete notes using `pykanto`'s algorithm, which is a simple amplitude-based method that works reasonably well (based on Tim Sainburg's [vocalseg](https://github.com/timsainb/vocalization-segmentation) and Robert Lachlan's de-echoing method in [Luscinia](https://rflachlan.github.io/Luscinia/)).

In [None]:
# Segment:
dataset.segment_into_units()

# Plot an example:
for vocalisation in dataset.vocs.index[:1]:
    dataset.plot_voc_seg(vocalisation)

Next, you can create spectrogram representations of the units or the average of the units present in the vocalisations of each individual ID in the dataset, project and cluster them, and prepare compressed representations that can be used with the interactive app:

In [None]:
dataset.get_units()
dataset.cluster_ids(min_sample=5)
dataset.prepare_interactive_data()

Now you can start the interactive app on your browser by simply running:

In [None]:
dataset.open_label_app()

In [None]:
# for song_level in [True, False]:
#     dataset.parameters.update(song_level=song_level)
#     dataset.get_units()

# dataset.reload()
# for song_level in [True, False]:
#     dataset.parameters.update(song_level=song_level)
#     dataset.cluster_ids(min_sample=5)

# for song_level in [True, False]:
#     dataset.parameters.update(song_level=song_level)
#     dataset.prepare_interactive_data()

# dataset.parameters.update(song_level=True)
# dataset.open_label_app()


If need to load an existing dataset:
(This needs you to create a ProjDirs object)

In [None]:
# out_dir = DIRS.DATA / "datasets" / DATASET_ID / f"{DATASET_ID}.db"
# dataset = pickle.load(open(out_dir, "rb"))