Some questions about the datasets uesd in the paper #2

Feiyuyu0503 · 2021-11-26T02:37:18Z

1.For the datasets in the folder ‘data for training treba and feature extraction’，could you explain to me the physical meaning in each dimension.(e.g. There are 20 elements in the first dimension and what are their meanings?)

2.For the datasets in the folder 'data for classification', I observed that there are two keys which are 'features and annotations'. For the features, it consists 4 three-dim arrays, could you explain to me the meaning of each dimension? And what's the relationship between ‘features’ key and 'annotations' key？(How to establish the correspondence with the data of the two keys)

jenjsun · 2021-11-26T19:00:35Z

Hi! Thanks for your question.

For the 20 dimensions, there are 10 for each fly (2 flies). The dimensions are described by https://github.com/neuroethology/TREBA/blob/master/util/datasets/fly_v1/core.py#L88 and let me know if you need more details. They are detected from the FlyTracker (http://www.vision.caltech.edu/xpburgos/papers/ECCV14%20Eyjolfsdottir.pdf).
All the entries from the keys should have the same number of elements in the sequence (N = 162372 in your example). For the annotations, the 6 corresponds to the class index. The annotation array is full of 0 (behavior not occurring) &1 (behavior occurring) ints. For an example on loading the data, see the provided classifier script: https://github.com/neuroethology/TREBA/blob/c522e169738f5225298cd4577e5df9085130ce8a/downstream_tasks/fly_classification/fly_classification_script.py (Also see Instructions for Downstream Tasks (Behavior Classification) in README.md if you would like to run the classifier).

Feiyuyu0503 · 2021-11-27T04:41:11Z

Thank you for your reply!
I also have some other questions.
1.In the datasets for classification, for example the datasets which named 'fly_train_features.npz', why its 'features' shape is divided into 27, and in each one of 27 elements, why its shape is (xxxxx,2,50)? What's the meaing of each dimension?

2.What's the difference between the datasets for training treba and the datasets for classification? For example, the datasets for training treba which named 'fly_train_encoding' and the datasets for classification which named 'fly_train_features.npz', do the data in the key 'features' exist some relationship with the data in the datasets for training treba?

jenjsun · 2021-11-28T06:35:25Z

The 27 elements corresponds to individual videos from the Fly vs. Fly dataset. The dimension is sequence length x number of flies (2) x feature dimension. The features are from the FlyTracker (http://www.vision.caltech.edu/xpburgos/papers/ECCV14%20Eyjolfsdottir.pdf). I've attached a file with the corresponding name of the 50 dims from FlyTracker:
fly_feature_names.txt
fly_train_encoding is for extracting TREBA features from a trained model (not for training). In fly_train_features we provide pre-extracted embeddings. Also, for training TREBA, fly_train_shuffled is used: https://github.com/neuroethology/TREBA/blob/master/util/datasets/fly_v1/core.py#L21.

jenjsun closed this as completed Nov 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some questions about the datasets uesd in the paper #2

Some questions about the datasets uesd in the paper #2

Feiyuyu0503 commented Nov 26, 2021

jenjsun commented Nov 26, 2021

Feiyuyu0503 commented Nov 27, 2021

jenjsun commented Nov 28, 2021

Some questions about the datasets uesd in the paper #2

Some questions about the datasets uesd in the paper #2

Comments

Feiyuyu0503 commented Nov 26, 2021

jenjsun commented Nov 26, 2021

Feiyuyu0503 commented Nov 27, 2021

jenjsun commented Nov 28, 2021