Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some questions about the datasets uesd in the paper #2

Closed
Feiyuyu0503 opened this issue Nov 26, 2021 · 3 comments
Closed

Some questions about the datasets uesd in the paper #2

Feiyuyu0503 opened this issue Nov 26, 2021 · 3 comments

Comments

@Feiyuyu0503
Copy link

1.For the datasets in the folder ‘data for training treba and feature extraction’,could you explain to me the physical meaning in each dimension.(e.g. There are 20 elements in the first dimension and what are their meanings?)

f59375c8b9079d161f5a5c7d8e62f34
2.For the datasets in the folder 'data for classification', I observed that there are two keys which are 'features and annotations'. For the features, it consists 4 three-dim arrays, could you explain to me the meaning of each dimension? And what's the relationship between ‘features’ key and 'annotations' key?(How to establish the correspondence with the data of the two keys)
fa61595efda0e1f6697f42d615c2f72

@jenjsun
Copy link
Contributor

jenjsun commented Nov 26, 2021

Hi! Thanks for your question.

  1. For the 20 dimensions, there are 10 for each fly (2 flies). The dimensions are described by https://github.com/neuroethology/TREBA/blob/master/util/datasets/fly_v1/core.py#L88 and let me know if you need more details. They are detected from the FlyTracker (http://www.vision.caltech.edu/xpburgos/papers/ECCV14%20Eyjolfsdottir.pdf).

  2. All the entries from the keys should have the same number of elements in the sequence (N = 162372 in your example). For the annotations, the 6 corresponds to the class index. The annotation array is full of 0 (behavior not occurring) &1 (behavior occurring) ints. For an example on loading the data, see the provided classifier script: https://github.com/neuroethology/TREBA/blob/c522e169738f5225298cd4577e5df9085130ce8a/downstream_tasks/fly_classification/fly_classification_script.py (Also see Instructions for Downstream Tasks (Behavior Classification) in README.md if you would like to run the classifier).

@jenjsun jenjsun closed this as completed Nov 26, 2021
@Feiyuyu0503
Copy link
Author

Thank you for your reply!
I also have some other questions.
1.In the datasets for classification, for example the datasets which named 'fly_train_features.npz', why its 'features' shape is divided into 27, and in each one of 27 elements, why its shape is (xxxxx,2,50)? What's the meaing of each dimension?
image

2.What's the difference between the datasets for training treba and the datasets for classification? For example, the datasets for training treba which named 'fly_train_encoding' and the datasets for classification which named 'fly_train_features.npz', do the data in the key 'features' exist some relationship with the data in the datasets for training treba?

@jenjsun
Copy link
Contributor

jenjsun commented Nov 28, 2021

  1. The 27 elements corresponds to individual videos from the Fly vs. Fly dataset. The dimension is sequence length x number of flies (2) x feature dimension. The features are from the FlyTracker (http://www.vision.caltech.edu/xpburgos/papers/ECCV14%20Eyjolfsdottir.pdf). I've attached a file with the corresponding name of the 50 dims from FlyTracker:
    fly_feature_names.txt

  2. fly_train_encoding is for extracting TREBA features from a trained model (not for training). In fly_train_features we provide pre-extracted embeddings. Also, for training TREBA, fly_train_shuffled is used: https://github.com/neuroethology/TREBA/blob/master/util/datasets/fly_v1/core.py#L21.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants