Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add feature vector dataloader #291

Open
wants to merge 26 commits into
base: main
Choose a base branch
from

Conversation

xianyuanliu
Copy link
Member

@xianyuanliu xianyuanliu commented Jan 22, 2022

Description

Update action_dann_lightn example with a dataloader of feature vector input. This dataloader is used in the EPIC challenge.
The trainer and models for feature vectors will be updated in the next PR after this one is merged. Coverage in video.py will be improved in the next PR because of most parts are used in training.
Changes are summarized below.

  1. Add EPIC100DatasetAccess
  2. Add new hyperparameters: CLASS_TYPE, INPUT_TYPE and NUM_SEGMENTS
  3. Improve video-related trainers and ClassNetVideo for multi-class output.

Status

Ready

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • In-line docstrings updated.
  • Source for documentation at docs manually updated for new API.

@xianyuanliu xianyuanliu marked this pull request as draft January 22, 2022 14:14
@github-actions github-actions bot added this to In progress in v0.1.0 Jan 22, 2022
@xianyuanliu xianyuanliu added new feature New feature/module (including request) work-in-progress Work in progress that should NOT be merged labels Jan 22, 2022
@codecov-commenter
Copy link

codecov-commenter commented Jan 23, 2022

Codecov Report

Merging #291 (2773771) into main (3706d7c) will decrease coverage by 0.39%.
The diff coverage is 80.92%.

@@            Coverage Diff             @@
##             main     #291      +/-   ##
==========================================
- Coverage   92.96%   92.56%   -0.40%     
==========================================
  Files          45       45              
  Lines        4622     4724     +102     
==========================================
+ Hits         4297     4373      +76     
- Misses        325      351      +26     
Impacted Files Coverage Δ
kale/loaddata/videos.py 73.18% <51.85%> (-13.48%) ⬇️
kale/loaddata/video_access.py 97.05% <94.91%> (-0.86%) ⬇️
kale/embed/video_feature_extractor.py 96.66% <100.00%> (+0.05%) ⬆️
kale/pipeline/video_domain_adapter.py 96.93% <100.00%> (ø)
kale/predict/class_domain_nets.py 91.89% <100.00%> (+1.89%) ⬆️
kale/prepdata/video_transform.py 88.00% <100.00%> (+1.04%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3706d7c...2773771. Read the comment docs.

@xianyuanliu xianyuanliu marked this pull request as ready for review January 23, 2022 11:27
@xianyuanliu xianyuanliu removed the work-in-progress Work in progress that should NOT be merged label Jan 23, 2022
@@ -16,15 +16,17 @@
# Dataset
# -----------------------------------------------------------------------------
_C.DATASET = CN()
_C.DATASET.ROOT = "I:/Datasets/EgoAction/" # "/shared/tale2/Shared"
_C.DATASET.ROOT = "J:/Datasets/EgoAction/" # "/shared/tale2/Shared"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid using absolute path in examples

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be necessary because the data is stored somewhere else (e.g. /Shared/data), not in pykale.

Copy link
Member

@shuo-zhou shuo-zhou Feb 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to use public dataset for demonstration? Do you expect users to change the .yaml configurations and download the data themselves?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It uses public video datasets (e.g. EPIC KITCHENS), but each is large (>10G). Therefore, users need to download and prepare data following dataset instructions before running this example. Then, of course, they need to change configurations fit for data.

Reasons that this example does not contain video dataset downloading are as follows:

  1. Some datasets provide tools or packages to boost downloading speed, but no need to add these into our pykale.
  2. Pykale and data may not be in the same hardware due to the datasets' large-scale.

I may update the readme to describe the dataset format. We may add video dataset downloading in the future when we reduce dataset storage by using some dataset tools (e.g. Hub).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Examples are used for demonstration only. It should not be run on large, and full dataset. Is there any small and public dataset can be used?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Shuo. PyKale examples should be relatively easy to run. Large-scale data experiment can be created on separate repo under PyKale or your personal repo. We can discuss tomorrow.

# ---- setup dataset ----
seed = cfg.SOLVER.SEED
source, target, num_classes = VideoDataset.get_source_target(
source, target, dict_num_classes = VideoDataset.get_source_target(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason of renaming num_classes as dict_num_classes? It is not consistent with the variable names in other examples.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is that num_classes is a dictionary with a multi-task label (verb and noun). Other examples do not consider this. But I think no need to rename num_classes now. I will change it back.

@haipinglu haipinglu removed this from In progress in v0.1.0 Aug 11, 2022
@haipinglu haipinglu added this to In progress in v0.2.0 via automation Aug 11, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Oct 7, 2022

This pull request has been automatically marked as stale due to lack of activity.

@github-actions github-actions bot added the Stale label Oct 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature New feature/module (including request) Stale
Projects
Status: In progress
v0.2.0
In progress
Development

Successfully merging this pull request may close these issues.

None yet

4 participants