Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manifest #2763

Merged
merged 37 commits into from
Mar 24, 2021
Merged

Manifest #2763

merged 37 commits into from
Mar 24, 2021

Conversation

Marishka17
Copy link
Contributor

@Marishka17 Marishka17 commented Feb 2, 2021

Motivation and context

  • Reducing task creation time for some case
  • Unified way of working with different data types with prepared meta
  • Helps to work with remote sources in the future:
    • validity of the received data
    • reducing of traffic between data storage and server

How has this been tested?

manually, tests

Checklist

License

  • I submit my code changes under the same MIT License that covers the project.
    Feel free to contact the maintainers if that's a concern.
  • I have updated the license header for each file (see an example below)
# Copyright (C) 2021 Intel Corporation
#
# SPDX-License-Identifier: MIT

cvat/apps/engine/cache.py Outdated Show resolved Hide resolved
cvat/apps/engine/cache.py Outdated Show resolved Hide resolved
@@ -25,6 +25,8 @@
ImageFile.LOAD_TRUNCATED_IMAGES = True

from cvat.apps.engine.mime_types import mimetypes
from utils.dataset_manifest import VManifestManager, IManifestManager
from utils.dataset_manifest.core import WorkWithVideo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest renaming this class to something more descriptive.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Marishka17 , a class name shouldn't start with a verb. It should describe a type of entity. Could you please rename?

cvat/apps/engine/media_extractors.py Outdated Show resolved Hide resolved
cvat/apps/engine/media_extractors.py Outdated Show resolved Hide resolved
cvat/apps/engine/task.py Outdated Show resolved Hide resolved
utils/dataset_manifest/core.py Outdated Show resolved Hide resolved
utils/dataset_manifest/core.py Outdated Show resolved Hide resolved
utils/dataset_manifest/core.py Outdated Show resolved Hide resolved
utils/dataset_manifest/core.py Show resolved Hide resolved
@coveralls
Copy link

coveralls commented Feb 20, 2021

Coverage Status

Coverage decreased (-0.4%) to 74.854% when pulling 71d1b9a on mk/manifest into 4f7b1f9 on develop.

@azhavoro
Copy link
Contributor

@dvkruchinin Could you please take a look at pylint check? It was failed for a reason.

@dvkruchinin
Copy link
Contributor

Could you please take a look at pylint check? It was failed for a reason.

Sure. I`ll take a look.

@dvkruchinin
Copy link
Contributor

@azhavoro I have prepared a PR #2858 to solve this error.

@Marishka17 Marishka17 changed the title [WIP] Manifest Manifest Feb 24, 2021
# Copyright (C) 2021 Intel Corporation
#
# SPDX-License-Identifier: MIT
from .core import prepare_meta, VideoManifestManager, ImageManifestManager
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will vote for renaming prepare_meta. I don't like the module name. There are multiple variants like dataset_manifest, meta_info, meta_data, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it makes sense to move prepare_meta, VideoManifestManager, ImageManifestManager into the module and use dataset_manifest inside core.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that I understand correctly.

  1. prepare_meta - It is a function that responsible for preparing meta information, which later is used to create a manifest. Why it should be dataset_manifest, meta_info ... ?

@nmanovic
Copy link
Contributor

@Marishka17 , comments about the command line:

  • Let's make dataset_directory positional arugment as an option --output-dir with the current directory as its default value.
  • Don't analyze the whole video to print a warning about force. Need to adjust the strategy.
  • Need a progress bar. For long videos users will not understand what is going on. Let's do that in the PR. Let me know if it takes too much time to implement.

@zhiltsov-max
Copy link
Contributor

tqdm library can help with progress bars.

try:
meta_info = prepare_meta(data_type='video', media_file=source, force=args.force)
except AssertionError as ex:
if str(ex) == 'Too few keyframes':
msg = 'NOTE: prepared manifest file contains too few key frames for smooth decoding.\n' \
'Use --force flag if you still want to prepare a manifest file.'
print(msg)
sys.exit(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it should exit with 2 or something like this?

@Marishka17 Marishka17 mentioned this pull request Mar 22, 2021
2 tasks
{"type":"video"}
{"properties":{"name":"video.mp4","resolution":[1280,720],"length":778}}
{"number":0,"pts":0,"checksum":"17bb40d76887b56fe8213c6fded3d540"}
{"number":135,"pts":486000,"checksum":"9da9b4d42c1206d71bf17a7070a05847"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Marishka17 , Could you please describe how to interpret these data? Just imagine that somebody wants to write code to use these data. Give the user enough information and a couple of sentences. I don't think that you need to describe every parameter, but pts and checksum we should.

Will checksum depends on environment (codec, OS) for video files?

@@ -25,6 +25,8 @@
ImageFile.LOAD_TRUNCATED_IMAGES = True

from cvat.apps.engine.mime_types import mimetypes
from utils.dataset_manifest import VManifestManager, IManifestManager
from utils.dataset_manifest.core import WorkWithVideo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Marishka17 , a class name shouldn't start with a verb. It should describe a type of entity. Could you please rename?


for packet in container.demux(video_stream):
for frame in packet.decode():
assert frame.pict_type.name == 'I', 'First frame is not key frame'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Marishka17 , Video is a user input. Please raise an exception instead of assert. Let's treat it as a bad input data (HTTP 400).

)
return frame.width, frame.height

class AnalyzeVideo(WorkWithVideo):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Marishka17 , What is a reason to split WorkWithVideo and AnalyzeVideo?

Could you please rename class names? Don't start them with a verb. For example, VideoStreamReader or something like that.


frame_pts, frame_dts = frame.pts, frame.dts

class PrepareImageInfo:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please rename? Also probably it is a part of a bigger class which works with images.

def content(self):
return self._content

class PrepareVideoInfo(WorkWithVideo):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Marishka17 , I believe it is a part of VideoStreamReader.

with closing(av.open(self.source_path, mode='r')) as container:
self.width, self.height = self._get_frame_size(container)

def get_task_size(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def get_task_size(self):
def get_size(self):

meta_info.create()
return meta_info

def prepare_meta(data_type, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Marishka17 , can it be just a method of ImageManifestManager and VideoManifestManager? What do you think?

def partial_update(self, number, properties):
pass

#TODO:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Marishka17, please leave an appropriate comment here or remove TODO.

@nmanovic nmanovic merged commit 6c38ad0 into develop Mar 24, 2021
@nmanovic nmanovic deleted the mk/manifest branch March 24, 2021 10:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants