Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any way to avoid copying and compressing files when creating new task? #204

Open
Paperone80 opened this issue Nov 21, 2018 · 4 comments
Open
Assignees
Labels
Milestone

Comments

@Paperone80
Copy link

@Paperone80 Paperone80 commented Nov 21, 2018

Hi,

Is there any way to avoid copying and compressing images when a new task is created with Source: Shared? Also, source is bound read-only so it shouldn't overwrite anything.
Not sure what the reason is but I have some high resolution imagery and I am loosing the important details to set proper attributes and polygon masks. Also, it takes up unnecessary time and disk space. Thanks.

I am using cvat github version from 2018-11-19.

@gzvulon

This comment has been minimized.

Copy link

@gzvulon gzvulon commented Jan 2, 2019

Same here, I have a lot of HD images on net share and s3,
I'd like to map them to /share dir and use only links, without any data copy.

@vfdev-5

This comment has been minimized.

Copy link
Contributor

@vfdev-5 vfdev-5 commented Sep 13, 2019

any updates on this ?

@nmanovic

This comment has been minimized.

Copy link
Collaborator

@nmanovic nmanovic commented Sep 13, 2019

@vfdev-5 , we are going to reimplement our way to serve data from server (https://github.com/opencv/cvat/tree/az/video_stream). I hope to see the functionality merged in a month or so. After it is merged we will think how to implement the feature. Probably it will be possible to provide data in a pre-defined format. I hope to see the feature in v1.0.0. But we cannot promise.

@nmanovic

This comment has been minimized.

Copy link
Collaborator

@nmanovic nmanovic commented Dec 24, 2019

Some notes for future reference.

Pipeline to use original data:

  1. Prepare data in a format which CVAT can understand and put them onto your remote storage:
    • directory with images
    • Use CVAT script (will be provided) to prepare data in right format (chunks)
    • Use CVAT script (will be provided) to prepare data for "protected" access (e.g. S3 with credentials). Thus instead of original data the user will upload some text files with links on original data and meta information about data.
  2. CVAT will use the remote storage as is and don't try to copy files internally. There are three main use cases:
    • For directory with images CVAT will convert them into "own format" on the fly and cache using DiskCache (http://www.grantjenks.com/docs/diskcache/) in a temporary directory (thus next access should be fast and storage size will be limited).
    • Serve original data as is if data was prepared using CVAT script. Aka remote links on the required data. The user already prepared original data for us, don't need to do anything else. But for compressed data we need to compress them on the fly and cache using DiskCache (http://www.grantjenks.com/docs/diskcache/) in a temporary directory (thus next access should be fast and storage size will be limited). Because the original data was prepared as small chunks it will be fast enough to prepare compressed data.
  3. cvat-data will be responsible to accept prepared data and transform them to actual images with meta information.
    • chunkN.zip with images will be unzipped
    • chunkN.mp4 with video frames will be decoded
    • chunkN.txt with links to data will be converted to images if credentials are provided by the client.
@nmanovic nmanovic assigned nmanovic and unassigned azhavoro Dec 24, 2019
@nmanovic nmanovic modified the milestones: 1.0.0 - Beta, 1.1.0 - Alpha Dec 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Data streaming
  
To do
Server
  
To do
5 participants
You can’t perform that action at this time.