Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recipe for using the Lightly Docker as API worker #630

Merged
merged 11 commits into from Jan 25, 2022

Conversation

MalteEbner
Copy link
Contributor

@MalteEbner MalteEbner commented Dec 14, 2021

Description

  • links to the docs of setting up the S3 bucket

  • links to the docs of installing the docker

  • links to the docs of the first steps with the docker

  • changes the tutorial of setting up the S3 bucket to include distinguishing between Images/Videos

Generated report

Using the docker with an S3 bucket as remote datasource. — lightly 1.2.3 documentation.pdf

@codecov
Copy link

codecov bot commented Dec 14, 2021

Codecov Report

Merging #630 (9a51601) into master (deef6be) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #630   +/-   ##
=======================================
  Coverage   87.70%   87.70%           
=======================================
  Files          89       89           
  Lines        3433     3433           
=======================================
  Hits         3011     3011           
  Misses        422      422           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update deef6be...9a51601. Read the comment docs.

@MalteEbner MalteEbner force-pushed the malte-lig-338-tutorial-for-user-story-m01 branch from 1b3887d to 23776f9 Compare January 17, 2022 08:56
@MalteEbner MalteEbner marked this pull request as ready for review January 17, 2022 09:13
Copy link
Contributor

@japrescott japrescott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments. I am not sure if the title of the tutorial is correct as this tutorial should be more about directly using a datasource instead of needing the data locally

docs/source/docker/getting_started/first_steps.rst Outdated Show resolved Hide resolved
docs/source/docker/integration/docker_api_worker.rst Outdated Show resolved Hide resolved
docs/source/docker/integration/docker_api_worker.rst Outdated Show resolved Hide resolved
docs/source/docker/integration/docker_api_worker.rst Outdated Show resolved Hide resolved
@philippmwirth
Copy link
Contributor

Just had a quick look and the "setting up" is too minimal and is missing a few points:

  • there should be a case distinction between video datasets and image datasets
    -> if you want to process images, choose Image when creating the dataset
    -> if you want to process videos, choose Video when creating the dataset

@MalteEbner
Copy link
Contributor Author

MalteEbner commented Jan 17, 2022

Just had a quick look and the "setting up" is too minimal and is missing a few points:

  • there should be a case distinction between video datasets and image datasets
    -> if you want to process images, choose Image when creating the dataset
    -> if you want to process videos, choose Video when creating the dataset

The setting up of the S3 datasource is covered in the dataset creation with AWS recipe: https://docs.lightly.ai/getting_started/dataset_creation/dataset_creation_aws_bucket.html
It surely belongs there and not into the docker tutorial.

@philippmwirth
Copy link
Contributor

Just had a quick look and the "setting up" is too minimal and is missing a few points:

  • there should be a case distinction between video datasets and image datasets
    -> if you want to process images, choose Image when creating the dataset
    -> if you want to process videos, choose Video when creating the dataset

The setting up of the S3 datasource is covered in the dataset creation with AWS recipe: https://docs.lightly.ai/getting_started/dataset_creation/dataset_creation_aws_bucket.html It surely belongs there and not into the docker tutorial.

All I'm saying is that if I follow the tutorial now it will not work as expected.

@MalteEbner
Copy link
Contributor Author

All I'm saying is that if I follow the tutorial now it will not work as expected.

I adapted the tutorial of setting up the S3 bucket accordingly.

Copy link
Contributor

@IgorSusmelj IgorSusmelj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments. It worked well and is CRAZY SUPER COOL!!! :D

No, it's really awesome to see that this is working. Sooooo convenient.

Could you also add a line here on top? https://docs.lightly.ai/docker/overview.html
Maybe mark the active learning part not as new anymore.

This recipe requires that you already have a dataset in the Lightly Platform
configured to use the data in your AWS S3 bucket.

Follow the steps on how to `create a Lightly dataset connected to your S3 bucket <https://docs.lightly.ai/getting_started/dataset_creation/dataset_creation_aws_bucket.html>`_.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would assume at this part that there is an S3 bucket. The user should go to the other link to setup s3 properly or have a look if there are any questions.

However, for the sake of this tutorial, I would suggest adding all the elements starting from the dataset creation in the UI here as well. So we guide the user from creating a dataset in Lightly and connecting it with the S3 bucket to running the docker.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also tell the user here that he/she needs to pick images or videos depending on the dataset type he/she wants to use

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not add all the dataset creation elements here for two reasons:

  • It would mean duplicating lots of stuff.
  • It does not work well when we also have support for GC storage and azure storage.

Use your subsampled dataset
---------------------------

Once the docker run has finished, you can use your subsampled dataset as you like:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would add a screenshot here of the UI home screen for the dataset (e.g. how should it look like)

subsample it further, or export it for labeling.

.. _ref-docker-with-datasource-datapool:
Process new data in your S3 bucket using a datapool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is sort of the killer feature that appears in the end and it goes under because it's just lots of text. I would try to use bullet points or images/ screenshots to help. E.g.

Running the docker for the first time:
image

Running the docker again with a new video (we see a new cluster because the new video is very different):
image

Only visualizing the newly added data:
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added much more content and screenshots.

@IgorSusmelj
Copy link
Contributor

Ah, and one other thing:

  • we should mention that this only works with S3 and when to expect other sources (azure, gcp) to work as well

@MalteEbner MalteEbner changed the title tutorial for using the Lightly Docker as API worker recipe for using the Lightly Docker as API worker Jan 19, 2022
@MalteEbner
Copy link
Contributor Author

we should mention that this only works with S3 and when to expect other sources (azure, gcp) to work as well

I added which features don't work yet but will be implemented soon.

@MalteEbner MalteEbner force-pushed the malte-lig-338-tutorial-for-user-story-m01 branch from 44bf89d to bf6c66a Compare January 19, 2022 18:40

.. code-block:: console

docker run --gpus all --rm -it \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would try to give a more complete docker run command. Maybe add the stopping condition?

Copy link
Contributor

@IgorSusmelj IgorSusmelj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes. It already looks much better. I would still add a more detailed instruction on the s3 setup. Just mention the steps:

  1. create dataset (use videos or images depending on the data)
  2. edit dataset and click on s3
  3. fill out (here I would add a screenshot, we can even recycle the one we already have :))

The reason why I would add this is to give the user a good end-to-end workflow. Otherwise, he/she has to jump around in our docs and search the various parts together.

@MalteEbner
Copy link
Contributor Author

MalteEbner commented Jan 20, 2022

Thanks for the changes. It already looks much better. I would still add a more detailed instruction on the s3 setup. Just mention the steps:

  1. create dataset (use videos or images depending on the data)
  2. edit dataset and click on s3
  3. fill out (here I would add a screenshot, we can even recycle the one we already have :))

The reason why I would add this is to give the user a good end-to-end workflow. Otherwise, he/she has to jump around in our docs and search the various parts together.

I have some questions regarding your proposal:

  • The 3 steps you proposed are basically the 6 steps in the section "Create and configure a dataset" from the docs: https://docs.lightly.ai/getting_started/dataset_creation/dataset_creation_aws_bucket.html#uploading-your-data
    Do I get that correctly that you would leave away the complete first section of Setting up Amazon S3? Would you link it? Or how would you tell the user what the access key and secret access key are?
  • How would you extend the documentation when we add GC bucket and azure storage? Create new tutorials for them? Or extend this tutorial?
  • Would you also try to get rid of the other references to the docs? (How to download the docker, the first steps with the docker)

However, I get that the S3 tutorial is not ideal for that, thus I created a new issue: https://linear.app/lightly/issue/LIG-548/datasource-setup-recipes-use-order-fitting-milestone-0

@MalteEbner MalteEbner force-pushed the malte-lig-338-tutorial-for-user-story-m01 branch from 6260a46 to 9a51601 Compare January 25, 2022 13:02
@MalteEbner MalteEbner merged commit 61863ac into master Jan 25, 2022
@MalteEbner MalteEbner deleted the malte-lig-338-tutorial-for-user-story-m01 branch January 25, 2022 13:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants