Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate data labeling tools into kubeflow #1316

Closed
codeflitting opened this issue Aug 6, 2018 · 5 comments
Closed

Integrate data labeling tools into kubeflow #1316

codeflitting opened this issue Aug 6, 2018 · 5 comments

Comments

@codeflitting
Copy link
Member

The introduction

For more context see "End Users’ Hate List and Wish List for Kubeflow in 6 Months”

Tf-operator in Kubeflow can smoothly handle training tasks. However, we found that in practice many of our (at Caicloud) ML engineers and customers are reluctant to use it, given that in many cases with smaller dataset training doesn't heavily rely on distributed scheduling and local training (single server with multiple GPU cards) would suffice. So at this stage, we also need to provide good tools to simplify the coding and development process for ML developers for better Kubeflow adoption.

Data labeling stands out in our user (ML engineers) survey as it serves as a very important step prior to training (and necessary for the prevailing supervised learning) . Hence, we are proposing to incorporate a data labeling component in Kubeflow, possibly by integrating some existing labeling project.

According to labeling for different (and popular) data types, initial thoughts are as follows:

Image and video labeling

some of the most commonly used tools aimed at the faster, simpler completion of machine vision tasks:

Text labeling

Audio labeling

Some time we need effective and easy to use labeling tools to train high-performance neural networks for sound recognition and music classification tasks. Here are some of them.

Conclusion

Labeling is an indispensable stage of data preprocessing in supervised learning, the above list is not comprehensive. We may have a vote to choose which project to integrate in kubeflow.

In the end. I hope with the integration of "data labeling" kubeflow will be a platform that low-skilled R&D engineers will be able to leverage.

@xinzhangcmu
Copy link

@aronchick @jlewi per discussion in the email thread, how should we prioritize it? (applying release 0.3 label with an appropriate priority?)

@jlewi
Copy link
Contributor

jlewi commented Aug 6, 2018

I think this would be an amazing contribution.

Do we have someone that's willing to pick this up?

@xinzhangcmu
Copy link

@jlewi great; we can take a stab, starting with a design doc

@ddutta
Copy link
Member

ddutta commented Aug 9, 2018 via email

@stale
Copy link

stale bot commented May 16, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot closed this as completed May 23, 2019
yanniszark pushed a commit to arrikto/kubeflow that referenced this issue Feb 15, 2021
* Update CI cluster version to 1.16

* Add retry strategy

* Remove backoff
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants