-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate data labeling tools into kubeflow #1316
Comments
@aronchick @jlewi per discussion in the email thread, how should we prioritize it? (applying release 0.3 label with an appropriate priority?) |
I think this would be an amazing contribution. Do we have someone that's willing to pick this up? |
@jlewi great; we can take a stab, starting with a design doc |
This is awesome!
…On Thu, Aug 9, 2018 at 4:52 PM Xin Zhang ***@***.***> wrote:
@jlewi <https://github.com/jlewi> great; we can take a stab, starting
with a design doc
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1316 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA1h-kqdxk_0VbenFCFV9omIH8rL57heks5uPMs_gaJpZM4VvptO>
.
--
-Debo~
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
* Update CI cluster version to 1.16 * Add retry strategy * Remove backoff
The introduction
For more context see "End Users’ Hate List and Wish List for Kubeflow in 6 Months”
Tf-operator in Kubeflow can smoothly handle training tasks. However, we found that in practice many of our (at Caicloud) ML engineers and customers are reluctant to use it, given that in many cases with smaller dataset training doesn't heavily rely on distributed scheduling and local training (single server with multiple GPU cards) would suffice. So at this stage, we also need to provide good tools to simplify the coding and development process for ML developers for better Kubeflow adoption.
Data labeling stands out in our user (ML engineers) survey as it serves as a very important step prior to training (and necessary for the prevailing supervised learning) . Hence, we are proposing to incorporate a data labeling component in Kubeflow, possibly by integrating some existing labeling project.
According to labeling for different (and popular) data types, initial thoughts are as follows:
Image and video labeling
some of the most commonly used tools aimed at the faster, simpler completion of machine vision tasks:
Text labeling
Audio labeling
Some time we need effective and easy to use labeling tools to train high-performance neural networks for sound recognition and music classification tasks. Here are some of them.
Conclusion
Labeling is an indispensable stage of data preprocessing in supervised learning, the above list is not comprehensive. We may have a vote to choose which project to integrate in kubeflow.
In the end. I hope with the integration of "data labeling" kubeflow will be a platform that low-skilled R&D engineers will be able to leverage.
The text was updated successfully, but these errors were encountered: